Habitat connectivity analysis involves expensive raster operations: buffering, masking, and patch identification. Depending on the size of the raster or vector files, these operations can take minutes to hours to run. When you are iterating on your analysis (trying different buffer distances, updating input data, or fine-tuning parameters), re-running everything from scratch is slowed down, and you can end up being unsure if everything is up to date, so you just run it all again.
In an ideal world you would only need to run code that has changed, or has had its dependencies change.
The targets package addresses this issue by only running code that has been changed. You can think of this as a kind of intelligent caching: it tracks every input and output in your pipeline and only re-runs the steps whose inputs have changed. If you add a new interpatch distance, targets re-runs only the connectivity step for that distance — not the data preparation or the other interpatch distances.
urbioconnect is works well in a targets pipeline, and
this vignette unpacks an example pipeline, describing how it works.
We first discuss a minimal pipeline, before going on to add multiple interpatch distances, and then finally demonstrate how to take advantage of parallel processing.
_targets.RThe following _targets.R file uses the built-in lizard
example data and runs connectivity analysis at one interpatch
distance.
Place this code in a file in the root of your project directory, and
name it _targets.R:
# _targets.R
library(geotargets)
library(tarchetypes)
library(targets)
library(terra)
library(urbioconnect)
## Load any R files
tar_source()
## Assign like regular R, just make sure to pipe into a tar_ operation
tar_assign({
species <- tar_target("Blue-Tongued-Lizard")
data_resolution <- tar_target(10)
interpatch_distance <- tar_target(10)
barrier <- example_barrier() |> tar_terra_rast()
habitat <- example_habitat() |> tar_terra_rast()
barrier_mask <- create_barrier_mask(barrier) |> tar_terra_rast()
remaining <- drop_habitat_under_barrier(habitat, barrier_mask) |>
tar_terra_rast()
buffered_habitat <- habitat_buffer(remaining, interpatch_distance = interpatch_distance) |>
tar_terra_rast()
fragmentation_raster <- fragment_habitat(buffered_habitat, barrier_mask) |>
tar_terra_rast()
# get IDs of connected areas
# intersect with habitat to get area IDs of habitat patches
patches <- assign_patches_to_fragments(remaining, fragmentation_raster) |>
add_patch_area() |>
tar_terra_rast()
areas <- aggregate_connected_patches(patches) |>
tar_target()
# or as one step
areas_connected <- habitat_connectivity(
habitat = habitat,
barrier = barrier,
species = "Blue-tongued Lizard",
interpatch_distance = interpatch_distance
) |>
tar_target()
results_connect_habitat <- summarise_connectivity(
connectivity = areas_connected
) |>
tar_target()
})This tar_assign({ does something special
It means we get to use <- like we do in normal R, and
it marks it as something that is part of a targets pipeline.
We specify that each of these things below are to be watched with
targets with tar_target():
species <- tar_target("Blue-Tongued-Lizard")
target_resolution <- tar_target(500)
interpatch_distance <- tar_target(10)This means if any of these variables are changed, say
interpatch_distance changes from 10 and 20, then anything
using distance would need to get rerun.
These parts here:
Are somewhat special because example_habitat() creates
an example habitat raster file:
example_habitat()
#> class : SpatRaster
#> size : 763, 766, 1 (nrow, ncol, nlyr)
#> resolution : 2, 2 (x, y)
#> extent : 326109.6, 327641.6, 5820362, 5821888 (xmin, xmax, ymin, ymax)
#> coord. ref. : GDA94 / MGA zone 55 (EPSG:28355)
#> source : lizard_habitat_raster.tif
#> name : Pseudo Layer
#> min value : 1
#> max value : 1but instead of using tar_target(), we use
tar_terra_rast(). The reason is essentially that raster
objects are very special and need to be treated differently by targets.
This is made possible by the R package, geotargets, which
extends targets to cover special geospatial objects. Read more at https://github.com/ropensci/geotargets.
In a project using real data, you would replace
example_habitat() and example_barrier() with
your own loading code, which might look like this:
Targets will re-run this step only if the file,
habitat_file changes.
The rest of the code then follows as we have done in other examples,
the most important difference being that every example must be
designated as a target, using something such as
tar_target(), tar_terra_rast(), or
similar.
From an R session in your project directory run the following
On first run, every target is computed and cached. When you run it again, only out-of-date targets are re-computed.
If you need to force everything to re-run, you can do the following:
You can load the individual targets back into your R session using
tar_load()
This is a very simple demonstration of using targets, for other more complex examples, which includes quarto report generation, and parallel execution, see:
https://github.com/urbio-ecology/urbio-eco-targets
That repository demonstrates:
prepare_rasters()plot_barrier_habitat_interpatch_dist()geotargets to store terra rasters natively in the
targets cache