--- title: "Using urbioconnect in a targets pipeline" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using urbioconnect in a targets pipeline} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r load-package} library(urbioconnect) ``` ## Why use targets for connectivity analysis? Habitat connectivity analysis involves expensive raster operations: buffering, masking, and patch identification. Depending on the size of the raster or vector files, these operations can take minutes to hours to run. When you are iterating on your analysis (trying different buffer distances, updating input data, or fine-tuning parameters), re-running everything from scratch is slowed down, and you can end up being unsure if everything is up to date, so you just run it all again. In an ideal world you would only need to run code that has changed, or has had its dependencies change. The [targets](https://docs.ropensci.org/targets/) package addresses this issue by only running code that has been changed. You can think of this as a kind of **intelligent caching**: it tracks every input and output in your pipeline and only re-runs the steps whose inputs have changed. If you add a new interpatch distance, targets re-runs only the connectivity step for that distance — not the data preparation or the other interpatch distances. `urbioconnect` is works well in a targets pipeline, and this vignette unpacks an example pipeline, describing how it works. We first discuss a minimal pipeline, before going on to add multiple interpatch distances, and then finally demonstrate how to take advantage of parallel processing. ## A minimal `_targets.R` The following `_targets.R` file uses the built-in lizard example data and runs connectivity analysis at one interpatch distance. Place this code in a file in the root of your project directory, and name it `_targets.R`: ```{r targets-file, eval=FALSE} # _targets.R library(geotargets) library(tarchetypes) library(targets) library(terra) library(urbioconnect) ## Load any R files tar_source() ## Assign like regular R, just make sure to pipe into a tar_ operation tar_assign({ species <- tar_target("Blue-Tongued-Lizard") target_resolution <- tar_target(500) data_resolution <- tar_target(10) aggregation_factor <- tar_target(target_resolution / data_resolution) interpatch_distance <- tar_target(10) barrier <- example_barrier() |> tar_terra_rast() habitat <- example_habitat() |> tar_terra_rast() barrier_mask <- create_barrier_mask(barrier) |> tar_terra_rast() remaining <- drop_habitat_under_barrier(habitat, barrier_mask) |> tar_terra_rast() buffered_habitat <- habitat_buffer(remaining, interpatch_distance = interpatch_distance) |> tar_terra_rast() fragmentation_raster <- fragment_habitat(buffered_habitat, barrier_mask) |> tar_terra_rast() # get IDs of connected areas # intersect with habitat to get area IDs of habitat patches patches <- assign_patches_to_fragments(remaining, fragmentation_raster) |> add_patch_area() |> tar_terra_rast() areas <- aggregate_connected_patches(patches) |> tar_target() # or as one step areas_connected <- habitat_connectivity( habitat = habitat, barrier = barrier, interpatch_distance = interpatch_distance ) |> tar_target() results_connect_habitat <- summarise_connectivity( area = areas_connected$area, interpatch_distance = interpatch_distance, target_resolution = target_resolution, data_resolution = data_resolution, aggregation_factor = aggregation_factor, species = species ) |> tar_target() }) ``` ### What each section does This `tar_assign({` does something special ```r tar_assign({ }) ``` It means we get to use `<-` like we do in normal R, and it marks it as something that is part of a targets pipeline. We specify that each of these things below are to be watched with targets with `tar_target()`: ```r species <- tar_target("Blue-Tongued-Lizard") target_resolution <- tar_target(500) data_resolution <- tar_target(10) aggregation_factor <- tar_target(target_resolution / data_resolution) interpatch_distance <- tar_target(10) ``` This means if any of these variables are changed, say `interpatch_distance` changes from 10 and 20, then anything using `distance` would need to get rerun. These parts here: ```r example_habitat() |> tar_terra_rast() example_barrier() |> tar_terra_rast() ``` Are somewhat special because `example_habitat()` creates an example habitat raster file: ```{r} example_habitat() ``` but instead of using `tar_target()`, we use `tar_terra_rast()`. The reason is essentially that raster objects are very special and need to be treated differently by targets. This is made possible by the R package, `geotargets`, which extends targets to cover special geospatial objects. Read more at https://github.com/ropensci/geotargets. In a project using real data, you would replace `example_habitat()` and `example_barrier()` with your own loading code, which might look like this: ```r tar_file(habitat_file, "data/habitat.tif") rast(habitat_file) |> tar_terra_rast() ``` Targets will re-run this step only if the file, `habitat_file` changes. The rest of the code then follows as we have done in other examples, the most important difference being that every example must be designated as a target, using something such as `tar_target()`, `tar_terra_rast()`, or similar. ## Running and inspecting the pipeline ### Running From an R session in your project directory run the following ```{r run-pipeline, eval=FALSE} targets::tar_make() ``` On first run, every target is computed and cached. When you run it again, only out-of-date targets are re-computed. If you need to force everything to re-run, you can do the following: ```{r run-all, eval=FALSE} targets::tar_invalidate(everything()) targets::tar_make() ``` ### Inspecting results You can load the individual targets back into your R session using `tar_load()` ```{r inspect-results, eval=FALSE} # The combined summary table targets::tar_load(results_connect_habitat) results_connect_habitat # The connectivity data frame for a specific interpatch distance targets::tar_load(connectivity_50) connectivity_50 ``` ### Visualising the dependency graph Before running, you can inspect the pipeline graph to check the dependency structure looks correct: ```{r pipeline-graph, eval=FALSE} targets::tar_visnetwork() ``` ## Example workflows This is a very simple demonstration of using targets, for other more complex examples, which includes quarto report generation, and parallel execution, see: **** That repository demonstrates: - Loading real habitat and barrier shapefiles and converting them with `prepare_rasters()` - Saving habitat interpatch distance plots to files with `plot_barrier_habitat_interpatch_dist()` - Rendering a quarto report as a targets artefact - Using `geotargets` to store terra rasters natively in the targets cache