2025-02-26
https://datascience.cct.arizona.edu/
We offer:
Workshops
Drop-in hours support
Incubator projects for ALVSCE
Collaboration on funded projects
You’ll need the following R packages today:
Understand what problems targets (and workflow managers in general) solve
Be able to set up a simple project using targets, view the dependency graph, and run the workflow
Write a (good enough) custom R function
Have an awareness of the possibilities: parallelization, cloud storage, iteration, Bayesian analyses, geospatial targets
Know where to go for help with targets
Working toward reproducible analyses benefits:
You in the future
Your collaborators
The greater community
Image from: Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst
Automatically detect dependencies
Run your entire analysis with one master command
Skip steps that don’t need to be re-run
Scaleable

To install a demo targets pipeline, run the following R code and follow the prompts:
usethis::use_course("cct-datascience/targets-demo")In this new project, try the following:
library(targets)
tar_visnetwork() # view dependency graph
tar_make() # run the pipeline
tar_visnetwork()
tar_read(lm_full) # view the "lm_full" targetNow, in R/fit_models.R, change the response variable from flipper_length_mm to bill_length_mm and run tar_visnetwork() again.
The result of each step is stored as an R object1 in _targets/
Everything that happens to make that target is written as a function
_targets.R
R/
_targets/
Generated by tar_make()
Contains targets saved as R objects
Should not be tracked in version control
refactor
/rēˈfaktər/
verb
- restructure (the source code of an application or piece of software) so as to improve operation without altering functionality.
First, let’s start with a “traditional” R analysis project. Download and take a look around:
usethis::use_course("cct-datascience/targets-refactor")03:00
use_targets() to set up infrastructure including _targets.R
tar_option_set()
tar_make()
We’re going to walk through this process together
use_targets()
Exercise
Run use_targets() and open the _targets.R file it generates
Exercise
Figure out what packages are needed and add them to tar_option_set() in _targets.R.
_targets.R
tar_option_set(
packages = c(
"tidyverse",
"lubridate",
#add more packages here
),
format = "rds"
)Tip
This would be a good time to install any necessary packages with install.packages() also!
02:00
Functions in R are created with the function() function
add_ten <- #name of function
function(x) { #argument names and defaults defined here
x + 10 #what the function does
}
add_ten(x = 13)[1] 23
Using multiple arguments:
to_fractional_power <-
function(x, numerator, denominator) {
power <- numerator / denominator
x^power
}
to_fractional_power(27, 1, 3)[1] 3
Important
The last line of a function must return something, so don’t end a function by assigning results with <-!
WORSE:
m1 <- function(x) {
lm(stem_length ~ watershed, data = x)
}BETTER:
# Fit linear model to explain stem lenght as a function of watershed treatment
# data_clean = a tibble; the `data_clean` target
fit_model <- function(data_clean) {
lm(stem_length ~ watershed, data = data_clean)
}Exercise
Convert the code in 01-read_wrangle_data.R into a function that takes the file path to the raw data as an argument and returns the cleaned maples data frame.
08:00
Steps in the workflow are defined by the tar_target() function inside of the list() at the end of _targets.R
_targets.R
list(
tar_target(name = file, command = "data/penguins_raw.csv", format = "file"),
tar_target(name = data, command = get_data(file))
)Usually you only need to use the first two arguments, name for the name of the target, and command for what that target does
Targets that are files (or create files) need the additional argument format = "file" (for files on disk) or format = "url" (for files on the web)
Tip
A common mistake is to leave a trailing comma after the last target. This will result in the error: Error running targets::tar_make() Last error: argument 5 is empty
Use a naming convention (nouns are good)
Use concise but descriptive target names
WORSE:
data1, data2, data3
histogram_by_site_plotBETTER:
data_file, data_raw, data_clean
plot_hist_site
Exercise
Create targets for the input CSV file and the results of your data wrangling function in _targets.R. Remember to use format = "file" in the target for the CSV file.
Check your progress by running tar_visnetwork() and tar_make().
Once you’ve gotten those first two targets working, try creating functions and targets for additional steps in the analysis.
10:00
tar_make() are sometimes uninformative because code is run in a separate R sessiontar_meta() to access error messagestar_meta(fields = error, complete_only = TRUE)When a target errors, you can load a “workspace” with all the functions, data, and packages needed to reproduce the error interactively.
Enable this with:
tar_option_set(
workspace_on_error = TRUE
)Then, load the workspace with tar_workspace(<NAME OF TARGET>)
#load upstream target `data_clean` and `fit_model()` function into global environment
tar_workspace(linear_model)
#try interactively
fit_model(data_clean)Error in eval(mf, parent.frame()): object 'df_cleaned' not found
Can you spot the source of the error?
fit_model.R
fit_model <- function(data_clean) {
lm(y ~ x + z, data = df_cleaned)
}tarchetypes provides alternatives to list() for defining workflow.
tar_plan() allows a name = command() shortcut
tar_assign() allows the assignment operator (<-) and works well with pipes (%>% or |>).
_targets.R
tarchetypes::tar_assign({
data_file <- tar_file("data.csv")
data <-
read.csv(data_file) |>
filter(year >= 2020) |>
tar_target()
model <-
lm(y ~ x + z, data = data) |>
tar_target()
})“target factories” for Bayesian analyses
Simplify analyses by automatically creating targets for multiple steps
E.g. defining a target with tar_stan_mcmc() actually generates multiple targets that wrangle data, run the MCMC, create a table of posterior draws, etc.
geotargets provides helpers to use geospatial packages like terra and stars with targets
Iterate targets over a list of inputs with dynamic branching—useful for large tasks where it would be cumbersome to write out individual targets
E.g. fitting model to 100 bootstraps of data
#creates list of 100 bootstrapped dataframes
tar_target(
data_boot,
purrr::map(1:100, ~sample_frac(data, size = 1, replace = TRUE)),
iteration = "list"
),
# Fit model to each data frame, save results as a list
tar_target(
lm_boot,
fit_lm_full(data_boot),
pattern = map(data_boot),
iteration = "list"
),In the workflow below, the three models and the three plot targets can all be run independently at the same time.
Provide a “controller” created by the crew package to tar_option_set() to run in parallel using multiple workers.
E.g. to use 3 concurrent R sessions on your computer:
tar_option_set(
controller = crew::crew_controller_local(workers = 3)
)crew.cluster provides additional “controllers” for HPC including crew_controller_slurm().targets workflow on the UA HPCBy default, the _targets/ store is on your computer and not shared with collaborators
Collaborators will have to run tar_make() to reproduce the workflow, which might not be convenient if some targets take days or weeks to run
Optionally, _targets/ can be stored in the cloud (Amazon Web Services or Google Cloud S3 buckets)
These stores can be versioned, so you can roll back your _targets.R and not have to re-compute targets.
When to use targets?
Things to consider:
Are intermediates R objects or files (as opposed to, say, in-place modifications to a database)?
Benefits of parallelization
Your collaborators
Your comfort using targets
“Real life” examples of targets workflows:
geotargets and crew.cluster to run on UA HPC)geotargets)targets manual: https://books.ropensci.org/targets/
targets reference: https://docs.ropensci.org/targets/
targets discussion board: https://github.com/ropensci/targets/discussions
CCT Data Science Team drop-in hours every Wednesday afternoon
Make an appointment with Eric to discuss this content and get troubleshooting help
Sign up for our mailing list to be notified about future workshops
