2024-10-17
Ecologist → “Scientific Programmer & Educator”
Attempted (unsuccessfully) to use HPC as PhD student
Successfully used HPC as postdoc
Requirement to use shell commands
Uncomfortable way of editing and running R code
Not seeing HPC resources as “for me”
Key skills that empowered me to use HPC while minimizing time outside of my comfort zone:
GitHub
renv
📦 for managing R package dependencies
Open OnDemand
targets
📦
I can avoid the command line entirely:
RStudio file pane for upload/download of files
RStudio git pane for interacting with git/GitHub
Run parallel R code on HPC cores without SLURM
Cons: can’t load additional modules (?)
targets
targets
_targets.R
R/
with custom functions fit_model()
and plot_model()
targets
Visualize pipeline with tar_visnetwork()
targets
Run pipeline with tar_make()
✔ skipped target file
✔ skipped target data
▶ dispatched target model
● completed target model [3.008 seconds, 2.879 kilobytes]
▶ dispatched target plot
● completed target plot [0.101 seconds, 1.081 kilobytes]
▶ ended pipeline [3.779 seconds]
crew
We can set up a crew
controller to run targets in parallel.
_targets.R
library(targets)
tar_source()
tar_option_set(
packages = c("dplyr", "ggplot2"),
controller = crew::crew_controller_local(workers = 3)
)
list(
tar_target(file, "data.csv", format = "file"),
tar_target(data, read.csv(file)),
tar_target(model1, fit_model1(data)),
tar_target(model2, fit_model2(data)),
tar_target(model3, fit_model3(data))
)
This “local” controller also works on Open OnDemand!
crew.cluster
Use SLURM (or PBS, SGE, etc.) without writing a bash script!
crew.cluster::crew_controller_slurm(
workers = 5,
slurm_partition = "standard",
slurm_time_minutes = 1200,
slurm_log_output = "logs/crew_log_%A.out",
slurm_log_error = "logs/crew_log_%A.err",
slurm_memory_gigabytes_per_cpu = 5,
slurm_cpus_per_task = 2,
script_lines = c(
"#SBATCH --account kristinariemer",
"module load R"
),
seconds_idle = 600
)
Links to relevant tutorials for prerequisite skills
Example targets
pipeline
Uses renv
for package management
Example crew
controllers with all required fields set
Includes run.sh
to launch targets::tar_make()
as a SLURM job
Collaborative workshops led by HPC RSEs & Domain RSEs
Offer workshops on using HPC without the command line
HPC workshops tailored to R/RStudio users
Create a template repo for using targets
on your HPC
crew
technical details nanonext
, R bindings for NNG (Nanomsg Next Gen), which powers…
crew.cluster
seconds_idle