CCT Data Science, University of Arizona
Jan 31, 2023
Ecologist turned research software engineer
Not an HPC professional or expert
Feels most comfortable never leaving the comfort of RStudio Desktop
Modular workflows & branching create independent targets
use_targets()
automatically sets things up
Use tar_make_clustermq()
or tar_make_future()
to run in parallel
Parallel processes on your computer or jobs on a computing cluster
Potentially easy entry to high performance computing
Persistent workers with clustermq
One-time cost to set up workers
System dependency on zeromq
Transient workers with future
Every target gets its own worker (more overhead)
No additional system dependencies
clustermq
on a clusterclustermq
R package on the clustertargets::use_targets()
clustermq
on a cluster#!/bin/sh
#SBATCH --job-name={{ job_name }} # job name
#SBATCH --partition=hpg-default # partition
#SBATCH --output={{ log_file | logs/workers/pipeline%j_%a.out }} # you can add .%a for array index
#SBATCH --error={{ log_file | logs/workers/pipeline%j_%a.err}} # log file
#SBATCH --mem-per-cpu={{ memory | 8GB }} # memory
#SBATCH --array=1-{{ n_jobs }} # job array
#SBATCH --cpus-per-task={{ cores | 1 }}
#SBATCH --time={{ time | 1440 }}
source /etc/profile
ulimit -v $(( 1024 * {{ memory | 8192 }} ))
module load R/4.0 #R 4.1 not working currently
module load pandoc #For rendering RMarkdown
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
clustermq
workstar_make_clustermq()
worksMay need to run from command line without RStudio
Options for RStudio could be clunky
Data store (_targets/
) not synced with local computer
Develop and run workflow on your computer
Targets are sent off to the cluster to be run as SLURM jobs
Results returned and _targets/
store remains on your computer
Ideal when:
Only some targets need cluster computing
Targets don’t run too long
No comfortable way to use RStudio on the cluster
~/.Rprofile
on the cluster:_targets.R
on your computer:zeromq
installed because I couldn’t get an HPC person to email me back!future
backend worked, but overhead was too much to be helpfultargets
auto-detects SLURM, but need to run as “multicore”Template GitHub repo with setup instructions in README
Tell the HPC experts about it
University of Florida:
On the HPC: BrunaLab/hipergator-targets
Using SSH connector: BrunaLab/hipergator-targets-ssh
University of Arizona (WIP): cct-datascience/targets-uahpc