December 4, 2024
Understand the role of GitHub Actions workflows in (research) software development
Know how to trigger workflows in several different ways and determine which trigger is useful for different scientific applications
Be able to export and access data created by workflows in a variety of ways
Use GitHub actions for distributed computing
Repository: a folder with your code and data in it with changes tracked by git
GitHub: a cloud platform for syncing git repositories, publishing websites, running automated workflows (this workshop), and more
Run basically any workflow on a virtual machine(s) in the cloud in a way that integrates with GitHub
Easily incorporate workflows for common, complex tasks created by others
Designed for continuous integration & delivery (CI/CD)—software development practices that translate to scientific code and data.
Automate integrating changes in code or data into the main version of your project in a safe way. For example:
When data is updated, run some data validation checks
Before incorporating changes from a collaborator, make sure their code adheres to a particular style
“Delivery” is basically any way you make your data, code, or code outputs available to view or download. For example:
Every month, archive a new version of the data with Zenodo and get a new DOI
When code or data is updated, re-render a report for collaborators
Workflows are defined with YAML files placed in .github/workflows/
on:
workflow_dispatch
jobs:
hello-world:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup R
uses: r-lib/actions/setup-r@v2 #installs R
- name: Run my code
run: Rscript mycode.R
workflow_dispatch
= a button on GitHub
hello-world
mycode.R
script
Go to template repository
Click the green “Use this template” button and select “create new repository”
For “Owner” choose your own GitHub username.
To use actions that publish to GitHub pages, your repository must be public!
In the “Actions” tab, find the “hello_world” example and click “Run workflow”
renv
📦renv analyzes your code and creates a renv.lock
file with information on:
This can be used with r-lib/actions/setup-renv to install all the required R packages.
Workflows are triggered by events that occur in your GitHub repository and defined under on:
push
: triggered whenever changes are made to your repository
pull_request
: runs on a “pull request”—a common way of making changes to a repository
workflow_dispatch
: creates a “run workflow” button that you can click on GitHub
schedule
: E.g. run on the first Monday of every month.
Workflows contain one or more jobs that run on virtual machines, called runners.
You can use GitHub provided runners for free
runs-on: |
OS | CPUs | RAM | Storage |
---|---|---|---|---|
ubuntu-latest |
Ubuntu Linux | 4 | 16 GB | 14 GB |
windows-latest |
Windows | 4 | 16 GB | 14 GB |
macos-latest |
macOS (M1) | 3 | 7 GB | 14 GB |
Each job may contain multiple steps. Steps can be either run a script or an action.
Actions are pre-packaged workflows for common complex tasks.
Common actions:
actions/checkout: gets the files in your repository
r-lib/actions/setup-r: installs R
actions/setup-python: installs Python
r-lib/actions/setup-renv: if your project uses renv, installs the R packages in renv.lock
Find additional actions in the marketplace.
Try searching the web for “<thing you want to do> GitHub action”
Environment variables can be set in workflows with
and accessed in R code with
Secrets can be stored in GitHub and accessed in workflows with
A matrix strategy can be used for iteration to spawn parallel runners.
validate.yaml
R/validate.R
which either errors or doesn’ttestthat.yaml
Runs testthat.R
which uses the testthat
R package
Pros: all tests are run even with multiple errors
Cons: more complicated setup, testthat
is usually for R packages
render_readme.yaml
Renders a .Qmd (Quarto) file to “github flavored markdown” which is rendered into html by GitHub
Pros: relatively simple with quarto-actions/render
action
Cons: must commit results in order to see them—could cause git confusion! GFM doesn’t support all Quarto HTML features
validation_report.yaml
Renders a validation report to a webpage served on GitHub
Pros: all Quarto HTML features supported, full-fledged website you can send to your collaborators
Cons: must run quarto publish gh-pages
locally once before the action works, repository must be public
matrix.yaml
Uses the matrix:
key to iterate over multiple weather stations, pull data, fit a model, and combine model summaries
Pros: Scheduling a task like this is easier on GitHub actions than, say, the HPC
Cons: Limitations on number of concurrent runners and computational power of runners
US-RSE ’24 tutorial: GitHub Actions for Scientific Data Workflows
Virtual drop-in hours every Wednesday from 2-3pm
Join our email list