Demystifying APIs for Researchers

Eric R. Scott

2024-01-24

Learning Objectives

  • Understand what an API (application programming interface) is
  • Get a sense of what kinds of data are available via APIs
  • Understand how to get data from an API into R programmatically

Slides: https://cct-datascience.quarto.pub/demystifying-apis-slides/

What is an API

Getting data on the web

  • Made for humans
  • Point and click

E.g. CAS Common Chemistry

  1. Search for the name of a chemical (e.g. hexanol)
  2. Click on a result to get more info

What’s missing?

There’s no easy way to get the results in a ready-to-analyze form!

Getting data for analysis

This is where an API is useful

  • Made for machines
  • Programmable

APIs exist for many data portals

Getting data directly from an API

Every API is a little different, but here the API equivalent to https://commonchemistry.cas.org/results?q=hexanol is https://commonchemistry.cas.org/api/search?q=hexanol.

(I’ll show you how I figured this out in a bit)

What am I looking at?

The result of https://commonchemistry.cas.org/api/search?q =hexanol is in a format called JSON. It’s made of nested key: value pairs. For example

{
  "count": 2,
  "results": [
    {
      "rn": "111-27-3",
      "name": "1-Hexanol",
      "image": "<svg ...</svg>"
    },
    {
      "rn": "25917-35-5",
      "name": "Hexanol",
      "image": "<svg ...</svg>"
    }
  ]
}

Structure of an API request

Note

This is a fairly standard API, but not all APIs take requests structured in the same way! You’ll have to find the documentation for each particular API to figure out how to build a query.

How to know how to use an API

Let’s look at some API documentation: https://commonchemistry.cas.org/api-overview

What endpoints are available? What queries can you use?

Note

This API only allows one “method”—GET. Other methods exist (e.g. POST), but GET is the most common.

Building an API request in R

  • Reproducible queries

  • Iterate over many queries

  • Create re-usable functions

library(httr2)

1. Create a request

Start a request with the domain

req_search <-
1  request("https://commonchemistry.cas.org/api") |>
2  req_url_path_append("search") |>
3  req_url_query(q = "hexanol")
1
Set the base URL
2
Append the URL with the search endpoint
3
Add a query
req_search
<httr2_request>
GET https://commonchemistry.cas.org/api/search?q=hexanol
Body: empty

2. Perform the request

resp_search <- 
  req_perform(req_search)
resp_search
<httr2_response>
GET https://commonchemistry.cas.org/api/search?q=hexanol
Status: 200 OK
Content-Type: application/json
Body: In memory (3053 bytes)

3. Get the results

results_search <- 
  resp_body_json(resp_search)
str(results_search)
List of 2
 $ count  : int 2
 $ results:List of 2
  ..$ :List of 3
  .. ..$ rn   : chr "111-27-3"
  .. ..$ name : chr "1-Hexanol"
  .. ..$ image: chr "<svg width=\"184.65\" viewBox=\"0 0 6155 1002\" text-rendering=\"auto\" stroke-width=\"1\" stroke-opacity=\"1\""| __truncated__
  ..$ :List of 3
  .. ..$ rn   : chr "25917-35-5"
  .. ..$ name : chr "Hexanol"
  .. ..$ image: chr "<svg width=\"142.29\" viewBox=\"0 0 4743 3165\" text-rendering=\"auto\" stroke-width=\"1\" stroke-opacity=\"1\""| __truncated__

Live Demo

Other useful functions

  • req_throttle() for obeying API rate limits

  • req_retry() for automatically retrying requests that fail

  • req_oauth_auth_code() and associated helpers for APIs that require access tokens or passwords. See this OAuth article.

Don’t re-invent the wheel!

Getting Help