Crafting Publication Quality Data Visualizations with ggplot2

Author

Eric R. Scott

Learning Objectives
  • Customize color palettes with accessibility in mind
  • Customize appearance of plots with themes
  • Customizing axes (ticks, grid-lines, labels, date axes, transformations)
  • Create custom legends
  • Save high resolution or vector formats
  • Arrange multi-panel figures

Required by journals

Journals often require certain modifications to your plots to make them publication-ready

  • High resolution
  • Specific file types (TIFF, EPS, PDF are common)
  • Figure size limits
  • Suggested font size & typeface

Not required, but good practice

Other modifications to the appearance of your plot are a good idea, but less often required by journals or reviewers

  • Colorblind accessible colors
  • Grey scale friendly colors
  • Perceptually-even colors
  • Screen-reader compatible
  • High data-ink ratio (simplify plot, within reason)
  • Arrangement of related plots into multi-panel figures

Example plots

A version of the plot from part 1 of this workshop series is re-created here.

Code
mean_sd <- function(x) {
  tibble(y = mean(x), ymin = mean(x) - sd(x), ymax = mean(x) + sd(x))
}
p1 <-
  ggplot(data = penguins |> 
           filter(!is.na(sex)),
         mapping = aes(x = species, 
                       y = body_mass_g,
                       shape = sex)) +
  geom_point(
    alpha = 0.15,
    position = position_jitterdodge(dodge.width = 0.75)
  ) +
  stat_summary(
    fun.data = mean_sd,
    position = position_dodge(width = 0.75)
  )
p1

In addition, I’ll create two more basic plots for practice combining plots into multi-panel figures.

Code
#bill length vs. flipper length
p2 <-
  ggplot(penguins,
         aes(
           x = flipper_length_mm,
           y = bill_length_mm,
           color = species,
           fill = species,
           shape = species
         )) +
  geom_point() +
  geom_smooth(method = "lm")
p2

Code
# flipper length to mass ratio vs. bill_length
p3 <-
  ggplot(penguins,
         aes(
           x = flipper_length_mm,
           y = bill_depth_mm,
           color = species,
           fill = species,
           shape = species
         )) +
  geom_point() +
  geom_smooth(method = "lm")
p3

Color

There are many ways to change plot colors, including using built-in palettes, packages that add color palettes, and manually. It’s usually best to choose a color palette that meets these criteria:

  • Colorblind friendly
  • Greyscale friendly
  • Perceptually even
  • High contrast (with background & eachother)

The viridis color palettes built in to ggplot2 generally meet these criteria and are a good choice for many visualizations.

Here’s an example plot with a color bar:

v <- 
  ggplot(
    penguins,
    aes(x = bill_length_mm, y = bill_depth_mm, color = body_mass_g)
  ) +
  geom_point()
v

For a continuous color scale use scale_color_viridis_c()

Other palettes are available by changing option

# Variations with option = "A" through "H"
v + scale_color_viridis_c(option = "B")

A binned version is available with scale_color_viridis_b()

v + scale_color_viridis_b(option = "B", n.breaks = 5)

The viridis palettes tend to end on a very bright yellow, which doesn’t look great on light backgrounds, especially on a projector. You can “cap” the scale using begin and end.

E.g. Begin at 10% in to skip the darkest purples and end at 90% to skip the brightest yellows

v + scale_color_viridis_c()
v + scale_color_viridis_c(begin = 0.1, end = 0.9)

Discrete colors

Our initial example plot uses just 3 colors, but we can still use viridis to ensure they are colorblind friendly and perceptually even using scale_color_viridis_d() for “discrete”. The colors still follow a gradient from cool to warm.

p2 + scale_color_viridis_d(end = 0.9, option = "C")

To apply the same color scale to both “fill” and “color”, you can use the aesthetics argument to scale_color_* or scale_fill_* functions.

p2 + scale_color_viridis_d(end = 0.9, option = "C", aesthetics = c("fill", "color"))

Manual color palette

Maybe you don’t like the viridis palettes or maybe you want a palette more appropriate for diverging or discrete data. There are a ton of tools for generating color palettes—both R packages that extend ggplot2 and external tools.

Packages and resources to check out:

You can use any set of colors as a custom palette with scale_color_manual().

my_cols <- c("#B60A1C","#E39802","#309143")

p2 + scale_color_manual(values = my_cols, aesthetics = c("color", "fill"))

Specify which color goes with which factor level by using a named vector.

my_cols <- 
  c("Chinstrap" = "#B60A1C", "Gentoo" = "#E39802", "Adelie" = "#309143")
p2 <- p2 + 
  scale_color_manual(values = my_cols, aesthetics = c("color", "fill"))
p2

p3 <- p3 +
  scale_color_manual(values = my_cols, aesthetics = c("color", "fill"))
p3

Labeling scales

Each scale (color, fill, shape), including x and y axes, can have a label. There are multiple ways to set this, but I’ll demonstrate setting it with labs()

p2 + labs(color = "Penguin Species")

This creates a separate legend for color. Guides with the same name will get combined when possible.

p2 + 
  labs(
    color = "Penguin Species",
    shape = "Penguin Species",
    fill = "Penguin Species"
  )

We can also use labs() to re-name the axes

p2 <- p2 + 
  labs(
    color = "Penguin Species",
    shape = "Penguin Species",
    fill = "Penguin Species",
    x = "Flipper Length (mm)",
    y = "Bill Length (mm)"
  )
p2

Custom Axes

Axes are a type of scale and can be modified with ggplot2 functions like scale_x_continuous() for continuous data, scale_x_discrete() for discrete data, etc.

Our first plot has a discrete x axis and a continuous y axis. We can use a scale_*() to change the x-axis labels (there are other ways to do this of course) and adjust the number of breaks on the y-axis

scinames <-
  c("Adelie" = "P. adeliae", "Chinstrap" = "P. antarticus", "Gentoo" = "P. papua")
p1 <- 
  p1 + scale_x_discrete(labels = scinames) + scale_y_continuous(n.breaks = 12)
p1 

You’ll notice that n.breaks=12 doesn’t produce exactly 12 breaks—it preferences “pretty” breaks at whole numbers and tries to get about as many breaks as you asked for. Supply a numeric vector to breaks if you want to specify the breaks exactly.

You can set the range of a axis two ways, and the choice of which way is consequential. Setting limits inside of scale_() crops the data while coord_cartesian() only crops the plotting area. With ylim(2000, 5500) the mean and standard error for our largest penguins are re-drawn using the cropped data while they are unaffected by coord_cartesian() (except that they go off the edge of the plotting area now)

p1 + scale_y_continuous(limits = c(2000, 5500), n.breaks = 12)

# p1 + ylim(2000, 5500) #shortcut, but resets scale
p1 + coord_cartesian(ylim = c(2000, 5500))

We can use coord_flip() to make a horizontal graph.

p1 + coord_flip()

Important

coord_flip() only changes where the x and y axes are drawn. In the plot above, the “species” axis is still the x-axis and the body_mass_g axis is still the y-axis.

Customizing appearance with theme()

There are default themes built-in to ggplot2

p2 + theme_bw()

And themes available in a variety of packages.

Customizing themes “manually” involves knowing the name of the theme element and it’s corresponding element_*() function. A good place to start is the help page for theme() https://ggplot2.tidyverse.org/reference/theme.html

Activity

Name some things about the appearance of p1 that you want to change and we’ll figure it out together!

Tip

It’s best to find a built-in theme_*() function that gets you most of the way there and then customize with theme()

p1 + 
  theme_minimal() + 
  theme(axis.line = element_line(linewidth = 1.5, lineend = "round"))

Multi-panel figures

library(patchwork)
p1 /
(p2 + p3) + plot_layout(guides = "collect") + 
  plot_annotation(tag_levels = "a")

Saving plots

Although we often leave this step until the end, I think it’s good to save your plot to a file early on once the plot starts taking shape. If you have some plot dimensions in mind, this allows you to see how any further changes will look in the finished plot.

ggsave(
  filename = "penguins.png", # filetype comes from filename extension
  plot = p2, # default is prev. plot, but good to specify
  width = 4, 
  height = 3, 
  units = "in", 
  dpi = "print" # resolution for raster images
)

For example, with these dimensions (4x3 in.), the points and lines appear quite large.

Raster vs. Vector

Raster images (e.g. .jpg, .png, .tiff) are made of pixels and the resolution can vary. Vector images (e.g. .svg, .eps) are not made of pixels and don’t have a resolution.

Tip

Try zooming in on the images below in your browser to see the difference

Raster (72 dpi .png file)

Vector (.svg file)

Vector images should be used whenever possible and accepted by the journal. Vector formats are more accessible for screen-readers as well (although you should always provide alt-text when possible)

References

Zeileis, Achim, Jason C. Fisher, Kurt Hornik, Ross Ihaka, Claire D. McWhite, Paul Murrell, Reto Stauffer, and Claus O. Wilke. 2019. “Colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes.”