Creating Publication-Quality Graphics with ggplot2 (2024)

Last updated on 2024-05-12 | Edit this page

Estimated time: 80 minutes

Overview

Questions

  • How can I create publication-quality graphics in R?

Objectives

  • To be able to use ggplot2 to generate publication-qualitygraphics.
  • To apply geometry, aesthetic, and statistics layers to a ggplotplot.
  • To manipulate the aesthetics of a plot using different colors,shapes, and lines.
  • To improve data visualization through transforming scales andpaneling by group.
  • To save a plot created with ggplot to disk.

Plotting our data is one of the best ways to quickly explore it andthe various relationships between variables.

There are three main plotting systems in R, the base plottingsystem, the latticepackage, and the ggplot2package.

Today we’ll be learning about the ggplot2 package, because it is themost effective for creating publication-quality graphics.

ggplot2 is built on the grammar of graphics, the idea that any plotcan be built from the same set of components: a dataset, mapping aesthetics, and graphicallayers:

  • Data sets are the data that you, the user,provide.

  • Mapping aesthetics are what connect the data tothe graphics. They tell ggplot2 how to use your data to affect how thegraph looks, such as changing what is plotted on the X or Y axis, or thesize or color of different data points.

  • Layers are the actual graphical output fromggplot2. Layers determine what kinds of plot are shown (scatterplot,histogram, etc.), the coordinate system used (rectangular, polar,others), and other important aspects of the plot. The idea of layers ofgraphics may be familiar to you if you have used image editing programslike Photoshop, Illustrator, or Inkscape.

Let’s start off building an example using the gapminder data fromearlier. The most basic function is ggplot, which lets Rknow that we’re creating a new plot. Any of the arguments we give theggplot function are the global options for theplot: they apply to all layers on the plot.

R

library("ggplot2")ggplot(data = gapminder)
Creating Publication-Quality Graphics with ggplot2 (1)

Here we called ggplot and told it what data we want toshow on our figure. This is not enough information forggplot to actually draw anything. It only creates a blankslate for other elements to be added to.

Now we’re going to add in the mapping aestheticsusing the aes function. aes tellsggplot how variables in the data map toaesthetic properties of the figure, such as which columns ofthe data should be used for the x andy locations.

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
Creating Publication-Quality Graphics with ggplot2 (2)

Here we told ggplot we want to plot the “gdpPercap”column of the gapminder data frame on the x-axis, and the “lifeExp”column on the y-axis. Notice that we didn’t need to explicitly passaes these columns(e.g.x = gapminder[, "gdpPercap"]), this is becauseggplot is smart enough to know to look in thedata for that column!

The final part of making our plot is to tell ggplot howwe want to visually represent the data. We do this by adding a newlayer to the plot using one of thegeom functions.

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point()
Creating Publication-Quality Graphics with ggplot2 (3)

Here we used geom_point, which tells ggplotwe want to visually represent the relationship betweenx and y as a scatterplot ofpoints.

Challenge 1

Modify the example so that the figure shows how life expectancy haschanged over time:

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point()

Hint: the gapminder dataset has a column called “year”, which shouldappear on the x-axis.

Challenge 2

In the previous examples and challenge we’ve used theaes function to tell the scatterplot geomabout the x and y locations of eachpoint. Another aesthetic property we can modify is the pointcolor. Modify the code from the previous challenge tocolor the points by the “continent” column. What trendsdo you see in the data? Are they what you expected?

The solution presented below adds color=continent to thecall of the aes function. The general trend seems toindicate an increased life expectancy over the years. On continents withstronger economies we find a longer life expectancy.

R

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp, color=continent)) + geom_point()
Creating Publication-Quality Graphics with ggplot2 (5)

Layers

Using a scatterplot probably isn’t the best for visualizing changeover time. Instead, let’s tell ggplot to visualize the dataas a line plot:

R

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, color=continent)) + geom_line()
Creating Publication-Quality Graphics with ggplot2 (6)

Instead of adding a geom_point layer, we’ve added ageom_line layer.

However, the result doesn’t look quite as we might have expected: itseems to be jumping around a lot in each continent. Let’s try toseparate the data by country, plotting one line for each country:

R

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, group=country, color=continent)) + geom_line()
Creating Publication-Quality Graphics with ggplot2 (7)

We’ve added the group aesthetic, whichtells ggplot to draw a line for each country.

But what if we want to visualize both lines and points on the plot?We can add another layer to the plot:

R

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, group=country, color=continent)) + geom_line() + geom_point()
Creating Publication-Quality Graphics with ggplot2 (8)

It’s important to note that each layer is drawn on top of theprevious layer. In this example, the points have been drawn on topof the lines. Here’s a demonstration:

R

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, group=country)) + geom_line(mapping = aes(color=continent)) + geom_point()
Creating Publication-Quality Graphics with ggplot2 (9)

In this example, the aesthetic mapping ofcolor has been moved from the global plot options inggplot to the geom_line layer so it no longerapplies to the points. Now we can clearly see that the points are drawnon top of the lines.

Tip: Setting an aesthetic to a value insteadof a mapping

So far, we’ve seen how to use an aesthetic (such ascolor) as a mapping to a variable in the data.For example, when we usegeom_line(mapping = aes(color=continent)), ggplot will givea different color to each continent. But what if we want to change thecolor of all lines to blue? You may think thatgeom_line(mapping = aes(color="blue")) should work, but itdoesn’t. Since we don’t want to create a mapping to a specific variable,we can move the color specification outside of the aes()function, like this: geom_line(color="blue").

Challenge 3

Switch the order of the point and line layers from the previousexample. What happened?

The lines now get drawn over the points!

R

ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, group=country)) + geom_point() + geom_line(mapping = aes(color=continent))
Creating Publication-Quality Graphics with ggplot2 (10)

Transformations and statistics

ggplot2 also makes it easy to overlay statistical models over thedata. To demonstrate we’ll go back to our first example:

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point()
Creating Publication-Quality Graphics with ggplot2 (11)

Currently it’s hard to see the relationship between the points due tosome strong outliers in GDP per capita. We can change the scale of unitson the x axis using the scale functions. These control themapping between the data values and visual values of an aesthetic. Wecan also modify the transparency of the points, using the alphafunction, which is especially helpful when you have a large amount ofdata which is very clustered.

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point(alpha = 0.5) + scale_x_log10()
Creating Publication-Quality Graphics with ggplot2 (12)

The scale_x_log10 function applied a transformation tothe coordinate system of the plot, so that each multiple of 10 is evenlyspaced from left to right. For example, a GDP per capita of 1,000 is thesame horizontal distance away from a value of 10,000 as the 10,000 valueis from 100,000. This helps to visualize the spread of the data alongthe x-axis.

Tip Reminder: Setting an aesthetic to a valueinstead of a mapping

Notice that we used geom_point(alpha = 0.5). As theprevious tip mentioned, using a setting outside of theaes() function will cause this value to be used for allpoints, which is what we want in this case. But just like any otheraesthetic setting, alpha can also be mapped to a variable inthe data. For example, we can give a different transparency to eachcontinent withgeom_point(mapping = aes(alpha = continent)).

We can fit a simple relationship to the data by adding another layer,geom_smooth:

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm")

OUTPUT

`geom_smooth()` using formula = 'y ~ x'
Creating Publication-Quality Graphics with ggplot2 (13)

We can make the line thicker by setting thesize aesthetic in the geom_smoothlayer:

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point(alpha = 0.5) + scale_x_log10() + geom_smooth(method="lm", size=1.5)

WARNING

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.ℹ Please use `linewidth` instead.This warning is displayed once every 8 hours.Call `lifecycle::last_lifecycle_warnings()` to see where this warning wasgenerated.

OUTPUT

`geom_smooth()` using formula = 'y ~ x'
Creating Publication-Quality Graphics with ggplot2 (14)

There are two ways an aesthetic can be specified. Here weset the size aesthetic by passing it as anargument to geom_smooth. Previously in the lesson we’veused the aes function to define a mapping betweendata variables and their visual representation.

Challenge 4a

Modify the color and size of the points on the point layer in theprevious example.

Hint: do not use the aes function.

Here a possible solution: Notice that the color argumentis supplied outside of the aes() function. This means thatit applies to all data points on the graph and is not related to aspecific variable.

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + geom_point(size=3, color="orange") + scale_x_log10() + geom_smooth(method="lm", size=1.5)

OUTPUT

`geom_smooth()` using formula = 'y ~ x'
Creating Publication-Quality Graphics with ggplot2 (15)

Challenge 4b

Modify your solution to Challenge 4a so that the points are now adifferent shape and are colored by continent with new trendlines. Hint:The color argument can be used inside the aesthetic.

Here is a possible solution: Notice that supplying thecolor argument inside the aes() functionsenables you to connect it to a certain variable. The shapeargument, as you can see, modifies all data points the same way (it isoutside the aes() call) while the colorargument which is placed inside the aes() call modifies apoint’s color based on its continent value.

R

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point(size=3, shape=17) + scale_x_log10() + geom_smooth(method="lm", size=1.5)

OUTPUT

`geom_smooth()` using formula = 'y ~ x'
Creating Publication-Quality Graphics with ggplot2 (16)

Multi-panel figures

Earlier we visualized the change in life expectancy over time acrossall countries in one plot. Alternatively, we can split this out overmultiple panels by adding a layer of facet panels.

Tip

We start by making a subset of data including only countries locatedin the Americas. This includes 25 countries, which will begin to clutterthe figure. Note that we apply a “theme” definition to rotate the x-axislabels to maintain readability. Nearly everything in ggplot2 iscustomizable.

R

americas <- gapminder[gapminder$continent == "Americas",]ggplot(data = americas, mapping = aes(x = year, y = lifeExp)) + geom_line() + facet_wrap( ~ country) + theme(axis.text.x = element_text(angle = 45))
Creating Publication-Quality Graphics with ggplot2 (17)

The facet_wrap layer took a “formula” as its argument,denoted by the tilde (~). This tells R to draw a panel for each uniquevalue in the country column of the gapminder dataset.

Modifying text

To clean this figure up for a publication we need to change some ofthe text elements. The x-axis is too cluttered, and the y axis shouldread “Life expectancy”, rather than the column name in the dataframe.

We can do this by adding a couple of different layers. Thetheme layer controls the axis text, and overall textsize. Labels for the axes, plot title and any legend can be set usingthe labs function. Legend titles are set using the samenames we used in the aes specification. Thus below thecolor legend title is set using color = "Continent", whilethe title of a fill legend would be set usingfill = "MyTitle".

R

ggplot(data = americas, mapping = aes(x = year, y = lifeExp, color=continent)) + geom_line() + facet_wrap( ~ country) + labs( x = "Year", # x axis title y = "Life expectancy", # y axis title title = "Figure 1", # main title of figure color = "Continent" # title of legend ) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Creating Publication-Quality Graphics with ggplot2 (18)

Exporting the plot

The ggsave() function allows you to export a plotcreated with ggplot. You can specify the dimension and resolution ofyour plot by adjusting the appropriate arguments (width,height and dpi) to create high qualitygraphics for publication. In order to save the plot from above, we firstassign it to a variable lifeExp_plot, then tellggsave to save that plot in png format to adirectory called results. (Make sure you have aresults/ folder in your working directory.)

R

lifeExp_plot <- ggplot(data = americas, mapping = aes(x = year, y = lifeExp, color=continent)) + geom_line() + facet_wrap( ~ country) + labs( x = "Year", # x axis title y = "Life expectancy", # y axis title title = "Figure 1", # main title of figure color = "Continent" # title of legend ) + theme(axis.text.x = element_text(angle = 90, hjust = 1))ggsave(filename = "results/lifeExp.png", plot = lifeExp_plot, width = 12, height = 10, dpi = 300, units = "cm")

There are two nice things about ggsave. First, itdefaults to the last plot, so if you omit the plot argumentit will automatically save the last plot you created withggplot. Secondly, it tries to determine the format you wantto save your plot in from the file extension you provide for thefilename (for example .png or .pdf). If youneed to, you can specify the format explicitly in thedevice argument.

This is a taste of what you can do with ggplot2. RStudio provides areally useful cheatsheet of the different layers available, and more extensivedocumentation is available on the ggplot2 website. AllRStudio cheat sheets are available from the RStudiowebsite. Finally, if you have no idea how to change something, aquick Google search will usually send you to a relevant question andanswer on Stack Overflow with reusable code to modify!

Challenge 5

Generate boxplots to compare life expectancy between the differentcontinents during the available years.

Advanced:

  • Rename y axis as Life Expectancy.
  • Remove x axis labels.

Here a possible solution: xlab() and ylab()set labels for the x and y axes, respectively The axis title, text andticks are attributes of the theme and must be modified within atheme() call.

R

ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp, fill = continent)) + geom_boxplot() + facet_wrap(~year) + ylab("Life Expectancy") + theme(axis.title.x=element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank())
Creating Publication-Quality Graphics with ggplot2 (19)

Key Points

  • Use ggplot2 to create plots.
  • Think about graphics in layers: aesthetics, geometry, statistics,scale transformation, and grouping.
Creating Publication-Quality Graphics with ggplot2 (2024)
Top Articles
Latest Posts
Article information

Author: Madonna Wisozk

Last Updated:

Views: 5918

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Madonna Wisozk

Birthday: 2001-02-23

Address: 656 Gerhold Summit, Sidneyberg, FL 78179-2512

Phone: +6742282696652

Job: Customer Banking Liaison

Hobby: Flower arranging, Yo-yoing, Tai chi, Rowing, Macrame, Urban exploration, Knife making

Introduction: My name is Madonna Wisozk, I am a attractive, healthy, thoughtful, faithful, open, vivacious, zany person who loves writing and wants to share my knowledge and understanding with you.