library(tidyverse)
<- tibble(sample = c("SampleA",
data_tbl "SampleB",
"SampleC",
"SampleD",
"SampleE"),
value = c(20,
20.5,
22,
19.8,
21))
ggplot(data = data_tbl, aes(x = sample, y = value)) +
geom_bar(stat = "identity") +
theme_bw()
I recently went down a bit of an educational rabbit 🐇 hole with ggplot is setting axis limits. So I thought I would share some insights I gathered along the way.
ylim
It all started with a simple barplot 📊 using ggplot2. Typically, I prefer to filterthe data before plotting, avoiding axis limits. However, on this occasion, I wanted to focus in on the differences between values that were relatively small compared to the overall bars. Queue example data 👇
Naturally I went for the generically name ylim() ggplot2 function to try and focus in more on the “tops” of the bars.
ggplot(data = data_tbl, aes(x = sample, y = value)) +
geom_bar(stat = "identity") +
theme_bw() +
ylim(15, 25)
And boom! 💥 Everything disappeared. Initially, I was puzzled 🤔 Where did my bars go? After some digging, I realised that ylim() was not the function I was looking for.
The ylim() function removes 🧹 any data outside the specified range. Since I chose a barplot, the bars extend from 0 on the y-axis to the sample value, resulting in them being removed. If I had used geom_point, everything would have been fine:
ggplot(data = data_tbl, aes(x = sample, y = value)) +
geom_point() +
theme_bw() +
ylim(15, 25)
coord_cartesian()
So, how do I achieve the desired outcome with geom_bar? The answer is to zoom rather than limit the data, and that’s where coord_cartesian() comes in:
ggplot(data = data_tbl, aes(x = sample, y = value)) +
geom_bar(stat = "identity") +
theme_bw() +
coord_cartesian(ylim=c(15,25))
Using coord_cartesian, all the underlying data is retained, but the plot is zoomed in as if using a magnifying glass. 🔍 ✅
Now, you might think this is only relevant for bar plots. However, let’s explore why it’s important even for scatter plots with geom_point.
Consider data with a strong linear correlation. What happens when we limit the x-axis? Instead of just showing a zoomed-in focus, the linear regression is recalculated with only the data in view, resulting in a misleading correlation:
library(palmerpenguins)
data(penguins)
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "gray50") +
theme_bw()
`geom_smooth()` using formula = 'y ~ x'
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "gray50") +
theme_bw() +
xlim(170, 200)
`geom_smooth()` using formula = 'y ~ x'
You can see that depending on the context this approach could be problematic.
Or, let’s say you’re inspecting time series data, such as growth over time:
<- tibble(days = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 30, 40, 50),
growth_tbl weight = c(2, 5, 4, 7, 10, 13, 17, 17, 15, 13, 20, 25, 45, 90, 200, 500, 2000))
ggplot(data = growth_tbl, aes(x = days, y = weight)) +
geom_point() +
geom_line() +
theme_bw()
And you wanted to focus in on early growth. If you used xlim()
instead of coor_cartesian()
, you will notice below that the last piece of informative data, that the growth recovers and continues to increase, is lost.
xlim()
ggplot(data = growth_tbl, aes(x = days, y = weight)) +
geom_point() +
geom_line() +
xlim(0,10) +
ylim(0,20) +
theme_bw()
coor_cartesian()
ggplot(data = growth_tbl, aes(x = days, y = weight)) +
geom_point() +
geom_line() +
coord_cartesian(xlim=c(0,10), ylim=c(0,20)) +
theme_bw()
While most statistics are computed before plotting, and therefore this may not drastically change data interpretation, using coord_cartesian()
ensures you’re zooming in rather than trimming data.
scales
📢 Shoutout to June Choe for highlighting this, even coord_cartesian()
might not be the function you want.
Instead, consider defining your x or y limits within the scale_*()
functions and using the scales package to handle out-of-bounds (OOB) values. This approach makes your logic clearer to others using your code, or even to yourself when revisiting your script months later. June has a video explaining this a little more 🎥
Let’s revisit our barplot example. By specifying the y-axis limits to be between 15 and 25 and using scales::oob_keep()
, we maintain values outside these limits for the background dataset and other statistical calculations. This effectively zooms into the data within our specified limits without removing out-of-bounds values.
ggplot(data = data_tbl, aes(x = sample, y = value)) +
geom_bar(stat = "identity") +
theme_bw() +
scale_y_continuous(limits = c(15,25), oob = scales::oob_keep)
Maybe you have already been through this journey and are an axis limiting champion. Hopefully though this post helps someone. If you have any suggestions or additions, please share them!