1 %in% c(1, 2, 3, 4, 5) #is TRUE
[1] TRUE
Now, lets plot some Seahorse data. For this we need to import some of it into this session .The data is available from github.
It is data from PBMCs where we followed OCR and ECAR using Extracellular Flux analysis with the XFe96 over time and during that time we injected after three measurement phases FCCP, and after six measurement phases we injected Antimycin/Rotenone (AM/Rot). Much more context can be found in Scientific Reports Janssen et al..
As you can see from the glimpse
, the data table that we have now (called a tibble
in tidy language), contains 7 columns; Measurement
, Well
, Group
, Time
, OCR
, ECAR
, PER
. The data is allready nice and tidyly organized in the Rate sheet of the excel file that we have loaded. The file was generated in the Wave Agilent software and directly comes from exporting the Seahorse data to xlsx.
I prefer to use lower case column names without any spaces, so for these column names we have to turn them into lower case first. We use some easy functions from the janitor
package for this.
Next, we can start plotting data using ggplot.
Let’s introduce the filter
command from dplyr
. Whereas select
selects columns, filter
selects rows. So let’s filter the rows for the group which is labeled “200.000” (200.000 cells/per well) and the “Background” group.
Filtering data is selecting the rows based on some arguments. You need some to understand some semantics here. For filtering based on multiple conditions we use group %in% c("200.000", "Background")
, for filtering based on a single condition we can use group == "200.000"
. The %in%
operator is used to match two or more items.
1 %in% c(1, 2, 3, 4, 5) #is TRUE
[1] TRUE
# just like
1 == 1 #is TRUE
[1] TRUE
#the reverse is also possible
c(1,2,3,4,5) %in% 1
[1] TRUE FALSE FALSE FALSE FALSE
#is TRUE FALSE FALSE FALSE FALSE FALSE
Thus the group %in% c("200.000", "Background")
statement in the filter function above tells which group items to use. For 200.000
there is match (TRUE
), but for 100.000
there is not a match (it is FALSE
).
Now that we know how to filter we can use the filtered data to make the ggplot.
That plot is not very informative. Let’s make it prettier. First, add a line plot:
Next, change colors:
Change theme and text size:
Add titles:
This is a very nice plot now! It shows all OCR curves for each well for the 200.000
and the background
groups. The information that is now not in the plot is which line matches to which well.
We could consider coloring each line, but there are too many wells so it will not look nice! *TODO: Change this in the above code color = well
instead of color = group
. * You will notice that there are not enough colors in the brewer
palette Set1
, so you go back to the default coloring by deleting the scale_color_brewer
line as well. Use th #
to comment out the line. * Now notice that the legend is huge and not completely visible, againg indicating that this is not the way to go
Instead, we can try to label the lines. We’ll use the geom_text
or annotate
commands from ggplot
.
ggrepel
For using plot labels personal R projects, consider using ggrepel
. Unfortunately, it is not available for webr, but it has the benefit of automatically preventing text overlap. *TODO: consider removing this note*
Although we now labeled lines that are at the minimum and maximum OCR, this is only useful for this one plot in these conditions. The position of the label is tweaked based on this specific plot, making this not such a quick solution to our problem.
In the above ggplot commands, we included the geom_text
, but we only used a subset of the full data for this geom. We use the .
(dot) operator to get the original data (so in our case the filtered data that went into the ggplot), and piped that into another two filters. Basically we do the following, but then within one layer of the ggplot:
Thus here we are filtering all the way to getting only one row of the full dataset. The well
name “C08” or “B08” is then given to the label
argument of geom_text
.
Let’s do some more layout adjustments. Although the theme_bw
gives a basic plotting layout, we often want to change the formatting. There are again great resources for this, for example this one: https://ggplot2.tidyverse.org/articles/faq-customising.html, but we explain the basics here. By giving options to the theme
function we can change specific elements of a ggplot.
For example, if we want to change the text size of the axis title (or leave it blank), we give arguments to the axis.title
options. Also please note the rel(1.2)
argument which means relative 1.2 times higher than base_size
. I think it is good practice to use the rel
here instead of absolute numbers.
Change the rel 1.2
to 0.5
in the above code and see what happens.
Next, we change the grid lines:
Next, we change the orientation of the x axis labels.
ggiraph
ggiraph
is another interesting package. This brings in some nice interactivity into the plot. Since we are now working with the plot in a browser, this can be very handy. This can also be useful if we want to publish the plot as html and not a plain PDF. ggiraph
is unfortunately also not available for wasm/webr since one dependent package is not available uuid
, and I also can’t get it to run via quarto. *TODO: consider removing this note*
Add three vertical lines to the plot. You can use the geom_vline
command with xintercepts set at 15, 33 and 48; so that the the line is approximately at the injection time point. Also give it a shade of grey, eg. grey40
.
Now add the injection labels. Use the annotate
command and *TODO: something seems to be missing here*
Use the facet_wrap
command to plot all groups (except background) in separate plots and in each plot show the wells. First, we will need to filter away the background data. Instead of selecting all groups we need it is better and easier to this filter our the background data using filter(group != "Background")
. The !=
means “is not”, and it is the inverse of the ==
operator.
Next, add the facet_wrap
command to the ggplot. I prefer to do that always at the bottom, so that I can easily see if a plot is wrapped.
The plot in exercise 3 looks great already, but the order of the plots is important! We would like to see it go from low to high OCR. We can fix that using the forcats
package commands. A nice and quick way to sort is based on the name of the group. It is important to realize that the Group
column in the XF
data are characters and not numbers. That is also the reason why it doesn’t get sorted in the most natural way. It is sorted based on the first character, thus the “50.000” group comes last. If we would change the “group” column to double
(that is a number format), it would sort better, but also your group name will change because it will recognize the .
as a decimal operator. So it is better to leave the group names as they are and do it differently.
In comes forcats
, you can re-level and reorder the crap out of your data in the ggplot! We often do the releveling at the point where you use your parameter, without making any changes the type of the columns. So that means you can use ~fct_reorder(group, group)
in the facet_wrap
instead of only ~group
.
Please note that fct_reorder
first argument is the parameter that you plot or need, and the second argument is the parameter that is used for sorting the data. In our case now that is the same, both are “group”, but we also need to add something else. If we would do it like this there will be no difference from when ggplot takes facet_wrap only takes ~group
. Thus we can make the second argument into a number by using as.double
.
Also, try what happens in the above code if you:
as.double
in the facet_wrapdouble
for the group columnNow that the facet_wrap
is sorted nicely, we would also like to have the legend sorted nicely. Use the same fct_reorder
trick to reorder the color legend.
If you didn’t already change the title of the legend, do that as well. You can specify the name of the legend manually using the name
argument in the scale_color_brewer
command.
Please change the facet_wrap
command so that the y-axis is not fixed for all groups. Make the output so that each individual plot has its own y-axis scale.
Now, it is up to you to build a whole ggplot using the XF data. Instead of plotting time vs OCR, now plot cell density vs maximal capacity. For this you need to know some stuff.
stat_summary()
The previous plot showed the data from individual wells as well as the median for that group. You can also calculate the median
before plotting using the dplyr
summarize
command. You can find summarize
info here: https://dplyr.tidyverse.org/reference/summarise.html
We can also perform a linear regression on the maximal capacity at different densities. For this we can use the geom_smooth
command. The arguments should be method = "lm"
and formula = y~x
.
Observe also what the difference is when only using the data from rows A and H. You can uncomment the line in the above code. Please note I use the very useful str_detect
function for this from the stringr
package that is also part of the tidyverse
.
Next, you can decide yourself what you want to plot. Have a glimpse
at the data and think of another important visualisation that you want to make using all the tools that you have learned so far, or the tools that you found on the internet.