4 Plotting seahorse

Now, lets plot some Seahorse data. For this we need to import some of it into this session .The data is available from github.

It is data from PBMCs where we followed OCR and ECAR using Extracellular Flux analysis with the XFe96 over time and during that time we injected after three measurement phases FCCP, and after six measurement phases we injected Antimycin/Rotenone (AM/Rot). Much more context can be found in Scientific Reports Janssen et al..

As you can see from the glimpse, the data table that we have now (called a tibble in tidy language), contains 7 columns; Measurement, Well, Group, Time, OCR, ECAR, PER. The data is allready nice and tidyly organized in the Rate sheet of the excel file that we have loaded. The file was generated in the Wave Agilent software and directly comes from exporting the Seahorse data to xlsx.

I prefer to use lower case column names without any spaces, so for these column names we have to turn them into lower case first. We use some easy functions from the janitor package for this.

Next, we can start plotting data using ggplot. Let’s introduce the filter command from dplyr. Whereas select selects columns, filter selects rows. So let’s filter the rows for the group which is labeled “200.000” (200.000 cells/per well) and the “Background” group.

The filter command

Filtering data is selecting the rows based on some arguments. You need some to understand some semantics here. For filtering based on multiple conditions we use group %in% c("200.000", "Background"), for filtering based on a single condition we can use group == "200.000". The %in% operator is used to match two or more items.

1 %in% c(1, 2, 3, 4, 5) #is TRUE

[1] TRUE

# just like
1 == 1 #is TRUE

[1] TRUE

#the reverse is also possible
c(1,2,3,4,5) %in% 1

[1]  TRUE FALSE FALSE FALSE FALSE

#is TRUE FALSE FALSE FALSE FALSE FALSE

Thus the group %in% c("200.000", "Background") statement in the filter function above tells which group items to use. For 200.000 there is match (TRUE), but for 100.000 there is not a match (it is FALSE).

Now that we know how to filter we can use the filtered data to make the ggplot.

That plot is not very informative. Let’s make it prettier. First, add a line plot:

Next, change colors:

Change theme and text size:

Add titles:

This is a very nice plot now! It shows all OCR curves for each well for the 200.000 and the background groups. The information that is now not in the plot is which line matches to which well.

We could consider coloring each line, but there are too many wells so it will not look nice! *TODO: Change this in the above code color = well instead of color = group. * You will notice that there are not enough colors in the brewer palette Set1, so you go back to the default coloring by deleting the scale_color_brewer line as well. Use th # to comment out the line. * Now notice that the legend is huge and not completely visible, againg indicating that this is not the way to go

Instead, we can try to label the lines. We’ll use the geom_text or annotate commands from ggplot.

ggrepel

For using plot labels personal R projects, consider using ggrepel. Unfortunately, it is not available for webr, but it has the benefit of automatically preventing text overlap. *TODO: consider removing this note*

Although we now labeled lines that are at the minimum and maximum OCR, this is only useful for this one plot in these conditions. The position of the label is tweaked based on this specific plot, making this not such a quick solution to our problem.

Subsetting of data within the ggplot commands

In the above ggplot commands, we included the geom_text, but we only used a subset of the full data for this geom. We use the . (dot) operator to get the original data (so in our case the filtered data that went into the ggplot), and piped that into another two filters. Basically we do the following, but then within one layer of the ggplot:

Thus here we are filtering all the way to getting only one row of the full dataset. The well name “C08” or “B08” is then given to the label argument of geom_text.

Let’s do some more layout adjustments. Although the theme_bw gives a basic plotting layout, we often want to change the formatting. There are again great resources for this, for example this one: https://ggplot2.tidyverse.org/articles/faq-customising.html, but we explain the basics here. By giving options to the theme function we can change specific elements of a ggplot.

For example, if we want to change the text size of the axis title (or leave it blank), we give arguments to the axis.title options. Also please note the rel(1.2) argument which means relative 1.2 times higher than base_size. I think it is good practice to use the rel here instead of absolute numbers.

Change the rel 1.2 to 0.5 in the above code and see what happens.

Next, we change the grid lines:

Next, we change the orientation of the x axis labels.

ggiraph

ggiraph is another interesting package. This brings in some nice interactivity into the plot. Since we are now working with the plot in a browser, this can be very handy. This can also be useful if we want to publish the plot as html and not a plain PDF. ggiraph is unfortunately also not available for wasm/webr since one dependent package is not available uuid, and I also can’t get it to run via quarto. *TODO: consider removing this note*

Exercise 1

Add three vertical lines to the plot. You can use the geom_vline command with xintercepts set at 15, 33 and 48; so that the the line is approximately at the injection time point. Also give it a shade of grey, eg. grey40.

Solution to Exercise 1

Exercise 2

Now add the injection labels. Use the annotate command and *TODO: something seems to be missing here*

Solution to Exercise 2

Exercise 3

Use the facet_wrap command to plot all groups (except background) in separate plots and in each plot show the wells. First, we will need to filter away the background data. Instead of selecting all groups we need it is better and easier to this filter our the background data using filter(group != "Background"). The != means “is not”, and it is the inverse of the == operator.

Next, add the facet_wrap command to the ggplot. I prefer to do that always at the bottom, so that I can easily see if a plot is wrapped.

Solution to Exercise 3

Exercise 4

The plot in exercise 3 looks great already, but the order of the plots is important! We would like to see it go from low to high OCR. We can fix that using the forcats package commands. A nice and quick way to sort is based on the name of the group. It is important to realize that the Group column in the XF data are characters and not numbers. That is also the reason why it doesn’t get sorted in the most natural way. It is sorted based on the first character, thus the “50.000” group comes last. If we would change the “group” column to double (that is a number format), it would sort better, but also your group name will change because it will recognize the . as a decimal operator. So it is better to leave the group names as they are and do it differently.

In comes forcats, you can re-level and reorder the crap out of your data in the ggplot! We often do the releveling at the point where you use your parameter, without making any changes the type of the columns. So that means you can use ~fct_reorder(group, group) in the facet_wrap instead of only ~group.

Please note that fct_reorder first argument is the parameter that you plot or need, and the second argument is the parameter that is used for sorting the data. In our case now that is the same, both are “group”, but we also need to add something else. If we would do it like this there will be no difference from when ggplot takes facet_wrap only takes ~group. Thus we can make the second argument into a number by using as.double.

Also, try what happens in the above code if you:

only use as.double in the facet_wrap
change the type of data to double for the group column

Solution to Exercise 4

Exercise 5

Now that the facet_wrap is sorted nicely, we would also like to have the legend sorted nicely. Use the same fct_reorder trick to reorder the color legend.

If you didn’t already change the title of the legend, do that as well. You can specify the name of the legend manually using the name argument in the scale_color_brewer command.

Solution to Exercise 5

Exercise 6

Please change the facet_wrap command so that the y-axis is not fixed for all groups. Make the output so that each individual plot has its own y-axis scale.

Solution to Exercise 6

Exercise 7

Now, it is up to you to build a whole ggplot using the XF data. Instead of plotting time vs OCR, now plot cell density vs maximal capacity. For this you need to know some stuff.

we define maximal capacity as the OCR at measurement 4
we should filter out the “Background” group
we should convert the group names to numbers
we can also add the mean of all wells for each group by using: stat_summary()

Solution to Exercise 7

Exercise 8

The previous plot showed the data from individual wells as well as the median for that group. You can also calculate the median before plotting using the dplyr summarize command. You can find summarize info here: https://dplyr.tidyverse.org/reference/summarise.html

Solution to Exercise 8

Exercise 9

We can also perform a linear regression on the maximal capacity at different densities. For this we can use the geom_smooth command. The arguments should be method = "lm" and formula = y~x.

Solution to Exercise 9

Observe also what the difference is when only using the data from rows A and H. You can uncomment the line in the above code. Please note I use the very useful str_detect function for this from the stringr package that is also part of the tidyverse.

Exercise 10

Next, you can decide yourself what you want to plot. Have a glimpse at the data and think of another important visualisation that you want to make using all the tools that you have learned so far, or the tools that you found on the internet.