7 Seahtrue outputs
7.1 Nested tibbles
The data output format of the run_seahtrue
function is a list of lists. List of lists is also called nesting of data. The advantage of this is that the data is properly organized, but also easily accessible. Here is an example that I took from a tidyr
vignette https://tidyr.tidyverse.org/articles/nest.html.
You can see that the the data is now nicely organized by the cylinder
parameter. Since there are only 3 different values for the cyl in the mtcars dataset, there are now three rows and two columns, one column has the cyl
parameter all other data is nested into a data
column.
In one of the latest releases of the tidyverse
the use of .by
was introduced. Previously we used the group_by
to tell R how to organize the data. The grouping of data remains attached to the data tibble, which sometimes could result in unintentional things to happen, when you forgot that the tibble was grouped. The group_by
can be undone with the ungroup
command.
With the .by
the grouping is only apparent while using the function in which you use it as argument. group_by
and .by
are doing similar things so they can be used both.Let’s have a look at how they work:
If you glimpse
the results of the two ways of using grouping above you will see that group_by
is doing stuff to your data, that you might not want. In this case it turns the mtcars
dataframe into a tibble, whereas the result of the .by
in the summarize
function is still a dataframe. Although it might not really matter whether your data is a tibble or dataframe, it shows that group_by
is a bit more invasive on your data.
You can use pluck
to get to the nested data
. Basically you just pluck a part of the data out of the full dataset.
Please note that we use here "data"
instead of data
. It can be confusing when to use the ""
or not. For example, with the pull
function which takes one full column out of a tibble, you are not using ""
.
Also, pluck
uses indexing for retrieving its components, it is not possible to directly get the element that belongs to cyl == 3
for example. You would need to filter
first on that parameter and then pluck
the first row of data.
7.2 The purrr map function
The cool thing about a nested tibble is that you can quickly perform stuff on each nested tibble. A really good introduction to this is described in this blog post by Rebecca Barter https://www.rebeccabarter.com/blog/2019-08-19_purrr. You can map a function on each item from that row.
You see that a new column is generated named model
, if you pluck
the one of the models, you can see the typical output of the linear model (lm
) function. For each cylinder now you creates a linear model!
The semantics and how to use the map
function is nicely explained in the blog post that was referenced here above. But some more considerations here:
Another good resource for the purrr map function is https://dcl-prog.stanford.edu/purrr-basics.html. map
has many more forms and ways to use, which are summarized in its cheat sheet https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf.
7.3 The seahtrue ouput
Now go and have a look at the run_seahtrue
output.
Also pluck some of the data
Some data are simple character strings, like the date
column, whereas others are large tables like the raw_data
column
With this loaded data (seahtrue_output_donor_A
) you can now do similar plotting as in the plotting seahorse
chapter. For this we only have to pluck
the rate_data
out of the data set. Be carefull that we preprocessed the data and we have other column names now so first glimpse
the data.
You will see that the column names are labeled with wave
, in this way we can distinguish for example the time column in the raw_data
tibble from the time_wave
column in the rate_data
tibble. Also, please notice that we have OCR_wave_bc
and OCR_wave
. This distinctino is made because we can have OCR data that is background corrected or not. When clicking on the background slider in the Wave software from Agilent, the OCR data will be changed to non background corrected. If at this point the data is exported the xlsx input file is not background corrected. In the seahtrue
this will show up as OCR_wave
. Typically however the data is background corrected, so we most of the time have OCR_wave_bc
.
Since rate is an aggregate of mulitple O2 or pH readings, also the definition of the timing of each measurement is different between the rate_data
and the raw_data
. Therefore in the seahtrue
package both times are labeled differently. For the rate_table
we labeled it with time_wave
and for the raw_data
we labeled it with timescale
. And again, we used timescale
to distinguish it from the time
in the original input file.
Please note if we want to plot the OCR vs time, we have to use the OCR_wave_bc
vs time_wave
in our ggplot aesthetics.
It is good practice to have a quick look at how the groups were named in the experiment. We can use the pull(group)
and unique()
commands for this:
Next, take some of the groups and plot them in a ggplot:
Great, this looks exactly the same as the plot we generated using the data from the downloaded excel file in the “plotting seahorse” chapter.