3 Plotting cars
3.1 Learn and code
First, let’s make a simple scatter plot. We’ll use a famous dataset that is used in R a lot for educational purposes. This is the mtcars
dataset. It stands for “Motor Trend Car Road Tests”. See parameter overview and documentation for info about the mtcars
dataset. It is one of many datasets available from base R or tidyverse packages, so we can always call it without having to load it.
First, we will inspect the dataset. For this we will load the tidyverse:
Once tidyverse is loaded via the library
call, it is available in your current session in your browser, so you do not have to load it each time. Let’s have a look at the full dataset:
or
or
or
Let’s select a small part of the data using select
from the dplyr
package:
Next, make a simple plot with the miles per gallon (mpg
) and displacement parameters (disp
) in the mtcars
dataset.
This is a very basic plot, without much formatting. Let’s make it prettier!
Add color and bring in a third parameter:
Here we need to have a look at data-types. The cyl
parameter is numerical. ggplot
automatically assumes we want a continuous scale for this. Instead the cyl
is more of a categorical data type (there are either 4, 6 or 8 cylinders in each car) so we can explicitly make the cyl
parameter categorical like this:
If you want to have different colors you can use one of the many color palettes available:
Apart from color you can change the shape of the datapoints:
ggplot
can use different themes for your plots, and there are many many options to tweak your plots to the way you like. You can see some examples below:
Let’s change titles:
Change the plotting theme and base size of the elements:
Change the scaling of the axes. It is good practice to plot graphs from zero:
Now the datapoints at the maxima of the axis are not completely visible so it would be nice that we have some more space:
Now we have generated a nice visualisation of our data using ggplot
. Please note that ggplot
uses layers and we added each time a different layer of information to the ggplot. If you want you can go wild with ggplot.
Please find a nice overview of visualisations using ggplot, tidy and R from Cedric Scherer. Also the underlying R code is available for those plots.
3.2 Exercises
3.2.1 Adding layers and changing the MTCARS plot
Give the points in the ggplot some transparency (or opacity), so that individual points are better visible. TIP: use the alpha
argument it should be a number from 0 to 1.
Please note that the alpha we added is not part of an aesthetics (aes
), meaning that the value of the alpha is not linked with a parameter in our data.
Add a layer that will generate a smooth linear regression line that shows the relation between mpg
and disp
. Use the stat_smooth
command for this.
Please make sure that the ggplot is not separated into groups. If the data is grouped by color
or shape
a different regression line for each group will be generated.
Use the facet_wrap
command to make three separate plots for each cylinder.
3.2.2 Fixing common errors
Below is some code that is not working properly, because of coding semantics mistakes. Can you spot (and fix) the errors?
Commas are often forgotten, but easily fixed. Within brackets arguments are separated with commas. R also generates an error that is helpful and can point you to the missing ,
.
ggplot
layers are added with a +
not with the pipe
term.
Make sure that the end of a layer or line of code is not followed up with a +
or %>%
.
Although R doesn’t show you an error message, the code does not give you what you want. The plot should show the cyl
parameter in different shapes, just like there are three different colors for each level of the cyl
parameter.
The shape
argument should be included in the aesthetics (aes
) part of the ggplot
Since chatGPT can also solve basic programming problems https://doi.org/10.1371/journal.pcbi.1011511, I tried solving Error 4 also with chatGPT. In my chatGPT session, the following solution was given:
chatGPT indeed put the shape
in the aes
, but also started making other changes. The remark that chatGPT made was: “you should correct the syntax error and use scale_shape_manual() instead of scale_shape().”… “to customize the shape values and labels.”. I didn’t really ask for that, and from there it went all downhill. Let’s call that the “chatGPT loop of death”. Let’s see what happened.
Running the “improved” code showed that there was still no fill
, so it did not improve in a way that I wanted.
Next I said “but now the shapes don’t have a fill” and chatGPT replied:
“I see your concern. If you want to have both color and fill for the shapes, you can achieve this by mapping fill within the aes() call and using scale_fill_brewer() for the fill scale. Here’s the modified code to achieve that:”
This again did not give fill unfortunately. Asking for a different solution: “the fill is not there, do you have another solution?”
Also did not give me fill.
Another attempt by chatGPT (still not working):
Another attempt by chatGPT (still not working):
Asking chatGPT to use scale_shape
and solid=TRUE
gives:
…. but apparently chatGPT doesn’t know how to use scale_shape
(in the first instance), because it scale_shape
cannot take values
as an argument. So when I reply with the following “I get the following error: Error: unused argument (values = c(15, 16, 17))”, chatGPT goes back to use the ’scale_fill_manual`.
AND FINALLY we have a nice and simple fill with one legend…… but also lots of unnecessary code. So after a few exchanges to make the code more concise this is what came out of it:
Affter asking to use scale_shape
instead of scale_shape_manual
, chatGPT generates:
This works nicely, but chatGPT introduces stroke = 1
, which is not needed here, so again we have unnecessary code. So after I asked “can I leave out the stroke argument?” we get the easiest solution and exactly the same solution as I came up with myself without chatGPT.
Please note, that when building the ggplot example, I did use google (…off course) to get some solutions, I liked the scale_shape
and solid=TRUE
solution that I found, because it made the code so concise and I don’t like to type in values
and breaks
manually.
ChatGPT use in science and coing is just dipping the toe in the water. ChatGPT is likely to better not be used as knowledge database but instead as “reasoning or infering agents” https://www.nature.com/articles/s41591-023-02594-z. ChatGPT can produce false information, also described as “hallucinations” https://www.nature.com/articles/d41586-023-00816-5, which makes it difficult to use it for getting knowledge and facts. That said, it can be used to gain knowledge and learn better coding skills. Here is a nice quick tips paper from PLOS computational biology on how to “harness the power of chatGPT” https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011319.