Course Introduction

1 Lecture

1.1 Tibbles

Slides in full screen

1.2 Figures with ggplot2

Slides in full screen

1.3 Data handling

Slides in full screen

2 Tasks

2.1 ggplot2

A helpful resource to consult for this task can be the ggplot2 cheatsheet.

Remember to put library(tidyverse) (or library(ggplot2)) on top of your script to access the ggplot functions.

To practice plotting with the ggplot2 package, we will use a data set on 3 species of penguins on islands in Antarctica.

The data is available from the palmerpenguins package. To get it, you have to:

  • Install the package with install.packages("palmerpenguins")
  • Load the package at the beginning of your script with library(palmerpenguins)

The data set is called penguins:

  • The data set is available after you loaded the package
  • Just type penguins in the console and you should see the penguins tibble
  • Find a description of the variables in the help page ?penguins

Take a moment to get familiar with the data set and its variables.

2.2 Exploratory plots with ggplot 2

Explore the data set and it’s variables with ggplot. Below, you find some suggestions for plots. You can start with the plot type your are most interested in and then continue from there. You don’t have to finish all the plots If you have your own ideas for interesting plots with the penguin data set, feel free to deviate from the tasks.

In this exploratory section, don’t worry about the beauty of your plots. This task is about exploring the data and testing different visualization options.

2.2.1 Relationship between bill length and bill depth (scatterplot)

What is the relationship between bill length and bill depth?

  • Create a scatterplot with bill length on the x-axis and bill depth on the y-axis
  • Can you add a regression line?
  • Add species as color aesthetic. Does your interpretation of the data change?
    • What is the difference between adding color as a global aesthetic or as a local aesthetic of the point layer?
  • Add species as shape aesthetic to distinguish the species

2.2.2 Difference in flipper length between species (boxplot)

Is there a difference in flipper length between the species?

  • Create a boxplot of the flipper length (y-axis) for the different species (x-axis)
  • Try adding notches to the boxplots
  • Extra: Add a layer with geom_point(). Try setting position = position_jitter() as argument in the point layer. What does it do?

2.2.3 Differences between body mass of male and female penguins (boxplot)

Are male penguins heavier than female penguins? And is this different between the 3 species?

  • Create a boxplot with body mass on the y axis and sex on the x axis
  • Add the difference between penguin species to this plot. Try the different options ggplot offers
    • Species as color aesthetic
    • Species as fill aesthetic
    • Species as facet using facet_wrap
  • Extra: What happens if you use geom_violin instead of geom_boxplot? Can you combine both geoms in one plot?

2.2.4 Distribution of flipper length between species (histogram)

Make a histogram of the the flipper length separated by species.

Try different methods of separating the species (color or facet).

Compare stacked and overlapping histograms.

2.2.5 Penguin flipper length by species and sex (heatmap)

Create a heat map that shows:

  • The categories sex and species on x- and y-axis
  • The flipper length as color

2.2.6 Beautify the plots

First, choose one of the tasks 2.2.7 or 2.2.8, then do task 2.2.9 on saving plots. If you still have time, you can come back to the task you didn’t do.

But also here, if you have other ideas, feel free to deviate from the tasks.

2.2.7 Beautify the plots from Task 1

Take a plot you did in the previous task and make it look nicer.

Here’s a list of ideas:

  • Add a theme layer
  • Customize the theme, e.g.
    • Change the position of the legend
    • Make the axis titles bold
  • Change the color/fill scale of the plot
    • Use scale_color_manual or scale_fill_manual
    • Try scale_color_viridis_d() or scale_fill_viridis_d() with different options
    • Try a color scale from the paletteer package
  • Change the labels of the x- and y-axis and add a title to the plot
  • Make the points transparent, give them a different shape, …

2.2.8 Can you reproduce this plot?

Take a look at this plot:

It is similar to the one from Task 1.3.1 but more beautiful. Can you reproduce this plot?

The colors that are used are:

penguin_cols <- c("darkorange", "purple", "cyan4")

2.2.9 Save one of the plots on your machine

Save one of the plots you produced in a variable and then use ggsave to save the plot on your machine. Save the plot in a dedicated plot directory in your RStudio project.

Note: Make sure the path where you save the image exists. If you e.g. want to save in img/, then you first have to create the img folder in your project directory. For this you can use the Files pane of RStudio.

2.2.10 References

Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.

Solution

2.3 dplyr - get started

A helpful resource for to consult for this task can be the dplyr cheatsheet.

Before you start, make sure to load dplyr, ggplot2 and the palmerpenguins package.

library(dplyr)
library(ggplot2)
library(palmerpenguins)

2.4 Data transformation with dplyr

In the following, you find a lot of different data transformation tasks. First, do 1-2 from each category before you do the remaining ones. You don’t have to finish all the tasks but make sure you covered each category. Generally, the first tasks from a category are easier than the last tasks of a category.

Find all penguins that …

  1. … have a bill length between 40 and 45 mm.

  2. … for which we know the sex.

  3. … which are of the species Adelie or Gentoo and live either on Dream or on Torgersen

  4. … lived on the island Dream in the year 2007. How many of them were from each of the 3 species?

Count …

  1. … the number of penguins on each island.

  2. … the number of penguins of each species on each island.

Select …

  1. … only the variables species, sex and year

  2. … variables based on the following vector

cols <- c("species", "bill_length_mm", "flipper_length_mm", "body_mass_g")
  1. … only columns that contain measurements in mm

Add a column …

  1. … with the ratio of bill length to bill depth

  2. … with abbreviations for the species (Adelie = A, Gentoo = G, Chinstrap = C).

Calculate …

  1. … mean flipper length and body mass for the 3 species and male and female penguins separately

  2. … Can you do the same but remove the penguins for which we don’t know the sex first?

2.5 Extras

  1. Make a boxplot of penguin body mass with sex on the x-axis and facets for the different species. Can you remove the penguins with missing values for sex first?

  2. Make a scatterplot with the ratio of bill length to bill depth on the y axis and flipper length on the x axis? Can you distinguish the point between male and female penguins and remove penguins with unknown sex before making the plot?

Solution