Introduction to linear models

1 Lecture

1.1 Overview

Slides in full screen

1.2 Assumptions of linear models

Slides in full screen

1.3 Contrast coding

Slides in full screen

1.4 Typical workflow in R

Slides in full screen

2 Exercises

  • the first three exercises have one numerical predictor (the third exercise is more involved in data wrangling)
  • the fourth exercise has one categorical predictor

2.1 Influence of species richness on productivity in grasslands

The file 05_productivity.csv contains artificial data on species richness (number of grass species) and net primary production (NPP, gCarbon/(m²yr)) of two different grassland sites.

  1. Create a subset of site 1.
  2. Chose the independent and the dependent variable and plot these as points. The independent variable is plotted on the x-axis and the dependent variable on the y-axis.
  3. Calculate a linear model and draw the line into the plot with either geom_abline or create predictions and use geom_line
  4. Look at the summary to find out the slope and the intercept. Write down the equation of the fitted model. What do the values of the intercept and the slope mean biologically? How do you can make the interpretation of the intercept more meaningful? What would you change in the model?
  5. What is the R² and the adjusted R² for your fitted value? What do these values mean?
  6. Check the model assumptions for the linear regression of site 1
  7. Calculate a linear model for site 2 and check the model assumptions

Solution for exercise 1

2.2 Influence of temperature on growth of different plant species

We will use an artificial dataset on growth of five plant species (NPP in gC/(m²day)) related to temperature (°C). The file name is 05_growth.csv.

The data of each species is put into a separate column (labelled species 1 , species 2 …).

What you have to do for each species:

  1. Have a look at the data how does growth relate to temperature?
  2. Make a linear regression of the data.
  3. Write down the equation of the model.
  4. Check the model assumptions.
  5. Plot the data and add the regression line.
  6. What could be wrong and do you have an idea of what you could do with such data?

Solution for exercise 2

2.3 Bonus: Ice cover of lakes in response to air temperature

We will use two dataset: 05_ice_cover.csv and 05_ice_air_temp.csv. You can find the reference to ice cover dataset here and the reference to the temperature dataset here. Both are timeseries datasets and we want to explore the influence of air temperature on the ice cover of two lakes.

  1. Plot the ice cover of the two lakes over time.
  2. Remove observations upto 1884 (because they have a bias) and calculate the yearly mean temperature and plot the air temperature. 3, Plot the ice cover duration over time for both lakes.
  3. Calculate the mean ice duration per year and join both datasets.
  4. Fit a linear model with the yearly mean temperature as predictor and the mean ice duration as dependent variable.
  5. Write down the equation of the model.
  6. Add predictions (bonus: prediction interval) to the scatter plot.
  7. Check model assumptions.
  8. Bonus: instead of calculating the yearly mean temperature, calculating the mean temperature per hydrological year for the ice cover season can be more meaningful. A month belongs to the last hydrological year if the month is smaller than 10. The 10 to 12 months (October to December) belong to this hydrological year. Subset the months to the cold season (November to April) and calculate the mean temperature per hydrological year for the cold months. Use a scatterplot to plot the mean temperature per hydrological year against the yearly mean temperature that you have used before. Do again tasks 4 to 7 with the mean temperature per hydrological year.

You can extract the month out of date with (you mave to install the package lubridate first for the second option):

as.numeric(format(sampledate, "%m"))
lubridate::month(sampledate) 

Solution for exercise 3

2.4 Penguins

We will use the penguins dataset (see here). It is named 05_penguins.csv.

You can choose either bill_length_mm, bill_depth_mm, flipper_length_mm, or body_mass_g as dependent variable and species as independent variable.

  1. Fit a linear model.
  2. Write down the equation of the model.
  3. Check model assumptions.
  4. Plot data and add the mean of the groups. You have different options of plotting the data, try geom_jitter, geom_dotplot, and geom_boxplot.
  5. Add the 95 % prediction interval with geom_pointrange to the plot. You can use the function predict with interval = "prediction" to calculate the prediction interval.

Solution for exercise 4