Data transformation for linear models
1 Lecture
2 Exercise
2.1 Diatoms
We will use an artificial dataset to practice data transformation for linear models (07_diatoms.csv). The dataset contains the concentration of two diatom species over time. The diatoms were grown in two different pH levels.
- Create subset with species 1 and low pH only!
- Fit models of the diatom concentration (
conc) over time (day) - Apply useful transformation
- Follow the protocol on data transformation for linear models.
2.2 Bonus: Covid-19
We will use a dataset about Covid-19 cases in Germany (07_bl_infektionen.csv). The dataset contains the reported Covid-19 cases of the last 7 days (bl_inz) and it was downloaded from https://www.corona-daten-deutschland.de/dataset/infektionen_bundeslaender.
- Subset the data for the state of Berlin.
- Plot the reported Covid-19 cases over time.
- Do you see an exponential growth of the number of cases at some point?
- Subset the data to specific timepoints (for example: 11.03.2020 - 22.03.2020 and 15.10.2021 - 08.11.2021) and try to fit an exponential model to the data.
- Check the assmptions for a linear and an exponential model.
- Plot the raw data and the fitted model.
2.3 Bonus: Population growth
Use the dataset 07_population-and-demography.csv. The dataset contains the population of all countries over time. The dataset was downloaded from https://ourworldindata.org/population-growth
- Calculate the world population for each year.
- Plot the world population over time.
- Fit a linear model and an exponential model to the world population over time.
- Which models fits better? Why?
- Calculate the population growth rate for each year and plot it over time.
Use the group_by and summarise functions to calculate the world population for each year.
Or use summarise(df, Population = sum(Population), .by = "Year").
If you assume exponential growth, you can use the lm function to fit the model. The model is lm(log(Population) ~ year, data = df).
Use the lag function to calculate the population growth rate for each year. It is as simple as lag(Population) / Population.