library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggfortify)
## theme for ggplot
theme_set(theme_classic())
theme_update(text = element_text(size = 14))R
Day 5 and 6
Freie Universität Berlin @ Theoretical Ecology
January 19, 2024
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggfortify)
## theme for ggplot
theme_set(theme_classic())
theme_update(text = element_text(size = 14))flowchart TD
A[Design your study] --> B[Collect data]
A -.->|already think of| C
B --> C[Fit a model]
C --> D[Access the model fit]
D -.->|if it is not good| C
D --> E[Inference / Hypothesis testing]
D --> F[Prediction]
The study wanted to analyse the Bergmann’s rule, that states that the body size of animals increases with latitude. They measured the size of the carapace at various locations along the eastern US coast. The data set contains the following variables:
latitude: a number denoting the latitude of salt marsh (degree)size: a number denoting carapace width of a crab (millimeter)the data set is taken from lterdatasampler package, see documentation.
Johnson, D. 2019. Fiddler crab body size in salt marshes from Florida to Massachusetts, USA at PIE and VCR LTER and NOAA NERR sites during summer 2016. ver 1. Environmental Data Initiative. doi:10.6073/pasta/4c27d2e778d3325d3830a5142e3839bb
df <- read_csv("data/06_crab_size.csv")
df# A tibble: 392 × 9
date latitude site size air_temp air_temp_sd water_temp water_temp_sd
<date> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2016-07-24 30 GTM 12.4 21.8 6.39 24.5 6.12
2 2016-07-24 30 GTM 14.2 21.8 6.39 24.5 6.12
3 2016-07-24 30 GTM 14.5 21.8 6.39 24.5 6.12
4 2016-07-24 30 GTM 12.9 21.8 6.39 24.5 6.12
5 2016-07-24 30 GTM 12.4 21.8 6.39 24.5 6.12
6 2016-07-24 30 GTM 13.0 21.8 6.39 24.5 6.12
7 2016-07-24 30 GTM 10.3 21.8 6.39 24.5 6.12
8 2016-07-24 30 GTM 11.2 21.8 6.39 24.5 6.12
9 2016-07-24 30 GTM 12.7 21.8 6.39 24.5 6.12
10 2016-07-24 30 GTM 14.6 21.8 6.39 24.5 6.12
# ℹ 382 more rows
# ℹ 1 more variable: name <chr>
lm1 <- lm(size ~ latitude, data = df)
lm1
Call:
lm(formula = size ~ latitude, data = df)
Coefficients:
(Intercept) latitude
-3.6244 0.4851
summary(lm1)
Call:
lm(formula = size ~ latitude, data = df)
Residuals:
Min 1Q Median 3Q Max
-7.8376 -1.8797 0.1144 1.9484 6.9280
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.62442 1.27405 -2.845 0.00468 **
latitude 0.48512 0.03359 14.441 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.832 on 390 degrees of freedom
Multiple R-squared: 0.3484, Adjusted R-squared: 0.3467
F-statistic: 208.5 on 1 and 390 DF, p-value: < 2.2e-16
alpha <- lm1$coefficients[1]
beta <- lm1$coefficients[2]
ggplot(df, aes(latitude, size)) +
geom_point() +
geom_abline(intercept = alpha, slope = beta,
color = "red", linewidth = 1.5)autoplot(lm1, which = 1:2)you can do predictions, test hypotheses, …
We created a linear model:
predict(lm1) # or: fitted(lm1)bind_cols to add several columns to the tablepredict function always uses the same namesdf_pred <- tibble(x = seq(10, 30, length = 20)) %>%
bind_cols(predict(lm1, newdata = ., interval = "confidence")) %>%
select(-fit) %>%
rename(upr_conf = upr, lwr_conf = lwr) %>%
bind_cols(predict(lm1, newdata = ., interval = "prediction")) %>%
rename(y_pred = fit, upr_pred = upr, lwr_pred = lwr)
head(df_pred)# A tibble: 6 × 6
x lwr_conf upr_conf y_pred lwr_pred upr_pred
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10 23.4 25.6 24.5 16.7 32.3
2 11.1 25.5 27.7 26.6 18.8 34.4
3 12.1 27.5 29.7 28.6 20.8 36.4
4 13.2 29.5 31.9 30.7 22.8 38.5
5 14.2 31.4 34.0 32.7 24.9 40.5
6 15.3 33.3 36.2 34.8 26.9 42.6
Linear models in R