flowchart LR A[Variable x] --> M[Mediator] M --> O[Outcome y] A ==> O C[Confounder] --> O C --> A A --> D O --> D[Collider] style A fill:#f80 style O fill:#f80
Day 7
Freie Universität Berlin @ Theoretical Ecology
January 20, 2024
Z. M. Laubach, E. J. Murray, K. L. Hoke, R. J. Safran, and W. Perng, “A biologist’s guide to model selection and causal inference,” Proceedings of the Royal Society B: Biological Sciences, 2021. doi: 10.1098/rspb.2020.2815.
A. T. Tredennick, G. Hooker, S. P. Ellner, and P. B. Adler, “A practical guide to selecting models for exploration, inference, and prediction in ecology,” Ecology, 2021. doi: 10.1002/ecy.3336.
\[ \hat y = m \cdot x + b \]
Example: analysis of vegetation data from Alps excursion
There is a large overlap between prediction and machine learning:
M. Pichler and F. Hartig, “Machine learning and deep learning—A review for ecologists,” Methods in Ecology and Evolution, 2023. doi: 10.1111/2041-210x.14061.
Example: occupancy model of Black Kite in Spain under current and future climatic conditions
flowchart LR A[Variable x] --> M[Mediator] M --> O[Outcome y] A ==> O C[Confounder] --> O C --> A A --> D O --> D[Collider] style A fill:#f80 style O fill:#f80
see also Mediators, confounders, colliders – a crash course in causal inference by Florian Hartig
flowchart LR X -->|0.7| Y Confounder -->|0.14| X Confounder -->|0.11| Y X -->|0.43| Collider Y -->|0.21| Collider
library(dplyr)
library(ggplot2)
library(piecewiseSEM)
set.seed(123)
n <- 2000
true_effect_size <- 0.7
confounder_var <- rnorm(n)
x_var <- rnorm(n, 0.14*confounder_var, 0.4)
y_var <- rnorm(n, true_effect_size*x_var + 0.11*confounder_var, 0.24)
collider_var <- rnorm(n, 0.43*x_var + 0.21*y_var, 0.22)
df <- tibble(
x = x_var,
y = y_var,
confounder = confounder_var,
collider = collider_var)Our assumed structure is as follows:
flowchart LR X --> Y
(Intercept) x
0.003660183 0.791415440
Our next idea could be to include all variables as predictors (x, confounder, and collider). Our assumed structure:
flowchart LR X --> Y Confounder --> Y Collider --> Y
(Intercept) x confounder collider
0.001066547 0.578216064 0.101149710 0.225615265
Our next idea could be to include all variables as predictors but not the collider. Our assumed structure:
flowchart LR X --> Y Confounder --> Y
Another option is to specify the complete structure. This model is called a structural equation model (SEM).
mod <- psem(
lm(y ~ confounder + x, data = df),
lm(x ~ confounder, data = df),
lm(collider ~ x + y, data = df)
)
c4 <- coefs(mod, standardize = "none")
c4[2, ] Response Predictor Estimate Std.Error DF Crit.Value P.Value
2 y x 0.7088 0.0136 1997 52.2153 0 ***
plot(mod, show = "unstd") df_summary <- tibble(
estimate = c(c1$Estimate, c2$Estimate[1], c3$Estimate[1], c4$Estimate[2]),
SE = c(c1$Std.Error, c2$Std.Error[1], c3$Std.Error[1], c4$Std.Error[2]),
label = factor(
c("only x", "all variables", "x + confounder", "SEM"),
levels = c("only x", "all variables", "x + confounder", "SEM"))
)
df_summary %>%
ggplot() +
geom_point(aes(estimate, label), size = 3) +
geom_vline(aes(xintercept = true_effect_size),
color = "orange", linewidth = 1.5) +
annotate("text",
x = true_effect_size, y = 4.3, label = "true effect size",
color = "orange", size = 7, hjust = -0.05) +
geom_errorbar(aes(
xmin = estimate - SE, xmax = estimate + SE,
y = label), width = 0.0) +
labs(y = "predictor variables included", x = "Effect size of x → y (± SE)") +
theme_classic() +
theme(text = element_text(size = 18))Further reading: Causal Inference: have you been doing science wrong all this time? implements the same analysis in Python and with a Bayesian perspective
R
flowchart LR A[Variable x] --> O[Outcome y] D[Confounder] --> O style A fill:#f80 style O fill:#f80
y ~ x + confounder1 + confounder2Experiments:
Observations:
Exploration, Inference, Prediction