P-value adjustment for multiple tests

Day 9

Felix May

Freie Universität Berlin @ Theoretical Ecology

Reproduce slides

library(tidyr)
library(dplyr)
library(ggplot2)
library(readr)

## theme for ggplot
theme_set(theme_classic())
theme_update(text = element_text(size = 14))

Repeat prairie dog analysis

2 Models
1. All main effects and two-way interactions
2. All main effects

pdogs1 <- read_csv("data/08_prairiedogs.csv")

pdogs1 <- pdogs1 %>%
  mutate(n.surv = n.tot - n.death,
         p.surv = n.surv / n.tot
  )

model1 <- glm(cbind(n.surv, n.death) ~ mated*hibernation + 
                                       mated*min.weight + 
                                       hibernation*min.weight,
              family = binomial, data = pdogs1)

model2 <- glm(cbind(n.surv, n.death) ~ mated +
                                       hibernation + 
                                       min.weight,
              family = binomial, data = pdogs1)

Several hypothesis tests

drop1(model1, test = "Chi")

Single term deletions

Model:
cbind(n.surv, n.death) ~ mated * hibernation + mated * min.weight + 
    hibernation * min.weight
                       Df Deviance    AIC     LRT Pr(>Chi)
<none>                     0.82565 39.744                 
mated:hibernation       1  1.18134 38.100 0.35569   0.5509
mated:min.weight        1  0.97217 37.891 0.14651   0.7019
hibernation:min.weight  1  1.13001 38.049 0.30436   0.5812

drop1(model2, test = "Chi")

Single term deletions

Model:
cbind(n.surv, n.death) ~ mated + hibernation + min.weight
            Df Deviance    AIC    LRT Pr(>Chi)  
<none>           1.6184 34.537                  
mated        1   1.6781 32.597 0.0597  0.80694  
hibernation  1   7.2750 38.194 5.6566  0.01739 *
min.weight   1   7.2963 38.215 5.6779  0.01718 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type I error rate

6 tests
\(\alpha = 0.05\) in each test
- \(\alpha\): Probability of Type I error aka false positive rate
\(\alpha\) –> defined for single test
What is the probability of doing mistakes when we do multiple tests?

Type I error / false positive

Rejecting the Null Hypothesis although it is true

Multiple tests – how many mistakes?

20 tests with \(\alpha = 0.05\)
- Expectation: 1 false positive
100 tests with \(\alpha = 0.05\)
- Expectation: 5 false positives
  - (When there are no effects in reality)

Multiple tests – one or more mistake?

New question:
- What is the probability of at least one mistake in 6 tests?
  - Probability of correct result in one test: \(1 - \alpha = 0.95\)
  - Probability of correct results in 6 tests: \(0.95^6 = 0.735\)
  - Probability of one or more false positives: \(1 - 0.95^6 = 0.265\)
>25% chance of doing a mistake in our 6 tests!!!
What to do?

Multiple tests – adjust p-values!

Idea: Maintain given type I error rate
- Keep \(\alpha\)
- Increase p-values
- Several approaches: see ?p.adjust
  - Holm’s method –> default in p.adjust()
  - Bonferroni correction –> easy, but very conservative (low risk of false positives)

Bonferroni correction

Multiply p-values with no. of tests

tests1 <- drop1(model1, test = "Chi")
tests2 <- drop1(model2, test = "Chi")
p_vals <- c(tests1$`Pr(>Chi)`[-1], tests2$`Pr(>Chi)`[-1])
p_vals

[1] 0.55091240 0.70188906 0.58116150 0.80693792 0.01739019 0.01717946

Apply Bonferroni correction

p_vals * length(p_vals)

[1] 3.3054744 4.2113343 3.4869690 4.8416275 0.1043412 0.1030768

Using p.adjust

p.adjust(p_vals, method = "bonferroni")

[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1043412 0.1030768

Bonferroni vs. Holm’s method

Bonferroni

p.adjust(p_vals, method = "bonferroni")

[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1043412 0.1030768

Holm’s method

p.adjust(p_vals, method = "holm") # default method!

[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1030768 0.1030768

Reconsider findings

How does that change our interpretation of the prairie dog analysis?

drop1(model2, test = "Chi")

Single term deletions

Model:
cbind(n.surv, n.death) ~ mated + hibernation + min.weight
            Df Deviance    AIC    LRT Pr(>Chi)  
<none>           1.6184 34.537                  
mated        1   1.6781 32.597 0.0597  0.80694  
hibernation  1   7.2750 38.194 5.6566  0.01739 *
min.weight   1   7.2963 38.215 5.6779  0.01718 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p.adjust(p_vals)

[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1030768 0.1030768

At best very weak evidence (“marginal significance”) of different survival probabilities with or without hibernation and with or without minimum weight.
Collect more data?

Review features of the data set

mated	hibernation	min.weight	n.tot	n.death	n.surv	p.surv
No	Yes	Yes	60	5	55	0.9166667
Yes	Yes	Yes	17	2	15	0.8823529
No	No	Yes	8	1	7	0.8750000
Yes	No	Yes	2	0	2	1.0000000
No	Yes	No	187	35	152	0.8128342
Yes	Yes	No	85	13	72	0.8470588
No	No	No	51	15	36	0.7058824
Yes	No	No	23	8	15	0.6521739

433 individuals in total, but very few individuals for some factor level combinations
Effect sizes also limited (65 % - 100%)