P-value adjustment for multiple tests

Day 9

Felix May

Freie Universität Berlin @ Theoretical Ecology

Reproduce slides

library(tidyr)
library(dplyr)
library(ggplot2)
library(readr)

## theme for ggplot
theme_set(theme_classic())
theme_update(text = element_text(size = 14))

Repeat prairie dog analysis

  • 2 Models
    1. All main effects and two-way interactions
    2. All main effects
pdogs1 <- read_csv("data/08_prairiedogs.csv")

pdogs1 <- pdogs1 %>%
  mutate(n.surv = n.tot - n.death,
         p.surv = n.surv / n.tot
  )

model1 <- glm(cbind(n.surv, n.death) ~ mated*hibernation + 
                                       mated*min.weight + 
                                       hibernation*min.weight,
              family = binomial, data = pdogs1)

model2 <- glm(cbind(n.surv, n.death) ~ mated +
                                       hibernation + 
                                       min.weight,
              family = binomial, data = pdogs1)

Several hypothesis tests

drop1(model1, test = "Chi")
Single term deletions

Model:
cbind(n.surv, n.death) ~ mated * hibernation + mated * min.weight + 
    hibernation * min.weight
                       Df Deviance    AIC     LRT Pr(>Chi)
<none>                     0.82565 39.744                 
mated:hibernation       1  1.18134 38.100 0.35569   0.5509
mated:min.weight        1  0.97217 37.891 0.14651   0.7019
hibernation:min.weight  1  1.13001 38.049 0.30436   0.5812
drop1(model2, test = "Chi")
Single term deletions

Model:
cbind(n.surv, n.death) ~ mated + hibernation + min.weight
            Df Deviance    AIC    LRT Pr(>Chi)  
<none>           1.6184 34.537                  
mated        1   1.6781 32.597 0.0597  0.80694  
hibernation  1   7.2750 38.194 5.6566  0.01739 *
min.weight   1   7.2963 38.215 5.6779  0.01718 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type I error rate

  • 6 tests
  • \(\alpha = 0.05\) in each test
    • \(\alpha\): Probability of Type I error aka false positive rate
  • \(\alpha\) –> defined for single test
  • What is the probability of doing mistakes when we do multiple tests?

Type I error / false positive

Rejecting the Null Hypothesis although it is true

Multiple tests – how many mistakes?

  • 20 tests with \(\alpha = 0.05\)
    • Expectation: 1 false positive
  • 100 tests with \(\alpha = 0.05\)
    • Expectation: 5 false positives
      • (When there are no effects in reality)

Multiple tests – one or more mistake?

  • New question:
    • What is the probability of at least one mistake in 6 tests?
      • Probability of correct result in one test: \(1 - \alpha = 0.95\)
      • Probability of correct results in 6 tests: \(0.95^6 = 0.735\)
      • Probability of one or more false positives: \(1 - 0.95^6 = 0.265\)
  • >25% chance of doing a mistake in our 6 tests!!!
  • What to do?

Multiple tests – adjust p-values!

  • Idea: Maintain given type I error rate
    • Keep \(\alpha\)
    • Increase p-values
    • Several approaches: see ?p.adjust
      • Holm’s method –> default in p.adjust()
      • Bonferroni correction –> easy, but very conservative (low risk of false positives)

Bonferroni correction

  • Multiply p-values with no. of tests
tests1 <- drop1(model1, test = "Chi")
tests2 <- drop1(model2, test = "Chi")
p_vals <- c(tests1$`Pr(>Chi)`[-1], tests2$`Pr(>Chi)`[-1])
p_vals
[1] 0.55091240 0.70188906 0.58116150 0.80693792 0.01739019 0.01717946
  • Apply Bonferroni correction
p_vals * length(p_vals)
[1] 3.3054744 4.2113343 3.4869690 4.8416275 0.1043412 0.1030768
  • Using p.adjust
p.adjust(p_vals, method = "bonferroni")
[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1043412 0.1030768

Bonferroni vs. Holm’s method

  • Bonferroni
p.adjust(p_vals, method = "bonferroni")
[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1043412 0.1030768
  • Holm’s method
p.adjust(p_vals, method = "holm") # default method!
[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1030768 0.1030768

Reconsider findings

  • How does that change our interpretation of the prairie dog analysis?
drop1(model2, test = "Chi")
Single term deletions

Model:
cbind(n.surv, n.death) ~ mated + hibernation + min.weight
            Df Deviance    AIC    LRT Pr(>Chi)  
<none>           1.6184 34.537                  
mated        1   1.6781 32.597 0.0597  0.80694  
hibernation  1   7.2750 38.194 5.6566  0.01739 *
min.weight   1   7.2963 38.215 5.6779  0.01718 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p.adjust(p_vals)
[1] 1.0000000 1.0000000 1.0000000 1.0000000 0.1030768 0.1030768
  • At best very weak evidence (“marginal significance”) of different survival probabilities with or without hibernation and with or without minimum weight.
  • Collect more data?

Review features of the data set

mated hibernation min.weight n.tot n.death n.surv p.surv
No Yes Yes 60 5 55 0.9166667
Yes Yes Yes 17 2 15 0.8823529
No No Yes 8 1 7 0.8750000
Yes No Yes 2 0 2 1.0000000
No Yes No 187 35 152 0.8128342
Yes Yes No 85 13 72 0.8470588
No No No 51 15 36 0.7058824
Yes No No 23 8 15 0.6521739
  • 433 individuals in total, but very few individuals for some factor level combinations
  • Effect sizes also limited (65 % - 100%)