Data transformation for linear models

Day 7

Felix Nößler

Freie Universität Berlin @ Theoretical Ecology

January 23, 2024

Reproduce slides

1 When to transform data?

  • The relationhip between the response and the predictor is non-linear/exponential \(\rightarrow\) log-transform the predictor variable
  • You want an interpretable intercept \(\rightarrow\) center the predictor variables
  • You want to compare the coefficients of different predictors \(\rightarrow\) standardize (center and divide by the standard deviation) the predictor variables
  • You have quantities with several orders of magnitude \(\rightarrow\) log-transform the variables to be able to compare them

2 When to better not use transformations?

  • often it’s better to use a generalized linear model (GLM) instead of a transformation of the response variable, especially if you want to test a hypothesis

see, for example: R. B. O’Hara and D. J. Kotze, “Do not log‐transform count data,” Methods in Ecology and Evolution, 2010. doi: 10.1111/j.2041-210x.2010.00021.x.

3 Transformation for visualisation

Source on wikimedia

4 Workflow

flowchart LR
  A[fit model\nto raw\ndata] --> B[check\nassumptions]
  B -->|if not\nfulfilled| T[transform\ndata]
  T --> J
  J[fit model to\ntransformed\ndata] --> K[check\nassumptions]
  K --> P
  B -->|if fulfilled| P
  P[make\npredictions] --> S[plot raw\ndata and\npredictions]
  
style T fill:#f80