Creating vectors and tables

Day 3

Felix May

Freie Universität Berlin @ Theoretical Ecology

Reproduce slides

Overview – Data management topics

What you learned so far

  • Reading data into R
  • Visualising data
  • Summarising data
  • Selecting rows or columns from tables

What you will learn today

  • Creating vectors and tables from scratch
  • Combining tables from different files
  • Adjusting the structure of tables ➔ tidy data

Why is this useful?

  • Preparing tables for data entry
    • E.g., treatment values in experiments
  • Input data for model predictions (see days 5 - 9)
  • Adjust data for analysis or visualisation

Generating single vectors

Creating vectors in R

  • Any set of values
my_num <- c(0.05, 1, 2.5)
my_fac <- c("low", "high", "med")

Note

Vectors always include values from the same data type (i.e., numeric or categorical)

Generating ordered sequences

  • Ordered sequences with user-defined, start, end, and step size (argument by)
seq(from = 2, to = 4, by = 0.5)
[1] 2.0 2.5 3.0 3.5 4.0
  • Reminder: Function arguments can be specified by name and/or order
seq(2, 4, by = 0.5)
[1] 2.0 2.5 3.0 3.5 4.0
  • Choose length of sequence instead of step size
    • by is calculated from length
seq(2, 4, length = 9)
[1] 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00

Generating ordered sequences

  • Special short syntax with by = 1
seq(2, 6, by = 1)
[1] 2 3 4 5 6
2:6
[1] 2 3 4 5 6
  • Also step sizes of -1 works
5:-5
 [1]  5  4  3  2  1  0 -1 -2 -3 -4 -5

Vectors with repeated values

  • Helpful for observational or experimental data with replicates

  • Replicate a complete vector

rep(1:4, times = 3)
 [1] 1 2 3 4 1 2 3 4 1 2 3 4
  • Replicate each element a given number of time
rep(1:4, each = 3)
 [1] 1 1 1 2 2 2 3 3 3 4 4 4
  • Also works with categorical data
levels <- c("high", "medium", "low")
rep(levels, each = 4)
 [1] "high"   "high"   "high"   "high"   "medium" "medium" "medium" "medium"
 [9] "low"    "low"    "low"    "low"   

Creating data tables

Creating data tables

mydata <- tibble(x = 1:5, y = seq(100, 200, length = 5))
mydata
# A tibble: 5 × 2
      x     y
  <int> <dbl>
1     1   100
2     2   125
3     3   150
4     4   175
5     5   200

Creating data tables

  • Tables can include different data types
tibble(num = c(1,2,3),
       fac = c("a","b","c"),
       log = c(T,T,F))
# A tibble: 3 × 3
    num fac   log  
  <dbl> <chr> <lgl>
1     1 a     TRUE 
2     2 b     TRUE 
3     3 c     FALSE

Creating data tables

  • Single values are recycled to match longer vectors
tibble(x = 1:10, treatment = "control")
# A tibble: 10 × 2
       x treatment
   <int> <chr>    
 1     1 control  
 2     2 control  
 3     3 control  
 4     4 control  
 5     5 control  
 6     6 control  
 7     7 control  
 8     8 control  
 9     9 control  
10    10 control  

Creating data tables

  • Functions for generating vectors can be called within tibble()
tibble(temp = rep(seq(2,4), times  = 2),
       treatment = rep(c("control","CO2"), each = 3)) 
# A tibble: 6 × 2
   temp treatment
  <int> <chr>    
1     2 control  
2     3 control  
3     4 control  
4     2 CO2      
5     3 CO2      
6     4 CO2      

Reminder: Adding columns to tables

dat1 <- tibble(temp = rep(2:3, times  = 2),
               treatment = rep(c("control","CO2"), each = 2)) 
dat2 <- dat1 %>% mutate(tree_species = "pine")
dat2
# A tibble: 4 × 3
   temp treatment tree_species
  <int> <chr>     <chr>       
1     2 control   pine        
2     3 control   pine        
3     2 CO2       pine        
4     3 CO2       pine        
  • Using base R
dat1$tree_species <- "pine"
dat1
# A tibble: 4 × 3
   temp treatment tree_species
  <int> <chr>     <chr>       
1     2 control   pine        
2     3 control   pine        
3     2 CO2       pine        
4     3 CO2       pine        

Combining several tables

Preparation: Create several tables and files

  • Example: Data from different locations in different files
dat_pine <- dat1 %>% mutate(tree_species = "pine")
dat_beech <- dat1 %>% mutate(tree_species = "beech")
dat_oak <- dat1 %>% mutate(tree_species = "oak")
  • Write to csv
write_csv(dat_pine, file = "data/03_pine1.csv")
write_csv(dat_beech, file = "data/03_beech1.csv")
write_csv(dat_oak, file = "data/03_oak1.csv")

Combining several tables – by row

  • Reminder: reading tables from files
    • Set your working directory correctly!
dat_pine <- read_csv("data/03_pine1.csv")
dat_beech <- read_csv("data/03_beech1.csv")
dat_oak <- read_csv("data/03_oak1.csv")
  • Put the tables together by row
bind_rows(dat_pine, dat_beech, dat_oak)
# A tibble: 12 × 3
    temp treatment tree_species
   <dbl> <chr>     <chr>       
 1     2 control   pine        
 2     3 control   pine        
 3     2 CO2       pine        
 4     3 CO2       pine        
 5     2 control   beech       
 6     3 control   beech       
 7     2 CO2       beech       
 8     3 CO2       beech       
 9     2 control   oak         
10     3 control   oak         
11     2 CO2       oak         
12     3 CO2       oak         

Combining several tables – potential problems

  • Usually, the tables should have the same column names
names(dat_pine)
[1] "temp"         "treatment"    "tree_species"
dat_pine <- dat_pine %>% rename(Species = "tree_species")
names(dat_pine)
[1] "temp"      "treatment" "Species"  
bind_rows(dat_pine, dat_beech)
# A tibble: 8 × 4
   temp treatment Species tree_species
  <dbl> <chr>     <chr>   <chr>       
1     2 control   pine    <NA>        
2     3 control   pine    <NA>        
3     2 CO2       pine    <NA>        
4     3 CO2       pine    <NA>        
5     2 control   <NA>    beech       
6     3 control   <NA>    beech       
7     2 CO2       <NA>    beech       
8     3 CO2       <NA>    beech       

Combining several tables – solution

Mini-exercise

How would you fix the problem?

Solution code
dat_pine <- dat_pine %>% rename(tree_species = "Species")

bind_rows(dat_pine, dat_beech)
# A tibble: 8 × 3
   temp treatment tree_species
  <dbl> <chr>     <chr>       
1     2 control   pine        
2     3 control   pine        
3     2 CO2       pine        
4     3 CO2       pine        
5     2 control   beech       
6     3 control   beech       
7     2 CO2       beech       
8     3 CO2       beech       

Combining tables – by column

dat2 <- tibble(year = 2023, 
               irrigation = rep(c("yes","no"), times = 2))
dat2
# A tibble: 4 × 2
   year irrigation
  <dbl> <chr>     
1  2023 yes       
2  2023 no        
3  2023 yes       
4  2023 no        
  • Combining by columns
bind_cols(dat_pine, dat2)
# A tibble: 4 × 5
   temp treatment tree_species  year irrigation
  <dbl> <chr>     <chr>        <dbl> <chr>     
1     2 control   pine          2023 yes       
2     3 control   pine          2023 no        
3     2 CO2       pine          2023 yes       
4     3 CO2       pine          2023 no        

Combining tables – potential problems

  • Tables must have equal row numbers (or just a single row)
dat3 <- tibble(year = 2023, 
               irrigation = rep(c("yes","no"), times = 3))
bind_cols(dat_pine, dat3)
Error in `bind_cols()`:
! Can't recycle `..1` (size 4) to match `..2` (size 6).

Exercise

Now it is your turn!

Exercise: Generating experimental treatments