Day 1
Freie Universität Berlin @ Theoretical Ecology
January 15, 2024
The tidyverse is an opinonated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
(www.tidyverse.org)
These are the main packages from the tidyverse that we will use:






Install the tidyverse once with:
install.packages("tidyverse")Then load and attach the packages at the beginning of your script:
You can also install and load the tidyverse packages individually, but since we will use so many of them together, it’s easier to load and attach them together.
readr is a tidyverse package. To use it, you have to load it:
library(read_csv) The most important functions are:
read_csv/write_csv to read/write comma delimited files
read_tsv/write_tsv to read/write tab delimited files
read_delim/write_delim to read/write files with any delimiter
read_*()
All read_* functions take a path to the data file as a first argument:
read_*(file = “path/to/your/file”, …)
Import files with a readr function fitting the delimiter of your file:
Use read_delim for a generic type of delimiter:
dat <- read_delim("data/your_data.txt", delim = "\t") # tab delimiter
dat <- read_delim("data/your_data.txt", delim = "..xyz..") # ..xyz.. delimiterAll read_* functions return a tibble
read_*()
The read functions provide several options to modify the reading of data.
Have a look at ?read_delim for all options.
Useful if your data is not a “perfect table”
read_*()
Specify number of lines to skip reading with skip

# without skipping first lines
read_csv(file = "data/meta_data_top.csv")# A tibble: 6 × 1
Metadata
<chr>
1 Date: June, 12, 1989
2 Author: Selina Baldauf
3 Temperature, Rainfall
4 1.5, 2
5 1, 0
6 0.5, 0.6
# skip meta data lines
read_csv(
file = "data/meta_data_top.csv",
skip = 4
)# A tibble: 3 × 2
Temperature Rainfall
<dbl> <dbl>
1 1.5 2
2 1 0
3 0.5 0.6
read_*()
Specify whether the data has a header column or not with col_names

read_*()
Specify whether the data has a header column or not with col_names

# First line expected to be column names
read_csv(file = "data/no_col_names.csv")# A tibble: 2 × 2
`1.5` `2`
<dbl> <dbl>
1 1 0
2 0.5 0.6
write_*()
Every read_* has a corresponding write_* function to export data from R.
Write data from R e.g.
To share transformed or summarized data
Summarize complex raw data and continue working with summarized data
…
write_*()
All write_* functions take the data to write as the first and the file to write to as the second argument:
write_*(x = dat, file = “path/to/save/file.*”, …)
Use write_delim for a generic type of delimiter:
write_delim(dat, file = "data-clean/your_data.txt", delim = "\t") # tab delimiter
write_delim(dat, file = "data-clean/your_data.txt", delim = "..xyz..") # ..xyz.. delimiterThe readxl package is part of the tidyverse, but you need to load it explicitly
Use the read_excel function to read an excel file:
dat <- read_excel(path = "data/your_data.xlsx")By default, this reads the first sheet. You can read other sheets with
dat <- read_excel(path = "data/your_data.xlsx", sheet = "sheetName") # via sheet name
dat <- read_excel(path = "data/your_data.xlsx", sheet = 2) # via sheet numberread_excel also has other functionality, like skipping rows etc.A little warning:
summary function and checking if the number of rows etc. is correctC:/Users/Selina/folder1/folder2/data/file_to_read.csv
data/file_to_read.csv
getwd()
Working with R and RStudio, the best way is to:
Follow these guidelines to make data import to R easier and less frustrating
.csv, .txt instead of .xlsx)Save an Excel spreadsheet as csv
Follow these guidelines to make data import to R easier and less frustrating
.csv, .txt instead of .xlsx)species_name instead of species name
janitor::clean_names() from the janitor package
. as a decimal separator (not ,)data-raw/my_data.csv instead of data raw/my data.csv
Introduction to R