| cities | population | area_km2 |
|---|---|---|
| Istanbul | 15100000 | 2576 |
| Moscow | 12500000 | 2561 |
| London | 9000000 | 1572 |
| Saint Petersburg | 5400000 | 1439 |
| Berlin | 3800000 | 891 |
| Madrid | 3200000 | 604 |
| Kyiv | 3000000 | 839 |
| Rome | 2800000 | 1285 |
| Bucharest | 2200000 | 228 |
| Paris | 2100000 | 105 |
Day 2
Freie Universität Berlin @ Theoretical Ecology
January 16, 2024
The built-in data structure for tables in R is a data frame.
Vectors in R can’t represent data table where values are connected via rows
Data frames are one of the biggest and most important ideas in R, and one of the things that make R different from other programming languages.
(H. Wickham, Advanced R)
| cities | population | area_km2 |
|---|---|---|
| Istanbul | 15100000 | 2576 |
| Moscow | 12500000 | 2561 |
| London | 9000000 | 1572 |
| Saint Petersburg | 5400000 | 1439 |
| Berlin | 3800000 | 891 |
| Madrid | 3200000 | 604 |
| Kyiv | 3000000 | 839 |
| Rome | 2800000 | 1285 |
| Bucharest | 2200000 | 228 |
| Paris | 2100000 | 105 |
A data frame is a named list of vectors of the same length.

Data frames are created with the function data.frame():
cities <- c(
"Istanbul", "Moscow", "London",
"Saint Petersburg", "Berlin","Madrid",
"Kyiv", "Rome", "Bucharest","Paris")
population <- c(
15.1e6, 12.5e6, 9e6, 5.4e6, 3.8e6,
3.2e6, 3e6, 2.8e6, 2.2e6, 2.1e6)
area_km2 <- c(2576, 2561, 1572, 1439,
891, 604, 839, 1285, 228, 105)
data.frame(
cities = cities,
population = population,
area_km2 = area_km2
) cities population area_km2
1 Istanbul 15100000 2576
2 Moscow 12500000 2561
3 London 9000000 1572
4 Saint Petersburg 5400000 1439
5 Berlin 3800000 891
6 Madrid 3200000 604
7 Kyiv 3000000 839
8 Rome 2800000 1285
9 Bucharest 2200000 228
10 Paris 2100000 105
Tibbles are
a modern reimagining of the data frame. Tibbles are designed to be (as much as possible) drop-in replacements for data frames.
(Wickham, Advanced R)
Have a look at this book chapter for a full list of the differences between data frames and tibbles and the advantages of using tibbles.
Tibbles have the same basic properties as data frames (named list of vectors)
Everything that you can do with data frames, you can do with tibbles


Tibbles are a available from the tibble package.
Before we use tibbles, we need to install the package once using the function install.packages:
# This has do be done only once (in the console, not in the script)
install.packages("tibble")Then, we need to load the package into our current R session using library:
Create a tibble using the tibble() function:
# A tibble: 10 × 3
cities population area_km2
<chr> <dbl> <dbl>
1 Istanbul 15100000 2576
2 Moscow 12500000 2561
3 London 9000000 1572
4 Saint Petersburg 5400000 1439
5 Berlin 3800000 891
6 Madrid 3200000 604
7 Kyiv 3000000 839
8 Rome 2800000 1285
9 Bucharest 2200000 228
10 Paris 2100000 105
How many rows?
nrow(cities_tbl)[1] 10
How many columns?
ncol(cities_tbl)[1] 3
What are the column headers?
names(cities_tbl)[1] "cities" "population" "area_km2"
Look at the entire table in a separate window with view():
view(cities_tbl)Or click on the little table sign in the Environment pane:

Get a quick summary of all columns:
summary(cities_tbl) cities population area_km2
Length:10 Min. : 2100000 Min. : 105.0
Class :character 1st Qu.: 2850000 1st Qu.: 662.8
Mode :character Median : 3500000 Median :1088.0
Mean : 5910000 Mean :1210.0
3rd Qu.: 8100000 3rd Qu.:1538.8
Max. :15100000 Max. :2576.0
Indexing tibbles works similar to indexing vectors but with 2 dimensions instead of 1:
tibble [ row_index, col_index or col_name ]
[] always returns another tibble.# First row and first column
cities_tbl[1, 1]# A tibble: 1 × 1
cities
<chr>
1 Istanbul
This is the same as
cities_tbl[1, "cities"]# rows 1 & 5, all columns:
cities_tbl[c(1, 5), ]# A tibble: 2 × 3
cities population area_km2
<chr> <dbl> <dbl>
1 Istanbul 15100000 2576
2 Berlin 3800000 891
# All rows, first 2 columns
cities_tbl[ ,1:2] # same as cities_tbl[ , c(1, 2)]
# same as
cities_tbl[ ,c("cities", "population")]# A tibble: 10 × 2
cities population
<chr> <dbl>
1 Istanbul 15100000
2 Moscow 12500000
3 London 9000000
# ℹ 7 more rows
Indexing columns by name is usually preferred to indexing by position
cities_tbl[ ,1:2] # okay
cities_tbl[ ,c("cities", "population")] # betterCode is much easier to read
Code is more robust against
General rule
Good code produces errors when something unintended or wrong happens
$
Select an entire column from a tibble using $ (this returns a vector instead of a tibble):
cities_tbl$cities [1] "Istanbul" "Moscow" "London" "Saint Petersburg"
[5] "Berlin" "Madrid" "Kyiv" "Rome"
[9] "Bucharest" "Paris"
New columns can be added as vectors using the $ operator. The vectors need to have the same length as the tibble has rows.
# add a country column
cities_tbl$country <- c(
"Turkey", "Russia", "UK", "Russia", "Germany", "Spain",
"Ukraine", "Italy", "Romania", "France"
)# A tibble: 10 × 4
cities population area_km2 country
<chr> <dbl> <dbl> <chr>
1 Istanbul 15100000 2576 Turkey
2 Moscow 12500000 2561 Russia
3 London 9000000 1572 UK
4 Saint Petersburg 5400000 1439 Russia
5 Berlin 3800000 891 Germany
6 Madrid 3200000 604 Spain
7 Kyiv 3000000 839 Ukraine
8 Rome 2800000 1285 Italy
9 Bucharest 2200000 228 Romania
10 Paris 2100000 105 France
Tables in R: Data frames and tibbles
install.packages("tibble")
library(tibble) at the beginning of your script to load packageReturn result as tibble:
Return result as vector:
tbl$colA # select colA Introduction to R