Tidy Data with tidyr
1 Lecture
2 Exercise
In 2021, students conducted vegetation surveys in Berlin in open sites without any trees, i.e., grasslands, ruderal vegetation, etc., as well as in half-open sites, which had some trees on the plots or next to the plots, i.e., parks, forests, etc.. The vegetation surveys are included in the file vegetation_berlin.csv. It includes species lists for all plots and the cover of species measured on the scale of Londo (in percent).
In addition, there is data on the Ellenberg indicator values of the species, which describe the environmental requirements of the species in the table 03_ellenberg_indicator_values.csv. For example, a low value for light means that the species is adapted to shady conditions, e.g., in dark forests, while a high value means it is growing in open conditions with a lot of light.
Your task is now to compare the mean indicator values for light between open and half-open sites. To address this, you have to take the following steps:
- Join the two tables
- Calculate the mean light indicator value for each plot and each plot_type using the unweighted mean and the weighted mean with the Londo cover as weight.
- Create an appropriate figure that shows the mean light indicator value depending on the plot type (open vs. half open)
- What is the primary and the foreign key here? Remember that a key can include several columns.
- The functions
separate_wider_delim()andunite()fromtidyrmight be helpful here.
- The combination
group_by() %>% summarise()can help here. - With
weighted.mean()you can use weights in the calculation.
Boxplots are a good choice here.