Probability and Likelihood
1 Lecture
2 Exercises
2.1 Discrete vs. continuous distributions
Which data in biology/ecology could you better model with a discrete distribution? Which with a continuous distribution?
2.2 Characterize a distribution
Research distributions on the Internet and choose one that seems interesting to you. Briefly characterize it in bullet points and use R to create illustrations of this distribution. Summarize your results on one (max. two) A4 pages.
You can choose for example the Normal Distribution, the Uniform Distribution, the Binomial Distribution, or the Poisson Distribution.
You can characterise your distribution e.g.:
- Central Tendency:
- Where the center of the distribution is located (mean, median, mode).
- Spread:
- How much the values deviate from the center (variance, standard deviation).
- Shape:
- Symmetric, skewed, unimodal, bimodal.
Visualization tools for distributions are e.g.:
- Histograms:
- Bar-like representation of data distribution.
- Probability Density Function (PDF):
- Mathematical function describing the likelihood of a continuous random variable.
- Cumulative Distribution Function (CDF):
- Probability that a random variable takes a value less than or equal to a given value.
2.3 Calculate the likelihood
We will use the 05_penguins.csv dataset.
- Make a histogram of the penguin body mass for each species. You can either make three plots or use
facet_wrap. What distribution do you think would be appropriate to model this data? Would you use a discrete or continuous distribution? - Calculate the log-likelihood of \(\mathcal{N}(4000, 500)\) for all measurements of the body mass of the Adelie penguins.
- Plot the density of \(\mathcal{N}(4000, 500)\) along with the measurements. Try
geom_point,geom_dotplot, andgeom_rugto plot the measurements. Does it look like a good fit? - Try out three different values for the mean and standard deviation of the normal distribution. Which one has the highest likelihood?
- What is the maximum likelihood estimate for the mean and standard deviation of the normal distribution for the body mass of the Adelie penguins?
- Check that your maximum likelihood estimate coincides with the result of
lm(body_mass_g ~ 1), where the data is already filtered for the Adelie penguins.
You need a vector of the body mass of the Adelie penguins for the dnorm function. You can get a vector out of a data frame with the $ operator or with pull(tibble_name, column_name).
You have to filter for the Adelie penguins and remove the NAs before you can calculate the likelihood.
Use dnorm with log = T and calculate the sum of the result to get the log-likelihood.