Between 2007 and 2009, researchers collected data on penguins in three islands in the Palmer Archipelago in Antarctica: Biscoe, Dream, and Torgersen. The penguins dataset has data for 342 penguins from 3 different species: Chinstrap, Gentoo, and Adélie. It includes the following variables:

Knowing the difference between bill length and bill depth is tricky if you’re not a bird expert (I’m not!), so here’s a helpful diagram:

Penguin bill length vs. bill depth

Penguin bill length vs. bill depth

Cleaning the data

We first need to clean the data a little. Some of the observations are missing the sex of the penguin!

Missing data will mess up our regression models, so we remove any rows with missing sex. That also fixes the issues we had with the other missing variables, since those rows were missing the sex.

We’ll save this clean data as a CSV file so we can use it in other analysis (or other files within this analysis).

Exploratory analysis

First we’ll look for any patterns in the data. Maybe specific species are heavier or have longer wings or longer or taller bills?

Penguin weight by species

Penguin weight by species

Penguin weight by species

It looks like Gentoo penguins are heavier on average than the other two species, and substantially so. Gentoo penguins weigh an average of 5,092 grams, while Adelie and Chinstrap penguins weigh an average of 3,706 and 3,733, respectively.

Bill depth by species

Next we’ll look at bill depth (again, this refers to the distance between the top and bottom of the bill) across species:

Penguin bill depth by species

Penguin bill depth by species

Again, Gentoo penguins are quite distinctive and have the shortest bills. On average, Gentoo bills are 15 millimeters deep, while Adelie and Chinstrap penguins have bills that are 18.3 and 18.4, respectively.

Penguin location

Are there any patterns in where these birds live?

Penguin location by species

Penguin location by species

Neat! Gentoo penguins are only on Biscoe Island, Chinstrap penguins are only on Dream Island, and Adelie penguins live on all three of the islands in the dataset—and they’re all alone on Torgersen Island.

Regression analysis

We’ve seen that Gentoo penguins are pretty distinctive and are both heavier and have shorter bill depths. What’s the overall relationship between bill length and bird weight? Are Gentoos still distinctive?

According to this plot, it looks like there’s a negative relationship between bill depth and body mass—as bills get taller, penguins get lighter. We can create a regression model to see the exact relationship. We’ll use this model:

\[ \widehat{\text{Body mass}} = \beta_0 + \beta_1 \text{Bill depth} + \epsilon \]

term estimate std.error statistic p.value
(Intercept) 7519.9808 342.32506 21.967368 0
bill_depth_mm -193.0061 19.81378 -9.741003 0

Based on this model, a 1-mm increase in bill depth is associated with a -193 gram decrease in body weight, on average.

However, that’s wrong! The coefficient for \(\beta_1\) is negative here, but we’re not accounting for species. If we look at the original scatterplot, the trend line does go down, but if we color the points by species, we can see that the relationship is actually positive within species.

The dark red line shows the trend when not considering species, while the yellow, purple, and blue lines show the within-species trends. The directions reverse! This is a great example of something called Simpson’s Paradox, which according to Wikipedia means that

…a trend appears in several groups of data but disappears or reverses when the groups are combined.

If we control for species in a new regression model, we can see the positive relationship between bill depth and body mass. Here’s the new model:

\[ \widehat{\text{Body mass}} = \beta_0 + \beta_1 \text{Bill depth} + \beta_2 \text{Species} + \epsilon \]

term estimate std.error statistic p.value
(Intercept) -1000.845936 324.96796 -3.0798295 0.0022457
bill_depth_mm 256.551128 17.63746 14.5458078 0.0000000
speciesChinstrap 8.111481 52.87375 0.1534122 0.8781672
speciesGentoo 2245.878347 73.95557 30.3679398 0.0000000

It worked! After controlling for species, on average, a 1-mm increase in bill depth is associated with a 256.6 gram increase in weight. Also, interestingly, the coefficients for Chinstrap and Gentoo penguins show the trends across these species’ weights. Compared to Adelie penguins, Chinstrap penguins are only 8.1 grams heavier, while Gentoo penguins are 2,246 grams heavier than Adelie penguins, on average.

Conclusion

Therefore, Gentoo penguins are neat. They

  1. only live on Biscoe Island
  2. are heavier than their Chinstrap and Adelie counterparts—they’re the chonky bois of the penguin world
  3. have short bills, height-wise

Also, there seems to be a fairly strong relationship between bill depth and body weight. Within all three species, penguins with taller bills tend to be heavier. This relationship can get hidden by Simpson’s Paradox if we don’t look at within-species trends though.

The end.

Penguins!

Penguins!