Introduction

Many of us consume cereal on a daily basis. Given this, our motivation in this report is to examine the popularity of cereals among consumers. In order to understand this relationship, we will take a closer look at the different nutrients that are included in cereal, as well as the relationship between popularity among consumers and cereal manufacturer. Lastly, we will to explore how the popularity of cereals is reflected through shelf levels in grocery stores.

Main Research Questions

  1. Do nutrients differ by cereal manufacturer? And if so, how?
  2. Are unhealthy cereals more popular among consumers than traditionally “healthier” cereals?
  3. What is the relationship between shelf level, rating and manufacturer? (i.e. which manufacturers are the most popular?)

First we will focus on determining whether there is a significant difference in nutrients among cereals manufactured by different companies. Then, we will analyze whether such differences, if any, play a role in the consumer rating of each cereal. Finally, after putting the first two questions together, we will examine which manufacturers are most popular to gain further insight into the relationship between shelf level, rating, and manufacturer of each cereal.

Dataset Description

The dataset (www.kaggle.com/crawford/80-cereals) we worked with contains information about various cereals, and has 80 observations. The variables that are used in the dataset are the name of the cereal, mfr (manufacturer of cereal) where A = American Home Food Products, G = General Mills, K = Kellogg, N = Nabisco, P = Post, Q = Quaker Oats, R = Ralston Purina. The dataset also specifies whether a cereal is eaten hot or cold. Furthermore, there are different nutrient levels (calories (per serving), protein, fat, sodium, fiber, carbo, sugars, potass, and vitamins). Lastly, the dataset has a variable called shelf, whhich represents the level of the display shelf (1, 2, or 3, counting from the floor), weight (in ounces of one serving), cups (number of cups in one serving), and rating (a rating of the cereals).

Question 1: Do nutrients differ by cereal manufacturer? And if so, how?

We wanted to test whether we can detect if nutrients are associated with certain manufacturers and, as an offshoot, whether we can classify manufacturers by their nutrients. To do this we focus on the variables mfr (manufacturer), protein, fat, sodium, fiber, carbo, sugars, potass, and calories.

First we created a pairs plot of all the variables described above to get an idea how mfr could be associated with each nutrient variable.

To cast more light on how nutrients could be related to manufacturers, we performed principal component analysis (PCA) to help us visualize where clusters could be.

After running PCA, we created an elbow plot to figure out the best number of principal components to choose to show the maximum amount of variance.

We concluded from this plot that 7 principal components would be the optimal number of components to plot. However, plotting 7 dimensions would be difficult to both interpret and plot so we settled on plotting two principal components which accounts for about 58% of the variance in the data.

From our plot we do not see any clusters of manufacturers. There does not seem to be any pattern between nutrients and manufacturers.

To answer whether we could predict what manufacturer made a certain cereal by its nutrient levels, we created a dendrogram based on the quantitative nutrient variables - the same used in the above PCA. We color 7 groups for each of the 7 manufacturers.

Here we created a dendrogram taking into account all the nutrient data to see if we can group by manufacturer based on these nutrients. The blue cluster seems to contain most of the red cereals and the purple cluster only contains orange cereals. However, there does not seem to be any clusters that match the manufacturers.

Conclusion

In this report, we examined a dataset comprised of nutritional information on 80 different cereals. For our first research question, we explored whether nutrients differ by manufacturer, and found that, on average, the nutrients used in cereals do not seem to differ by manufacturer. For our second research question, contrary to our initial hypothesis, we concluded that healthy cereals are actually rated more favorably among consumers than unhealthy cereals. Lastly, for our final research question, we found that Kellogg and General Mills cereals are the most popular among consumers. Additionally, we discovered that cereals on the 3rd and 1st shelves are rated as the most popular. Overall, given our three research questions, we were able to examine the interplay between the nutrients, manufacturers, and popularity of different cereals. Based on our analyses and conclusions, we have a richer understanding of what consumers look for when purchasing cereal. In terms of future work, examining how nutrients differ by cup rather than by serving may provide more insight into the relationship between nutrition and cereal. Additionally, categorizing each cereal as healthy or unhealthy based on its mix of nutrients may allow us to better understand the relationship between the healthiness of cereals and their corresponding manufacturers. We are also interested in finding geospatial data for cereal popularity throughout the United States with which we could explore relationships between states/regions and the healthiness of their favorite cereals.