Introduction
Many of us consume cereal on a daily basis. Given this, our motivation in this report is to examine the popularity of cereals among consumers. In order to understand this relationship, we will take a closer look at the different nutrients that are included in cereal, as well as the relationship between popularity among consumers and cereal manufacturer. Lastly, we will to explore how the popularity of cereals is reflected through shelf levels in grocery stores.
Main Research Questions
- Do nutrients differ by cereal manufacturer? And if so, how?
- Are unhealthy cereals more popular among consumers than traditionally “healthier” cereals?
- What is the relationship between shelf level, rating and manufacturer? (i.e. which manufacturers are the most popular?)
First we will focus on determining whether there is a significant difference in nutrients among cereals manufactured by different companies. Then, we will analyze whether such differences, if any, play a role in the consumer rating of each cereal. Finally, after putting the first two questions together, we will examine which manufacturers are most popular to gain further insight into the relationship between shelf level, rating, and manufacturer of each cereal.
Dataset Description
The dataset (www.kaggle.com/crawford/80-cereals) we worked with contains information about various cereals, and has 80 observations. The variables that are used in the dataset are the name of the cereal, mfr (manufacturer of cereal) where A = American Home Food Products, G = General Mills, K = Kellogg, N = Nabisco, P = Post, Q = Quaker Oats, R = Ralston Purina. The dataset also specifies whether a cereal is eaten hot or cold. Furthermore, there are different nutrient levels (calories (per serving), protein, fat, sodium, fiber, carbo, sugars, potass, and vitamins). Lastly, the dataset has a variable called shelf, whhich represents the level of the display shelf (1, 2, or 3, counting from the floor), weight (in ounces of one serving), cups (number of cups in one serving), and rating (a rating of the cereals).
- For reference, these are all the variables used in the dataset:
- Name: Name of cereal
- mfr = Manufacturer of cereal
- A = American Home Food Products
- G = General Mills
- K = Kellogg
- N = Nabisco
- P = Post
- Q = Quaker Oats
- R = Ralston Purina
- Type:
- Calories: Calories per serving
- Protein: grams of protein
- Fat: grams of fat
- Sodium: milligrams of sodium
- Fiber: grams of dietary fiber
- Carbo: grams of complex carbohydrates
- Sugars: grams of sugars
- Potass: milligrams of potassium
- Vitamins: vitamins and minerals - 0, 25, or 100, indicating the typical-percentage of FDA recommended
- Shelf: display shelf (1,2,or 3, counting from the floor)
- Weight: weight in ounces of one serving
- Cups: number of cups in one serving
- Rating: a rating of the cereals
Question 1: Do nutrients differ by cereal manufacturer? And if so, how?
We wanted to test whether we can detect if nutrients are associated with certain manufacturers and, as an offshoot, whether we can classify manufacturers by their nutrients. To do this we focus on the variables mfr (manufacturer), protein, fat, sodium, fiber, carbo, sugars, potass, and calories.
First we created a pairs plot of all the variables described above to get an idea how mfr could be associated with each nutrient variable.

To cast more light on how nutrients could be related to manufacturers, we performed principal component analysis (PCA) to help us visualize where clusters could be.
After running PCA, we created an elbow plot to figure out the best number of principal components to choose to show the maximum amount of variance.

We concluded from this plot that 7 principal components would be the optimal number of components to plot. However, plotting 7 dimensions would be difficult to both interpret and plot so we settled on plotting two principal components which accounts for about 58% of the variance in the data.

From our plot we do not see any clusters of manufacturers. There does not seem to be any pattern between nutrients and manufacturers.
To answer whether we could predict what manufacturer made a certain cereal by its nutrient levels, we created a dendrogram based on the quantitative nutrient variables - the same used in the above PCA. We color 7 groups for each of the 7 manufacturers.

Here we created a dendrogram taking into account all the nutrient data to see if we can group by manufacturer based on these nutrients. The blue cluster seems to contain most of the red cereals and the purple cluster only contains orange cereals. However, there does not seem to be any clusters that match the manufacturers.
Question 2: Are unhealthy cereals more popular among consumers than traditionally unhealthier cereals?
For our second research question, we are interested in examining whether unhealthier cereals are more popular among consumers than traditionally “healthier” cereals. In order to answer this question, we plotted rating against calories, sodium, carbs, sugars, and fat per serving of cereal as research tells us that high values of these nutrients are associated with unhealthy lifestyles. Below, we see our scatterplot colored by nutrient.

Contrary to our hypothesis that unhealthy cereals are more popular, it appears that calories, sodium, sugars, and fat have a moderate to strong negative association with rating, while carbs has a weak positive association with rating. This suggests that, on average, lower amounts of calories, sodium, sugars, and fat, which correspond to healthy cereals, are actually more popular among consumers while higher amounts of carbs are somewhat more popular among consumers.
Lastly, as a part of our analysis, we plotted rating against potassium, protein, and fiber as research tells us that healthy diets are made up of high amounts of these nutrients. Unlike our previous plot, all nutrients in our plot below have a moderate to strong positive relationship with rating. Thus, evidence suggests that higher amounts of potassium, protein, and fiber are more popular among consumers.

Taken together, both of these plots suggests that healthier cereals are, in fact, more popular among consumers than unhealthy cereals.
Question 3: Which manufacturers are the most popular?
In order to gain insight into which manufacturers are most popular, we examined the relationship between shelf level, rating and manufacturer with a faceted bar plot. We faceted by shelf level with manufacturer on the x-axis and rating on the y-axis.

From the above plot, we are able to conclude that on average across all shelf levels, cereal by manufacturers K and G are the most popular while cereal by manufacturer A is the least popular. More specifically, we are able to see that people generally had a higher rating for cereal placed on the 3rd shelf as seen by the ones manufactured by G, K, P and Q. Additionally, we are able to see that cereal manufactured by R, P, N and G had higher ratings on the 1st shelf than the 2nd shelf. Thus, another main takeaway from this plot is that people’s preference for cereal depending on manufacturer by shelf level are as follows from highest to lowest: 3, 1, and 2.
For such reasons, we are able to compare the rating of cereal for each manufacturer not only in each shelf level, but also across different shelf levels. Thus, this plot is informative in demonstrating the relationship between shelf level, rating and manufacturer, and in turn answering our research question.
Conclusion
In this report, we examined a dataset comprised of nutritional information on 80 different cereals. For our first research question, we explored whether nutrients differ by manufacturer, and found that, on average, the nutrients used in cereals do not seem to differ by manufacturer. For our second research question, contrary to our initial hypothesis, we concluded that healthy cereals are actually rated more favorably among consumers than unhealthy cereals. Lastly, for our final research question, we found that Kellogg and General Mills cereals are the most popular among consumers. Additionally, we discovered that cereals on the 3rd and 1st shelves are rated as the most popular. Overall, given our three research questions, we were able to examine the interplay between the nutrients, manufacturers, and popularity of different cereals. Based on our analyses and conclusions, we have a richer understanding of what consumers look for when purchasing cereal. In terms of future work, examining how nutrients differ by cup rather than by serving may provide more insight into the relationship between nutrition and cereal. Additionally, categorizing each cereal as healthy or unhealthy based on its mix of nutrients may allow us to better understand the relationship between the healthiness of cereals and their corresponding manufacturers. We are also interested in finding geospatial data for cereal popularity throughout the United States with which we could explore relationships between states/regions and the healthiness of their favorite cereals.