Governments have the responsiblity to make effective food, health and wellbeing decisions in mass quantities. In recent years, organizations such as The Good Food Purchasing Program (GFPP) aim to be transparent about sustainable food purchases. The GFPP was developed to promote thoughtful food purchasing decision making in city governments. The program stresses the following factors to consider: local economies, environmental sustainability, valued workforce, animal welfare, and nutrition. We would expect that local and lower processed foods, such as fresh produce, dairy, and grain, are more prioritized as healthier food options.
To see how well food making decisions are made in recent years, we analyzed a sample of the U.S. that is assisted by the GFPP through Good Food Purchasing Data found from NYC open data. The dataset contains twelve variables and 17,208 observations. The variables we analyze from the dataset are:
Food Product Category - food product (low level categorization)
Food Product Group - food product (high level categorization)
Time Period - time period of product purchase
Total Weight in lbs - product weight in pounds
Product Name - product name (normalized from type of product)
Number of units - number of units of product
As we will see further in our discussion, we will use the above variables to create new ones that help further analyze NYC’s food purchases.
We ask the following to determine the effectiveness of NYC’s food purchasing: What are the overall trends in product purchasing, to inform NYC about the success of their initiative? Which agencies are ordering the most units, and what are their recent food product group purchasing trends? Are food purchases from local farms increasing?
One of the primary goals of the Good Food Purchasing Program is to promote health through offers of produce, grains, and foods that are less processed. If this goal has been implemented, we would expect that over the 3 years since the initiative started, purchases within these categories may increase, or shift overall. To explore such trends in product purchasing, it is helpful to consider the different product types and the way purchasing trends have progressed over the time data has been collected in this study. Demand for certain types of food may increase over the years as the priorities of the food purchasing changes. Firstly, it is helpful to explore a plot of the total number of units purchased each year within the categories of Produce and Bread/Grains/Legumes:
We see that, seemingly contrary to the goals of the initiative, the overall amount of produce and grains have decreased over time rather than increased. In particular, the purchasing of vegetables has decreased a fair amount since 2018. We also notice that grain products make up the primary purchases made within the bread, grains, and legumes food family, with grain products far outweighing any other individual category purchases between produce and grains. We also see that within produce, the primary item purchased are fruits and vegetables, but these sharply declined in 2019 - 2020. We test whether this difference in produce consumption of the yeras is significant using a mosaic plot of produce category purchases made over time:
##
## Pearson's Chi-squared test
##
## data: produces.table
## X-squared = 268403, df = 4, p-value < 2.2e-16
Based on the colorings of the mosaic plot, which were made through Pearson Residuals, we see that all the categories appear to be significantly different. The Pearson residuals all have absolute values greater than 2, so there is a significant difference between the product types and purchases made over the years. This conclusion is further supported by a chi-squared test with a p-value less than \(2.2*10^{-16}\), indicating that there is a significant difference among these purchases made per year.
However, it is possible that while the overall produce and grain purchases have decreased since 2018, the NYC government may be purchasing higher quality items, such as more fresh produce or healthier grains. Thus, we would like to explore a word cloud over these 3 years, comparing the common words found in product names.
Within this word cloud, we see that from 2018 to 2020, there has been an increase in the prevalence of certain grains in comparison to produce. In particular, pasta and cereal, and beans have grown relatively more common, while the relative frequency of produce words, such as green, carrots, apple, potatoes, and vegetable, have stayed relatively similar over time. We also see that across all years, the most common words were “frozen” and “can”, indicating that NYC has not significantly implemented their initiative to purchase more fresh food.
Based on our analysis of produce purchases over time, it may be beneficial to explore whether produce purchases are made through a specific vendor, to understand more about the sourcing of food.
This treemap relates the total purchases made to the area of each
individual box, separated by the Vendor and by the Produce Category. All
vendors who provide less than the 70th percentile in amount of produce
have been grouped into the category All Other Vendors
to
provide for easier understanding of the chart. We see that Frank
Gargiulo & Sons is a primary provider of many of the produce types.
This is a produce company that provides food across the East Coast.
Other large providers include Teri Nichols, US Foods, and FoodCo appear
to be common providers. We also see a large amount of overlap in general
between the different produce types and the vendors. For the NYC
government, it may be beneficial to diversify their vendors in order to
prevent potential reliances on a company. Should there be a problem with
that vendor, it would be beneficial to have other potential connections
to still provide food.
Another goal of the Good Food Purchasing program is to buy local food and like the previous question, we would like to see that there is a general increase in local foods purchased and how this program is supporting local farms. To answer this question, we will define local as states in the Northeast which are: Connecticut, Delaware, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, and Vermont. First, we can look at the general trend of products bought locally by looking at the amount of local products bought by category.
Overall, we can see that for all categories, there has been a general decrease in the amount of local foods purchased over time. While COVID-19 could have forced the city to have more budget constraints, it is also important that these local farmers and producers are continuously supported economically. We can continue analyzing local foods to understand more about the products that are being purchased and not just the categories that foods are purchased in. Hopefully, we see that fresh food is bought relatively more and that more nutritious foods such as fruits and meats are being purchased. Below is a word cloud of all of the products that are bought locally and we will use this word cloud to see the relative amounts of foods and types of foods bought.
From this word cloud, we see that frozen is the word that appears the most showing that frozen foods are the most purchased which is concerning that most of the local foods are not fresh. We know that milk and dairy are the largest categories that are bought locally, so it this must mean that most of the categories such as produce and meals are frozen. The purple words are the second highest frequent words and in this category are juice, dessert, cooked and cheese. This is concerning as juice and dessert are not fresh foods and cooked might show that these foods are highly processed. The orange foods are the third most frequent foods where we do see more nutritionally dense foods such as yogurt, chicken and vegetable. This group also contains kosher which shows that there is care taken to make sure that these groups are able to access the food they need. Overall, it is concerning that most locally sourced foods are still frozen and that there seems to be a lack of high nutrition foods in the most commonly bought products. Next, we can focus more on the farms that this program aims to support. The below graph shows the percent change in amount of local food bought from 2018 to 2021.
The above graph shows the percent change in amount of food bought by each state. We see that from 2018 to 2021, most states saw a decrease in purchases. Maryland and Connecticut saw the sharpest declines in purchasing with Connecticut seeing a 99.6% decrease in amount of food purchased. Though this is may be concerning, we do see that New York has an increase in spending which shows that local food is being bought within state, ensuring peak freshness. Overall, we see that there has been a decrease in local foods bought and buying from states that have had decreases in purchases can help supporting their economies and increasing amount of fresh foods offered to New York City citizens.
Given the Good Food Purchasing Program is a relatively recent initiative, we assume that their resources may be less robust. Thus, we wanted to investigate whether there was a way for them to more smartly allocate their attention and resources to specific agencies.
To explore this question, we investigate a two primary points:
The results of these investigations would hopefully allow the Good Food Purchasing Program to focus their resources and attention on one or a few, high-priority agencies, and use the trends about those agencies’ food product group purchases to better optimize their impact with them.
We start by finding which agency or agencies are responsible for the most unit orders. To do this, we calculate a table of the total counts of units ordered by agency. We find in the table below that the Department of Education orders the most by a large margin, at around 90% of total units. After the Department of Education, the next highest contributor for proportion of orders, Health and Hospitals, only contributes to about 5% of total unit orders. Thus, we proceed with focusing on the Department of Education’s food product group purchasing trends.
Units Ordered per Agency | |
Agency | Units |
---|---|
Administration for Childrens Services | 456,632 |
Department of Correction | 10,183,003 |
Department of Education | 297,052,844 |
Department of Homeless Services | 4,338,003 |
Health + Hospitals | 15,941,166 |
Human Resources Administration | 634,611 |
With the knowledge that the Department of Education is the biggest
contributor to the number of units ordered, we focus our following
analysis of food product group purchasing trends on that agency. To
investigate the Department of Education’s trends, we begin with some
exploratory data analysis of the distribution of our individual
predictors– Time.Period
and
Food.Product.Group
–, as well as their relationship with the
number of units ordered X..of.Units
.
From our univariate histograms of each variable, we find that each
level under each of the variables has a reasonable amount of
observations, with the exception of Seafood under
Food.Product.Group
which only has 24 observations. With
this in mind, we remove Seafood from our future analysis as we consider
it to not have a large enough sample size (ideally \(n > 30\)) to run an accurate linear
regression.
When plotting our predictors against the raw number of units ordered
with boxplots, we found that there were many, high outliers. So, we
applied a log transformation to X..of.Units + 1
. The reason
for the “+1” in the transformation is to ensure that none of the log
outputs are negative. With this transformation applied, we see the final
box plots below. Ultimately, we conclude that the log number of units
doesn’t appear to change greatly with each year. However, there does
appear to be a difference in the number of units ordered by food product
group.
Next, we observe how the Department of Education’s food product group purchases have changed across time together in a faceted bar plot. We find that with the log transformation, all of the individual food product groups appear to have slight, decreasing linear trends with each passing year. The observed linear trends among each individual food product group across each year reasonably satisfy our linear model assumptions, and so we can proceed with our model and analysis.
## `summarise()` has grouped output by 'Time.Period'. You can override using the
## `.groups` argument.
With our initial assumptions reasonably satisfied, we run a multiple linear regression model on our subset of Department of Education data to model how year and food product group predict the log number of units bought by the agency.
Note that the variables Time.Period
and
Food.Product.Group
have been renamed to Year
and Group
, respectively, in the model output below for
conciseness.
We ultimately find that all of our interaction terms are significant and positive, assuming an alpha of 0.05, which implies that with each year, we expect the number of log units (which is proportional to the raw number of units) to grow when associated with the featured food product groups. We break down each coefficient in greater detail as follows:
Year:GroupBread, Grains & Legumes
’s estimate is
\(1.458 \cdot 10^{-3}\), which implies
that if the food product group is Bread, Grains, and Legumes, then with
each year, the number of log units to be purchased per order is
estimated to increase by \(1.458 \cdot
10^{-3}\) on average, when holding all other variables constant
(95% CI \([1.238 \cdot 10^{-3}, 1.678 \cdot
10^{-3}]\), \(t(2631)=13.012\),
\(p < 2 \cdot 10^{-16}\)).Year:GroupCondiments & Snacks
’s estimate is \(9.745 \cdot 10^{-4}\), which implies that
if the food product group is Condiments & Snacks, then with each
year, the number of log units to be purchased per order is estimated to
increase by \(9.745 \cdot 10^{-4}\) on
average, when holding all other variables constant (95% CI \([7.951 \cdot 10^{-4}, 1.154 \cdot
10^{-3}]\), \(t(2631)=10.653\),
\(p < 2 \cdot 10^{-16}\)).Year:GroupMeals
’s estimate is \(1.473 \cdot 10^{-3}\), which implies that
if the food product group is Meals, then with each year, the number of
log units to be purchased per order is estimated to increase by \(1.473 \cdot 10^{-3}\) on average, when
holding all other variables constant (95% CI \([1.277 \cdot 10^{-3}, 1.668 \cdot
10^{-3}]\), \(t(2631)=14.799\),
\(p < 2 \cdot 10^{-16}\)).Year:GroupMeat
’s estimate is \(1.420 \cdot 10^{-3}\), which implies that
if the food product group is Meat, then with each year, the number of
log units to be purchased per order is estimated to increase by \(1.420 \cdot 10^{-3}\) on average, when
holding all other variables constant (95% CI \([1.171 \cdot 10^{-3}, 1.669 \cdot
10^{-3}]\), \(t(2631)=11.179\),
\(p < 2 \cdot 10^{-16}\)).Year:GroupMilk & Dairy
’s estimate is \(1.935 \cdot 10^{-3}\), which implies that
if the food product group is Milk & Dairy, then with each year, the
number of log units to be purchased per order is estimated to increase
by \(1.935 \cdot 10^{-3}\) on average,
when holding all other variables constant (95% CI \([1.712 \cdot 10^{-3}, 2.157 \cdot
10^{-3}]\), \(t(2631)=17.062\),
\(p < 2 \cdot 10^{-16}\)).Year:Group_Produce
’s estimate is \(1.442 \cdot 10^{-3}\), which implies that
if the food product group is Produce, then with each year, the number of
log units to be purchased per order is estimated to increase by \(1.442 \cdot 10^{-3}\) on average, when
holding all other variables constant (95% CI \([1.265 \cdot 10^{-3}, 1.619 \cdot
10^{-3}]\), \(t(2631)=15.986\),
\(p < 2 \cdot 10^{-16}\)).We also find that we do not have enough evidence to suggest that
Year
itself has a significant relationship with log number
of units ordered.
Units by Year and Food Product Group for Dept. of Education | ||||
Term | Estimate | Std. Err. | t-stat | p-value |
---|---|---|---|---|
(Intercept) | 81.4939 | 126.9313 | 0.6420 | 0.5209 |
Year | −0.0381 | 0.0629 | −0.6057 | 0.5448 |
Year:GroupBread, Grains & Legumes | 0.0015 | 0.0001 | 13.0118 | 0.0000 |
Year:GroupCondiments & Snacks | 0.0010 | 0.0001 | 10.6532 | 0.0000 |
Year:GroupMeals | 0.0015 | 0.0001 | 14.7990 | 0.0000 |
Year:GroupMeat | 0.0014 | 0.0001 | 11.1786 | 0.0000 |
Year:GroupMilk & Dairy | 0.0019 | 0.0001 | 17.0619 | 0.0000 |
Year:GroupProduce | 0.0014 | 0.0001 | 15.9862 | 0.0000 |
With our model created, we must run model diagnostics to ensure that our assumption of constant variance is fulfilled. We do so with the following box plot of our model’s residuals, split by year and food product group. From the graph below, while imperfect in some groups such as Beverages, it appears that the distribution of residuals is sufficiently similar across each of the subsets of data, and so we can conclude that our assumption of constant variance is reasonably satisfied.
We don’t need to plot our residuals versus the predictors or fitted values because there is no clear linear trend to assess when our predictors are both categorical/discrete.
We also analyze a QQ plot of our model’s residuals to assess the normality of our errors. We find in our QQ plot that the residuals align closely with the ideal theoretical vs. observed line, only deviating slightly towards the extremities of the graph. Thus, we can conclude that our model reasonably satisfies the normality assumption for errors.
Ultimately, we conclude that the Department of Education contributes to the most unit orders, and thus the Good Food Purchasing Program should direct most of its resources to ensuring it abides by the Good Food policies. Within the Department of Education, all of the food groups we investigated produced a slight, positive trend, and thus there isn’t a specific food group we can recommend that the Good Food Purchasing Program should focus on. Instead, we might recommend that the Good Food Purchasing Program simply focus on the categories that make up the majority of the Department of Education’s unit orders, such as Milk & Dairy.
Before we conclude this section, we note some caveats with the conclusions drawn. Firstly, since the Good Food Purchasing Program is relatively new, we only had three, discrete years of data to work with in our regression. This makes our model and results susceptible to noise and outliers, as well as difficult to extrapolate. In the future, we would hope that we’d either be able to have more years’ worth of data to work with or a more granular breakdown of the date and month in which the product was purchased. Secondly, most of this data was collected during COVID, which introduces a significant external factor that likely impacted the number of units purchased in the years 2020-21. This factor is hard to account for, and thus was not taken into consideration in our linear model, which may make the results slightly inaccurate or hard to extrapolate. Having more years worth of data in the future would also help solve this issue.
Through research, our goal was to determine the efficiency of the Good Food Purchasing Program. With our first research question, we found that fresh produce purchasing actually decreased and that frozen foods were most purchased in recent years. With our second research question, we found a purchases from local farms decreased while also confirming that frozen foods are most frequently bought. With our third research question, we found that the Department of Education buys at the largest margin with largely milk and dairy purchases.
We urge the GFPP to commit more to efficient food purchases as the above analysis contrast their stated goals.