Inspiration

In 2017, we may have hoped for self-driving cars or some other new technological advancement, but instead we got a weird craze for Avocados. From Social Media Articles to market analyses, everyone was hopping on the trend and talking about the “Avocado-pocalypse of 2017.” So the question has to be asked: was the avocadopocalypse real? Did it prices or total volume sold in 2017 significantly differ from other years?

The Data

Our data is taken from Kaggle here, and the most important variables are the following:

PLU stands for product lookup codes. The 3 different PLUs are simply just 3 different kinds of Hass avocados. Other variations (such as greenskins) are not included. The data comes from US observations, so prices are in dollars and regions are strictly American. (‘Size’ of avocado relates to the total weight in ounces or an estimate of the shape)

Measurements of avocado sizes ^

Measurements of avocado sizes ^

Since we want to delve into whether or not 2017 could be considered an “Avocadopocalypse,” we had three main research questions:

  1. Do the price, volume, and type of avocado change over time?

  2. Does the number of avocados sold differ across regions and cities?

  3. Is there a preference for certain avocado sizes? Did 2017 change those preferences?

So let’s address the first general question. Was 2017 a statistically significant year for avocado prices? How about for Volume?

The below figure shows the average price and log(Total Volume) of avocado sales, over the 4 years. The key thing to note is that we only used the data that had regions = cities. Any sort of statistical testing done with cities had similar results as regions, as the region data are just aggregated city data. Thus it should be no surprise that we get similar conclusions.

Just from a visualization standpoint, yes for Average Price. We split into two groups because we know traditionally organic goods are more expensive, so we want to control for that. Just to be sure, we ran a t-test to compare the Average Price of conventional vs organic avocados, and found a statistically significant result. No surprises there.

What about within conventional and organic themselves? Do the years differ significantly? We run a bartlett test to determine whether or not variances are equal. We find a significant p-value, thus we ran a Dunnett test with the assumption that the variances are not equal and also find significance: the average price of avocados are not equal between the 4 years, for both conventional and organic avocados.

Main conclusion: For both conventional and organic avocados, 2017 is significant from all other years, when comparing average price.

How about Total Volume? We found that only 2017 and 2018 were statistically significant. we’ll explore why this might be the case later on.

The dates in our dataset are listed in weekly intervals from 2015-01-04 to 2018-03-25. Each unique data actually has 18249 observations because the dates are repeated for region/city specific information. We decided to create a variation of our dataset where we grouped by unique date (there are 169 unique dates) and calculated the mean average price per date, median total volume per date, standard deviation of average price per date, and standard deviation of median total volume per date. We looked at the mean average price since we found the standard deviations of average price per date to be consistently low (around 0.5), and we looked at the median total volume per date mainly because we found the standard deviations of median total volume per date to be high and varied.

We produced two ACF graphs with lag 169 to measure trends from 2015-01-04 to 2018-03-25. It appears that the mean average price of avocados per date is significantly different around 50-100 weeks from our start date; this would roughly correspond to the start of 2016 to the end of 2017. The ACF graph measuring the median total volume of avocados per date is significant in peaks around 75 weeks from the start date and 125 weeks from the start date. The two ACF graphs do not appear similar, as we hypothesized the correlation of Price and Volume might show similarities over time.

The graph shows that around the start of each year there is a spike in the number of total avocados sold; we could not find any evidence as to why this might be since it appears avocado season starts earliest in March for top-producing states like California. Overall, it appears that the median of total avocados sold per day has been increasing from 2015 to 2018, although we do not see a notable spike in 2017 as we thought we would. In fact, the distribution of avocados sold in 2016 and 2017 look almost the same.

We produced a moving average graph with width 10 to study how avocado price changes in quarterly intervals. It appears that there is a seasonal trend in the pricing of avocados; that is, around summer-time in the U.S. is when prices are highest. The overall prices of avocados increased from 2015 to 2016, and there is a huge spike in price from 2016 to 2017. Interestingly, we look at the previous graph to note that the number of avocados sold from 2016 to 2017 does not change as drastically as the price of avocados from 2016 to 2017.

This map shows the difference in avocado prices between 2017 and all other years recorded by region. Each point on the map represents one or two cities included in the dataset. Each coordinate was found using Google Maps. The locations in the dataset that included two cities (e.g. BaltimoreWashington) were placed on the map using a coordinate between the cities. Regions that were not cities that were included in the dataset are not shown on this map. The price difference was found by subtracting the average avocado prices of all years recorded that were not 2017 from the average prices from 2017. The map shows that all cities (except for Pittsburgh, which showed an $0.002 decrease) showed an increase in average price in 2017 compared to other years.

This last graph shows the mean volume of PLUs sold over time. The trends don’t look to be any different for 2017 when compared with any of the other years, for any of the PLUs.

Conclusion

  1. The price of avocados has been increasing since 2015; we saw an increase in prices from 2015 to 2016 and then a pretty huge increase in prices from 2016 to 2017. The volume or total number of avocados sold has been increasing over time, although the trends over 2016 and 2017 appear the same. The BBC in this article gives us some insight as to why the price significantly changed, but the total volume sold did not. It may have been due to a decreased global supply, which led to increased prices. For whatever reason, demand also increased but due to global avocado shortages, the amount sold was unchanged. Perhaps due to these combined factors, society deemed 2017 as an ‘avocado-pocalypse’.

  2. The prices of avocados tended to be higher for cities on the coast than those inland. We found that there was an increase in average avocado prices in 2017 compared to the average prices in 2015, 2016 and 2018. This trend was seen in all cities observed except for Pittsburgh, suggesting that the “avocado-pocolypse” caused a spike in avocado prices in 2017. Again, there did not seem to be a difference of volume of avocados sold by region or city.

  3. The most frequently purchased avocados are PLU#4046 and PLU#4225 (the smallest avocado sizes). While the sizing scale of avocados can be vague and vary vendor-to-vendor, PLU4046 and PLU4225 correspond to the smaller avocado sizes in this dataset - likely implying they are more attractive to customers. In 2017, it appears the historical skew towards PLU4225 avocados was greatly reduced - with the weekly PLU4046 volume more closely tracking the volume for PLU4225 than in years prior. It cannot be said whether this shift came from a difference in consumer demand or possible production shortfalls.