Introduction

Happiness is an undeniably important part of human lives. On an individual level, happy people are often more successful, more creative, and even live longer lives. But how does that play into effect on a national and global level? To research into the predictors of country-wide happiness, we took advantage of the World Happiness Report, which is a publication by the Sustainable Development Solutions Network.

The annual report contains data about a variety of variables related to quality of life in every country in the world, as follows: * Happiness * Health (life expectancy) * Economy (log GDP per capita) * Freedom * Family (social support) * Generosity * Trust (lack of perceptions of corruption in government and businesses)

More information about each of these variables can be found here

The research questions we will explore are: 1. How has average happiness around the world/within regions changed over time? 2. Which factors are most influential in overall happiness in countries? 3. With the established highest contributing factors to global happiness, how effective are they as predictor variables of future generation growth?

Question 1: How has average happiness around the world/within regions changed over time?

Our first research question looked at how average happiness around the world and between regions has changed over the past eight years beginning in 2015, and apparent trends that have occurred over this period of time. We start our exploration of this dataset with a boxplot looking at the happiness index by region, factored by year.

The boxplot allows us to more clearly distinguish distribution of happiness scores across the eight years we observed per region. In addition, by looking at the shifts in median score for each region, we can gauge generally how people’s happiness changed per year. Overall, none of the indices shifted by more than 1 unit for each region. North America, Australia/New Zealand, and Western Europe tended to have the highest happiness indices overall, while Sub-Saharan Africa and Southern Asia had the lowest. Particularly, North America and Australia/New Zealand most clearly exhibited a downward trend over the eight years. The median happiness score for Southern Asia increased in 2020 unlike a majority of the other regions, which is surprising considering it was the year the pandemic began. Furthermore, the ranges for all ten regions generally remained constant.

We continued regional explorations on global happiness by choropleth maps, comparing 2015 and 2018 years.

happData15 <- subset(happiness, Year == 2015)
happData22 <- subset(happiness, Year == 2022)

map.world <- map_data("world")
happWorld15 <- left_join(happData15, map.world, by=c('Country' = 'region'))
happWorld22 <- left_join(happData22, map.world, by=c('Country' = 'region'))

ggplot(happWorld15) + geom_polygon(aes(x=long, y=lat, group=group, fill=Happiness), color="black") +
  scale_fill_gradient2(low="steelblue1", mid="lavender", high ="plum1", midpoint = 5) +
  theme_void() +
  coord_map("mercator") + labs(x="Longitude", y="Latitude", title = "2015 Global Happiness Colored Chart")

ggplot(happWorld22) + geom_polygon(aes(x=long, y=lat, group=group, fill=Happiness), color="black") +
  scale_fill_gradient2(low="steelblue1", mid="lavender", high ="plum1", midpoint = 5) +
  theme_void() +
  coord_map("mercator")+ labs(x="Longitude", y="Latitude", title = "2022 Global Happiness Colored Chart")

The choropleth map allows us to further visualize the spread of happiness and the change over time between 2015 and 2022. We notice that in general, in 2022, there is less pink (happy) saturation in all of the regions except Africa. There are more saturated blues, denoting lower happiness scores, in 2015 for Africa– in general, the region reported happier scores in 2022. In comparison, we observed less happiness in 2022 compared to the earlier year in India and Afghanistan.

Question 2: Which factors are most influential in overall happiness in countries?

Our second research question we wanted to address looked at what factors were most influential in determining overall happiness in countries, specifically whether social factors, such as government system or trust, had more influence on the happiness index or if geographical factors played a larger role. From our data, we can observe people’s feelings towards the institutions in charge of making long-term decisions that directly impact them, as well as how they personally perceive their wellbeing to be, on a worldwide scale.

To get a general understanding of the quantitative variables and how they relate to each other, we created a correlation matrix as follows: We observe that Happiness and Rank appear to have the largest correlation coefficient in terms of magnitude, which makes sense as they, by definition, are directly related; the smaller the rank, the larger the happiness score (ie. the country with rank 1 has the highest happiness score). Between the explanatory variables of Year, Family, Health, Economy, Generosity, Freedom, and Trust, we see that Economy has the highest positive correlation with Happiness, followed by Health and Family. Generosity and Year appear to have the smallest correlation coefficients with Happiness; most notably, this could suggest that Happiness on a global scale has not differed significantly between 2015 and 2022. Health and Economy also seem to be relatively highly correlated with each other, which could indicate collinearity.

We then used the best subsets regression to select two variables that would best predict Happiness. We chose to use two predictor variables as we believed that it would produce a model that could explain more of the variation in Happiness than a model with a single predictor variable, while keeping it simple and minimizing multicollinearity. Through best subsets, we found that Health and Freedom were the best two predictors of Happiness.

To further explore this relationship, we created a scatterplot of Happiness and Health, colored by Region and sized by Freedom:

In this plot, each point represents a country during a specific year. We see a clear positive association between Happiness and Health, which is to be expected from the correlation plot. We also see a relationship between Health and Region, as points from the same region are grouped together in the same area on the scatter plot. Within these groups of regions, many of the points with lower Happiness indices are also smaller in size, indicating lower Freedom values. This makes sense both within our analysis, as Freedom is positively correlated with Happiness, and logically; people who do not feel that they have the freedom to make their own life choices are less likely to be able to pursue what makes them happy.

Question 3: With the most influential factors in global happiness, how effective are they as predictor variables of future generation growth in education?

Our previous graphs have explored the influence of region, as well as Health and Freedom factors in a country’s average happiness index. We hoped to further investigate the application of these values in predicting future generations’ growth. We chose education, specifically UNICEF sourced global secondary school attendance, as a marker of growth. As health and freedom are the established, highest influential values to a country’s average happiness, how well do these variables represent the next generation’s success and potential? To best answer this question, we chose to first divide our global data into colors of region, then expressed contour plots comparing Health and Freedom values.

library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.2.1
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(grid)
educ <- read.csv(file = 'global_education_attendance_completion_rates.csv')
educ <- as.data.frame(educ)
worldEd22 <- left_join(happData22, educ, by=c('Country' = 'Country'))
worldEd22 <- worldEd22[complete.cases(worldEd22),]

g1 <- ggplot(data=worldEd22, aes(x=Health, y=USCR.ALL/100)) + 
  geom_density2d() +
  geom_point(aes(col = Region)) + labs(x="Health Value", y="Upper Secondary Net Attendance Rates")

g2 <- ggplot(data=worldEd22, aes(x=Freedom, y=USCR.ALL/100)) + 
  geom_density2d() +
  geom_point(aes(col = Region)) + labs(x="Freedom Value", y="Upper Secondary Net Attendance Rates")

grid.arrange(g1, g2, ncol = 1, top=textGrob("2022 Upper Secondary School Attendance Between Health and Freedom Values"))

The contour plot brings an interesting observation that there are two, approximately similar modes when comparing both health and attendance rates, with freedom and attendance rates. We also noted in general, clusters typically had one ore two specific regions of the world. For example, a lower net secondary school attendance rate and health value occurred in high frequency for Sub-Saharan African regions. Central and Eastern Europe and Latin America and Carribean regions showed higher attendance and higher health value pairs. These same relationships can be observed in the freedom and attendance rate plots, as well. Through the scatterplot, we can also generally affirm that an increase in how much ‘Freedom’ or ‘Health’ and individual believed to have is positively correlated with attendance rates for upper secondary schools. Since attendance to school is reflective upon home situations and desire, purpose, or ability to pursue education, it makes sense that the value is associated with the freedom in one’s life, and the health of the family.

Conclusion

In conclusion, we were able to gain insight into our research questions through visualizations of the dataset.

From the box plot, we saw that between 2015 and 2022, average happiness increased, decreased, or remained relatively constant depending on the region. Between regions, through the box plot and choropleth map, we saw that happiness scores also differed between regions. Western Europe, North America, and Australia and New Zealand being the happiest regions and Sub-Saharan Africa and Southern Asia being the unhappiest regions, on average.

From our correlation matrix, we found that out of the explanatory variables, log GDP per capita, life expectancy, and social support were the most highly correlated variables with happiness indices, while year and generosity were the least correlated. From our best subsets regression, the best two explanatory variables were Health (life expectancy) and Freedom. We saw this effect in a scatter plot, where happiness scores were positively associated with life expectancy and freedom values. From our contour plots, we saw how education, our measure of a country’s future success, correlated with health and freedom values.

Some future work that we would be interested in exploring is why certain regions’ happiness scores increased throughout the past eight years, while others decreased. In particular, regions like Southern Asia, Southeastern Asia, and Central and Eastern Europe appeared to have higher happiness scores throughout 2020 and 2021, during the COVID-19 pandemic lockdowns. Meanwhile, North America and Australia and New Zealand had decreasing scores for the majority of the eight years. We would like to identify the factors that contribute to this difference between regions.