Happiness is an undeniably important part of human lives. On an individual level, happy people are often more successful, more creative, and even live longer lives. But how does that play into effect on a national and global level? To research into the predictors of country-wide happiness, we took advantage of the World Happiness Report, which is a publication by the Sustainable Development Solutions Network.
The annual report contains data about a variety of variables related to quality of life in every country in the world, as follows: * Happiness * Health (life expectancy) * Economy (log GDP per capita) * Freedom * Family (social support) * Generosity * Trust (lack of perceptions of corruption in government and businesses)
More information about each of these variables can be found here
The research questions we will explore are: 1. How has average happiness around the world/within regions changed over time? 2. Which factors are most influential in overall happiness in countries? 3. With the established highest contributing factors to global happiness, how effective are they as predictor variables of future generation growth?
Our first research question looked at how average happiness around the world and between regions has changed over the past eight years beginning in 2015, and apparent trends that have occurred over this period of time. We start our exploration of this dataset with a boxplot looking at the happiness index by region, factored by year.
The boxplot allows us to more clearly distinguish distribution of happiness scores across the eight years we observed per region. In addition, by looking at the shifts in median score for each region, we can gauge generally how people’s happiness changed per year. Overall, none of the indices shifted by more than 1 unit for each region. North America, Australia/New Zealand, and Western Europe tended to have the highest happiness indices overall, while Sub-Saharan Africa and Southern Asia had the lowest. Particularly, North America and Australia/New Zealand most clearly exhibited a downward trend over the eight years. The median happiness score for Southern Asia increased in 2020 unlike a majority of the other regions, which is surprising considering it was the year the pandemic began. Furthermore, the ranges for all ten regions generally remained constant.
We continued regional explorations on global happiness by choropleth maps, comparing 2015 and 2018 years.
happData15 <- subset(happiness, Year == 2015)
happData22 <- subset(happiness, Year == 2022)
map.world <- map_data("world")
happWorld15 <- left_join(happData15, map.world, by=c('Country' = 'region'))
happWorld22 <- left_join(happData22, map.world, by=c('Country' = 'region'))
ggplot(happWorld15) + geom_polygon(aes(x=long, y=lat, group=group, fill=Happiness), color="black") +
scale_fill_gradient2(low="steelblue1", mid="lavender", high ="plum1", midpoint = 5) +
theme_void() +
coord_map("mercator") + labs(x="Longitude", y="Latitude", title = "2015 Global Happiness Colored Chart")
ggplot(happWorld22) + geom_polygon(aes(x=long, y=lat, group=group, fill=Happiness), color="black") +
scale_fill_gradient2(low="steelblue1", mid="lavender", high ="plum1", midpoint = 5) +
theme_void() +
coord_map("mercator")+ labs(x="Longitude", y="Latitude", title = "2022 Global Happiness Colored Chart")
The choropleth map allows us to further visualize the spread of happiness and the change over time between 2015 and 2022. We notice that in general, in 2022, there is less pink (happy) saturation in all of the regions except Africa. There are more saturated blues, denoting lower happiness scores, in 2015 for Africa– in general, the region reported happier scores in 2022. In comparison, we observed less happiness in 2022 compared to the earlier year in India and Afghanistan.
Our second research question we wanted to address looked at what factors were most influential in determining overall happiness in countries, specifically whether social factors, such as government system or trust, had more influence on the happiness index or if geographical factors played a larger role. From our data, we can observe people’s feelings towards the institutions in charge of making long-term decisions that directly impact them, as well as how they personally perceive their wellbeing to be, on a worldwide scale.
To get a general understanding of the quantitative variables and how
they relate to each other, we created a correlation matrix as follows:
We observe that
Happiness
and Rank
appear to
have the largest correlation coefficient in terms of magnitude, which
makes sense as they, by definition, are directly related; the smaller
the rank, the larger the happiness score (ie. the country with rank 1
has the highest happiness score). Between the explanatory variables of
Year
, Family
, Health
,
Economy
, Generosity
, Freedom
, and
Trust
, we see that Economy
has the highest
positive correlation with Happiness
, followed by
Health
and Family.
Generosity
and
Year
appear to have the smallest correlation coefficients
with Happiness
; most notably, this could suggest that
Happiness
on a global scale has not differed significantly
between 2015 and 2022. Health
and Economy
also
seem to be relatively highly correlated with each other, which could
indicate collinearity.
We then used the best subsets regression to select two variables that
would best predict Happiness.
We chose to use two predictor
variables as we believed that it would produce a model that could
explain more of the variation in Happiness
than a model
with a single predictor variable, while keeping it simple and minimizing
multicollinearity. Through best subsets, we found that
Health
and Freedom
were the best two
predictors of Happiness
.
To further explore this relationship, we created a scatterplot of
Happiness
and Health
, colored by
Region
and sized by Freedom
:
In this plot, each point represents a country during a specific year.
We see a clear positive association between Happiness
and
Health
, which is to be expected from the correlation plot.
We also see a relationship between Health
and
Region
, as points from the same region are grouped together
in the same area on the scatter plot. Within these groups of regions,
many of the points with lower Happiness
indices are also
smaller in size, indicating lower Freedom
values. This
makes sense both within our analysis, as Freedom
is
positively correlated with Happiness
, and logically; people
who do not feel that they have the freedom to make their own life
choices are less likely to be able to pursue what makes them happy.
Our previous graphs have explored the influence of region, as well as
Health
and Freedom
factors in a country’s
average happiness index. We hoped to further investigate the application
of these values in predicting future generations’ growth. We chose
education, specifically UNICEF sourced global secondary school
attendance, as a marker of growth. As health and freedom are the
established, highest influential values to a country’s average
happiness, how well do these variables represent the next generation’s
success and potential? To best answer this question, we chose to first
divide our global data into colors of region, then expressed contour
plots comparing Health
and Freedom
values.
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.2.1
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(grid)
educ <- read.csv(file = 'global_education_attendance_completion_rates.csv')
educ <- as.data.frame(educ)
worldEd22 <- left_join(happData22, educ, by=c('Country' = 'Country'))
worldEd22 <- worldEd22[complete.cases(worldEd22),]
g1 <- ggplot(data=worldEd22, aes(x=Health, y=USCR.ALL/100)) +
geom_density2d() +
geom_point(aes(col = Region)) + labs(x="Health Value", y="Upper Secondary Net Attendance Rates")
g2 <- ggplot(data=worldEd22, aes(x=Freedom, y=USCR.ALL/100)) +
geom_density2d() +
geom_point(aes(col = Region)) + labs(x="Freedom Value", y="Upper Secondary Net Attendance Rates")
grid.arrange(g1, g2, ncol = 1, top=textGrob("2022 Upper Secondary School Attendance Between Health and Freedom Values"))
The contour plot brings an interesting observation that there are two,
approximately similar modes when comparing both health and attendance
rates, with freedom and attendance rates. We also noted in general,
clusters typically had one ore two specific regions of the world. For
example, a lower net secondary school attendance rate and health value
occurred in high frequency for Sub-Saharan African regions. Central and
Eastern Europe and Latin America and Carribean regions showed higher
attendance and higher health value pairs. These same relationships can
be observed in the freedom and attendance rate plots, as well. Through
the scatterplot, we can also generally affirm that an increase in how
much ‘Freedom’ or ‘Health’ and individual believed to have is positively
correlated with attendance rates for upper secondary schools. Since
attendance to school is reflective upon home situations and desire,
purpose, or ability to pursue education, it makes sense that the value
is associated with the freedom in one’s life, and the health of the
family.
In conclusion, we were able to gain insight into our research questions through visualizations of the dataset.
From the box plot, we saw that between 2015 and 2022, average happiness increased, decreased, or remained relatively constant depending on the region. Between regions, through the box plot and choropleth map, we saw that happiness scores also differed between regions. Western Europe, North America, and Australia and New Zealand being the happiest regions and Sub-Saharan Africa and Southern Asia being the unhappiest regions, on average.
From our correlation matrix, we found that out of the explanatory
variables, log GDP per capita, life expectancy, and social support were
the most highly correlated variables with happiness indices, while year
and generosity were the least correlated. From our best subsets
regression, the best two explanatory variables were Health
(life expectancy) and Freedom
. We saw this effect in a
scatter plot, where happiness scores were positively associated with
life expectancy and freedom values. From our contour plots, we saw how
education, our measure of a country’s future success, correlated with
health and freedom values.
Some future work that we would be interested in exploring is why certain regions’ happiness scores increased throughout the past eight years, while others decreased. In particular, regions like Southern Asia, Southeastern Asia, and Central and Eastern Europe appeared to have higher happiness scores throughout 2020 and 2021, during the COVID-19 pandemic lockdowns. Meanwhile, North America and Australia and New Zealand had decreasing scores for the majority of the eight years. We would like to identify the factors that contribute to this difference between regions.