For our project, we use data collected by the Gallup World Poll for the 2021 World Happiness Report. In their poll, Gallup asked respondents to value their current life as a whole using the image of a ladder with a 10 being the highest rung and a 0 being the lowest. The Ladder.score
in the dataset is then the mean reported evaluation from the respondents of that country and is the primary variable we examine throughout this report. Other variables such as Log(GDP) or Healthy Life Expectancy are quantitative measures collected from the World Bank and reports published by the individual countries. Other variables examined like Generosity or Perception of corruption are reported as values that estimate how much of the Ladder.score
is explained by those qualitative measures.
With that in mind, our project revolves around three questions: What regions of the world have the highest density of high- and low-scoring countries? How do the happiness ladder scores that were released this year differ from scores from previous years—have happiness scores for certain countries and/or regions increased or decreased over the years? And on a macro scale, what are some key predictors of happiness and how are geographical regions associated with these variables?
In this visualization, we’re able to get a world-view on how different countries and regions have reported their Ladder Scores. Unsurprisingly, countries in Scandinavia, Western Europe, and North America reported having higher mean Ladder Scores while countries in Africa and South Asia have lower reported scores. The countries and regions that are colored gray are either regions without a single recognized government (i.e., Antarctica) or countries enveloped in civil and/or military conflicts that made it impossible for Gallup to conduct their poll.
The question is which regions tend to have the most countries that have lower happiness scores (i.e., which regions have the unhappiest countries). This provides a way for us to see at a glance which regions tend to be less happy than other regions. We can see from the plot that the regions Sub-Saharan Africa
and South Asia
tend to have countries that are typically very unhappy.
After exploring the difference in reported happiness scores between regions, we were particularly interested in the lowest scoring regions and how their scores varied over time.
Given that Sub-Saharan Africa and South Asia had the lowest mean reported ladder scores, we were interested to see if the 2021 scores were part of a larger trend or significantly different due to some global event (like the COVID-19 pandemic). For the six lowest-scoring countries in both Regions, there seems to be either a consistent declining or little difference in ladder scores over the years. This is also noted in the World Happiness Report where the authors have noted a “surprising resilience to the COVID-19 pandemic” particularly in the South Asia region (which may be an indication of larger issues that have a more significant effect on overall happiness).
To address this question, we first used best subsets to determine which combinations of explanatory variables result in the highest R-squared values, allowing us to find the best predictors of the 2021 happiness ladder scores out of logged GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption.
After taking multicollinearity into account, we found that the best single predictor of 2021 ladder score is logged GDP per capita, and the best two predictors are logged GDP and freedom to make life choices. Since the best-two regression model has a higher adjusted R-squared value, when combined, the two aforementioned variables are the strongest predictors of happiness. (The model also fulfills the assumptions of linear regression.)
Although it’s interesting to note which combination of explanatory variables are most correlated with the ladder scores, it is not that easy to visualize a multiple linear regression line with two quantitative covariates. As such, we’ll use Logged.GDP.per.capita
as our explanatory variable for a simple linear regression model and color the countires by their geographical region to gain more insight into how logged GDP is associated with happiness score.
We can see from this plot that there are clusters of regions: for example, many Sub-Saharan African countries have lower logged GDP per capita and ladder scores while many Western European countries have higher GDP’s and ladder scores. Furthermore, as nearly all of the regression lines have positive slopes (except for North American and ANZ and the Commonwelath of Independent States), we can identify a general trend across geographic regions of a positive relationship between logged GDP per capita and happiness score. This result is also—unsurprisingly—supported by the overall trend of the scatterplot, with there being a definite positive correlation between GDP and ladder score, meaning that generally, the higher a country’s logged GDP per capita is, the higher it scores on happiness.
As another note, although the regression lines in the graph are clearly not parallel, when we regress logged GDP and region against ladder score in a linear regression model with interaction and look at the summary output, we find that none of the interaction terms are significant.
We can also perform principal component analysis to visualize how the specific geographic regions are associated (and in what way) with the variables of logged GDP, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption.
From this biplot, we can see some clusters of geographical regions, such as Sub-Saharan Africa and Western Europe, which were also noticeable in the scatterplot above, further underscoring the similarity of the countries within those clusters. Looking at the variable arrows, logged GDP, social support, healthy life expectancy, and the freedom to make life choices all point to the left, where a mix of regions reside, meaning those regions tend to have higher levels of those variables while countries in Sub-Saharan Africa, for instance, tend to have lower values of the variables. Simiilar logic can be applied to the other variables, and thus, this plot helps us glean a better understanding of how the geographical regions tend to be associated with these variables.
In general, we can see that countries in Sub-Saharan Africa and South East Asia tend to have lower happiness scores. We also find that logged GDP per capita is significantly correlated with the happiness score. It would be interesting to look at how these scores change given the push for vaccination in the United States but also for countries such as India, which is suffering from a devastating second wave, and Colombia, which is in the middle of major protests against a COVID-19 tax reform bill and police response to the protests. We would most likely anticipate that countries with ready access to vaccines would see an improvement in happiness scores, while those who do not will probably have scores that continue to decline or remain the same.