Introduction

This research paper aims to examine the world happiness dataset, which includes the following variables:

Country name
Year
Regional Indicator
Life Ladder Score: Happiness index
Log GDP per capita: economic production
Social Support
Life Expectancy
Freedom
Generosity
Corruption

The data is a continuous report that ranks 155 countries by their happiness levels. The report is used to help inform policy-making decisions for countries by governments, organizations, and civil society. The measurements above are being used effectively to assess the progress of nations and to help explain the personal and national variations in happiness.

With this dataset, we are exploring three research questions:

What variables –Log GDP per capita, Social Support, Healthy life expectancy at birth, Freedom to make life choices, Generosity, Perceptions of corruption– have the most significant impact in explaining happiness levels?
How did global happiness levels change due to the COVID-19 pandemic and how does happiness levels depend on a country’s freedom and corruption?
Are happiness levels the same across regions?

Research Question 1: How do the variables –Log GDP per capita, Social Support, Healthy life expectancy at birth, Freedom to make life choices, Generosity, and Perceptions of corruption– impact happiness levels?

We examine different data visualizations between life ladder score (happiness index) and the various covariates in order to answer the following research question: How do the variables –Log GDP per capita, Social Support, Healthy life expectancy at birth, Freedom to make life choices, Generosity, and Perceptions of corruption– impact happiness levels?

Since we are only focused on understanding the correlation between the Log GDP per capita, Social Support, Healthy life expectancy at birth, Freedom to make life choices, Generosity, Perceptions of corruption variables, we omit the year, country name, positive affect, negative affect, and regional variables that are also included in our data set.

Higher GDP per capita is often associated with a higher standard of living in a country. Our pairs plot indicated that a country with a higher standard of living could have a higher level of happiness. So, this plot shows the distribution of each region’s happiness index, focusing on comparing the distribution of happiness index between regions/countries with a higher GDP and lower GDP. Furthermore, by plotting the relationship between three variables, we can see that this plot shows that the peaks for the distributions of the higher GDP regions are centered more on the right of the graph compared to the lower GDP regions, suggesting that there may be a relationship between higher GDP per capita and a higher happiness index.

In the graph above we can see a clear positive correlation between Happiness and Life Expectancy. The relationship also seems to accentuate towards the high end of the life expectancy spectrum. Additionally, we observe clear partitions in the scatterplot by region. Notably, the Sub-Saharan Africa region has lower life expectancy at birth and Western Europe and has higher life expectancy at birth.

We can observe from the graph above that the distribution of corruption is left-skewed with most countries having relatively high levels of corruption. There exists a negative correlation between Life Ladder and Corruption especially visible in the top half of the graph. Western Europe, a region with low corruption also shows the highest levels of happiness, specifically in the lowest corruption countries. Sub-Saharan Africa contains a small group of outliers in the bottom left of the graph where albeit low corruption also shows low happiness most likely due to other factors.

Research Question 2a: Did regions see changes in happiness levels from 2019 to 2020-2021 (Covid-19 pandemic period)?

The Covid-19 pandemic took the world by surprise and continues to cause chaos in many different aspects of our lives with new variants. Since this data set pertains to happiness scores for all the countries within it, we seek to understand how the average happiness score (aka, Life Ladder in the dataset) has changed over time with the plot below.

This time-series analysis conveys the average happiness scores for all regions of the world from 2005 to 2020. Average happiness scores are colored by region. A dotted line is added at year = 2019 to focus in on the trend that exists from 2019-2020. A key takeaway for this plot is that many regions of the world had fluctuating average happiness scores from 2005-2019, which makes sense because there are many different events that can take place in certain regions to impact the overall average happiness score for a given year. Overall, average happiness ranges from 4 - >7.0.

For 2019-2020, the Middle East and and North Africa’s scores increased, the Commonwealth States’ scores increased, Central and Eastern Europe’s scores also sloped upward. South Asia’s average happiness score increased and so did Sub-Saharan Africa’s score. East Asia’s score also continued to increase which is interesting because East Asia was put in the hot seat by the media for Coronavirus, and so one would think that their scores would go down due to the negative PR. Southeast Asia’s average happiness score went down, and so did Latin America and the Carribean’s scores.

We observe that North America and Western Europe have higher average happiness scores than the rest of the world on average, and during 2019-2020, they mostly flat-lined and continued in the same direction they were going in before. Some regions’ average happiness scores decreased, whereas many other countries average happiness scores actually increased, interestingly enough. This chart fits into our research question of how average happiness has changed over time, especially over course of the pandemic.

Research Question 2b: Is free will good or bad? Is freedom to make life choices in communist countries (China, Laos and Vietnam) vs non-communist countries (US, Taiwan, Switzerland and Singapore) correlated with happiness?

Currently, there are five countries in the world that are officially communist: China, North Korea, Laos, Vietnam and Cuba. Within the data set being used, there were data for China, Laos, and Vietnam only, and so we will be comparing them to four non-communist countries in terms of happiness and freedom to make life choices.

The above visualization depicts happiness scores for three communist countries and four non-communist countries. Higher freedom to make life choices scores is good, and the same story applies to happiness scores (the higher the better). On average, we observe that happiness scores for the non-communist countries we have in our sample are higher than they are for the communist countries in the sample.

We also observe that on average, the freedom to make life choices variable has a high range of values it can take on for non-communist countries, as there are a considerable number of data points on the left side of the chart as well as on the right side, yet, they are still mostly happier than communist countries. We observe that for communist countries, freedom to make life choices scores are mostly concentrated from .76 - 1.0, and happiness ranges from 4.0 - 5.6. Another interesting takeaway is that the slopes for both communist and non-communist countries are very similar to one another. This begs the question of if being a communist state or not is actually significant in predicting happiness, but answers the question we set out to answer of exploring the relationship between communism, happiness and freedom to make life choices.

Research Question 2c: Are perceptions of corruption correlated with happiness based on if a country is communist or not?

For this question, we are working with the same sample of countries that we worked with in research question 2b, but some data points were missing for the communist countries for perceptions of corruption, which could affect the analysis.

This visualization demonstrates if perceptions of corruption affect happiness scores based on whether or not a country is classified as communist or not. A higher score for perceptions of corruption is bad, but a higher score for happiness is good. One key takeaway from this chart is that the data are not exactly linear, and so we fitted a loess model onto the data. Similar to the previous chart, happiness scores for non-communist countries were higher on average than for communist countries. However, a key takeaway from this plot is that some non-communist countries had very high perceptions of corruption (even higher some of the communist countries), with one country’s data point falling into the cluster of the communist countries’ data points at around the .75 level.

Similar to the previous plot, there are a wide range of values for perceptions of corruption for non-communist countries, while for communist countries, the data is clustered on a specific region of the chart (mostly from .50 - >.75) and the range is much tighter. It is interesting to note that for non-communist countries, the data is in the form of an inverted parabola, and so perceptions of corruption increased and so did happiness up until a certain point. Finally, many of the non-communist countries are happy, yet still have high perceptions of corruption. This chart also answers the question we set out to explore of if a country being communist affects perceptions of corruption and and happiness.

Research Question 3: Are happiness levels the same across regions?

Our research question aims to find if happiness levels are the same throughout the world. Ideally, it would comforting to know that happiness is homogeneous across the Earth and no matter where one is they can find happiness just as likely as another place. Thus, we will study further into whether happiness levels are the same across region.

To start off, we want to examine an areal map of the world and color it by the average happiness score of countries.

From the areal plot above, it seems like we can conclude that North America, South America, Europe, Australia, and New Zealand have higher average happiness scores compared to other regions in the world. The majority of parts of Asia and Africa appear to have lower average happiness scores compared to other regions.

We may look at other plots to get a better idea of the distribution of average happiness score given region instead, and then run test of equal mean and equal variance for the regions.

From the graph above, we conclude very similiar findings to the areal map. Having a higher average happiness score than other regions we find North America and ANZ, Western Europe, and Latin America and Caribbean. Having a lower average happiness score than other regions we find South Asia and Sub-Saharan Africa. At the same time, this graph can tell us more information about the regions and we also find that information about range, mean, outliers, and quartiles. We won’t dive into all regions, but we note that North America and ANZ is skewed left and has a very small range of average happiness levels, Middle East and North Africa ranges the most out of the regions, and Sub-Saharan Africa, South Asia, and Latin America and Caribbean are the regions that have outliers.

Thus, through our visualizations we can conclude that happiness levels are not the same across region. In other words, there are certain regions that are more prone to have higher happiness levels or lower happiness levels that are significantly different from other regions.

Conclusion

We were able to answer our research questions above. From the following series of graphs and analysis, we were able to identify that happiness levels differ by region with our boxplot analysis. We were also able to find out that in general more underdeveloped regions have lower happiness levels (by life ladder score) and more developed regions have higher happiness levels, as highlighted in our analysis of the faceted density plot between life ladder and log GDP per capita. Lastly, we also found that countries with more corruption and oppression have populations with lower happiness levels, expressed through the regression models.