Motivation

Global development is shaped by a complex interplay of economic, social, and political factors. Understanding these interactions is vital for addressing inequalities, improving living conditions, and fostering sustainable growth. This report leverages the World Development Indicators dataset to examine three key aspects of global development:

Regional Socio-Economic Disparities: Socio-economic indicators such as GDP per capita, literacy, and internet access vary widely across geographic regions. Identifying these disparities can illuminate areas requiring targeted intervention.

The Role of Technology in Education: Internet access is increasingly recognized as a cornerstone of education and human capital development. By exploring how internet usage interacts with income levels to influence literacy rates, we can better understand the digital divide and its implications for global equity.

Unemployment and Well-Being: Beyond economic hardship, unemployment has profound social and psychological impacts. Investigating its relationship with mental health indicators, such as suicide rates, highlights the broader implications of labor market dynamics.

These research questions are interconnected, as they each address critical dimensions of development—economic prosperity, technological advancement, and social stability—providing a holistic view of the challenges and opportunities facing global development efforts.


Dataset Overview

The World Development Indicators (WDI) dataset, sourced from the World Bank, provides a comprehensive view of development metrics across countries and regions from 2013 to 2022. This dataset is ideal for exploring relationships among socio-economic, environmental, and political indicators, as well as observing trends and disparities across regions. More information about the WDI dataset, including variables, can be found here: https://cmustatistics.github.io/data-repository/politics/world-bank.html


Key Features:

  • Timeframe: Data covers ten years (2013–2022).
  • Scope: Includes 266 countries and regions, including aggregates like “Sub-Saharan Africa.”
  • Variables: Features 40 indicators capturing diverse aspects of development.
  • Granularity: Each row represents a single country, territory, or region in a given year.
  • Limitations: Not all variables are available for all countries in all years, and more recent data is missing more often than older data.

Variables Used in Analysis

To address our research questions, we selected the following [insert number] variables, representing key aspects of national prosperity. For each variable, its form and value range are described.


1. GDP per Capita (GDPperCapita)

  • Definition: The gross domestic product (GDP) divided by the total population of a country or region.
  • Form: Continuous numeric variable, measured in USD.
  • Range: Varies widely, e.g., from hundreds in low-income countries to over $100,000 in high-income nations.
  • Relevance: A critical measure of economic prosperity, often used to compare development levels across regions.

2. Internet Usage (Internet)

  • Definition: The percentage of the population with Internet access.
  • Form: Continuous numeric variable, measured as a percentage.
  • Range: 0% to 100%, where 0% indicates no Internet access and 100% indicates universal Internet access within the population.
  • Relevance: Reflects technological development and access to digital resources.

3. Birth Rate (Birth)

  • Definition: The crude birth rate, expressed as the number of live births per 1,000 people per year.
  • Form: Continuous numeric variable, typically ranging between 5 (low birth rates in developed countries) to 50 (high birth rates in developing regions).
  • Relevance: Provides insights into population growth trends and socio-economic factors such as healthcare access.

4. Literacy Rate (Literacy)

  • Definition: The percentage of adults (15 years and older) who can read and write.
  • Form: Continuous numeric variable, measured as a percentage.
  • Range: 0% to 100%, where higher values indicate better educational outcomes.
  • Relevance: A strong indicator of human capital, with implications for economic productivity and quality of life.

5. Access to Electricity (Electricity)

  • Definition: The percentage of the population with access to electricity.
  • Form: Continuous numeric variable, measured as a percentage.
  • Range: 0% to 100%, where 0% indicates no access and 100% indicates universal access within the population.
  • Relevance: An essential infrastructure metric, reflecting living standards and economic development.

6. Political Stability (PoliticalStability)

  • Definition: A z-score measuring the likelihood of political instability or violence within a country.
  • Form: Continuous numeric variable, normalized as a z-score.
  • Range: Typically ranges between -2.5 (very unstable) to 2.5 (highly stable).
  • Relevance: Captures governance quality and security, crucial for understanding development risks.

7. Income Group (Income group)

  • Definition: A classification of a country’s income level as of 2023 based on its gross national income (GNI).
  • Form: Categorical variable.
  • Range: Four categories: “High income,” “Upper middle income,” “Lower middle income,” and “Low income.”
  • Relevance: Measures a country’s economic status, which is important for analyzing disparities in development, access to resources, and socio-economic outcomes across income groups.

Why These Variables?

These indicators were selected to represent a balanced view of economic, social, and political development:
- Economic: GDPperCapita and Electricity
- Technological: Internet
- Demographic: Birth
- Social: Literacy
- Political: PoliticalStability

Together, they provide a robust framework for examining regional clustering and disparities in national prosperity.


Research Question #1: How do geographic regions differ by various indicators of national prosperity?


To answer the above question, we observe clustering behaviors of geographic regions on important metrics, such as GDP, Internet, Birth, Literacy, Electricity, and Political Stability rate. Since GDP is a multiplier on population, we normalize it into a new transformed variable, GDPperCapita.




The above 2d MDS plot suggests some clustering of Sub Saharan Africa, Europe & Central Asia, and Latin America & Caribbean, as well as some overlap in clusters of other regions, but clustering of all 6 regions is difficult to observe. We could create side-by-side plots for each cluster, but doing so makes gauging the distance between clusters difficult. Instead, we use plotly to create an interactive 3d MDS plot to further differentiate the clusters.








The above 3d MDS plot shows a clearer distinction for all the geographic clusters of varying spread. We observe that Sub-Saharan Africa and Middle East & North Africa are the most distinct by the chosen indicators out of all the regions. In comparison, other 4 regions show noticible overlap in clustering, especially Europe & Central Asia and Latin America & Carribean, suggesting regional similarities. These two MDS plots suggest meaningful differences and similarities across regions on these important metrics of national prosperity.


Plot 3: by Samantha

This graph shows the relationship between Internet Usage (%) and GDP per Capita (USD) across regions and income groups, revealing significant global disparities in economic prosperity. Regions like North America and Europe & Central Asia show a strong positive correlation between internet usage and GDP, particularly among high-income countries. In contrast, regions such as Sub-Saharan Africa and South Asia demonstrate much lower GDP per capita, even with increasing internet access, indicating that internet penetration alone is insufficient to drive economic growth. This gap between high-income and low-income countries within each region shows significant economic inequality, with low-income countries consistently lagging in prosperity. Outliers in regions like East Asia & Pacific highlight that some countries achieve higher GDP per capita even with only moderate internet usage. This suggests that other factors, such as favorable economic policies, strong governance, or unique regional conditions, may drive prosperity in these cases. Overall, while internet access is an important driver of development, this analysis emphasizes that addressing broader structural issues, including governance, education, and infrastructure, is essential for achieving equitable economic growth globally.



Research Question #2: Does the interaction between income group and internet usage impact the literacy rate within a population.


To answer the above question, we examine the variables Literacy, Internet, and Income.group. To initially discover any patterns that may exist in the data between these 3 variables, we first plot the data points of literacy rate vs % of people using the internet since they are both discrete continuous variables. Then, we introduce Income Group as a categorical variable by grouping the points respectively, and using a contour plot to easily identify these groupings based on the density of the data points. All variables have been deleted of their NA and missing values to plot the most relevant data.


Question 1

The above graph suggests that there are 4 main groupings of the data, which follows the number of Income Groups in the data. When looking at the data and the income groups, there seems to be a positive relationship between % of internet usage and literacy rate that goes from low values and low income groups, to high values for higher income groups. From this data, it suggests that income groups and access to digital resources does influence literacy rates, with higher-income countries outperforming lower-income ones in both areas. An interesting observation is that the low income and lower middle income data seems to be more spread out with a wider range than the data points for the upper middle and high income groups.

It can also be useful to explore if time is a factor in these groupings and the relationship shown in the contour plot. The above graphs plot income group vs the average internet and literacy rates for each group in the oldest data (2013) in the data set, and the newest/most recent (2021). There is a clear discrepancy in income group and the internet usage/literacy rate (higher values = higher income groups.) Overall, all income groups had growth in the internet usage and literacy rate; therefore, other external factors like population growth, economic expansion, technological advancements, etc. could be contributing to the relationship of internet usage vs literacy rate along with income groups. Circling back to the earlier observation of the range of data from the contour plot, certain major growths between the years 2013 and 2020 could explain the wider range of data for internet or literacy rate; for example, the growth in literacy rate between 2013 and 2020 for the low income group.


Research Question #3: How does the unemployment rate vary across different regions, and what are its negative impacts?


Motivation & Introduction

Unemployment remains a critical economic and social issue worldwide, significantly affecting individual well-being, societal stability, and economic development. The variation in unemployment rate across regions is influenced by many aspects, such as economic structures and social systems; thus, it is crucial to understand its global distribution and impacts. My research focuses on examining how the unemployment rate varies across different regions and investigating its potential negative effects, such as mental health issues, by using the key indicator variable suicide mortality rate (per 100,000 population).

The graphs explore the implications of the unemployment rate in different ways. First, the conditional density plot reveals the regional disparities in the unemployment rate, highlighting regions with higher levels of unemployment. Second, the scatterplots with regression of the unemployment rate versus the suicide mortality rate suggest potential social and psychological impacts of unemployment. Third, the contour plot and heat map offer a more detailed view of the interaction patterns between these two variables.

I specifically use data from 2019 for the conditional density plot and the scatterplots since 2019 is the most recent year with the least missing values, ensuring both relevance and reliability. That is, this minimizes the potential biases introduced by incomplete data, allowing a reliable and accurate comparison. On the other hand, I use data from all available years for the contour plot and heat map. This provides a more comprehensive understanding of the long-term interaction patterns between the unemployment rate and the suicide mortality rate, capturing trends and variations that may not be evident in a single year.

By analyzing regional differences in unemployment and its negative impacts, my research seeks to guide targeted policy to reduce the harmful effects of unemployment and promote economic stability worldwide.


Conditional Density Plot

It is widely recognized that an unemployment rate below 2% signifies a prosperous economy; a rate between 2% and 4% reflects a healthy economy; a rate between 4% and 6% indicates a stable economy; while a rate between 6% and 8% signals that the economy is beginning to face challenges (sourced from EBC Financial Group). In other words, an unemployment rate of 6% is considered a critical threshold, marking the transition from economic stability to potential difficulties.

I divide the analysis of the conditional density plot into three parts as follows:

First, East Asia & Pacific and Europe & Central Asia exhibit steep unimodal distributions. Although the peak of Europe & Central Asia is positioned to the right of East Asia & Pacific’s peak, both of them are below 5%. This indicates that the most prevalent unemployment rates in these two regions fall within the stable or even healthy range. Additionally, while there are minor fluctuations in the tail ends of these distributions, the vast majority of the data lie below the 6% threshold. These observations collectively suggest that the economies of these two regions are robust and stable.

Second, North America and South Asia exhibit bimodal distributions. North America features two peaks of nearly equal height, both below 6%. Moreover, it is evident that almost the entire distribution falls within the 6% range. In contrast, South Asia has one prominent peak below 5% and a secondary, smaller peak above 10%. However, the majority of the distribution remains below 6%. Overall, the economies in these two regions also appear to be robust and stable.

Third, Latin America & Caribbean, Middle East & North Africa, and Sub-Saharan Africa exhibit relatively flat distributions. While the peaks of Latin America and Sub-Saharan Africa fall below 6%, approximately half of their distributions exceed this threshold. For Middle East & North Africa, it is challenging to identify a clear peak; however, it is evident that the vast majority of the distribution lies above 6%. Therefore, we may infer that these regions are facing certain economic challenges.


Scatterplots with Regression

These two scatterplots with regression explore how the unemployment rate varies across different regions and examine its potential negative impacts, specifically its association with the suicide mortality rate. The x-axis represents the unemployment rate, the y-axis represents the suicide mortality rate, and the data points are colored by region to highlight geographic differences.

Both linear regression line and LOESS regression are fitted to the data, showing a positive relationship between the unemployment rate and the suicide mortality rate. Moreover, we may note that the LOESS regression line, apart from a brief downward shift in the middle, is pretty similar to the linear regression line; therefore, we may infer that there is a linear relationship between the unemployment rate and the suicide mortality rate. This suggests that as the unemployment rate increases, the suicide mortality rate tends to rise moderately.

Regional clustering is apparent in the data, for example, the points of Latin America & Caribbean are concentrated in the bottom-left corner with a few points at the center and the right, indicating a low unemployment rate and suicide mortality rate, while most of Europe & Central Asia points are concentrated above Latin America & Caribbean points as well as the regression line, showing a higher suicide mortality rate. Also, East Asia & Pacific follows a similar pattern to Latin America & Caribbean with very limited scattering. Besides, Sub-Saharan Africa and Latin America & Caribbean have a few outliers with notably high suicide mortality rate, which may influence or distort the overall trend.


## 
## Call:
## lm(formula = Suicide ~ Unemployment, data = data.2019)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.198  -4.308  -1.946   2.733  60.712 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.7987     0.9781   7.974 2.06e-13 ***
## Unemployment   0.2304     0.1108   2.079   0.0391 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.764 on 172 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.02452,    Adjusted R-squared:  0.01885 
## F-statistic: 4.324 on 1 and 172 DF,  p-value: 0.03906

Here is the statistical analysis of the linear regression model of the suicide mortality rate, using the unemployment rate to predict. We estimate 0.2304 to be the expected difference in the suicide mortality rate for subjects whose unemployment rate happens to differ by one unit.


Heat Map & Contour Plot

The heat map and the contour plot may indicate phenomena and patterns in the relationship between the unemployment rate and suicide mortality rate, such as mode, in terms of region. From these two plots, there is only one mode at the bottom-left, and irregular shapes radiate from this local maxima. This mode can be considered as countries with both low levels of the unemployment rate and suicide mortality rate. Countries in the mode tend to be mainly from three regions, Sub-saharan Africa, Middle East & North Africa, and Europe & Central Asia. However, although lots of Europe & Central Asia countries are in the mode, many of them also diverge toward the right and upper-right directions.