Global development is shaped by a complex interplay of economic, social, and political factors. Understanding these interactions is vital for addressing inequalities, improving living conditions, and fostering sustainable growth. This report leverages the World Development Indicators dataset to examine three key aspects of global development:
Regional Socio-Economic Disparities: Socio-economic indicators such as GDP per capita, literacy, and internet access vary widely across geographic regions. Identifying these disparities can illuminate areas requiring targeted intervention.
The Role of Technology in Education: Internet access is increasingly recognized as a cornerstone of education and human capital development. By exploring how internet usage interacts with income levels to influence literacy rates, we can better understand the digital divide and its implications for global equity.
Unemployment and Well-Being: Beyond economic hardship, unemployment has profound social and psychological impacts. Investigating its relationship with mental health indicators, such as suicide rates, highlights the broader implications of labor market dynamics.
These research questions are interconnected, as they each address critical dimensions of development—economic prosperity, technological advancement, and social stability—providing a holistic view of the challenges and opportunities facing global development efforts.
The World Development Indicators (WDI) dataset, sourced from the World Bank, provides a comprehensive view of development metrics across countries and regions from 2013 to 2022. This dataset is ideal for exploring relationships among socio-economic, environmental, and political indicators, as well as observing trends and disparities across regions. More information about the WDI dataset, including variables, can be found here: https://cmustatistics.github.io/data-repository/politics/world-bank.html
To address our research questions, we selected the following [insert number] variables, representing key aspects of national prosperity. For each variable, its form and value range are described.
GDPperCapita
)Internet
)Birth
)Literacy
)Electricity
)PoliticalStability
)Income group
)These indicators were selected to represent a balanced view of
economic, social, and political development:
-
Economic: GDPperCapita
and
Electricity
- Technological:
Internet
- Demographic:
Birth
- Social:
Literacy
- Political:
PoliticalStability
Together, they provide a robust framework for examining regional clustering and disparities in national prosperity.
To answer the above question, we observe clustering behaviors of
geographic regions on important metrics, such as GDP
,
Internet
, Birth
, Literacy
,
Electricity
, and Political Stability
rate.
Since GDP
is a multiplier on population, we normalize it
into a new transformed variable, GDPperCapita.
The above 2d MDS plot suggests some clustering of Sub Saharan
Africa, Europe & Central Asia, and Latin America & Caribbean, as
well as some overlap in clusters of other regions, but clustering of all
6 regions is difficult to observe. We could create side-by-side plots
for each cluster, but doing so makes gauging the distance between
clusters difficult. Instead, we use plotly to create an interactive 3d
MDS plot to further differentiate the clusters.
The above 3d MDS plot shows a clearer distinction for all the geographic clusters of varying spread. We observe that Sub-Saharan Africa and Middle East & North Africa are the most distinct by the chosen indicators out of all the regions. In comparison, other 4 regions show noticible overlap in clustering, especially Europe & Central Asia and Latin America & Carribean, suggesting regional similarities. These two MDS plots suggest meaningful differences and similarities across regions on these important metrics of national prosperity.
This graph shows the relationship between Internet Usage (%) and GDP per Capita (USD) across regions and income groups, revealing significant global disparities in economic prosperity. Regions like North America and Europe & Central Asia show a strong positive correlation between internet usage and GDP, particularly among high-income countries. In contrast, regions such as Sub-Saharan Africa and South Asia demonstrate much lower GDP per capita, even with increasing internet access, indicating that internet penetration alone is insufficient to drive economic growth. This gap between high-income and low-income countries within each region shows significant economic inequality, with low-income countries consistently lagging in prosperity. Outliers in regions like East Asia & Pacific highlight that some countries achieve higher GDP per capita even with only moderate internet usage. This suggests that other factors, such as favorable economic policies, strong governance, or unique regional conditions, may drive prosperity in these cases. Overall, while internet access is an important driver of development, this analysis emphasizes that addressing broader structural issues, including governance, education, and infrastructure, is essential for achieving equitable economic growth globally.
To answer the above question, we examine the variables
Literacy
, Internet
, and
Income.group
. To initially discover any patterns that may
exist in the data between these 3 variables, we first plot the data
points of literacy rate vs % of people using the internet since they are
both discrete continuous variables. Then, we introduce Income Group as a
categorical variable by grouping the points respectively, and using a
contour plot to easily identify these groupings based on the density of
the data points. All variables have been deleted of their NA and missing
values to plot the most relevant data.
Question 1
The above graph suggests that there are 4 main groupings of the data, which follows the number of Income Groups in the data. When looking at the data and the income groups, there seems to be a positive relationship between % of internet usage and literacy rate that goes from low values and low income groups, to high values for higher income groups. From this data, it suggests that income groups and access to digital resources does influence literacy rates, with higher-income countries outperforming lower-income ones in both areas. An interesting observation is that the low income and lower middle income data seems to be more spread out with a wider range than the data points for the upper middle and high income groups. |
It can also be useful to explore if time is a factor in these groupings and the relationship shown in the contour plot. The above graphs plot income group vs the average internet and literacy rates for each group in the oldest data (2013) in the data set, and the newest/most recent (2021). There is a clear discrepancy in income group and the internet usage/literacy rate (higher values = higher income groups.) Overall, all income groups had growth in the internet usage and literacy rate; therefore, other external factors like population growth, economic expansion, technological advancements, etc. could be contributing to the relationship of internet usage vs literacy rate along with income groups. Circling back to the earlier observation of the range of data from the contour plot, certain major growths between the years 2013 and 2020 could explain the wider range of data for internet or literacy rate; for example, the growth in literacy rate between 2013 and 2020 for the low income group. |
Unemployment remains a critical economic and social issue worldwide, significantly affecting individual well-being, societal stability, and economic development. The variation in unemployment rate across regions is influenced by many aspects, such as economic structures and social systems; thus, it is crucial to understand its global distribution and impacts. My research focuses on examining how the unemployment rate varies across different regions and investigating its potential negative effects, such as mental health issues, by using the key indicator variable suicide mortality rate (per 100,000 population).
The graphs explore the implications of the unemployment rate in different ways. First, the conditional density plot reveals the regional disparities in the unemployment rate, highlighting regions with higher levels of unemployment. Second, the scatterplots with regression of the unemployment rate versus the suicide mortality rate suggest potential social and psychological impacts of unemployment. Third, the contour plot and heat map offer a more detailed view of the interaction patterns between these two variables.
I specifically use data from 2019 for the conditional density plot and the scatterplots since 2019 is the most recent year with the least missing values, ensuring both relevance and reliability. That is, this minimizes the potential biases introduced by incomplete data, allowing a reliable and accurate comparison. On the other hand, I use data from all available years for the contour plot and heat map. This provides a more comprehensive understanding of the long-term interaction patterns between the unemployment rate and the suicide mortality rate, capturing trends and variations that may not be evident in a single year.
By analyzing regional differences in unemployment and its negative impacts, my research seeks to guide targeted policy to reduce the harmful effects of unemployment and promote economic stability worldwide.
It is widely recognized that an unemployment rate below 2% signifies a prosperous economy; a rate between 2% and 4% reflects a healthy economy; a rate between 4% and 6% indicates a stable economy; while a rate between 6% and 8% signals that the economy is beginning to face challenges (sourced from EBC Financial Group). In other words, an unemployment rate of 6% is considered a critical threshold, marking the transition from economic stability to potential difficulties.
I divide the analysis of the conditional density plot into three parts as follows:
First, East Asia & Pacific and Europe & Central Asia exhibit steep unimodal distributions. Although the peak of Europe & Central Asia is positioned to the right of East Asia & Pacific’s peak, both of them are below 5%. This indicates that the most prevalent unemployment rates in these two regions fall within the stable or even healthy range. Additionally, while there are minor fluctuations in the tail ends of these distributions, the vast majority of the data lie below the 6% threshold. These observations collectively suggest that the economies of these two regions are robust and stable.
Second, North America and South Asia exhibit bimodal distributions. North America features two peaks of nearly equal height, both below 6%. Moreover, it is evident that almost the entire distribution falls within the 6% range. In contrast, South Asia has one prominent peak below 5% and a secondary, smaller peak above 10%. However, the majority of the distribution remains below 6%. Overall, the economies in these two regions also appear to be robust and stable.
Third, Latin America & Caribbean, Middle East & North Africa, and Sub-Saharan Africa exhibit relatively flat distributions. While the peaks of Latin America and Sub-Saharan Africa fall below 6%, approximately half of their distributions exceed this threshold. For Middle East & North Africa, it is challenging to identify a clear peak; however, it is evident that the vast majority of the distribution lies above 6%. Therefore, we may infer that these regions are facing certain economic challenges.
These two scatterplots with regression explore how the unemployment rate varies across different regions and examine its potential negative impacts, specifically its association with the suicide mortality rate. The x-axis represents the unemployment rate, the y-axis represents the suicide mortality rate, and the data points are colored by region to highlight geographic differences.
Both linear regression line and LOESS regression are fitted to the data, showing a positive relationship between the unemployment rate and the suicide mortality rate. Moreover, we may note that the LOESS regression line, apart from a brief downward shift in the middle, is pretty similar to the linear regression line; therefore, we may infer that there is a linear relationship between the unemployment rate and the suicide mortality rate. This suggests that as the unemployment rate increases, the suicide mortality rate tends to rise moderately.
Regional clustering is apparent in the data, for example, the points of Latin America & Caribbean are concentrated in the bottom-left corner with a few points at the center and the right, indicating a low unemployment rate and suicide mortality rate, while most of Europe & Central Asia points are concentrated above Latin America & Caribbean points as well as the regression line, showing a higher suicide mortality rate. Also, East Asia & Pacific follows a similar pattern to Latin America & Caribbean with very limited scattering. Besides, Sub-Saharan Africa and Latin America & Caribbean have a few outliers with notably high suicide mortality rate, which may influence or distort the overall trend.
##
## Call:
## lm(formula = Suicide ~ Unemployment, data = data.2019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.198 -4.308 -1.946 2.733 60.712
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.7987 0.9781 7.974 2.06e-13 ***
## Unemployment 0.2304 0.1108 2.079 0.0391 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.764 on 172 degrees of freedom
## (38 observations deleted due to missingness)
## Multiple R-squared: 0.02452, Adjusted R-squared: 0.01885
## F-statistic: 4.324 on 1 and 172 DF, p-value: 0.03906
Here is the statistical analysis of the linear regression model of the suicide mortality rate, using the unemployment rate to predict. We estimate 0.2304 to be the expected difference in the suicide mortality rate for subjects whose unemployment rate happens to differ by one unit.
The heat map and the contour plot may indicate phenomena and patterns in the relationship between the unemployment rate and suicide mortality rate, such as mode, in terms of region. From these two plots, there is only one mode at the bottom-left, and irregular shapes radiate from this local maxima. This mode can be considered as countries with both low levels of the unemployment rate and suicide mortality rate. Countries in the mode tend to be mainly from three regions, Sub-saharan Africa, Middle East & North Africa, and Europe & Central Asia. However, although lots of Europe & Central Asia countries are in the mode, many of them also diverge toward the right and upper-right directions.