Education plays a pivotal role in shaping individuals’ opportunities and socio-economic outcomes, making it crucial to understand educational attainment levels across different regions. In England, towns and cities vary widely in terms of their educational infrastructure, and socio-economic conditions. Recognizing the importance of these factors, our project aims to explore and analyze educational attainment and socio-economic dynamics across English towns and cities.
The “Educational Attainment of Young People in English Towns” dataset provides comprehensive insights into various educational and socio-economic indicators for different towns and cities across England. This dataset contains a total of 1104 observations, representing individual towns or areas, and encompasses a wide range of variable. Each variables offers valuable information, including geographical information such as codes and names for specific locations, along with population size and regional classifications, providing context for social composition. Educational indicators encompass measures like qualification levels, key stage attainment rates, pupil counts, and participation in further education and apprenticeships. Socio-economic factors like income deprivation, job density, and coastal classification offer additional context, along with details on travel-to-work areas and university presence. It is a rich resource to understand the educational conditions and socio-economic dynamics at the local level.
Research Questions: We would like to explore the correlations between educational and socio-economic factors among young people in English towns and cities.
Firstly, we seek to understand the relationship between education and the presence of universities. We want to investigate how the accessibility and proximity of higher education institutions impact young people’s educational choices and outcomes.
Furthermore, we intend to explore the connections between education and income levels. We recognize that socio-economic factors, particularly income, play a significant role in influencing access to educational resources, opportunities, and outcomes. Through meticulous analysis, we aim to elucidate how variations in income levels within different communities impact educational attainment among young people.
Finally, we investigate how the presence or availability of jobs influences the educational outcomes or qualifications of the population residing in that area, and vice versa. We would like to uncover whether areas with higher job densities tend to have higher levels of educational attainment to understand the intricate interplay between education and employment dynamics.
Through this comprehensive investigation, we would like to contribute valuable insights about how to improve educational outcomes and socio-economic well-being among young people in England.
In this section, we explore the spatial relationship between education scores and the density of universities in towns and cities across England. Utilizing geographic information systems (GIS) techniques, we will map the mean education scores for each region, providing a visual representation of educational attainment levels across the country. Additionally, we will create a university density plot to depict the distribution of universities within each region of the UK. By analyzing these spatial datasets, we want to investigate the potential correlation between education scores and the number or existence of universities.
We first make a plot of average education scores of each region on the map.
From the graph above, we see that the region with the highest average education score is North West, while regions such as South West and North East have the lowest average education score.
We then create a graph that shows the density of universities across the UK.
The plot above depicts the density of universities in each region, which is estimated by taking the proportion of towns that have universities in each region. We see that regions such as South West, South East and West Midlands have the highest university density, while regions such as North West and East Midlands have the lowest density.
This estimate may be slightly flawed, since towns may have more than
1 university, which would increase the density in that region.We also
notice that regions around London have university density 0, which is
not realistic. Upon further inspection, we note that this is because the
data set only contains 2 towns from the London region, and both towns
have missing data in the university_flag
variable.
The two plots above seem to imply that regions with higher education scores also have lower university density, for example North West has the highest average education score but has one of the lowest university densities, while South West has one of the highest university densities but has one of the lowest average education scores. This seems somewhat counter-intuitive, since one may think that higher availability to higher education would increase the overall education attainment.
We further investigate the relationship between education scores and the presence of universities, looking for any potential associations or patterns between educational attainment levels and the availability of higher education opportunities through the use of a side-by-side violin-box-plot.
This plot seems to suggest that the condition mean of education score given the presence of universities is not so different from the condition mean of education score given the lack of universities. We then formally test this result with a t-test. By fitting a linear model of education score on the university flag, we perform a t-test on the difference of the conditional mean of education score.
Model 1 | |
---|---|
(Intercept) | 0.027 |
(0.807) | |
factor(university_flag)University | -0.557 |
(0.239) | |
R2 | 0.001 |
Num.Obs. | 1100 |
From the table above, we note that the p-value on the coefficient of university flag is greater than 0.05. Therefore, we do not reject the null hypothesis that the conditional mean of education score is different between towns with a university and towns without one, and have strong statistical evidence that education score does not depend on the university flag.
In this section, we aim to explore the relationship between education and income levels. To accomplish this, we will visualize the data and identify any correlations. We will also introduce other variables (such as population) to see the correlations between different features and their co-effects.
How does income levels influence education scores among English young people? We first draw a violin plot to show the education scores in each group of income levels.
This above violin plot displays the conditional distribution of education scores given the income levels. We see that the average education score for higher deprivation towns is the lowest, that for Lower deprivation towns is the highest, showing that higher deprivation (lower income) towns have lower education scores on average. We can see that the box for lower deprivation towns has no overlapping with the box for higher deprivation towns, which may imply there is significant difference between each other. We also see that for lower deprivation towns there are several outliers with very high education scores and for higher deprivation towns there are several outliers with very low education scores.
To test this, we use three t-tests. Due to the fact we are performing 3 tests, we only reject at target \(\alpha = 0.05 / 3\) which is \(\approx 0.0167\).
##
## Welch Two Sample t-test
##
## data: education_score by income_flag
## t = -27.485, df = 840.33, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Higher deprivation towns and group Lower deprivation towns is not equal to 0
## 95 percent confidence interval:
## -5.550171 -4.810303
## sample estimates:
## mean in group Higher deprivation towns mean in group Lower deprivation towns
## -2.421538 2.758699
##
## Welch Two Sample t-test
##
## data: education_score by income_flag
## t = 14.664, df = 467.25, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Lower deprivation towns and group Mid deprivation towns is not equal to 0
## 95 percent confidence interval:
## 2.963435 3.880538
## sample estimates:
## mean in group Lower deprivation towns mean in group Mid deprivation towns
## 2.758699 -0.663287
##
## Welch Two Sample t-test
##
## data: education_score by income_flag
## t = -8.1282, df = 381.26, p-value = 6.16e-15
## alternative hypothesis: true difference in means between group Higher deprivation towns and group Mid deprivation towns is not equal to 0
## 95 percent confidence interval:
## -2.183572 -1.332930
## sample estimates:
## mean in group Higher deprivation towns mean in group Mid deprivation towns
## -2.421538 -0.663287
From our t-tests, we observe that the p-values are all approximately zero, which are all much smaller than our rejection threshold adjusting for multiple testing. Thus, we reject the null hypotheses for these tests and conclude that the difference in education score between Low, Mid, and High deprivation towns is significant.
Besides the single effect of income, would other features such as population also influence education scores together with income? Here we use scatter plot to show the relationship between population and education and scatter plot with linear fitting to show the trend.
From the plot, we can see that people who live in lower income areas tend to have lower education scores compared to those who live in higher income areas. There is no obvious linear relationship between population and education scores (we can see that the variability is larger for lower population, and the points are almost symmetric along zero). We can also see that cities (with highest income) tend to have higher population while all towns have various populations. This plot is informative because we want to answer the question that how the population and income in a certain area would possibly influence the education levels of the residence, which is exactly what this plot shows (we can easily show the relationship between education scores, population, and town income). Note that cities have high population values but relatively low education scores, which does not align with the intuition, and it cannot be interpret as an income level, so we think it might be an invalid label here.
In order to formally test the relationship between population and education scores, we use a linear regression on the education scores against the log population and income level. In particular, we are interested in if population and income can help us predict the level of education.
Model 1 | |
---|---|
(Intercept) | -8.242 |
(0.682) | |
log(population_2011) | 0.450 |
(0.774) | |
income_flagHigher deprivation towns | 0.068 |
(0.997) | |
income_flagLower deprivation towns | 12.992 |
(0.519) | |
income_flagMid deprivation towns | 5.121 |
(0.800) | |
log(population_2011) × income_flagHigher deprivation towns | 0.123 |
(0.938) | |
log(population_2011) × income_flagLower deprivation towns | -0.660 |
(0.676) | |
log(population_2011) × income_flagMid deprivation towns | -0.199 |
(0.900) | |
R2 | 0.436 |
Num.Obs. | 1100 |
From the model summary above, it seems that none of the predictors, including log population and income level, have a statistically significant relationship with education scores. The coefficients of all predictors have p-values greater than 0.05, indicating that they are not significantly associated with education scores. Therefore, based on this analysis, it does not appear that population size and income level can effectively predict the level of education in the studied towns and cities. Further investigation may be needed to better understand the factors influencing education scores in these areas.
For our final research question, we are interested in seeing if education scores is related to job densities. The dataset categorizes towns and cities into three distinct job density classifications: Mixed, Residential, and Working. By plotting the density of education scores within each of these categories, the graph provides insights into how the job densities correlates with the level of education in these regions.
The distribution of education scores in mixed job density areas appears to be centered slightly below zero.
The education scores in residential areas is centered around zero but slightly positive, which is higher than the Mixed category, suggesting that the average educational attainment might be slightly higher in residential areas compared to mixed ones.
The distribution for areas with high job density shows the peak of education scores substantially higher than both the Mixed and Residential categories. This suggests that areas with a high density of jobs tend to have higher average educational attainment.
All three distributions seem to have a similar spread, as indicated by the width of the curves. The peak of the distribution of the working areas appears to be sharper and taller, indicating less variability in those areas.
The distributions appear to be slightly right skewed, with a longer tail extending towards the higher education scores. The “Working” areas have the most notable skewness and the “Residential” areas have the least notable skewness. This which could indicate that higher education scores are more prevalent in areas with a higher job density.
Through our analysis of the relationship between the education scores and university density, income levels, population, and job density, etc. we present the following findings:
Based on our analysis, we are able to draw some correlation between the education scores and the university densities, income, and job densities. Overall, our analysis helps understand the importance of considering socio-economic factors in educational outcomes, providing valuable insights for policymakers and educators alike.
Some of these questions were left for future work due to several reasons. Firstly, we encountered limitations in data availability and completeness, e.g. we only have access to categorical variables (income_flag) but not their quantitative values (income amounts), preventing us from fully exploring certain aspects. Future work could involve collecting additional data, such as more detailed socio-economic indicators or longitudinal educational outcomes, to provide a more comprehensive understanding of the relationships under investigation. Secondly, we may want to analyze the interactive effects of multiple variables or assessing causal relationships, which requires advanced analytical methods such as non-linear modeling or other machine learning models. Future work could involve applying these techniques to unravel complex relationships within the data. By addressing these future-work questions, the project can further contribute to the understanding of educational and socio-economic dynamics among English young people, helping the introduction of informed policies and interventions to support educational attainment and social mobility.