The file education.dat consists of school expenditure in 1970 (SE70)
using PI68 (average personal income, 1968), Y69 (school-aged population
per capita, 1969), and Urban70 (urban population per capita, 1970)
for the 50 states and Washington, DC. The data also have variables
for region (general) and locale (specific). The question of interest
is, ``which variables have an effect on school expenditures?''
> education <- read.table("education.dat",header=T)
> is.factor(education$Region)
[1] T
> is.factor(education$Locale)
[1] T
> attach(education)
Let's try a model with all of the numeric predictors.
> summary(education.lm <- lm(SE70 ~ PI68 + Y69 + Urban70))
Call: lm(formula = SE70 ~ PI68 + Y69 + Urban70)
Residuals:
Min 1Q Median 3Q Max
-60.24 -15.74 -1.156 15.88 51.38
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) -286.8388 64.9199 -4.4183 0.0001
PI68 0.0807 0.0093 8.6738 0.0000
Y69 0.8173 0.1598 5.1151 0.0000
Urban70 -0.1058 0.0343 -3.0863 0.0034
Residual standard error: 26.69 on 47 degrees of freedom
Multiple R-Squared: 0.6896
F-statistic: 34.81 on 3 and 47 degrees of freedom, the p-value is 5.337e-12
Correlation of Coefficients:
(Intercept) PI68 Y69
PI68 -0.3064
Y69 -0.9398 0.0934
Urban70 -0.0711 -0.6784 0.0381
This model appears to fit well (note the high multiple R-squared
and small p-values). However, Urban70 and PI68 are highly
collinear, so Urban70 should be removed from the model.