The file education.dat
consists of school expenditure in 1970 (SE70)
using PI68 (average personal income, 1968), Y69 (school-aged population
per capita, 1969), and Urban70 (urban population per capita, 1970)
for the 50 states and Washington, DC. The data also have variables
for region (general) and locale (specific). The question of interest
is, ``which variables have an effect on school expenditures?''
> education <- read.table("education.dat",header=T) > is.factor(education$Region) [1] T > is.factor(education$Locale) [1] T > attach(education)Let's try a model with all of the numeric predictors.
> summary(education.lm <- lm(SE70 ~ PI68 + Y69 + Urban70)) Call: lm(formula = SE70 ~ PI68 + Y69 + Urban70) Residuals: Min 1Q Median 3Q Max -60.24 -15.74 -1.156 15.88 51.38 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -286.8388 64.9199 -4.4183 0.0001 PI68 0.0807 0.0093 8.6738 0.0000 Y69 0.8173 0.1598 5.1151 0.0000 Urban70 -0.1058 0.0343 -3.0863 0.0034 Residual standard error: 26.69 on 47 degrees of freedom Multiple R-Squared: 0.6896 F-statistic: 34.81 on 3 and 47 degrees of freedom, the p-value is 5.337e-12 Correlation of Coefficients: (Intercept) PI68 Y69 PI68 -0.3064 Y69 -0.9398 0.0934 Urban70 -0.0711 -0.6784 0.0381This model appears to fit well (note the high multiple R-squared and small p-values). However,
Urban70
and PI68
are highly
collinear, so Urban70
should be removed from the model.