Next: Updating Models Up: Statistical Modeling in S-PLUS Previous: Regression

Multiple Regression

The file education.dat consists of school expenditure in 1970 (SE70) using PI68 (average personal income, 1968), Y69 (school-aged population per capita, 1969), and Urban70 (urban population per capita, 1970) for the 50 states and Washington, DC. The data also have variables for region (general) and locale (specific). The question of interest is, ``which variables have an effect on school expenditures?''

> education <- read.table("education.dat",header=T)
> is.factor(education$Region)
[1] T
> is.factor(education$Locale)
[1] T
> attach(education)

Let's try a model with all of the numeric predictors.

> summary(education.lm <- lm(SE70 ~ PI68 + Y69 + Urban70))

Call: lm(formula = SE70 ~ PI68 + Y69 + Urban70)
Residuals:
    Min     1Q Median    3Q   Max
 -60.24 -15.74 -1.156 15.88 51.38

Coefficients:
                Value Std. Error   t value  Pr(>|t|)
(Intercept) -286.8388   64.9199    -4.4183    0.0001
       PI68    0.0807    0.0093     8.6738    0.0000
        Y69    0.8173    0.1598     5.1151    0.0000
    Urban70   -0.1058    0.0343    -3.0863    0.0034

Residual standard error: 26.69 on 47 degrees of freedom
Multiple R-Squared: 0.6896
F-statistic: 34.81 on 3 and 47 degrees of freedom, the p-value is 5.337e-12

Correlation of Coefficients:
        (Intercept)    PI68     Y69
   PI68 -0.3064
    Y69 -0.9398      0.0934
Urban70 -0.0711     -0.6784  0.0381

This model appears to fit well (note the high multiple R-squared and small p-values). However, Urban70 and PI68 are highly collinear, so Urban70 should be removed from the model.

Brian Junker 2002-08-26