next up previous
Next: General Linear Models Up: Statistical Modeling in S-PLUS Previous: Factors in Models

Removing Outliers

If you plot this model, you will see that there is one state with relatively large Cook's distance, Alaska. To exclude outliers from the analysis, tell S-PLUS to perform the model on only a subset of the data.

> summary(lm(SE70 ~ PI68 + Y69 + Locale, subset=-50))
The above will produce a summary of a regression excluding Alaska. Another way to do the exact same thing is:

> summary(lm(SE70 ~ PI68 + Y69 + Locale, data=education[-50,]))
The above means use education, without row 50 but with all columns. I think the first way is the ``proper'' thing to do. My understanding is that the data field exists to allow modeling without attaching to a data frame.



Brian Junker 2002-08-26