START WITH END OF KIDIQ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Anscomb as a warning -- if it can be this bad when the data is a visual disaster, what about when you can't visually tell? (also make a random data set that has anscomb's fit statistics) y <- .5*(x <- rnorm(11,0,3)) + rnorm(11,3,1.236) summary(lm(y~x)) discuss std R diagnostic plots: residuals (raw vs stdized) qq plot - normality considerations (pp 69 ff) scale location plot for stdzed resids (sqrt reduces skewness) stdizd residuals vs leverage - "good" vs "bad" leverage points - leverage = hii> 4/n (avg is 2/n) [generally hii > 2*avg(hii)] - good/bad is |stdres| > 2 Cook's D (pp 67 - 68, somewhat compressed) http://strata.uga.edu/8370/rtips/regressionPlots.html -- has some nice examples, with code at the end -- could make a good exercise -- could pre-make the data sets... general stata-centric discussion of residuals: http://www.philender.com/courses/linearmodels/notes1/resid0.html Recommendations: (a) patterns (b) leverage points generally these are conversation points & could be very informative about stuff the scientist cares about - delete or edit data ONLY with after good investigation & explanation ------------------------------------------------- ** Package "car" has most of the useful alr functions (renamed). summary(powerTransform) does box-cox... ** alr3 and alr4 have the data sets transformations - transform X (full discussion waits for multiple regression) - transform Y (deal with here) transforms on Y (mostly power xforms) note that log is the limit of power xforms Intuitive variance stabilization Automagic (Box-Cox) -- bctrans command from library(alr3). -- why not.. -- natural power laws are never y^(0.2356). why it's kind of hopeless to focus on y you really want to xform y so that RESIDS are normal this is indirect and difficult - box cox comes in theoretically here. - you can also "guess and check", or use box-cox to suggest a convenient power - inverse reponse plot inverse.response.plot from library(alr3) summary of power xforms - first try something intuitive and/or related to the substantive theory - focus on distrib of resids, not x or y - last resort use box-cox or a similar method to suggest a convenient power