START WITH END OF KIDIQ

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Anscomb as a warning -- if it can be this bad when the data is a
visual disaster, what about when you can't visually tell?

(also make a random data set that has anscomb's fit statistics)
 y <- .5*(x <- rnorm(11,0,3)) + rnorm(11,3,1.236)
 summary(lm(y~x))


discuss std R diagnostic plots:

residuals (raw vs stdized)
qq plot
  - normality considerations (pp 69 ff)
scale location plot for stdzed resids (sqrt reduces skewness)
stdizd residuals vs leverage
  - "good" vs "bad" leverage points
  - leverage = hii> 4/n (avg is 2/n) [generally hii > 2*avg(hii)]
  - good/bad is |stdres| > 2
Cook's D (pp 67 - 68, somewhat compressed)

http://strata.uga.edu/8370/rtips/regressionPlots.html
 -- has some nice examples, with code at the end
 -- could make a good exercise
    -- could pre-make the data sets...

general stata-centric discussion of residuals:
http://www.philender.com/courses/linearmodels/notes1/resid0.html

Recommendations:

(a) patterns
(b) leverage points

generally these are conversation points & could be very informative
about stuff the scientist cares about
  - delete or edit data ONLY with after good investigation & explanation

-------------------------------------------------

** Package "car" has most of the useful alr functions (renamed).
   summary(powerTransform) does box-cox...
   
** alr3 and alr4 have the data sets

transformations

- transform X  (full discussion waits for multiple regression)
- transform Y  (deal with here)

transforms on Y (mostly power xforms)

  note that log is the limit of power xforms

  Intuitive
  variance stabilization
  Automagic (Box-Cox)
   -- bctrans command from library(alr3).
   -- why not..
   -- natural power laws are never y^(0.2356).

  why it's kind of hopeless to focus on y

  you really want to xform y so that RESIDS are normal
  this is indirect and difficult
     - box cox comes in theoretically here.
     - you can also "guess and check", or use box-cox to suggest
       a convenient power
     - inverse reponse plot
        inverse.response.plot from library(alr3)

summary of power xforms

  - first try something intuitive and/or related to the substantive
    theory
  - focus on distrib of resids, not x or y
  - last resort use box-cox or a similar method to suggest a
    convenient power