The following data is from Box, Hunter and Hunter (1978) and is also analyzed in Chapter 13 of the SPLUS Guide to Statistics. It gives blood coagulation times for each of four diets.
We enter the data more or less as in the SPLUS manual,
402 > coag.times _ scan() 1: 62 63 68 56 5: 60 67 66 62 9: 63 71 71 60 13: 59 64 67 61 17: 65 68 63 20: 66 68 64 23: 63 24: 59 25: 402 > diet _ factor(c(rep(LETTERS[1:4],4),rep(LETTERS[1:3],2),c("A","A"))) 402 > split(coag.times,diet) # check that the factor labels are right $A: [1] 62 60 63 59 65 66 63 59 $B: [1] 63 67 71 64 68 68 $C: [1] 68 66 71 67 63 64 $D: [1] 56 62 60 61 402 > sapply(split(coag.times,diet),mean) A B C D 62.125 66.83333 66.5 59.75 402 > coag _ data.frame(coag=coag.times,diet=diet)
Now we fit the model, look at some diagnostic plots, and consider the analysis of variance table. More details on the SPLUS parts of the problem can be found in Chapter 13 of the SPLUS Guide to Statistics.
402 > coag.aov _ aov(coag ~ diet, data=coag) 402 > par(mfrow=c(2,3)) 402 > plot(coag.aov)
402 > model.tables(coag.aov,type="means") Refitting model to allow projection Tables of means Grand mean 64 diet A B C D 62.12 66.83 66.5 59.75 rep 8.00 6.00 6.0 4.00
The cell means and the grand mean
are illustrated in
the figure below; the cell means
are the estimates of
the
's in the cell means model
where k is the number of cells, and there are observations in
the
cell. As usual in regression, the error terms
(
in this case) are distributed
for
some unknown error variance
.
=1in
Now recall that we can write the deviation of a single observation from the grand mean as
If we square and sum these terms, the magic of orthogonal sums of squares tells us
with degrees of freedom n-1, n-k and k-1 respectively. This gives rise to the ANOVA table
Here is SPLUS's ANOVA table for this ANOVA model.
402 > anova(coag.aov) # some output omitted below Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) diet 3 186.0417 62.01389 8.055931 0.001028171 Residuals 20 153.9583 7.69792 402 > 186.0417 /( 186.0417 +153.9583) # R^2 [1] 0.5471815The F statistic for testing whether the factor explains the variation in Y is
Under the null hypothesis
is distributed as
. Large values of
argue in
favor of