We can illustrate the one-way analysis of variance using the sales data:
402 > attach(salesdata) 402 > sales205090 _ salesdata[((price==20)|(price==50)|(price==90)),] 402 > sales205090$price _ factor(sales205090$price) 402 > table(sales205090$price) # There are 11 months of sales at each price 20 50 90 11 11 11 402 > view(sales205090$price) structure(.Data = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), .Label = c("20", "50", "90"), class = "factor") 402 > plot.factor(sales205090,style.bxp="old")
402 > sales.aov _ aov(sales ~ price,data=sales205090) 402 > anova(sales.aov) Analysis of Variance Table Response: sales Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) price 2 13332276 6666138 895.2989 0 Residuals 30 223371 7446 402 > 13332276/( 13332276 + 223371 ) # calculate R^2 [1] 0.9835219This tells us that the ``price'' variable explains much of the variability in the ``sales'' data, but isn't very informative about details, like what the mean sales was at each price level, etc.
To obtain this, we need to think about the model in a different way. In ANOVA notation, we would write the model (1) as
where i=1, 2 or 3, depending on whether the price was 20, 50 or 90, and j indexes the 11 observations at each price level.
Nomenclature:
We can add one linear constraint to take away the overparametrization. The usual one for the ``grand means plus effects'' model is to make the effects sum to zero:
The can then be estimated with the model.tables()
function:
402 > model.tables(sales.aov,type="effects") Refitting model to allow projection Tables of effects price 20 50 90 826.3 -106.6 -719.7 402 > sum(scan()) # verify that the alpha's sum to zero 1: 826.3 -106.6 -719.7 4: [1] -1.136868e-013An alternative parameterization with just three parameters is to take out the intercept and just estimate the mean sales for each price level, this is called the cell means model:
Again, we can estimate the cell means with model.tables():
02 > model.tables(sales.aov,type="means") Refitting model to allow projection Tables of means Grand mean 6185 price 20 50 90 7011 6078 5465
Let's compare these results with fitting the model ``by hand'' using lm().
402 > lmsales _ sales205090$sales 402 > lmprice _ sales205090$price 402 > p20 _ ifelse(lmprice==20,1,0) 402 > p50 _ ifelse(lmprice==50,1,0) 402 > p90 _ ifelse(lmprice==90,1,0) 402 > sales.lm _ lm(lmsales ~ p20 + p50 + p90) Error in lm.fit.qr(x, y): computed fit is singular, rank 3 Dumped 402 > # oops, forgot to delete a column 402 > sales.lm _ lm(lmsales ~ p50 + p90) 402 > summary(sales.lm) Call: lm(formula = lmsales ~ p50 + p90) Residuals: Min 1Q Median 3Q Max -194.8 -55.67 -4.224 78.53 113.8 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 7010.8662 26.0170 269.4728 0.0000 p50 -932.8187 36.7936 -25.3528 0.0000 p90 -1545.9577 36.7936 -42.0171 0.0000 Residual standard error: 86.29 on 30 degrees of freedom Multiple R-Squared: 0.9835 F-statistic: 895.3 on 2 and 30 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) p50 p50 -0.7071 p90 -0.7071 0.5000Actually, when lm() sees a factor variable in a regression, it will construct dummy variables automatically, but unfortunately it then transforms the X matrix to make the columns orthogonal; we will learn more about SPLUS's coding of dummy variables later.