Factors

Factors are a very useful tool in S-PLUS. A factor is a vector where the various elements represent different levels of a variable and the relative effects of those levels are unknown.

Suppose an experiment were performed on ten patients, using three different treatments: placebo, medicine A, and medicine B. Suppose the first three received the placebo, the next four received medicine A, and the final three received medicine B. The response variable is degree of improvement. The data will consist of two vectors: which treatment a patient received and the extent to which he or she recovered.

If the medicine vector were coded as numbers, it would imply an ordering and a scale of the effects of the three treatments. For instance, coding placebo as 0, medicine A as 1 and medicine B as 2 would imply that medicine B is twice is good (or twice as bad) as medicine A, and that the difference between medicine B and medicine A is the same as the difference between medicine A and the placebo. In this experiment, we will not know the relative values of the three treatments until after the experiment is complete, so we cannot code the treatments as numbers. Instead, enter a vector of characters and convert it to a factor.

> medicine <- c(rep("Placebo", 3), rep("A", 4), rep("B", 3))
> medicine
 [1] "Placebo" "Placebo" "Placebo" "A"       "A"       "A"       "A"
 [8] "B"       "B"       "B"
> medicine <- as.factor(medicine)
> medicine
 [1] Placebo Placebo Placebo A       A       A       A       B       B
[10] B
(The function rep(what, n) is a shortcut which repeats the first argument n times.)

> levels(medicine)
[1] "A"       "B"       "Placebo"
The vector medicine is a factor with three levels, but no particular order to the levels. If a factor has a definite order to its levels, you can specify that by declaring it to be an ``ordered factor''. For example, suppose a marketing firm were trying to guage the appeal of a new movie. The data could consist of a factor for viewers' responses, and perhaps several vectors of demographic data on each viewer.

> responses
 [1] Great Good  Good  Poor  Great Fair  Awful Fair  Good  Fair  Poor  Great
[13] Fair  Good  Poor  Fair  Fair  Great
> levels(responses)
[1] "Awful" "Fair"  "Good"  "Great" "Poor"
The factor responses has five levels, but they should not be in alphabetical order. Use the ordered function to convert responses to an ordered factor.

> responses <- ordered(responses, levels=c("Awful", "Poor", "Fair", "Good", "Great"))
> responses
 [1] Great Good  Good  Poor  Great Fair  Awful Fair  Good  Fair  Poor  Great
[13] Fair  Good  Poor  Fair  Fair  Great

 Awful < Poor < Fair < Good < Great
Now S-PLUS understands that there is an ordering to the data, with ``Awful'' being the lowest level and ``Great'' being the highest.

WARNING: Do not try to change the order of the data by making an assignment to the levels attribute. Always change the order of the data by using the ordered function.

The only reason to make an assignment to the levels attribute is to change the labels on the levels.

> medicine
 [1] Placebo Placebo Placebo A       A       A       A       B       B
[10] B
> levels(medicine)
[1] "A"       "B"       "Placebo"
> levels(medicine) <- c("Medicine A", "Medicine B", "Placebo")
> medicine
 [1] Placebo    Placebo    Placebo    Medicine A Medicine A Medicine A
 [7] Medicine A Medicine B Medicine B Medicine B

Question: Generate a vector, moreresponses, which consists of ``Best Movie Ever'' repeated twice and ``Worst Movie Ever'' repeated three times. Make it an ordered factor, so that ``Worst Movie Ever'' is the lower of the two levels. Append moreresponses to responses. What does this new vector look like? Do some editing to make ``Worst Movie Ever'' the lowest level (below ``Awful'') and ``Best Movie Ever'' the highest (above ``Great''). Change the labels on the levels, if necessary.


Pantelis Vlachos
1/15/1999