Factors are a very useful tool in S-PLUS. A factor is a vector where the various elements represent different levels of a variable and the relative effects of those levels are unknown.
Suppose an experiment were performed on ten patients, using three different treatments: placebo, medicine A, and medicine B. Suppose the first three received the placebo, the next four received medicine A, and the final three received medicine B. The response variable is degree of improvement. The data will consist of two vectors: which treatment a patient received and the extent to which he or she recovered.
If the medicine vector were coded as numbers, it would imply an ordering and a scale of the effects of the three treatments. For instance, coding placebo as 0, medicine A as 1 and medicine B as 2 would imply that medicine B is twice is good (or twice as bad) as medicine A, and that the difference between medicine B and medicine A is the same as the difference between medicine A and the placebo. In this experiment, we will not know the relative values of the three treatments until after the experiment is complete, so we cannot code the treatments as numbers. Instead, enter a vector of characters and convert it to a factor.
> medicine <- c(rep("Placebo", 3), rep("A", 4), rep("B", 3)) > medicine [1] "Placebo" "Placebo" "Placebo" "A" "A" "A" "A" [8] "B" "B" "B" > medicine <- as.factor(medicine) > medicine [1] Placebo Placebo Placebo A A A A B B [10] B(The function
rep(what, n)
is a shortcut which repeats the first
argument n
times.)
> levels(medicine) [1] "A" "B" "Placebo"The vector
medicine
is a factor with three levels, but no
particular order to the levels. If a factor has a definite order to
its levels, you can specify that by declaring it to be an
``ordered factor''. For example, suppose a marketing firm were
trying to guage the appeal of a new movie. The data could consist of
a factor for viewers' responses, and perhaps several vectors of demographic
data on each viewer.
> responses [1] Great Good Good Poor Great Fair Awful Fair Good Fair Poor Great [13] Fair Good Poor Fair Fair Great > levels(responses) [1] "Awful" "Fair" "Good" "Great" "Poor"The factor
responses
has five levels, but they should not be
in alphabetical order. Use the ordered
function to convert
responses
to an ordered factor.
> responses <- ordered(responses, levels=c("Awful", "Poor", "Fair", "Good", "Great")) > responses [1] Great Good Good Poor Great Fair Awful Fair Good Fair Poor Great [13] Fair Good Poor Fair Fair Great Awful < Poor < Fair < Good < GreatNow S-PLUS understands that there is an ordering to the data, with ``Awful'' being the lowest level and ``Great'' being the highest.
WARNING: Do not try to change the order of the data by
making an assignment to the levels
attribute. Always change
the order of the data by using the ordered
function.
The only reason to make an assignment to the levels
attribute
is to change the labels on the levels.
> medicine [1] Placebo Placebo Placebo A A A A B B [10] B > levels(medicine) [1] "A" "B" "Placebo" > levels(medicine) <- c("Medicine A", "Medicine B", "Placebo") > medicine [1] Placebo Placebo Placebo Medicine A Medicine A Medicine A [7] Medicine A Medicine B Medicine B Medicine B
Question: Generate a vector,moreresponses
, which consists of ``Best Movie Ever'' repeated twice and ``Worst Movie Ever'' repeated three times. Make it an ordered factor, so that ``Worst Movie Ever'' is the lower of the two levels. Appendmoreresponses
toresponses
. What does this new vector look like? Do some editing to make ``Worst Movie Ever'' the lowest level (below ``Awful'') and ``Best Movie Ever'' the highest (above ``Great''). Change the labels on the levels, if necessary.