Next: A Statistical Example with Up: Reading Data in S-PLUS Previous: Attaching a Data Frame

Selecting Parts of a Data Frame

The data in a data frame are a matrix, so you can select observations out of it using matrix notation. For instance, the element in the fifth row, second column of pain.relief is

> pain.relief[5,2]
[1] 20

The elements in the first, second and fifth rows, first and second columns are

> pain.relief[c(1,2,5),c(1,2)]
  Dose Water
1  0.1    10
2  0.2    10
5  0.1    20

This works just like selection from vectors, only now in two dimensions.

Sometimes it is important to exclude part of a data frame. For instance, suppose you discover that the third observation in the pain.relief data frame is an outlier. To see the Dose variable with the third observation removed, do the following:

> Dose[-3]
   1   2   4   5   6   7   8   9  10  11  12
 0.1 0.2 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4

Dose is a one-dimensional vector, so it only needs one subscript. The pain.relief data frame has two dimensions, so S-PLUS will not know whether pain.relief[-3] means ``without row 3'' or ``without column 3''. The correct way to do it is to put a comma between the two subscripts (and no subscript means ``use all rows'' or ``use all columns''. To see pain.relief without row 3 but with all columns, type:

> pain.relief[-3,]
   Dose Water Relief
 1  0.1    10      7
 2  0.2    10     15
 4  0.4    10     23
 5  0.1    20     15
 6  0.2    20     13
 7  0.3    20     26
 8  0.4    20     38
 9  0.1    30     21
10  0.2    30     28
11  0.3    30     31
12  0.4    30     47

Similarly if you wanted to see all rows but without the second column you could type pain.relief[,-2].

Question: How do you exclude multiple rows?

Next: A Statistical Example with Up: Reading Data in S-PLUS Previous: Attaching a Data Frame

Brian Junker 2002-08-26