An Example in S-PLUS

In the previous example we had to type the data in by hand. S-PLUS can read data from a file, as will be described in a future handout. For now, consider a data set that comes pre-loaded into S-PLUS.

These data concerns cars purchased in 1974. There are three vectors of data: car.time, the number of days since the car was purchased; car.miles, the number of miles driven betwen this fill-up and the previous fill-up; and car.gals, the number of gallons required to fill the tank. These vectors are already loaded into S-PLUS, so you should be able to access them by typing their names.

> car.time
  [1]   1.0   7.0  17.0  28.0  42.0  56.0  65.0  74.0  76.0  88.0  92.0 100.0
 [13] 107.0 119.0 126.0 135.0 139.0 146.0 151.0 156.0 161.0 170.0 177.0 192.0
 [25] 200.0 208.0 215.0 219.0 226.0 234.0 239.0 245.0 251.0 257.0 262.0 270.0
 [37] 277.0 282.0 288.0 289.0 292.0 292.5 294.0 295.0 300.0 305.0 310.0 314.0
 [49] 321.0 323.0 330.0 338.0 347.0 355.0 360.0 364.0 368.0 370.0 370.5 371.0
 [61] 377.0 382.0 389.0 395.0 405.0 409.0 413.0 416.0 421.0 423.0 427.0 433.0
 [73] 438.0 447.0 453.0 461.0 466.0 475.0 482.0 487.0 491.0 501.0 516.0 524.0
 [85] 534.0 543.0 548.0 558.0 565.0 572.0 576.0 583.0 595.0 603.0 607.0 607.5
 [97] 609.0 610.0 610.5 614.0 622.0 635.0 645.0 652.0 660.0 667.0 686.0 702.0
[109] 717.0 728.0 737.0 747.0 753.0 764.0 771.0 777.0 783.0 786.0
> car.miles
  [1] 210.0 199.0 182.0 208.4 217.0 379.8 204.0 180.6 157.5 226.8 181.0 194.5
 [13] 194.5 208.3 188.6 193.0 212.0 207.5 216.2 203.1 202.5 193.5 116.0 221.6
 [25] 194.2 213.6 217.3 211.5 200.0 191.0 227.0 207.4 209.8 212.5 209.0 213.1
 [37] 215.4 215.8 204.1 165.0 176.3 243.0 142.0 293.0 118.0 224.0 221.9 199.0
 [49] 242.0 222.0 219.0 202.6 221.0 226.0 207.0 218.0 228.0  97.0 266.0 253.0
 [61] 191.0 228.0 212.7 217.0 190.0 213.0 210.6 217.0 176.7 206.0 208.3 190.0
 [73] 212.0 313.0 197.0 189.0 188.0 334.0 203.0 209.0 236.0 206.0 163.0 195.0
 [85] 201.0 213.0 209.5 197.4 199.6 167.0 208.0 204.0 188.5 212.3  88.7 200.0
 [97] 179.0 153.0 207.0 234.0 193.0 210.0 196.0 217.0 228.0 201.0 216.0 189.4
[109] 184.7 189.0 197.0 168.0 209.0 202.0 183.0 204.0 197.0 204.0
> car.gals
  [1] 13.3 12.2 11.5 13.5 14.3 25.7 13.3 12.7  8.9 14.2 12.0 13.2 12.8 13.6 12.3
 [16] 13.2 13.1 13.6 13.6 13.0 12.5 11.6  7.5 14.2 11.9 13.1 13.6 14.2 12.1 12.8
 [31] 13.9 13.0 12.5 13.6 12.8 14.2 13.0 13.8 12.3  8.9 10.3 11.6  7.8 14.5  7.1
 [46] 12.0 12.3 12.5 14.1 13.0 12.3 13.2 13.7 13.4 12.5 12.9 13.3  6.2 13.2 12.8
 [61] 11.2 13.5 13.2 14.1 13.6 13.0 12.8 13.2 12.9 13.0 13.0 12.0 14.5 21.2 13.2
 [76] 12.9 12.5 22.0 13.1 12.3 13.2 13.7 12.5 13.2 13.5 13.9 12.5 12.4 12.3 10.5
 [91] 11.5 12.8 12.0 13.0  5.8  9.5  9.0  7.7 10.1 11.9 11.4 12.6 12.5 12.7 13.6
[106] 11.7 13.5 12.3 13.0 13.2 13.2 11.6 13.8 13.5 13.9 14.2 13.0 13.0
Suppose we are interested in how many miles per gallon these cars got. Let's create a ``miles per gallon'' vector by dividing the car.miles vector by the car.gals vector. S-PLUS will divide each element of car.miles by the corresponding element of car.gals, which is exactly what we want.

> car.mpg <- car.miles/car.gals
Now we can analyze the miles per gallon data. Some summary statistics might be a good place to start:

> summary(car.mpg)
  Min. 1st Qu. Median  Mean 3rd Qu.  Max.
 13.04   15.19  15.93 16.21   16.78 21.05
> var(car.mpg)
[1] 2.438916
A stem-and-leaf plot might also prove interesting:

> stem(car.mpg)

N = 118   Median = 15.92875
Quartiles = 15.18182, 16.784

Decimal point is at the colon

   13 : 02
   13 : 7
   14 : 02234
   14 : 5667788899999
   15 : 00001122223333333444
   15 : 5566666777888899999
   16 : 000001122233333344
   16 : 556666778889999
   17 : 01111122
   17 : 789
   18 : 012
   18 : 57
   19 :
   19 : 7899

High: 20.15152 20.20690 20.49505 20.94828 21.05263
This shows that most of these cars got between 15 and 17 miles per gallon, although there are a few cars which got 19.5 MPG or more. This might be related to how long the car has been owned. We can summarize the car.time vector and then select the cars with the good MPG numbers to see how long they have been owned and compare them to the rest.

> summary(car.time)
 Min. 1st Qu. Median  Mean 3rd Qu. Max.
    1   235.2  370.8 388.2   563.2  786
> car.time[car.mpg > 19.5]
[1] 292.5 295.0 370.5 371.0 607.5 609.0 610.0 610.5 614.0
There does not appear to be any pattern to this: some of the cars are relatively old, some are rather new.

Question: How would you do a stem-and-leaf plot of miles per gallon for all cars owned less than 600 days?


Pantelis Vlachos
1/15/1999