STAT3002
Introduction to S-Plus
S-Plus is a software package designed for data analysis and graphical display. S-Plus is the most popular statistics program for data analysis and research by mathematical statisticians. This is due to the high quality graphics, the interactive nature of the language and the ease with which it can be extended with new methodology.
S-Plus for Windows has a standard help system available from the Help item on the main menu. At the command line prompt it is also possible to use the commands help or ? , for example, by
> help(help)
will display the help page about the help function. Pressing either Ctrl-C or the ESC key can interrupt S-Plus. To terminate your S-Plus session you may either give the command q() at the command line or EXIT from the FILE menu.
To view a list of objects that S-Plus currently has stored you can use the objects() command. To view a list of the places that S-Plus looks for objects, type the search() command. The last object you attached will always be located in the second spot on the list. The command objects(2) will give a list of the objects in the second place in the search list.
To add a directory or object to the S-Plus search path use the attach() command. This will enable you to use the directory or object during your S-Plus session. To remove a directory or object from the search path you can use the detach() command.
The two basic types of data structures in S-Plus are vectors and lists. Vectors may contain either numbers or characters, but all elements must be of the same type.
> w <- seq(1,10,0.5) # generate a sequence between 1 and 10 in steps of 0.5
> y <- rexp(10,2)
> y
[1] 0.02006780 0.03229059 1.56062583 0.13418912 0.17417082 0.34303562
0.03701264
[8] 0.08701415 0.30156031 0.35891966
> y[2]
[1] 0.03229059
> y[4:9]
[1] 0.13418912 0.17417082 0.34303562 0.03701264 0.08701415 0.30156031
> y[y>1]
[1] 1.560626
> x <- c(1.5, 2.6, 3.7, 4.8)
> x
[1] 1.5 2.6 3.7 4.8
> y[c(1, 5, 9)]
[1] 0.0200678 0.1741708 0.3015603
> cities <- c("Brisbane", "Melbourne", "Adelaide", "Sydney")
> cities
[1] "Brisbane" "Melbourne" "Adelaide"
"Sydney"
> cities[c(1, 3)]
[1] "Brisbane" "Adelaide"
Lists can contain anything at all. The components of a list may be numbers or characters, or even vectors or other lists.
> countries <- list(Name=c("Australia", "Sweden"), Capital=c("Canberra",
"Stockholm"), Population=c(18,10))
> countries
$Name:
[1] "Australia" "Sweden"
$Capital:
[1] "Canberra" "Stockholm"
$Population:
[1] 18 10
> countries$Capital
[1] "Canberra" "Stockholm"
> countries$Population
[1] 18 10
Everything that is created in S-Plus is referred to as an object. S-Plus knows that different types of objects should be treated differently depending upon the type of information they contain.
> car.price
[1] 24760 13150 16145 10320 14525 10945 12495
9745 15395 15350 8895 12267
7402
[14] 6319 17257 11470 9483 14980 12145 6635 9410 13945 12459 23300 14944 6599
[27] 8672 10989 17879 13249 17899
7399 11650 10565 7254 11499
9599 11588 8748
[40] 21498 6488
9995 18450
> summary(car.price)
Min. 1st Qu. Median Mean
3rd Qu. Max.
6319 9446
11590 12410 14960 24760
> car.type
[1] Medium Medium
Large Compact Large
Compact Medium Sporty
Van Medium
[11] Small
Van Small
Small Large
Sporty Compact Medium
Compact Small
[21] Sporty Sporty
Compact Medium Van
Small Small
Compact Compact Sporty
[31] Medium Small
Compact Compact Small
Compact Small Compact Small
Medium
[41] Small Small
Compact
> summary(car.type)
Compact Large Medium Small Sporty Van
12 3
8 12
5 3
The commands print, summary and plot can be applied to any object with more or less sensible results. They are what are referred to as generic commands.
> print(car.price)
[1] 24760 13150 16145 10320 14525 10945 12495
9745 15395 15350 8895 12267
7402
[14] 6319 17257 11470
9483 14980 12145 6635
9410 13945 12459 23300 14944
6599
[27] 8672 10989 17879 13249 17899
7399 11650 10565 7254 11499
9599 11588 8748
[40] 21498 6488
9995 18450
> par(mfrow=c(2,2)) #Sets up a graphing window in a 2 by 2 matrix
> plot(car.price)
> plot(car.type)
> plot(car.type,car.price)
A data frame is a type of object used to store a data matrix. It can be thought of as a list of variables of the same length, but possibly of different types. An S-Plus data frame is the equivalent of a Minitab or Microsoft Excel spreadsheet.
> auction
Age Bidders Price
1 127 13
1235
2 115 12
1080
3 127 7
845
4 150 9
1522
5 156 6
1047
6 182 11
1979
7 156 12
1822
8 132 10
1253
9 137 9
1297
10 113 9
946
11 137 15
1713
12 117 11
1024
13 137 8
1147
14 153 6
1092
15 117 13
1152
16 126 10
1336
17 170 14
2131
18 182 8
1550
19 162 11
1884
20 184 10
2041
21 143 6
854
22 159 9
1483
23 108 14
1055
24 175 8
1545
25 108 6
729
26 179 9
1792
27 111 15
1175
28 187 8
1593
29 111 7
785
30 115 7
744
31 194 5
1356
32 168 7 1262
Data may be read into a data frame using the read.table() command. For example, the command
> auction <- read.table("clipboard",header=T)
will read the data on the clipboard into a data frame labelled auction. A data frame may also be created from a group of vectors:
> z <- data.frame(u,v,x)
To remove the original vectors you may use the rm() command:
> rm(u,v,x)