STAT3002

 

Introduction to S-Plus

 

S-Plus is a software package designed for data analysis and graphical display. S-Plus is the most popular statistics program for data analysis and research by mathematical statisticians. This is due to the high quality graphics, the interactive nature of the language and the ease with which it can be extended with new methodology.

 

Working with S-Plus

 

S-Plus for Windows has a standard help system available from the Help item on the main menu. At the command line prompt it is also possible to use the commands help or ? , for example, by

 

> help(help)

> ?help

 

will display the help page about the help function. Pressing either Ctrl-C or the ESC key can interrupt S-Plus. To terminate your S-Plus session you may either give the command q() at the command line or EXIT from the FILE menu.

 

To view a list of objects that S-Plus currently has stored you can use the objects() command. To view a list of the places that S-Plus looks for objects, type the search() command. The last object you attached will always be located in the second spot on the list. The command objects(2) will give a list of the objects in the second place in the search list.

 

To add a directory or object to the S-Plus search path use the attach() command. This will enable you to use the directory or object during your S-Plus session. To remove a directory or object from the search path you can use the detach() command.

 

Vectors and Lists

 

The two basic types of data structures in S-Plus are vectors and lists. Vectors may contain either numbers or characters, but all elements must be of the same type.

 

> w <- seq(1,10,0.5) # generate a sequence between 1 and 10 in steps of 0.5

> y <- rexp(10,2)

> y

 [1] 0.02006780 0.03229059 1.56062583 0.13418912 0.17417082 0.34303562 0.03701264

 [8] 0.08701415 0.30156031 0.35891966

> y[2]

[1] 0.03229059

> y[4:9]

[1] 0.13418912 0.17417082 0.34303562 0.03701264 0.08701415 0.30156031

> y[y>1]

[1] 1.560626

> x <- c(1.5, 2.6, 3.7, 4.8)

> x

[1] 1.5 2.6 3.7 4.8

> y[c(1, 5, 9)]

[1] 0.0200678 0.1741708 0.3015603

> cities <- c("Brisbane", "Melbourne", "Adelaide", "Sydney")

> cities

[1] "Brisbane"  "Melbourne" "Adelaide"   "Sydney"  

> cities[c(1, 3)]

[1] "Brisbane" "Adelaide"

 

Lists can contain anything at all. The components of a list may be numbers or characters, or even vectors or other lists.

 

> countries <- list(Name=c("Australia", "Sweden"), Capital=c("Canberra", "Stockholm"), Population=c(18,10))

> countries

$Name:

[1] "Australia" "Sweden"  

 

$Capital:

[1] "Canberra"  "Stockholm"

 

$Population:

[1] 18 10

 

> countries$Capital

[1] "Canberra"  "Stockholm"

> countries$Population

[1] 18 10

 

Object-Oriented Programming

 

Everything that is created in S-Plus is referred to as an object. S-Plus knows that different types of objects should be treated differently depending upon the type of information they contain.

 

> car.price

 [1] 24760 13150 16145 10320 14525 10945 12495   9745 15395 15350  8895 12267   7402

[14]  6319 17257 11470  9483 14980 12145   6635  9410 13945 12459 23300 14944  6599

[27]  8672 10989 17879 13249 17899   7399 11650 10565  7254 11499   9599 11588  8748

[40] 21498  6488   9995 18450

 

> summary(car.price)

 Min. 1st Qu. Median  Mean 3rd Qu.  Max.

 6319    9446   11590 12410   14960 24760

 

> car.type

 [1] Medium  Medium   Large   Compact Large    Compact Medium  Sporty   Van     Medium

[11] Small    Van     Small    Small   Large    Sporty  Compact Medium   Compact Small 

[21] Sporty  Sporty   Compact Medium  Van      Small   Small    Compact Compact Sporty

[31] Medium  Small    Compact Compact Small   Compact Small   Compact Small    Medium

[41] Small   Small    Compact

 

> summary(car.type)

 Compact Large Medium Small Sporty Van

      12     3       8    12       5   3

 

The commands print, summary and plot can be applied to any object with more or less sensible results. They are what are referred to as generic commands.

 

> print(car.price)

 [1] 24760 13150 16145 10320 14525 10945 12495   9745 15395 15350  8895 12267   7402

[14]  6319 17257 11470   9483 14980 12145  6635   9410 13945 12459 23300 14944  6599

[27]  8672 10989 17879 13249 17899   7399 11650 10565  7254 11499   9599 11588  8748

[40] 21498  6488   9995 18450

 

> par(mfrow=c(2,2)) #Sets up a graphing window in a 2 by 2 matrix

> plot(car.price)

> plot(car.type)

> plot(car.type,car.price)

 

 

Data Frames

 

A data frame is a type of object used to store a data matrix. It can be thought of as a list of variables of the same length, but possibly of different types. An S-Plus data frame is the equivalent of a Minitab or Microsoft Excel spreadsheet.

 

> auction

   Age Bidders Price

 1 127      13   1235

 2 115      12   1080

 3 127       7    845

 4 150       9   1522

 5 156       6   1047

 6 182      11   1979

 7 156      12   1822

 8 132      10   1253

 9 137       9   1297

10 113       9    946

11 137      15   1713

12 117      11   1024

13 137       8   1147

14 153       6   1092

15 117      13   1152

16 126      10   1336

17 170      14   2131

18 182       8   1550

19 162      11   1884

20 184      10   2041

21 143       6    854

22 159       9   1483

23 108      14   1055

24 175       8   1545

25 108       6    729

26 179       9   1792

27 111      15   1175

28 187       8   1593

29 111       7    785

30 115       7    744

31 194       5   1356

32 168       7   1262

 

Data may be read into a data frame using the read.table() command. For example, the command

 

> auction <- read.table("clipboard",header=T)

 

will read the data on the clipboard into a data frame labelled auction. A data frame may also be created from a group of vectors:

 

> z <- data.frame(u,v,x)

 

To remove the original vectors you may use the rm() command:

 

> rm(u,v,x)