36-350
3 September 2014
v[5]
, v[“name”]a[5,6,2]
, a[,6,]
, a[rowname, colname, layername]
m[5,6]
, m[,6]
, m[,colname]
l[[3]]
, l$name
library(datasets)
states <- data.frame(state.x77, abb=state.abb, region=state.region, division=state.division)
data.frame()
is combining here a pre-existing matrix (state.x77
), a vector of characters (state.abb
), and two vectors of qualitative categorical variables (factors; state.region
, state.division
)
Column names are preserved or guessed if not explicitly set
colnames(states)
[1] "Population" "Income" "Illiteracy" "Life.Exp" "Murder"
[6] "HS.Grad" "Frost" "Area" "abb" "region"
[11] "division"
states[1,]
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
abb region division
Alabama AL South East South Central
states[49,3]
[1] 0.7
states["Wisconsin","Illiteracy"]
[1] 0.7
states["Wisconsin",]
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Wisconsin 4589 4468 0.7 72.48 3 54.5 149 54464
abb region division
Wisconsin WI North Central East North Central
Exercise: what class is states["Wisconsin",]
?
head(states[,3])
[1] 2.1 1.5 1.8 1.9 1.1 0.7
head(states[,"Illiteracy"])
[1] 2.1 1.5 1.8 1.9 1.1 0.7
head(states$Illiteracy)
[1] 2.1 1.5 1.8 1.9 1.1 0.7
states[states$division=="New England", "Illiteracy"]
[1] 1.1 0.7 1.1 0.7 1.3 0.6
states[states$region=="South", "Illiteracy"]
[1] 2.1 1.9 0.9 1.3 2.0 1.6 2.8 0.9 2.4 1.8 1.1 2.3 1.7 2.2 1.4 1.4
Parts or all of the dataframe can be assigned to:
summary(states$HS.Grad)
Min. 1st Qu. Median Mean 3rd Qu. Max.
37.8 48.0 53.2 53.1 59.2 67.3
states$HS.Grad <- states$HS.Grad/100
summary(states$HS.Grad)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.378 0.480 0.532 0.531 0.592 0.673
states$HS.Grad <- 100*states$HS.Grad
What percentage of literate adults graduated HS?
head(100*(states$HS.Grad/(100-states$Illiteracy)))
[1] 42.19 67.72 59.16 40.67 63.30 64.35
with()
takes a data frame and evaluates an expression “inside” it:
with(states, head(100*(HS.Grad/(100-Illiteracy))))
[1] 42.19 67.72 59.16 40.67 63.30 64.35
Lots of functions take data
arguments, and look variables up in
that data frame:
plot(Illiteracy~Frost, data=states)
\( R^2 =0.45 \), \( p \approx {10}^{-7} \)
Have the computer decide what to do next
if the country code is not "US", multiply prices by current exchange rate
Simplest conditional:
if (x >= 0) {
x
} else {
-x
}
Condition in if
needs to give one TRUE
or FALSE
value
else
clause is optional
one-line actions don't need braces
if (x >= 0) x else -x
if
can nest arbitrarily deeply:
if (x^2 < 1) {
x^2
} else {
if (x >= 0) {
2*x-1
} else {
-2*x-1
}
}
Can get ugly though
&
work |
like +
or *
: combine terms element-wise
Flow control wants one Boolean value, and to skip calculating what's not needed
&&
and ||
give one Boolean, lazily:
(0 > 0) && (all.equal(42%%6, 169%%13))
[1] FALSE
This never evaluates the complex expression on the right
Use &&
and ||
for control, &
and |
for subsetting
Repeat similar actions multiple times:
table.of.logarithms <- vector(length=7,mode="numeric")
table.of.logarithms
[1] 0 0 0 0 0 0 0
for (i in 1:length(table.of.logarithms)) {
table.of.logarithms[i] <- log(i)
}
table.of.logarithms
[1] 0.0000 0.6931 1.0986 1.3863 1.6094 1.7918 1.9459
for (i in 1:length(table.of.logarithms)) {
table.of.logarithms[i] <- log(i)
}
for
increments a counter (here i
) along a vector (here 1:length(table.of.logarithms)
) and loops through the *body until it runs through the vector
“iterates over the vector”
N.B., there is a better way to do this job!
Can contain just about anything, including:
c <- matrix(0, nrow=nrow(a), ncol=ncol(b))
if (ncol(a) == nrow(b)) {
for (i in 1:nrow(c)) {
for (j in 1:ncol(c)) {
for (k in 1:ncol(a)) {
c[i,j] <- c[i,j] + a[i,k]*b[k,j]
}
}
}
} else {
stop("matrices a and b non-conformable")
}
while (max(x) - 1 > 1e-06) {
x <- sqrt(x)
}
Condition in the argument to while
must be a single Boolean value (like if
)
Body is looped over until the condition is FALSE
so can loop forever
Loop never begins unless the condition starts TRUE
for() is better when the number of times to repeat (values to iterate over) is clear in advance
while() is better when you can recognize when to stop once you're there, even if you can't guess it to begin with
Every for() could be replaced with a while()
Exercise: show this
R has many ways of avoiding iteration, by acting on whole objects
How many languages add 2 vectors:
c <- vector(length(a))
for (i in 1:length(a)) { c[i] <- a[i] + b[i] }
How R adds 2 vectors:
a+b
or a triple for()
loop for matrix multiplication vs. a %*% b
Many functions are set up to vectorize automatically
abs(-3:3)
[1] 3 2 1 0 1 2 3
log(1:7)
[1] 0.0000 0.6931 1.0986 1.3863 1.6094 1.7918 1.9459
See also apply()
from last week
We'll come back to this in great detail later
ifelse(x^2 > 1, 2*abs(x)-1, x^2)
1st argument is a Boolean vector, then pick from the 2nd or 3rd vector arguments as TRUE
or FALSE
if
, nested if
, switch
for
, while
0 counts as FALSE
; other numeric values count as TRUE
; the strings “TRUE” and “FALSE” count as you'd hope; most everything else gives an error
Advice: Don't play games here; try to make sure control expressions are getting Boolean values
Conversely, in arithmetic, FALSE
is 0 and TRUE
is 1
mean(states$Murder > 7)
[1] 0.48
Simplify nested if
with switch()
: give a variable to select on, then a value for each option
switch(type.of.summary,
mean=mean(states$Murder),
median=median(states$Murder),
histogram=hist(states$Murder),
"I don't understand")
Set type.of.summary
to, succesively, “mean”, “median”, “histogram”, and “mode”, and explain what happens
repeat {
print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!")
}
repeat {
if (watched) { next() }
print("Help! I am Dr. Morris Culpepper, trapped in an endless loop!")
if (rescued) { break() }
}
break()
exits the loop; next()
skips the rest of the body and goes back into the loop
both work with for()
and while()
as well
Exercise: how would you replace while()
with repeat()
?