Last week: R basics

We write programs by composing functions to manipulate data
The basic data types let us represent Booleans, numbers, and characters
Data structures let us group together related values
Vectors let us group values of the same type
Arrays add multi-dimensional structure to vectors
Matrices act like you’d hope they would
Lists let us combine different types of data
Data frames are hybrids of matrices and lists, allowing each column to have a different data type

Part I

Indexing

How R indexes vectors, matrices, lists

There are 3 ways to index a vector, matrix, data frame, or list in R:

Using explicit integer indices (or negative integers)
Using a Boolean vector (often created on-the-fly)
Using names

Note: in general, we have to set the names ourselves. Use names() for vectors and lists, and rownames(), colnames() for matrices and data frames

Indexing with integers

The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors:

set.seed(33) # For reproducibility
x.vec = rnorm(6) # Generate a vector of 6 random standard normals
x.vec

## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683

x.vec[3] # Third element

## [1] 1.010539

x.vec[c(3,4,5)] # Third through fifth elements

## [1]  1.0105390 -0.1582624 -2.1566375

x.vec[3:5] # Same, but written more succintly

## [1]  1.0105390 -0.1582624 -2.1566375

x.vec[c(3,5,4)] # Third, fifth, then fourth element

## [1]  1.0105390 -2.1566375 -0.1582624

x.vec[-3] # All but third element

## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750  0.49864683

x.vec[c(-3,-4,-5)] # All but third through fifth element

## [1] -0.13592452 -0.04079697  0.49864683

x.vec[-c(3,4,5)] # Same

## [1] -0.13592452 -0.04079697  0.49864683

x.vec[-(3:5)] # Same, more succint (note the parantheses!)

## [1] -0.13592452 -0.04079697  0.49864683

Examples for matrices:

x.mat = matrix(x.vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals,
                            # column major order
x.mat

##             [,1]       [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [3,]  1.01053901  0.4986468

x.mat[2,2] # Element in 2nd row, 2nd column

## [1] -2.156638

x.mat[5] # Same (note this is using column major order)

## [1] -2.156638

x.mat[2,] # Second row

## [1] -0.04079697 -2.15663750

x.mat[1:2,] # First and second rows

##             [,1]       [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375

x.mat[,1] # First column

## [1] -0.13592452 -0.04079697  1.01053901

x.mat[,-1] # All but first column

## [1] -0.1582624 -2.1566375  0.4986468

Examples for lists:

x.list = list(x.vec, letters, sample(c(TRUE,FALSE),size=4,replace=TRUE))
x.list

## [[1]]
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
## 
## [[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## 
## [[3]]
## [1]  TRUE  TRUE FALSE FALSE

x.list[[3]] # Third element of list

## [1]  TRUE  TRUE FALSE FALSE

x.list[3] # Third element of list, kept as a list

## [[1]]
## [1]  TRUE  TRUE FALSE FALSE

x.list[1:2] # First and second elements of list (note the single brackets!)

## [[1]]
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
## 
## [[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"

x.list[-1] # All but first element of list

## [[1]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## 
## [[2]]
## [1]  TRUE  TRUE FALSE FALSE

Note: you will get errors if you try to do either of above commands with double brackets [[ ]]

Indexing with booleans

This might appear a bit more tricky at first but is very useful, especially when we define a boolean vector “on-the-fly”. Examples for vectors:

x.vec[c(F,F,T,F,F,F)] # Third element

## [1] 1.010539

x.vec[c(T,T,F,T,T,T)] # All but third element

## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750  0.49864683

pos.vec = x.vec > 0 # Boolean vector indicating whether each element is positive
pos.vec

## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE

x.vec[pos.vec] # Pull out only positive elements

## [1] 1.0105390 0.4986468

x.vec[x.vec > 0] # Same, but more succint (this is done "on-the-fly")

## [1] 1.0105390 0.4986468

Works the same way for lists; in lab, we’ll explore logical indexing for matrices

Indexing with names

Indexing with names can also be quite useful. We must have names in the first place; with vectors or lists, use names() to set the names

names(x.list) = c("normals", "letters", "bools")
x.list[["letters"]] # "letters" (third) element

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"

x.list$letters # Same, just using different notation

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"

x.list[c("normals","bools")]

## $normals
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
## 
## $bools
## [1]  TRUE  TRUE FALSE FALSE

We will see indexing by names being especially useful when we talk more about data frames, shortly
In lab, we’ll practice using rownames() and colnames() and named indexing with matrices

Part II

Control flow (if, else, etc.)

Control flow

Summary of the control flow tools in R:

if(), else if(), else: standard conditionals
ifelse(): conditional function that vectorizes nicely
switch(): handy for deciding between several options

`if()` and `else`

Use if() and else to decide whether to evaluate one block of code or another, depending on a condition

x = 0.5

if (x >= 0) {
  x
} else {
  -x
}

## [1] 0.5

Condition in if() needs to give one TRUE or FALSE value
Note that the else statement is optional
Single line actions don’t need braces, i.e., could shorten above to if (x >= 0) x else -x

`else if()`

We can use else if() arbitrarily many times following an if() statement

x = -2

if (x^2 < 1) {
  x^2 
} else if (x >= 1) {
  2*x-1
} else {
 -2*x+1
}

## [1] 5

Each else if() only gets considered if the conditions above it were not TRUE
The else statement gets evaluated if none of the above conditions were TRUE
Note again that the else statement is optional

Quick decision making

In the ifelse() function we specify a condition, then a value if the condition holds, and a value if the condition fails

ifelse(x > 0, x, -x)

## [1] 2

One advantage of ifelse() is that it vectorizes nicely; we’ll see this on the lab

Deciding between many options

Instead of an if() statement followed by elseif() statements (and perhaps a final else), we can use switch(). We pass a variable to select on, then a value for each option

type.of.summary = "mode"

switch(type.of.summary,
       mean=mean(x.vec),
       median=median(x.vec),
       histogram=hist(x.vec),
       "I don't understand")

## [1] "I don't understand"

Here we are expecting type.of.summary to be a string, either “mean”, “median”, or “histogram”; we specify what to do for each
The last passed argument has no name, and it serves as the else clause
Try changing type.of.summary above and see what happens

Reminder: Boolean operators

Remember our standard Boolean operators, & and |. These combine terms elementwise

u.vec = runif(10, -1, 1)
u.vec

##  [1]  0.54949775 -0.22561403 -0.72846986  0.80071515  0.13290531 -0.91453168
##  [7] -0.02336149 -0.29755356  0.93932343  0.57915778

u.vec[-0.5 <= u.vec & u.vec <= 0.5] = 999 
u.vec

##  [1]   0.5494977 999.0000000  -0.7284699   0.8007152 999.0000000  -0.9145317
##  [7] 999.0000000 999.0000000   0.9393234   0.5791578

Lazy Boolean operators

In contrast to the standard Boolean operators, && and || give just a single Boolean, “lazily”: meaning we terminate evaluating the expression ASAP

(0 > 0) && all(matrix(0,2,2) == matrix(0,3,3))

## [1] FALSE

(0 > 0) && (ThisVariableIsNotDefined == 0)

## [1] FALSE

Note R never evaluates the expression on the right in each line (each would throw an error)
In control flow, we typically just want one Boolean
Rule of thumb: use & and | for indexing or subsetting, and && and || for conditionals

Part III

Iteration

Iteration

Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming

Summary of the iteration methods in R:

for(), while() loops: standard loop constructs
Vectorization: use it whenever possible! Often faster and simpler
The apply family of functions: alternative to for() loop, these are base R functions
The map family of functions: another alternative, very useful, from the purrr package

`for()`

A for() loop increments a counter variable along a vector. It repeatedly runs a code block, called the body of the loop, with the counter set at its current value, until it runs through the vector

n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
  log.vec[i] = log(i)
}
log.vec

##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 2.0794415 2.1972246 2.3025851

Here i is the counter and the vector we are iterating over is 1:n. The body is the code in between the braces

Breaking from the loop

We can break out of a for() loop early (before the counter has been iterated over the whole vector), using break

n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
  if (log(i) > 2) {
    cat("I'm outta here. I don't like numbers bigger than 2\n")
    break
  }
  log.vec[i] = log(i)
}

## I'm outta here. I don't like numbers bigger than 2

log.vec

##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 0.0000000 0.0000000 0.0000000

Variations on standard `for()` loops

Many different variations on standard for() are possible. Two common ones:

Nonnumeric counters: counter variable always gets iterated over a vector, but it doesn’t have to be numeric
Nested loops: body of the for() loop can contain another for() loop (or several others)

for (str in c("Prof", "Ryan", "Tibs")) {
  cat(paste(str, "declined to comment\n"))
}

## Prof declined to comment
## Ryan declined to comment
## Tibs declined to comment

for (i in 1:4) {
  for (j in 1:i^2) {
    cat(paste(j,""))
  }
  cat("\n")
}

## 1 
## 1 2 3 4 
## 1 2 3 4 5 6 7 8 9 
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

`while()`

A while() loop repeatedly runs a code block, again called the body, until some condition is no longer true

i = 1
log.vec = c()
while (log(i) <= 2) {
  log.vec = c(log.vec, log(i))
  i = i+1
}
log.vec

## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

`for()` versus `while()`

for() is better when the number of times to repeat (values to iterate over) is clear in advance
while() is better when you can recognize when to stop once you’re there, even if you can’t guess it to begin with
while() is more general, in that every for() could be replaced with a while() (but not vice versa)

`while(TRUE)` or `repeat`

while(TRUE) and repeat: both do the same thing, just repeat the body indefinitely, until something causes the flow to break. Example (try running in your console):

repeat {
  ans = readline("Who is the best Professor of Statistics at CMU? ")
  if (ans == "Tibs" || ans == "Tibshirani" || ans == "Ryan") {
    cat("Yes! You get an 'A'.")
    break
  }
  else {
    cat("Wrong answer!\n")
  } 
}

Avoiding explicit iteration

Warning: some people have a tendency to overuse for() and while() loops in R
They aren’t always needed. Remember vectorization should be used whenever possible
We’ll emphasize this on the lab, and try to hit upon it throughout the course

Summary

Three ways to index vectors, matrices, data frames, lists: integers, Booleans, names
Boolean on-the-fly indexing can be very useful
Named indexing will be especially useful for data frames
Indexing lists can be a bit tricky (beware of the difference between [ ] and [[ ]])
if(), elseif(), else: standard conditionals
ifelse(): shortcut for using if() and else in combination
switch(): shortcut for using if(), elseif(), and else in combination
for(), while(), repeat: standard loop constructs
Don’t overuse explicit for() loops, vectorization is your friend!
apply() and **ply(): can also be very useful (we’ll see them later)

Indexing and Iteration

Statistical Computing, 36-350

Tuesday September 7, 2021

Last week: R basics

Part I

How R indexes vectors, matrices, lists

Indexing with integers

Indexing with booleans

Indexing with names

Part II

Control flow

`if()` and `else`

`else if()`

Quick decision making

Deciding between many options

Reminder: Boolean operators

Lazy Boolean operators

Part III

Iteration

`for()`

Breaking from the loop

Variations on standard `for()` loops

`while()`

`for()` versus `while()`

`while(TRUE)` or `repeat`

Avoiding explicit iteration

Summary

Indexing and Iteration

Statistical Computing, 36-350

Tuesday September 7, 2021

Last week: R basics

Part I

How R indexes vectors, matrices, lists

Indexing with integers

Indexing with booleans

Indexing with names

Part II

Control flow

if() and else

else if()

Quick decision making

Deciding between many options

Reminder: Boolean operators

Lazy Boolean operators

Part III

Iteration

for()

Breaking from the loop

Variations on standard for() loops

while()

for() versus while()

while(TRUE) or repeat

Avoiding explicit iteration

Summary

`if()` and `else`

`else if()`

`for()`

Variations on standard `for()` loops

`while()`

`for()` versus `while()`

`while(TRUE)` or `repeat`