Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class time if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted PDF file on Gradescope, by Friday 9pm, this week.

This week’s agenda: creating and updating functions; understanding argument and return structures; revisiting Shakespeare’s plays; code refactoring.

Huber loss function

The Huber loss function (or just Huber function, for short) is defined as: \[ \psi(x) = \begin{cases} x^2 & \text{if $|x| \leq 1$} \\ 2|x| - 1 & \text{if $|x| > 1$} \end{cases} \] This function is quadratic on the interval [-1,1], and linear outside of this interval. It transitions from quadratic to linear “smoothly”, and looks like this. It is often used in place of the usual squared error loss for robust estimation. For example, the sample average, \(\bar{X}\)—which given a sample \(X_1,\ldots,X_n\) minimizes the squared error loss \(\sum_{i=1}^n (X_i-m)^2\) over all choices of \(m\)—can be inaccurate as an estimate of \(\mathbb{E}(X)\) if the distribution of \(X\) is heavy-tailed. In such cases, minimizing Huber loss can give a better estimate.

Q1. Some simple function tasks

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q2. Plotting practice, side effects

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q3. Exploring function environments

huber = function(x, a=1) {
  x.squared = x^2
  ifelse(abs(x) <= a, x.squared, 2*a*abs(x)-a^2)
}
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
huber.sloppy = function(x) {
  ifelse(abs(x) <= a, x^2, 2*a*abs(x)-a^2)
}
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Shakespeare’s complete works

Recall, as we saw in Week 4, that the complete works of William Shakespeare are available freely from Project Gutenberg. We’ve put this text file up at http://www.stat.cmu.edu/~ryantibs/statcomp/data/shakespeare.txt.

Q4. Getting lines of text play-by-play

# get.wordtab.from.url: get a word table from text on the web
# Inputs:
# - str.url: string, specifying URL of a web page 
# - split: string, specifying what to split on. Default is the regex pattern
#   "[[:space:]]|[[:punct:]]"
# - tolower: Boolean, TRUE if words should be converted to lower case before
#   the word table is computed. Default is TRUE
# - keep.nums: Boolean, TRUE if words containing numbers should be kept in the
#   word table. Default is FALSE
# Output: list, containing lines, words, word table, and some basic summaries

get.wordtab.from.url = function(str.url, split="[[:space:]]|[[:punct:]]",
                                tolower=TRUE, keep.nums=FALSE) {
  lines = readLines(str.url)
  text = paste(lines, collapse=" ")
  words = strsplit(text, split=split)[[1]]
  words = words[words != ""]
    
  # Convert to lower case, if we're asked to
  if (tolower) words = tolower(words)
    
  # Get rid of words with numbers, if we're asked to
  if (!keep.nums) 
    words = grep("[0-9]", words, inv=TRUE, val=TRUE)
  
  # Compute the word table
  wordtab = table(words)
  
  return(list(wordtab=wordtab,
              number.unique.words=length(wordtab),
              number.total.words=sum(wordtab),
              longest.word=words[which.max(nchar(words))]))
}
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q5. Getting word tables play-by-play

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE
# YOUR CODE GOES HERE

Q6. Refactoring the word table functions

# YOUR CODE GOES HERE
# YOUR CODE GOES HERE