Name:
Andrew ID:
Collaborated with:

On this homework, you can collaborate with your classmates, but you must identify their names above, and you must submit your own homework as an knitted HTML file on Canvas, by Sunday 10pm, this week.

Huber loss function

Recall, as covered in lab, the Huber loss function (or just Huber function, for short), with cutoff \(a\), which is defined as: \[ \psi_a(x) = \begin{cases} x^2 & \text{if $|x| \leq a$} \\ 2a|x| - a^2 & \text{if $|x| > a$} \end{cases} \] This function is quadratic on the interval \([-a,a]\), and linear outside of this interval. It transitions from quadratic to linear “smoothly”, and looks like this (when \(a=1\)):

Plotting practice, side effects

huber = function(x, a=1) {
  ifelse(abs(x) <= a, x^2, 2*a*abs(x)-a^2)
}

Exploring function environments

huber = function(x, a=1) {
  x.squared = x^2
  ifelse(abs(x) <= a, x.squared, 2*a*abs(x)-a^2)
}
huber.sloppy = function(x) {
  ifelse(abs(x) <= a, x^2, 2*a*abs(x)-a^2)
}

Shakespeare’s complete works

Once more, as in lab (and lab/hw from Week 3), we’re going to look at that the complete works of William Shakespeare from Project Gutenberg. We’ve put this text file up at http://www.stat.cmu.edu/~ryantibs/statcomp-S18/data/shakespeare.txt.

Functions for word tables

get.dtmat.from.wordtabs = function(wordtab.list) {
  # First get all the unique words
  master.words = c() # Compute the master list here
  master.words = sort(master.words)
  
  # Then build the document-term matrix
  dt.mat = matrix(0, nrow=length(wordtab.list), ncol=length(master.words))
  rownames(dt.mat) = names(wordtab.list)
  colnames(dt.mat) = master.words
  for (i in 1:nrow(dt.mat)) {
    # Populate the ith row of dt.mat here
  }
  
  return(dt.mat)
}