Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 4 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 4 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday October 2. This document contains 12 of the 45 total points for Homework 4.

Important remark on compiling the homework: many homework questions depend on variables that are defined in the surrounding lab. It is easiest just to copy and paste the contents of all these labs into one big Rmd file, with your lab solutions and homework solutions filled in, knit it, and submit the HTML file.

Huber function

Hw4 Q1 (4 points). Reproduce the plot of the Huber function that you see above at the start of the lab. The axes and title should be just the same, so should the Huber curve (in black), so should be the red dotted lines at the values -1 and 1, and so should the text “Linear”, “Quadratic”, “Linear”.

Hw4 Q2 (2 points). Your instructor computed the Huber function values \(\psi_a(x)\) over a bunch of different \(x\) values, stored in huber.vals and x.vals, respectively. However, the cutoff \(a\) was, let’s say, lost. Using huber.vals, x.vals, and the definition of the Huber function, you should be able to figure out the cutoff value \(a\), at least roughly. Estimate \(a\) and explain how you got there. (Hint: draw in R or on a piece of paper the quadratic function \(y=x^2\) on top of the Huber function; when are they different?)

x.vals = seq(0, 5, length=21)
huber.vals = c(0.0000, 0.0625, 0.2500, 0.5625, 1.0000, 1.5625, 2.2500,
               3.0625, 4.0000, 5.0625, 6.2500, 7.5625, 9.0000, 10.5000,
               12.0000, 13.5000, 15.0000, 16.5000, 18.0000, 19.5000, 
               21.0000)

Get word table function

# get.wordtab: get a word table from text on the web
# Inputs:
# - str.url: string, specifying URL of a web page 
# - split: string, specifying what to split on. Default is the regex pattern
#   "[[:space:]]|[[:punct:]]"
# - tolower: boolean, TRUE if words should be converted to lower case before
#   the word table is computed. Default is TRUE
# Output: word table, i.e., vector with counts as entries and associated
#   words as names

get.wordtab = function(str.url, split="[[:space:]]|[[:punct:]]",
                       tolower=TRUE) {
  lines = readLines(str.url)
  text = paste(lines, collapse=" ")
  words = strsplit(text, split=split)[[1]]
  words = words[words != ""]
    
  # Convert to lower case, if we're asked to
  if (tolower) words = tolower(words)
  
  table(words)
}
sum = 0
for (i in 1:10) {
  sum = sum + i
}
sum
## [1] 55

Here, we’ve added the numbers between 1 and 10, stored in the variable sum. Below, we demonstrate how to use a for() loop to populate a list of length 5, where the 1st element contains the numeric 1, the 2nd element contains the numeric 2, etc.

my.list = list()
for (i in 1:5) {
  my.list[[i]] = i
}
my.list
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4
## 
## [[5]]
## [1] 5

(You don’t have to do anything for this part; just study this code, so that you can solve the next question.)

Hw4 Q3 (6 points). Run your function get.wordtabs() on a vector of the four strings which specify the appropriate URLs to the speeches from Trump, Clinton, Pence, Kaine. Save the result as four.wordtabs. Check that its entries (which are word tables) are equal to trump.wordtab, clinton.wordtab, pence.wordtab, kaine.wordtab, respectively (which you computed previously). (Hint: use all(), as demonstrated in the “Function Basics” mini-lecture.)

Then use get.wordtabs() to get the word tables for the Gingrich, Melania Trump, Obama, and Sanders speeches, which are stored in the files gingrich.txt, melania.txt, obama.txt, sanders.txt, at the usual base link http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/. Save the result as four.more.wordtabs, and plot these four word tables in a 2 x 2 grid, with Gingrich’s and Melania Trump’s in the first row, and Obama’s and Sanders’ in the bottom row.