Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 7 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 7 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday October 23. This document contains 15 of the 30 total points for Homework 7.

Bug hunt practice

In this section of the lab, you will fix a bunch of buggy function definitions. Probably the easiest workflow is to define the function in your console, and then run the sample commands—they will either give errors or produce the wrong outputs. Using any combination of: reading the error messages, traceback(), and cat() or print(), you must find and fix the bugs. Sometimes it can also help to try multiple different inputs, i.e., try new function calls, rather than just looking at the sample calls given to you, in order to determine the bugs. You shouldn’t show any of your debugging work in your final knitted answers (so, don’t show calls to traceback(), and don’t leave any cat() or print() calls in the final, fixed function). (You don’t have to do anything yet, this was just to setup this section of the lab.)
Below is a function called get.cols.with.ab.zeros(), but it has a few bugs. A few sample matrices are given below in mat, identity.mat, along with some sample calls that give errors. After fixing the bugs, the calls to get.cols.with.ab.zeros() should produce the outputs as described in comments.

# Function: cols.with.ab.zeros, to retrieve columns of matrix that have between
#   a and b zeros, each
# Inputs:
# - my.mat: the original matrix 
# - a: lower bound for number of zeros allowed; default is 0
# - b: upper bound for number of zeros allowed; default is Inf
# Output: the new matrix

cols.with.ab.zeros = function(my.mat, a=0, b=Inf) {
  zeros.per.column = colSums(mat != 0)
  i.to.keep = a <= zeros.per.column && zeros.per.column <= b
  return(my.mat[i.to.keep,])
}

mat = matrix(c(0,0,1,0,1,1,1,1,1), 3, 3)
identity.mat = diag(1, 3)
cols.with.ab.zeros(mat) # Should get back original matrix

##      [,1] [,2] [,3]
## [1,]    0    0    1
## [2,]    0    1    1
## [3,]    1    1    1

cols.with.ab.zeros(mat, a=1, b=2) # Should get back first 2 columns of mat

##      [,1] [,2] [,3]
## [1,]    0    0    1
## [2,]    0    1    1
## [3,]    1    1    1

cols.with.ab.zeros(mat, a=2, b=2) # Should get just 1st column of mat; note

##      [,1] [,2] [,3]

  # this should still be a matrix though, and not a numeric vector!
cols.with.ab.zeros(identity.mat, a=2, b=2) # Should get back original matrix

##      [,1] [,2] [,3]

Below is a function called list.extractor(), but it has a few bugs. A sample list is given below in cool.list, along with some sample calls that give errors. After fixing the bugs, the calls to list.extractor() should produce the outputs as described in comments.

# Function: list.extractor, to extract elements of a list
# Inputs:
# - my.list: the original list 
# - i.to.keep: vector of indices, corresponding to elements of the list we
#   want to keep. Default is NULL, in which case this argument is ignored
# - i.to.remove: vector of indices, corresponding to elements of the list we
#   want to remove Default is NULL, in which case this argument is ignored.
#   NOTE: if both i.to.keep and i.to.remove are non-NULL, then the first 
#   one should take precedence (i.e., we don't remove anything)
# Output: the new list

list.extractor = function(my.list, i.to.keep=NULL, i.to.remove=NULL) {
  if (i.to.keep!=NULL) {
    return(my.list[[i.to.keep]])
  }
  if (i.to.remove!=NULL) {
    return(my.list[[-i.to.remove]])
  }
}

cool.list = list(ints=1:10, lets=letters[1:8], fracs=1:7/7,
                 bools=sample(c(TRUE,FALSE), 5, replace=TRUE))
list.extractor(cool.list, i.to.keep=c(1,3)) # Should get list with ints, fracs

## Error in if (i.to.keep != NULL) {: argument is of length zero

list.extractor(cool.list, i.to.remove=4) # Should get list without bools

## Error in if (i.to.keep != NULL) {: argument is of length zero

list.extractor(cool.list, i.to.keep=2:4, i.to.remove=4) # Should get list with

## Error in if (i.to.keep != NULL) {: argument is of length zero

  # lets, fracs, and bools (the i.to.remove argument should be ignored)

Hw7 Q1 (6 points). Below we copied the random.walk() function from Lab 6w, but introduced a few bugs along the way. Some sample calls are given below that produce errors. After fixing the bugs, the calls to random.walk() should produce the outputs as described in comments. Briefly, describe what bugs you found, how you found them, and how you fixed them.

random.walk = function(x.start=5, plot.walk=TRUE, seed=NULL) {
  if (!is.null(seed)) set.seed(seed) # Set the seed, if we need to
  x.vals = x.start
  while (TRUE) {
    r = runif(1, -2, 1)
    if (tail(x.vals+r,1) <= 0) break
    else x.vals = c(x.vals, x.vals+r)
  }
  if (plot.walk <- TRUE) 
    plot(x.vals, xlab="Iteration", ylab="Random walk values", type="o")
  return(x.vals=x.vals, num.steps=length(x.vals))
}

random.walk(x.start=5, seed=3)$num.steps # Should print 8 (this is how many

## Error in return(x.vals = x.vals, num.steps = length(x.vals)): multi-argument returns are not permitted

  # steps it took the random walk), and produce a plot
random.walk(x.start=10, seed=7)$num.steps # Should print 14 (this is how many

## Error in return(x.vals = x.vals, num.steps = length(x.vals)): multi-argument returns are not permitted

  # steps it took the random walk), and produce a plot
random.walk(x.start=10, plot.walk=FALSE, seed=7)$num.steps # Should print 14

## Error in return(x.vals = x.vals, num.steps = length(x.vals)): multi-argument returns are not permitted

  # (this is how many steps it took the random walk), and not produce a plot

The gamma distribution

The gamma distributions are a family of probability distributions defined by the density functions, $f(x) = \frac{x^{a-1} e^{-x/s}}{s^a \Gamma(a)}$ where $\Gamma(a) = \int_{0}^{\infty}{u^{a-1} e^{-u} du}$ (called the gamma function) ensures that the density integrates to 1. The parameter $a$ is called the shape, and $s$ is the scale. (Special cases of the gamma distribution, correspond to certain values of the shape and scale parameters, are the exponential and chi-squared distributions.) The density looks like this, for various values of shape and scale parameters:

plot(x <- seq(0,20,length=101), dgamma(x, shape=1, scale=2), 
     type="l", xlab="x", ylab="Density", main="Gamma densities")
lines(x, dgamma(x, shape=2, scale=2), col=2)
lines(x, dgamma(x, shape=3, scale=2), col=3)
lines(x, dgamma(x, shape=4, scale=2), col=4)
legend("topright", legend=paste0("shape=", 1:4, ", scale=", 2),
       lty=1, col=1:4)

It can be shown that the expectation value of this distribution is $\mu=as$ , and the variance $\sigma^2=as^2$ . If the mean and variance are known, $\mu$ and $\sigma^2$ , then we can solve for the parameters: $a = \frac{\mu^2}{\sigma^2}, \;\;\; s = \frac{\sigma^2}{\mu}.$ (You don’t have to do anything yet, this was just an introduction to the gamma distribution.)

Below is a half-written function estimate.gamma.params() to estimate the gamma parameters. You have to replace the lines shape = 1 and scale = 1 with proper estimates of the shape and scale parameters, based on the formulae above, which shows you have to the shape and scale parameters depend on the mean and variance. After doing so, run estimate.gamma.params() on the cats heart weight data, cats$Hwt, and check that the scale parameter is estimated to be 19.06531, and the shape parameter to be 0.5575862.

estimate.gamma.params = function(dat.vec) {
  mean.dat = mean(dat.vec)
  var.dat = var(dat.vec)
  shape = 1 # TODO: estimate shape from mean.dat, var.dat
  scale = 1 # TODO: estimate scale from mean.dat, var.dat
  return(list(shape=shape, scale=scale))
}

library(MASS)
estimate.gamma.params(cats$Hwt)

## $shape
## [1] 1
## 
## $scale
## [1] 1

Plot a histogram of the cats heart weight data on the probability scale, with 20 breaks, an appropriate x-axis label, and an appropriate title. Then, on top, using a thick red line, plot the gamma density with the shape and scale parameters you estimate in the last question (hint: dgamma()). Does this density appear to be a good fit?

Hw7 Q2 (9 points). Recall the jackknife procedure from Lab 7w: here, we’ll use this to compute standard errors on the shape and scale parameters that we estimated for gamma distribution of cats heart weights. The idea is to leave one data point (here cat) out at a time, recompute the shape parameter, then take the standard deviation of those values and multiply by $\sqrt{(n-1)^2/n}$ where $n$ is the number of data points (here 144); the same procedure is applied for the scale parameter.

Below is a function estimate.gamma.std.errs() that runs the jackknife procedure for the shape and scale parameters. It uses sapply() (rather than for() loop as you did in Lab 7w). When estimate.gamma.std.errs() is run on the cats$Hwt data, we encounter an error, and get the very unhelpful error message Error: is.atomic(x) is not TRUE, as you can see below. Run this line in your console, then call traceback() and report what you see.

Using the traceback(), cat() or print() statements, and other investigations just by running chunks of code in your console. Explain why the bug occurs, and summarize your process for finding it. (Hint: there is a fairly nasty clash of data types between the return value of sapply(), and what apply() is expecting in the next line. If you’d like, you can also use browser(), which you’ll learn in the next mini-lecture.) Then fix the bug, and check that estimate.gamma.std.errs() on the cats$Hwt data gives a standard error of 2.74062279 for the shape parameter, and 0.07829436 for the scale parameter.

Lastly, estimate the gamma shape and scale parameters, as well as jackknife standard errors of these parameters, for just the male cats in the cats heart weight data. Then do the same for just the female cats. (Hint: look at cats$Sex.) Now investigate whether the two distributions have statistically significantly different gamma parameters—you can do this by computing rough 95% confidence intervals for (say) the gamma parameters for the male cats. The confidence interval for the male shape parameter is defined by the estimated male shape parameter plus or minus two male standard errors. Does the estimated female shape parameter falls in this interval? Answer the same question for the scale parameter.

estimate.gamma.std.errs = function(dat.vec) {
  n = length(dat.vec)
  jack.estimates = sapply(1:n, function(i) { 
    estimate.gamma.params(dat.vec[-i])
  })
  sd.of.ests = apply(jack.estimates, 1, sd)
  return(sqrt((n-1)^2/n) * sd.of.ests)
}

estimate.gamma.std.errs(cats$Hwt)

## Error: is.atomic(x) is not TRUE

Lab 8m: Debugging Basics

Statistical Computing, 36-350

Monday October 17, 2016

Bug hunt practice

The gamma distribution