Histograms and Heatmaps

Statistical Computing, 36-350

Wednesday September 21, 2016

Plotting a histogram

To plot a histogram of a numeric vector, use hist()

trump.lines = readLines("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/trump.txt")
trump.words = strsplit(paste(trump.lines, collapse=" "),
                       split="[[:space:]]|[[:punct:]]")[[1]]
trump.words = tolower(trump.words[trump.words != ""])
trump.wlens = nchar(trump.words)
hist(trump.wlens)

Histogram options

Several options are available as arguments to hist(), such as col, freq, breaks, xlab, ylab, main

hist(trump.wlens, col="pink", freq=TRUE) # Frequency scale, default

hist(trump.wlens, col="pink", freq=FALSE) # Probability scale

hist(trump.wlens, col="pink", freq=FALSE, breaks=0:20,
     xlab="Word length", main="Trump word lengths")

Adding a density curve to a histogram

To estimate a density from a numeric vector, use density(). This returns a list; it has components x and y, so we can actually call lines() directly on the returned object

density.est = density(trump.wlens, adjust=2) # Twice the default bw
class(density.est)
## [1] "density"
names(density.est)
## [1] "x"         "y"         "bw"        "n"         "call"      "data.name"
## [7] "has.na"
hist(trump.wlens, col="pink", freq=FALSE, breaks=0:20,
     xlab="Word length", main="Trump word lengths")
lines(density.est, lwd=3)

Adding a histogram to an existing plot

To add a histogram to an existing plot (say, another histogram), use hist() with add=TRUE

hist(trump.wlens, col="pink", freq=FALSE, breaks=0:20,
     xlab="Word length", main="Trump word lengths")
hist(trump.wlens + 2, col=rgb(0,0.5,0.5,0.5), # Note the use of transparency
     freq=FALSE, breaks=0:20, add=TRUE)

Plotting a heatmap

To plot a heatmap of a numeric matrix, use image()

(mat = 1:5 %o% 6:10) # %o% gives for outer product
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    6    7    8    9   10
## [2,]   12   14   16   18   20
## [3,]   18   21   24   27   30
## [4,]   24   28   32   36   40
## [5,]   30   35   40   45   50
image(mat) # Red means low, white means high

Orientation of image()

The orientation of image() is to plot the heatmap according to the following order, in terms of the matrix elements:

\[\begin{array}{cccc} (1,\text{nrow}) & (2, \text{nrow}) & \ldots & (\text{ncol},\text{nrow}) \\ \vdots & & & \\ (1,2) & (2,2) & \ldots & (\text{ncol}, 2) \\ (1,1) & (2,1) & \ldots & (\text{ncol}, 1) \end{array}\]

This is a 90 degrees counterclockwise rotation of the “usual” printed order. Therefore, if you want the displayed heatmap to follow the usual order, you must rotate the matrix 90 degrees clockwise before passing it in to image(). (Reverse the row order, then take the transpose)

clockwise90 = function(a) { t(a[nrow(a):1,]) }
image(clockwise90(mat))

Color scale

The default is to use a red-to-white color scale in image(). But the col argument can take any vector of colors. Functions gray.colors(), heat.colors(), terrain.colors(), rainbow(), etc., all return continguous color vectors of a given length

scores.mat = as.matrix(read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/scores.dat"))
image(scores.mat) # Default is col=heat.colors(12)

image(scores.mat, col=heat.colors(20)) # More colors

image(scores.mat, col=terrain.colors(20)) # Terrain colors

image(scores.mat, col=cm.colors(20)) # Cyan-magenta colors

Drawing contour lines

To draw contour lines from a numeric matrix, use contour(); to add contours to an existing plot (like, a heatmap), use contour() with add=TRUE

contour(scores.mat)

image(scores.mat, col=terrain.colors(20))
contour(scores.mat, add=TRUE)