Statistical Computing, 36-350
Monday September 23, 2019
nchar(), substr(): functions for substring extractions and replacementsstrsplit(), paste(): functions for splitting and combining stringstable(): function to get word counts, useful way of summarizing text dataPlot basics
Base R has a set of powerful plotting tools. An overview:
plot(): generic plotting functionpoints(): add points to an existing plotlines(), abline(): add lines to an existing plottext(), legend(): add text to an existing plotrect(), polygon(): add shapes to an existing plothist(), image(): histogram and heatmapheat.colors(), topo.colors(), etc: create a color vectordensity(): estimate density, which can be plottedcontour(): draw contours, or add to existing plotcurve(): draw a curve, or add to existing plotTo make a scatter plot of one variable versus another, use plot()
The type argument controls the plot type. Default is p for points; set it to l for lines
Try also b or o, for both points and lines
The main argument controls the title; xlab and ylab are the x and y labels
Use the pch argument to control point type
Try also 20 for small filled circles, or "." for single pixels
Use the lty argument to control the line type, and lwd to control the line width
Use the col argument to control the color. Can be:
The function colors() returns a string vector of the available colors
To set up a plotting grid of arbitrary dimension, use the par() function, with the argument mfrow. Note: in general this will affect all following plots! (Except in separate R Markdown code chunks …)
par(mfrow=c(2,2)) # Grid elements are filled by row
plot(x, y, main="Red cubic", pch=20, col="red")
plot(x, y, main="Blue cubic", pch=20, col="blue")
plot(rev(x), y, main="Flipped green", pch=20, col="green")
plot(rev(x), y, main="Flipped purple", pch=20, col="purple")Default margins in R are large (and ugly); to change them, use the par() function, with the argument mar. Note: in general this will affect all following plots! (Except in separate R Markdown code chunks …)
par(mfrow=c(2,2), mar=c(4,4,2,0.5))
plot(x, y, main="Red cubic", pch=20, col="red")
plot(x, y, main="Blue cubic", pch=20, col="blue")
plot(rev(x), y, main="Flipped green", pch=20, col="green")
plot(rev(x), y, main="Flipped purple", pch=20, col="purple")Use the pdf() function to save a pdf file of your plot, in your R working directory. Use getwd() to get the working directory, and setwd() to set it
## [1] "/Users/ryantibshirani/Dropbox/Teaching/f19-350/lectures/plotting"pdf(file="noisy_cubics.pdf", height=7, width=7) # Height, width are in inches
par(mfrow=c(2,2), mar=c(4,4,2,0.5))
plot(x, y, main="Red cubic", pch=20, col="red")
plot(x, y, main="Blue cubic", pch=20, col="blue")
plot(rev(x), y, main="Flipped green", pch=20, col="green")
plot(rev(x), y, main="Flipped purple", pch=20, col="purple")
graphics.off()Also, use the jpg() and png() functions to save jpg and png files
The main tools for this are:
points(): add points to an existing plotlines(), abline(): add lines to an existing plottext(), legend(): add text to an existing plotrect(), polygon(): add shapes to an existing plotYou’ll get practice with this on lab. Pay attention to layers—they work just like they would if you were painting a picture by hand
Histograms and heatmaps
To plot a histogram of a numeric vector, use hist()
trump.lines =
  readLines("http://www.stat.cmu.edu/~ryantibs/statcomp-F19/data/trump.txt")
trump.words = strsplit(paste(trump.lines, collapse=" "),
                       split="[[:space:]]|[[:punct:]]")[[1]]
trump.words = tolower(trump.words[trump.words != ""])
trump.wlens = nchar(trump.words)
hist(trump.wlens)Several options are available as arguments to hist(), such as col, freq, breaks, xlab, ylab, main
hist(trump.wlens, col="pink", freq=FALSE, # Probability scale, and more options
     breaks=0:20, xlab="Word length", main="Trump word lengths") To add a histogram to an existing plot (say, another histogram), use hist() with add=TRUE
hist(trump.wlens, col="pink", freq=FALSE, breaks=0:20, 
     xlab="Word length", main="Trump word lengths")
hist(trump.wlens + 5, col=rgb(0,0.5,0.5,0.5), # Note: using a transparent color
     freq=FALSE, breaks=0:20, add=TRUE)To estimate a density from a numeric vector, use density(). This returns a list; it has components x and y, so we can actually call lines() directly on the returned object
## [1] "density"## [1] "x"         "y"         "bw"        "n"         "call"      "data.name"
## [7] "has.na"hist(trump.wlens, col="pink", freq=FALSE, breaks=0:20, 
     xlab="Word length", main="Trump word lengths")
lines(density.est, lwd=3)To plot a heatmap of a numeric matrix, use image()
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    6    7    8    9   10
## [2,]   12   14   16   18   20
## [3,]   18   21   24   27   30
## [4,]   24   28   32   36   40
## [5,]   30   35   40   45   50image()The orientation of image() is to plot the heatmap according to the following order, in terms of the matrix elements:
\[\begin{array}{cccc} (1,\text{ncol}) & (2, \text{ncol}) & \ldots & (\text{nrow},\text{ncol}) \\ \vdots & & & \\ (1,2) & (2,2) & \ldots & (\text{nrow},2) \\ (1,1) & (2,1) & \ldots & (\text{nrow},1) \end{array}\]
This is a 90 degrees counterclockwise rotation of the “usual” printed order for a matrix:
\[\begin{array}{cccc} (1,1) & (1,2) & \ldots & (1,\text{ncol}) \\ (2,1) & (2,2) & \ldots & (2,\text{ncol}) \\ \vdots & & & \\ (\text{nrow},1) & (\text{nrow},2) & \ldots & (\text{nrow},\text{ncol}) \end{array}\]
Therefore, if you want the displayed heatmap to follow the usual order, you must rotate the matrix 90 degrees clockwise before passing it in to image(). (Equivalently: reverse the row order, then take the transpose.) Convenient way of doing so:
The default is to use a red-to-white color scale in image(). But the col argument can take any vector of colors. Built-in functions gray.colors(), rainbow(), heat.colors(), topo.colors(), terrain.colors(), cm.colors() all return continguous color vectors of given length
phi = dnorm(seq(-2,2,length=50))
normal.mat = phi %o% phi
image(normal.mat) # Default is col=heat.colors(12)To draw contour lines from a numeric matrix, use contour(); to add contours to an existing plot (like, a heatmap), use contour() with add=TRUE
Curves, surfaces, and colors
To draw a curve of a function, use curve()
To add a curve to an existing plot, use curve() with add=TRUE
n = 50
set.seed(0)
x = sort(runif(n, min=-2, max=2))
y = x^3 + rnorm(n)
plot(x, y)
curve(x^3, lwd=3, col="red", add=TRUE)To add a rug to an existing plot (just tick marks, for where the x points occur), use rug()
To draw a surface, use surface(), available at http://www.stat.cmu.edu/~ryantibs/statcomp-F19/lectures/surface.R. (This is a function written by Professor Tibs, relying on the built-in persp() function)
source("http://www.stat.cmu.edu/~ryantibs/statcomp-F19/lectures/surface.R")
surface(x^3 + y^3, from.x=-3, to.x=3, from.y=-3, to.y=3)surface(x^3 + y^3, from.x=-3, to.x=3, from.y=-3, to.y=3,
        theta=25, phi=15, col=terrain.colors(30), 
        ticktype="detailed", mar=c(2,2,2,2))To add points to a surface, save the output of surface(). Then use trans3d(), to transform (x,y,z) coordinates to (x,y) coordinates that you can pass to points()
persp.mat = surface(x^3 + y^3, from.x=-3, to.x=3, from.y=-3, to.y=3,
                    theta=25, phi=15, col=rgb(0,0,1,alpha=0.2),
                    ticktype="detailed", mar=c(2,2,2,2))
n = 500
x = runif(n, -3, 3)
y = runif(n, -3, 3)
z = x^3 + y^3 + 5*rnorm(n)
xy.list = trans3d(x, y, z, persp.mat)
points(xy.list, pch=20)Color palettes are functions for creating vectors of contiguous colors, just like gray.colors(), rainbow(), heat.colors(), terrain.colors(), topo.colors(), cm.colors(). Given a number n, each of these functions just returns a vector of colors (names, stored as strings) of length n
n = 50
plot(0, 0, type="n", xlim=c(1,n), ylim=c(1,6))
points(1:n, rep(6,n), col=gray.colors(n), pch=19)
points(1:n, rep(5,n), col=rainbow(n), pch=19)
points(1:n, rep(4,n), col=heat.colors(n), pch=19)
points(1:n, rep(3,n), col=terrain.colors(n), pch=19)
points(1:n, rep(2,n), col=topo.colors(n), pch=19)
points(1:n, rep(1,n), col=cm.colors(n), pch=19)To create a custom palette, that interpolates between a set of base colors, colorRampPalette()
## [1] "function"Coloring points according to the value of some variable can just be done with a bit of indexing, and the tools you already know about colors
# Function to retrieve a color according to a value
# - val: the value in question
# - lim: a vector of length 2, lower and upper limits for possible values
# - col.vec: the color vector to choose from
get.col.from.val = function(val, lim, col.vec) {
  col.vec[(val-lim[1])/(lim[2]-lim[1]) * (length(col.vec)-1) + 1]
}
# Let's color points according to y value
col.vec = heat.colors(30)
lim = c(-1, 1)
theta = seq(0, 6*pi, length=200)
plot(theta, sin(theta), type="o", pch=19, 
     col=get.col.from.val(sin(theta), lim, col.vec))# Another example, now in 3d
persp.mat = surface(x^3 + y^3, from.x=-3, to.x=3, from.y=-3, to.y=3,
                    theta=25, phi=15, col=rgb(1,1,1,alpha=0.2),
                    ticktype="detailed", mar=c(2,2,2,2))
# Let's color points according to z value
col.vec = terrain.colors(30)
lim = c(min(z), max(z))
xy.list = trans3d(x, y, z, persp.mat)
points(xy.list, pch=20, col=get.col.from.val(z, lim, col.vec))plot(): generic plotting functionpoints(): add points to an existing plotlines(), abline(): add lines to an existing plottext(), legend(): add text to an existing plotrect(), polygon(): add shapes to an existing plothist(), image(): histogram and heatmapheat.colors(), topo.colors(), etc: create a color vectordensity(): estimate density, which can be plottedcontour(): draw contours, or add to existing plotcurve(): draw a curve, or add to existing plot