Function Design

Statistical Computing, 36-350

Friday September 30, 2016

Environment: what the function can see and do

Environment examples

x = 7
y = c("A","C","G","T","U")
adder = function(y) { x = x+y; x }
adder(1)
## [1] 8
x
## [1] 7
y
## [1] "A" "C" "G" "T" "U"

(Continued)

circle.area = function(r) { pi*r^2 }
circle.area(1:3)
## [1]  3.141593 12.566371 28.274334
true.pi = pi
pi = 3 # Valid in 1800s Indiana
circle.area(1:3)
## [1]  3 12 27
pi = true.pi # Restore sanity
circle.area(1:3)
## [1]  3.141593 12.566371 28.274334

Relying on variables outside of the function’s environment

Top-down function design

  1. Start with the big-picture view of the task
  2. Break the task into a few big parts
  3. Figure out how to fit the parts together
  4. Repeat this for each part

Start off with a code sketch

You can write top-level code, right away, for your function’s design:

# Not actual code
big.job = function(lots.of.arguments) {
  first.result = first.step(some.of.the.args)
  second.result = second.step(first.result, more.of.the.args)
  final.result = third.step(second.result, rest.of.the.args)
  return(final.result)
}

After you write down your design, go ahead and write the sub-functions (here first.step(), second.step(), third.step()). The process may be iterative, in that you may write these sub-functions, then go back and change the design a bit, etc.

Example of a code sketch

Suppose that we wanted to (were instructed to) write a function that takes a vector of strings (each of which is a URL), builds a document-term matrix from these documents, computes correlations, and as a side effect (if asked): prints out a summary to the console.

Sounds complicated! But let’s write a code sketch:

compare.docs = function(str.urls, split="[[:space:]]|[[:punct:]]",
                        tolower=TRUE, keep.numbers=FALSE, print.summary=TRUE) {
  # Compute the document-term matrix
  dt.mat = get.dt.mat(str.urls, split, tolower, keep.numbers)
  # Compute correlations
  cor.mat = cor(t(dt.mat))
  # Print a summary, if we're asked to
  if (print.summary) print.dt.mat(dt.mat)
  # Return a list with document-term matrix and correlations
  return(list(dt.mat=dt.mat, cor.mat=cor.mat))
}

(Continued)

That wasn’t too bad, and now we know exactly what to work on next! More code sketching:

get.dt.mat = function(str.urls, split="[[:space:]]|[[:punct:]]",
                      tolower=TRUE, keep.numbers=FALSE) {
  # First, compute all the individual word tables
  wordtabs = get.wordtabs(str.urls, split, tolower, keep.numbers)
  # Then, build the document-term matrix from these, and return it
  return(dt.mat.from.wordtabs(wordtabs))
}

Luckily, we’ve already written get.wordtabs(); we need to write dt.mat.from.wordtabs(). Also need to sketch/write print.dt.mat()