The big idea: A function wraps up a bunch of commands that we run over and over again.
Another perspective: You can also think of a function as a black box that takes in arguments, and returns a result.
A key detail: Things that you define within the function won’t be available outside of it, unless they’re part of what you return
. And they’ll supercede things defined outside the function.
Let’s take an example from the “cats” dataset:
library(MASS)
data(cats)
Suppose that I’m interested in summary statistics for many different subgroups of cats. I could write a function for that:
cat_summary = function(selection){
mean_Bwt = mean(cats$Bwt[selection])
std_Bwt = sd(cats$Bwt[selection])
mean_Hwt = mean(cats$Hwt[selection])
std_Hwt = sd(cats$Hwt[selection])
to_return = list(mean_Bwt=mean_Bwt,std_Bwt=std_Bwt,mean_Hwt=mean_Hwt,std_Hwt=std_Hwt)
return(to_return)
}
So I could look at just heavy male cats:
bigboys = (cats$Sex=='M') & (cats$Bwt>=2.5)
cat_summary(bigboys)
## $mean_Bwt
## [1] 3.04125
##
## $std_Bwt
## [1] 0.38309
##
## $mean_Hwt
## [1] 11.92375
##
## $std_Hwt
## [1] 2.333425
Or all of the female cats:
girls = (cats$Sex=='F')
cat_summary(girls)
## $mean_Bwt
## [1] 2.359574
##
## $std_Bwt
## [1] 0.2739879
##
## $mean_Hwt
## [1] 9.202128
##
## $std_Hwt
## [1] 1.357666
Now that I’ve written cat_summary, I could think of it as a black box – a vector of TRUE/FALSE goes in, summary statistics come out. That’s how I’d describe it if I shared it with a friend. This is probably how you think about functions like mean()
and plot()
that you’ve used already.
Note that the number of things that go in to the function can vary. Suppose we defined cat_summary a little differently:
cat_summary = function(selection,catdata=cats){
mean_Bwt = mean(catdata$Bwt[selection])
std_Bwt = sd(catdata$Bwt[selection])
mean_Hwt = mean(catdata$Hwt[selection])
std_Hwt = sd(catdata$Hwt[selection])
to_return = list(mean_Bwt=mean_Bwt,std_Bwt=std_Bwt,mean_Hwt=mean_Hwt,std_Hwt=std_Hwt)
return(to_return)
}
We now have some options for how we call cats_summary
cat_summary(girls)
## $mean_Bwt
## [1] 2.359574
##
## $std_Bwt
## [1] 0.2739879
##
## $mean_Hwt
## [1] 9.202128
##
## $std_Hwt
## [1] 1.357666
cat_summary(girls,cats)
## $mean_Bwt
## [1] 2.359574
##
## $std_Bwt
## [1] 0.2739879
##
## $mean_Hwt
## [1] 9.202128
##
## $std_Hwt
## [1] 1.357666
Again, you’ve seen this with plot()
.
When you’re writing a function you probably make some assumptions about what kind of data you’re going to get. For example, I assumed that the vector of TRUE/FALSE and the data frame had the same number of rows. When you make those kinds of assumptions, you should check to make sure that the input you’re given matches them. stopifnot
helps with this.
cat_summary = function(selection,catdata=cats){
stopifnot(length(selection)==nrow(catdata))
mean_Bwt = mean(catdata$Bwt[selection])
std_Bwt = sd(catdata$Bwt[selection])
mean_Hwt = mean(catdata$Hwt[selection])
std_Hwt = sd(catdata$Hwt[selection])
to_return = list(mean_Bwt=mean_Bwt,std_Bwt=std_Bwt,mean_Hwt=mean_Hwt,std_Hwt=std_Hwt)
return(to_return)
}
#cat_summary(girls,cats[1:40,])
If you create a variable within the function, it won’t “survive” outside the function.
numberchanger = function(){
the_best_number = 4
return(the_best_number)
}
numberchanger()
## [1] 4
#the_best_number
If a variable outside the function and inside the function have the same name, the function will ignore the “outside” copy of the variable. It won’t modify it, and will work with its own copy.
the_best_number = 2
numberchanger = function(){
the_best_number = 4
return(the_best_number)
}
numberchanger()
## [1] 4
the_best_number
## [1] 2
If a variable isn’t defined within the function, but is defined outside, the function will refer to the outside value:
the_best_number = 2
numberchanger = function(){
return(the_best_number)
}
numberchanger()
## [1] 2
the_best_number
## [1] 2
But remember, if I try to modify the_best_number
from inside my function, R will instead create a local copy of the_best_number
. Generally speaking, getting variable values this way instead of as arguments is a poor programming practice.
Here’s a tricky example: what’s going to happen?
the_best_number = 2
numberchanger = function(){
the_best_number = the_best_number + 2
return(the_best_number)
}
numberchanger()
the_best_number
fibonacci = function(n){
stopifnot(n>0)
if ((n==2) | (n==1)){
return( 1 )
} else {
return( fibonacci(n-1) + fibonacci(n-2))
}
}
entries = 1:10
for (i in entries){
print(fibonacci(i))
}
## [1] 1
## [1] 1
## [1] 2
## [1] 3
## [1] 5
## [1] 8
## [1] 13
## [1] 21
## [1] 34
## [1] 55