a*ply()
and l*aply()
Statistical Computing, 36-350
Wednesday November 9, 2016
for()
We’ve learned some tools in R for iteration without explicit for()
loops:
apply()
: apply a function to rows or columns of a matrix or data framelapply()
: apply a function to elements of a list or vectorsapply()
: same as the above, but simplify the output (if possible)tapply()
: apply a function to levels of a factor vectorClever indexing + vectorization is always useful, when possible
The apply()
family is often useful, but it has some issues: primarily, inconsistent output
plyr
packageMost popular R package of all time (most downloads): plyr
This is true for a good reason! Provides us with an extremely useful family of apply-like functions. Advantage over the built-in apply()
family is its consistency
All plyr
functions are of the form **ply()
. Replace **
with characters denoting types:
a
, d
, l
a
, d
, l
, or _
(drop)a*ply()
: the input is an arrayThe signature for all a*ply()
functions is:
a*ply(.data, .margins, .fun, ...)
.data
: an array.margins
: index (or indices) to split the array by.fun
: the function to be applied to each piece...
: additional arguments to be passed to the functionNote that this looks like:
apply(X, MARGIN, FUN, ...)
my.array = array(1:27, c(3,3,3))
rownames(my.array) = c("Curly", "Larry", "Moe")
colnames(my.array) = c("Groucho", "Harpo", "Zeppo")
dimnames(my.array)[[3]] = c("Bart", "Lisa", "Maggie")
my.array
## , , Bart
##
## Groucho Harpo Zeppo
## Curly 1 4 7
## Larry 2 5 8
## Moe 3 6 9
##
## , , Lisa
##
## Groucho Harpo Zeppo
## Curly 10 13 16
## Larry 11 14 17
## Moe 12 15 18
##
## , , Maggie
##
## Groucho Harpo Zeppo
## Curly 19 22 25
## Larry 20 23 26
## Moe 21 24 27
library(plyr)
aaply(my.array, 1, sum) # Get back an array
## Curly Larry Moe
## 117 126 135
adply(my.array, 1, sum) # Get back a data frame
## X1 V1
## 1 Curly 117
## 2 Larry 126
## 3 Moe 135
alply(my.array, 1, sum) # Get back a list
## $`1`
## [1] 117
##
## $`2`
## [1] 126
##
## $`3`
## [1] 135
##
## attr(,"split_type")
## [1] "array"
## attr(,"split_labels")
## X1
## 1 Curly
## 2 Larry
## 3 Moe
aaply(my.array, 2:3, sum) # Get back a 3 x 3 array
## X2
## X1 Bart Lisa Maggie
## Groucho 6 33 60
## Harpo 15 42 69
## Zeppo 24 51 78
adply(my.array, 2:3, sum) # Get back a data frame
## X1 X2 V1
## 1 Groucho Bart 6
## 2 Harpo Bart 15
## 3 Zeppo Bart 24
## 4 Groucho Lisa 33
## 5 Harpo Lisa 42
## 6 Zeppo Lisa 51
## 7 Groucho Maggie 60
## 8 Harpo Maggie 69
## 9 Zeppo Maggie 78
alply(my.array, 2:3, sum) # Get back a list
## $`1`
## [1] 6
##
## $`2`
## [1] 15
##
## $`3`
## [1] 24
##
## $`4`
## [1] 33
##
## $`5`
## [1] 42
##
## $`6`
## [1] 51
##
## $`7`
## [1] 60
##
## $`8`
## [1] 69
##
## $`9`
## [1] 78
##
## attr(,"split_type")
## [1] "array"
## attr(,"split_labels")
## X1 X2
## 1 Groucho Bart
## 2 Harpo Bart
## 3 Zeppo Bart
## 4 Groucho Lisa
## 5 Harpo Lisa
## 6 Zeppo Lisa
## 7 Groucho Maggie
## 8 Harpo Maggie
## 9 Zeppo Maggie
l*ply()
: the input is a listThe signature for all l*ply()
functions is:
l*ply(.data, .fun, ...)
.data
: a list.fun
: the function to be applied to each element...
: additional arguments to be passed to the functionNote that this looks like:
lapply(X, FUN, ...)
my.list = list(nums=rnorm(1000), lets=letters, pops=state.x77[,"Population"])
laply(my.list, range) # Get back an array
## 1 2
## [1,] "-3.9408500719166" "2.82907461660228"
## [2,] "a" "z"
## [3,] "365" "21198"
ldply(my.list, range) # Get back a data frame
## .id V1 V2
## 1 nums -3.9408500719166 2.82907461660228
## 2 lets a z
## 3 pops 365 21198
llply(my.list, range) # Get back a list
## $nums
## [1] -3.940850 2.829075
##
## $lets
## [1] "a" "z"
##
## $pops
## [1] 365 21198
laply(my.list, summary) # Doesn't work! Outputs have different types/lengths
## Error: Results must have one or more dimensions.
ldply(my.list, summary) # Doesn't work! Outputs have different types/lengths
## Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor): Results do not have equal lengths
llply(my.list, summary) # Works just fine
## $nums
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.941000 -0.677300 0.002948 0.013530 0.691200 2.829000
##
## $lets
## Length Class Mode
## 26 character character
##
## $pops
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 365 1080 2838 4246 4968 21200
*
The fourth option for *
is _
: the function a_ply()
(or l*ply()
) has no explicit return object, but still runs the given function over the given array (or list), possibly producing side effects
par(mfrow=c(3,3), mar=c(4,4,1,1))
a_ply(my.array, 2:3, plot, ylim=range(my.array), pch=19, col=6)