a*ply() and l*aply()for()We’ve learned some tools in R for iteration without explicit for() loops:
apply(): apply a function to rows or columns of a matrix or data framelapply(): apply a function to elements of a list or vectorsapply(): same as the above, but simplify the output (if possible)tapply(): apply a function to levels of a factor vectorClever indexing + vectorization is always useful, when possible
The apply() family is often useful, but it has some issues: primarily, inconsistent output
plyr packageMost popular R package of all time (most downloads): plyr
This is true for a good reason! Provides us with an extremely useful family of apply-like functions. Advantage over the built-in apply() family is its consistency
All plyr functions are of the form **ply(). Replace ** with characters denoting types:
a, d, la, d, l, or _ (drop)a*ply(): the input is an arrayThe signature for all a*ply() functions is:
a*ply(.data, .margins, .fun, ...).data : an array.margins : index (or indices) to split the array by.fun : the function to be applied to each piece... : additional arguments to be passed to the functionNote that this looks like:
apply(X, MARGIN, FUN, ...)my.array = array(1:27, c(3,3,3))
rownames(my.array) = c("Curly", "Larry", "Moe")
colnames(my.array) = c("Groucho", "Harpo", "Zeppo")
dimnames(my.array)[[3]] = c("Bart", "Lisa", "Maggie")
my.array## , , Bart
## 
##       Groucho Harpo Zeppo
## Curly       1     4     7
## Larry       2     5     8
## Moe         3     6     9
## 
## , , Lisa
## 
##       Groucho Harpo Zeppo
## Curly      10    13    16
## Larry      11    14    17
## Moe        12    15    18
## 
## , , Maggie
## 
##       Groucho Harpo Zeppo
## Curly      19    22    25
## Larry      20    23    26
## Moe        21    24    27library(plyr)
aaply(my.array, 1, sum) # Get back an array## Curly Larry   Moe 
##   117   126   135adply(my.array, 1, sum) # Get back a data frame##      X1  V1
## 1 Curly 117
## 2 Larry 126
## 3   Moe 135alply(my.array, 1, sum) # Get back a list## $`1`
## [1] 117
## 
## $`2`
## [1] 126
## 
## $`3`
## [1] 135
## 
## attr(,"split_type")
## [1] "array"
## attr(,"split_labels")
##      X1
## 1 Curly
## 2 Larry
## 3   Moeaaply(my.array, 2:3, sum) # Get back a 3 x 3 array##          X2
## X1        Bart Lisa Maggie
##   Groucho    6   33     60
##   Harpo     15   42     69
##   Zeppo     24   51     78adply(my.array, 2:3, sum) # Get back a data frame##        X1     X2 V1
## 1 Groucho   Bart  6
## 2   Harpo   Bart 15
## 3   Zeppo   Bart 24
## 4 Groucho   Lisa 33
## 5   Harpo   Lisa 42
## 6   Zeppo   Lisa 51
## 7 Groucho Maggie 60
## 8   Harpo Maggie 69
## 9   Zeppo Maggie 78alply(my.array, 2:3, sum) # Get back a list## $`1`
## [1] 6
## 
## $`2`
## [1] 15
## 
## $`3`
## [1] 24
## 
## $`4`
## [1] 33
## 
## $`5`
## [1] 42
## 
## $`6`
## [1] 51
## 
## $`7`
## [1] 60
## 
## $`8`
## [1] 69
## 
## $`9`
## [1] 78
## 
## attr(,"split_type")
## [1] "array"
## attr(,"split_labels")
##        X1     X2
## 1 Groucho   Bart
## 2   Harpo   Bart
## 3   Zeppo   Bart
## 4 Groucho   Lisa
## 5   Harpo   Lisa
## 6   Zeppo   Lisa
## 7 Groucho Maggie
## 8   Harpo Maggie
## 9   Zeppo Maggiel*ply() : the input is a listThe signature for all l*ply() functions is:
l*ply(.data, .fun, ...).data : a list.fun : the function to be applied to each element... : additional arguments to be passed to the functionNote that this looks like:
lapply(X, FUN, ...)my.list = list(nums=rnorm(1000), lets=letters, pops=state.x77[,"Population"])
laply(my.list, range) # Get back an array##      1                   2                 
## [1,] "-2.79241863705677" "3.38700434933263"
## [2,] "a"                 "z"               
## [3,] "365"               "21198"ldply(my.list, range) # Get back a data frame##    .id                V1               V2
## 1 nums -2.79241863705677 3.38700434933263
## 2 lets                 a                z
## 3 pops               365            21198llply(my.list, range) # Get back a list## $nums
## [1] -2.792419  3.387004
## 
## $lets
## [1] "a" "z"
## 
## $pops
## [1]   365 21198laply(my.list, summary) # Doesn't work! Outputs have different types/lengths## Error: Results must have one or more dimensions.ldply(my.list, summary) # Doesn't work! Outputs have different types/lengths## Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor): Results do not have equal lengthsllply(my.list, summary) # Works just fine## $nums
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.79200 -0.67330 -0.05625 -0.02510  0.65430  3.38700 
## 
## $lets
##    Length     Class      Mode 
##        26 character character 
## 
## $pops
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     365    1080    2838    4246    4968   21200*The fourth option for * is _: the function a_ply() (or l*ply()) has no explicit return object, but still runs the given function over the given array (or list), possibly producing side effects
par(mfrow=c(3,3), mar=c(4,4,1,1))
a_ply(my.array, 2:3, plot, ylim=range(my.array), pch=19, col=6)