Plyr: a*ply() and l*aply()

Statistical Computing, 36-350

Wednesday November 9, 2016

Reminder: iterating in R without for()

We’ve learned some tools in R for iteration without explicit for() loops:

Clever indexing + vectorization is always useful, when possible

The apply() family is often useful, but it has some issues: primarily, inconsistent output

The plyr package

Most popular R package of all time (most downloads): plyr

This is true for a good reason! Provides us with an extremely useful family of apply-like functions. Advantage over the built-in apply() family is its consistency

All plyr functions are of the form **ply(). Replace ** with characters denoting types:

a*ply(): the input is an array

The signature for all a*ply() functions is:

a*ply(.data, .margins, .fun, ...)

Note that this looks like:

apply(X, MARGIN, FUN, ...)

Examples

my.array = array(1:27, c(3,3,3))
rownames(my.array) = c("Curly", "Larry", "Moe")
colnames(my.array) = c("Groucho", "Harpo", "Zeppo")
dimnames(my.array)[[3]] = c("Bart", "Lisa", "Maggie")
my.array
## , , Bart
## 
##       Groucho Harpo Zeppo
## Curly       1     4     7
## Larry       2     5     8
## Moe         3     6     9
## 
## , , Lisa
## 
##       Groucho Harpo Zeppo
## Curly      10    13    16
## Larry      11    14    17
## Moe        12    15    18
## 
## , , Maggie
## 
##       Groucho Harpo Zeppo
## Curly      19    22    25
## Larry      20    23    26
## Moe        21    24    27

(Continued)

library(plyr)
aaply(my.array, 1, sum) # Get back an array
## Curly Larry   Moe 
##   117   126   135
adply(my.array, 1, sum) # Get back a data frame
##      X1  V1
## 1 Curly 117
## 2 Larry 126
## 3   Moe 135
alply(my.array, 1, sum) # Get back a list
## $`1`
## [1] 117
## 
## $`2`
## [1] 126
## 
## $`3`
## [1] 135
## 
## attr(,"split_type")
## [1] "array"
## attr(,"split_labels")
##      X1
## 1 Curly
## 2 Larry
## 3   Moe

(Continued)

aaply(my.array, 2:3, sum) # Get back a 3 x 3 array
##          X2
## X1        Bart Lisa Maggie
##   Groucho    6   33     60
##   Harpo     15   42     69
##   Zeppo     24   51     78
adply(my.array, 2:3, sum) # Get back a data frame
##        X1     X2 V1
## 1 Groucho   Bart  6
## 2   Harpo   Bart 15
## 3   Zeppo   Bart 24
## 4 Groucho   Lisa 33
## 5   Harpo   Lisa 42
## 6   Zeppo   Lisa 51
## 7 Groucho Maggie 60
## 8   Harpo Maggie 69
## 9   Zeppo Maggie 78
alply(my.array, 2:3, sum) # Get back a list
## $`1`
## [1] 6
## 
## $`2`
## [1] 15
## 
## $`3`
## [1] 24
## 
## $`4`
## [1] 33
## 
## $`5`
## [1] 42
## 
## $`6`
## [1] 51
## 
## $`7`
## [1] 60
## 
## $`8`
## [1] 69
## 
## $`9`
## [1] 78
## 
## attr(,"split_type")
## [1] "array"
## attr(,"split_labels")
##        X1     X2
## 1 Groucho   Bart
## 2   Harpo   Bart
## 3   Zeppo   Bart
## 4 Groucho   Lisa
## 5   Harpo   Lisa
## 6   Zeppo   Lisa
## 7 Groucho Maggie
## 8   Harpo Maggie
## 9   Zeppo Maggie

l*ply() : the input is a list

The signature for all l*ply() functions is:

l*ply(.data, .fun, ...)

Note that this looks like:

lapply(X, FUN, ...)

Examples

my.list = list(nums=rnorm(1000), lets=letters, pops=state.x77[,"Population"])
laply(my.list, range) # Get back an array
##      1                  2                 
## [1,] "-3.9408500719166" "2.82907461660228"
## [2,] "a"                "z"               
## [3,] "365"              "21198"
ldply(my.list, range) # Get back a data frame
##    .id               V1               V2
## 1 nums -3.9408500719166 2.82907461660228
## 2 lets                a                z
## 3 pops              365            21198
llply(my.list, range) # Get back a list
## $nums
## [1] -3.940850  2.829075
## 
## $lets
## [1] "a" "z"
## 
## $pops
## [1]   365 21198

(Continued)

laply(my.list, summary) # Doesn't work! Outputs have different types/lengths
## Error: Results must have one or more dimensions.
ldply(my.list, summary) # Doesn't work! Outputs have different types/lengths
## Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor): Results do not have equal lengths
llply(my.list, summary) # Works just fine
## $nums
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -3.941000 -0.677300  0.002948  0.013530  0.691200  2.829000 
## 
## $lets
##    Length     Class      Mode 
##        26 character character 
## 
## $pops
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     365    1080    2838    4246    4968   21200

The fourth option for *

The fourth option for * is _: the function a_ply() (or l*ply()) has no explicit return object, but still runs the given function over the given array (or list), possibly producing side effects

par(mfrow=c(3,3), mar=c(4,4,1,1))
a_ply(my.array, 2:3, plot, ylim=range(my.array), pch=19, col=6)