Functional style
- Three techniques:
- Functionals:
- Replace many loops.
- E.g.,
map(), reduce()
.
- The most important, used all the time in data analysis.
- Function factories:
- Functions that create functions.
- Partition work between different parts of your code.
- Function operators:
- Functions that take/return functions as inputs/output.
- Typically modify the operation of a function.
- Called higher-order functions
Functionals
Functionals
- Functional:
- Takes a function as an input.
- Returns a vector (or scalar) as an output
randomise <- function(f) f(runif(1e3))
randomise(mean)
## [1] 0.5106934
randomise(mean)
## [1] 0.5093112
randomise(sum)
## [1] 492.8197
Outline
library(tidyverse)
- Focus on the purrr package:
- Consistent interface that makes it easier to use/understand.
- We will only briefly introduce one base R functional
apply
.
purrr::map()
.
- Combine multiple simple functionals to solve larger problems.
- The 18 important variants of
purrr::map()
.
purrr::reduce()
.
- Predicates (functions returning a single
TRUE
or FALSE
) and the functionals using them.
Map
Warm-up: purrr::map()
- The most fundamental functional:
- Takes a vector and a function.
- Calls the function once for each element of the vector
- Returns the results in a list.
- E.g.,
map(1:3, f)
is equivalent to list(f(1), f(2), f(3))
.
triple <- function(x) x * 3
map(1:3, triple)
## [[1]]
## [1] 3
##
## [[2]]
## [1] 6
##
## [[3]]
## [1] 9
map()
returns a list
- 4 more specific variants:
map_dbl(), map_chr(), map_int()
and map_lgl()
.
Producing atomic vectors
map_dbl()
always returns a double vector.
map_dbl(mtcars, mean)
## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb
## 0.437500 0.406250 3.687500 2.812500
map_chr()
always returns a character vector
map_chr(mtcars, typeof)
## mpg cyl disp hp drat wt qsec vs
## "double" "double" "double" "double" "double" "double" "double" "double"
## am gear carb
## "double" "double" "double"
map_int()
always returns an integer vector.
map_int(mtcars, function(x) length(unique(x)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
map_lgl()
always returns a logical vector.
map_lgl(mtcars, is.double)
## mpg cyl disp hp drat wt qsec vs am gear carb
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
- Remarks:
- Suffixes refer to the output.
- But
map_*()
can take any type of vector as input.
- Examples rely on two facts:
mtcars
is a data frame.
- data frames are lists containing vectors of the same length.
- Each call to the function must return a single value.
map_dbl(1:2, function(x) c(x, x))
#> Result 1 must be a single double, not an integer vector of length 2
- And obviously return the correct type.
map_dbl(1:2, as.character)
#> Error: Can't coerce element 1 from a character to a double
- In either case, use
map()
to see the problematic output!
Anonymous functions and shortcuts
map
can use anonymous functions.
map_dbl(mtcars, function(x) length(unique(x)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
map_dbl(mtcars, ~ length(unique(.x)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
- Useful for generating random data.
x <- map(1:3, ~ runif(2))
str(x)
## List of 3
## $ : num [1:2] 0.301 0.921
## $ : num [1:2] 0.724 0.376
## $ : num [1:2] 0.762 0.904
Passing arguments with …
- To pass along additional arguments, use an anonymous function.
x <- list(1:5, c(1:10, NA))
map_dbl(x, ~ mean(.x, na.rm = TRUE))
## [1] 3.0 5.5
x <- list(1:5, c(1:10, NA))
map_dbl(x, mean, na.rm = TRUE)
## [1] 3.0 5.5
- Additional vector arguments
Map variants
- 23 primary variants of
map()
:
map(), map_dbl(), map_chr(), map_int(), map_lgl()
- 18 (!!) more to learn.
- Five new ideas:
- Output type:
- Output same type as input with
modify()
- Return nothing with
walk()
.
- Input type:
- Iterate over two inputs with
map2()
.
- Iterate with an index using
imap()
- Iterate over any number of inputs with
pmap()
.
One argument |
map() |
map_lgl() , … |
modify() |
walk() |
Two arguments |
map2() |
map2_lgl() , … |
modify2() |
walk2() |
One argument + index |
imap() |
imap_lgl() , … |
imodify() |
iwalk() |
N arguments |
pmap() |
pmap_lgl() , … |
— |
pwalk() |
No outputs: walk()
and friends
welcome <- function(x) {
cat("Welcome ", x, "!\n", sep = "")
}
names <- c("Hadley", "Jenny")
# As well as generate the welcomes, it also shows
# the return value of cat()
map(names, welcome)
## Welcome Hadley!
## Welcome Jenny!
## [[1]]
## NULL
##
## [[2]]
## NULL
walk(names, welcome)
## Welcome Hadley!
## Welcome Jenny!
Two inputs: map2()
and friends
- How do we find the vector of weighted means?
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
xs
## [[1]]
## [1] 0.2271982 0.3196603 0.1726035 0.4312708 0.8947273 0.1149747 0.1303645
## [8] 0.9494872 0.1723842 0.5318115
##
## [[2]]
## [1] 0.11329206 0.03601022 0.24068169 0.28830587 0.07024510 0.85852935
## [7] 0.74631596 0.03071322 0.87007659 0.16599592
##
## [[3]]
## [1] 0.4443118 0.1632040 0.5210672 0.3751942 0.1914044 0.1612504 0.5388899
## [8] 0.5066812 0.2326832 0.2332805
##
## [[4]]
## [1] 0.94740926 0.48194669 0.61236067 0.18484342 0.76383276 0.07597317
## [7] 0.30996611 0.76928881 0.54550755 0.17940089
##
## [[5]]
## [1] 0.87192390 0.12525782 0.40557795 0.80324028 0.69103718 0.06128023
## [7] 0.93588816 0.67666827 0.11804369 0.61266477
##
## [[6]]
## [1] 0.8283395 0.7224854 0.2188552 0.7109752 0.3437943 0.6866659 0.2797798
## [8] 0.8735264 0.1442089 0.9490841
##
## [[7]]
## [1] 0.28687855 0.20648425 0.82641649 0.13025664 0.01454089 0.19600902
## [7] 0.66340291 0.77002564 0.52482237 0.19972811
##
## [[8]]
## [1] 0.08410161 0.89569706 0.85282003 0.33613543 0.20792388 0.79476840
## [7] 0.62895827 0.20287110 0.16050154 0.11773924
ws
## [[1]]
## [1] 6 5 5 5 6 7 10 5 4 6
##
## [[2]]
## [1] 4 5 9 7 6 9 7 7 6 6
##
## [[3]]
## [1] 6 8 5 12 1 5 6 5 4 4
##
## [[4]]
## [1] 9 4 6 6 11 6 8 4 6 4
##
## [[5]]
## [1] 7 8 11 4 8 4 8 5 5 6
##
## [[6]]
## [1] 5 3 4 6 8 4 6 4 7 4
##
## [[7]]
## [1] 7 8 5 7 4 10 3 9 2 5
##
## [[8]]
## [1] 3 6 4 7 7 6 9 4 8 2
- Use
map_dbl()
to compute the unweighted means.
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map_dbl(xs, mean)
## [1] 0.6522101 0.5499636 0.4522153 0.7381881 0.6047583 0.4610659 0.4495295
## [8] 0.5610550
- Passing
ws
as an additional argument doesn’t work.
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map_dbl(xs, weighted.mean, w = ws)
#> Error in weighted.mean.default(.x[[i]], ...): 'x' and 'w' must have the same length
- Both arguments are varied in each call.
set.seed(0)
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map2_dbl(xs, ws, weighted.mean)
## [1] 0.6389754 0.5211719 0.4711301 0.5323738 0.5958825 0.4148373 0.5072938
## [8] 0.6268950
- Additional arguments still go afterwards.
set.seed(0)
xs <- map(1:8, ~ runif(10))
xs[[1]][[1]] <- NA
ws <- map(1:8, ~ rpois(10, 5) + 1)
map2_dbl(xs, ws, weighted.mean, na.rm = TRUE)
## [1] 0.5841410 0.5211719 0.4711301 0.5323738 0.5958825 0.4148373 0.5072938
## [8] 0.6268950
Iterating over values and indices
- Three basic ways to loop over a vector with
for
:
- Over the elements:
for (x in xs) f(xs)
- Over the names:
for (nm in names(xs)) f(nm)
- Over the indices:
for (i in seq_along(xs)) f(i)
- First kind: similar to
map(xs, f)
.
- The other two:
imap(xs, f)
.
- Same as
map2(xs, names(xs), f)
if xs
as names.
- Same as
map2(xs, seq_along(xs), f)
otherwise.
imap_chr(iris, ~ paste0("The first value of ", .y, " is ", .x[[1]]))
## Sepal.Length
## "The first value of Sepal.Length is 5.1"
## Sepal.Width
## "The first value of Sepal.Width is 3.5"
## Petal.Length
## "The first value of Petal.Length is 1.4"
## Petal.Width
## "The first value of Petal.Width is 0.2"
## Species
## "The first value of Species is setosa"
Reduce
Reduce
- The next most important (family of) functionals.
- Much smaller (two main variants).
- Powers the map-reduce framework.
purrr::reduce()
:
- Takes a vector of length n.
- Produces a vector of length 1 by calling a function with a pair of values at a time.
reduce(1:2, f)
is equivalent to f(1, 2)
.
reduce(1:3, f)
is equivalent to f(f(1, 2), 3)
.
reduce(1:4, f)
is equivalent to f(f(f(1, 2), 3), 4)
.
Reduce family
- Useful to generalize a function that works with two inputs to work with any number of inputs.
- Problem: find the values that occur in every element.
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
str(l)
## List of 4
## $ : int [1:15] 9 4 7 1 2 7 2 3 1 5 ...
## $ : int [1:15] 9 5 5 9 9 5 5 2 10 9 ...
## $ : int [1:15] 10 6 4 4 10 9 7 6 9 8 ...
## $ : int [1:15] 7 3 10 6 8 2 2 6 6 1 ...
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
out <- l[[1]]
out <- intersect(out, l[[2]])
out <- intersect(out, l[[3]])
out <- intersect(out, l[[4]])
out
## [1] 10 6
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
reduce(l, intersect)
## [1] 10 6
- Can also pass additional arguments.
- Simple implementation.
simple_reduce <- function(x, f, ...) {
out <- x[[1]]
for (i in seq(2, length(x))) {
out <- f(out, x[[i]], ...)
}
out
}
Accumulate
Accumulate
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
accumulate(l, intersect)
## [[1]]
## [1] 9 4 7 1 2 7 2 3 1 5 5 10 6 10 7
##
## [[2]]
## [1] 9 4 1 2 3 5 10 6
##
## [[3]]
## [1] 9 4 10 6
##
## [[4]]
## [1] 10 6
Predicate functionals
Predicate functionals
- A predicate:
- Function that returns a single
TRUE
or FALSE
.
- E.g.,
is.character(), is.null()
, or all()
.
- A predicate matches a vector if it returns
TRUE
.
- A predicate functional:
- Applies a predicate to each element of a vector.
- 6 functions in 3 pairs.
some(.x, .p)/every(.x, .p)
.
- Returns
TRUE
if any/all element matches.
- Similar to
any(map_lgl(.x, .p))/all(map_lgl(.x, .p))
.
- But terminate early.
detect(.x, .p)/detect_index(.x, .p)
.
- Returns the value/location of the first match.
keep(.x, .p)/discard(.x, .p)
.
- Keeps/drops all matching elements.
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
detect(df, is.factor)
## [1] a b c
## Levels: a b c
detect_index(df, is.factor)
## [1] 2
# str(keep(df, is.factor))
# str(discard(df, is.factor))
Map variants
df <- data.frame(
num1 = c(0, 10, 20),
num2 = c(5, 6, 7),
chr1 = c("a", "b", "c"),
stringsAsFactors = FALSE
)
str(map_if(df, is.numeric, mean))
## List of 3
## $ num1: num 10
## $ num2: num 6
## $ chr1: chr [1:3] "a" "b" "c"
str(modify_if(df, is.numeric, mean))
## 'data.frame': 3 obs. of 3 variables:
## $ num1: num 10 10 10
## $ num2: num 6 6 6
## $ chr1: chr "a" "b" "c"
str(map(keep(df, is.numeric), mean))
## List of 2
## $ num1: num 10
## $ num2: num 6
Base functionals
Base functionals
- Some base R functionals have no
purrr
equivalent:
- Working with two-dimensional and higher vectors:
base::apply()
: summarizes by collapsing rows/columns to a single value.
Matrices and arrays: base::apply()
- Summarizes by collapsing rows/columns to a single value.
a2d <- matrix(1:20, nrow = 5)
apply(a2d, 1, mean)
## [1] 8.5 9.5 10.5 11.5 12.5
apply(a2d, 2, mean)
## [1] 3 8 13 18