Functional style

Functionals

Functionals

  • Functional:
    • Takes a function as an input.
    • Returns a vector (or scalar) as an output
randomise <- function(f) f(runif(1e3))
randomise(mean)
## [1] 0.5106934
randomise(mean)
## [1] 0.5093112
randomise(sum)
## [1] 492.8197

Outline

library(tidyverse)
  • Focus on the purrr package:
    • Consistent interface that makes it easier to use/understand.
    • We will only briefly introduce one base R functional apply.
  • purrr::map().
  • Combine multiple simple functionals to solve larger problems.
  • The 18 important variants of purrr::map().
  • purrr::reduce().
  • Predicates (functions returning a single TRUE or FALSE) and the functionals using them.

Map

Warm-up: purrr::map()

  • The most fundamental functional:
    • Takes a vector and a function.
    • Calls the function once for each element of the vector
    • Returns the results in a list.
    • E.g., map(1:3, f) is equivalent to list(f(1), f(2), f(3)).
triple <- function(x) x * 3
map(1:3, triple)
## [[1]]
## [1] 3
## 
## [[2]]
## [1] 6
## 
## [[3]]
## [1] 9

  • map() returns a list
  • 4 more specific variants:
    • map_dbl(), map_chr(), map_int() and map_lgl().

Producing atomic vectors

  • map_dbl() always returns a double vector.
map_dbl(mtcars, mean)
##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb 
##   0.437500   0.406250   3.687500   2.812500
  • map_chr() always returns a character vector
map_chr(mtcars, typeof)
##      mpg      cyl     disp       hp     drat       wt     qsec       vs 
## "double" "double" "double" "double" "double" "double" "double" "double" 
##       am     gear     carb 
## "double" "double" "double"
  • map_int() always returns an integer vector.
map_int(mtcars, function(x) length(unique(x)))
##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6
  • map_lgl() always returns a logical vector.
map_lgl(mtcars, is.double)
##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
  • Remarks:
    • Suffixes refer to the output.
    • But map_*() can take any type of vector as input.
  • Examples rely on two facts:
    • mtcars is a data frame.
    • data frames are lists containing vectors of the same length.

  • Each call to the function must return a single value.
map_dbl(1:2, function(x) c(x, x))
#> Result 1 must be a single double, not an integer vector of length 2
  • And obviously return the correct type.
map_dbl(1:2, as.character)
#> Error: Can't coerce element 1 from a character to a double
  • In either case, use map() to see the problematic output!

Anonymous functions and shortcuts

  • map can use anonymous functions.
map_dbl(mtcars, function(x) length(unique(x)))
##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6
  • Less verbose shortcut.
map_dbl(mtcars, ~ length(unique(.x)))
##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6
  • Useful for generating random data.
x <- map(1:3, ~ runif(2))
str(x)
## List of 3
##  $ : num [1:2] 0.301 0.921
##  $ : num [1:2] 0.724 0.376
##  $ : num [1:2] 0.762 0.904

Extracting elements from a vector

x <- list(
  list(-1, x = 1, y = c(2), z = "a"),
  list(-2, x = 4, y = c(5, 6), z = "b"),
  list(-3, x = 8, y = c(9, 10, 11))
)
# Select by name
map_dbl(x, "x")
## [1] 1 4 8
# Or by position
map_dbl(x, 1)
## [1] -1 -2 -3
# Or by both
map_dbl(x, list("y", 1))
## [1] 2 5 9

Passing arguments with …

  • To pass along additional arguments, use an anonymous function.
x <- list(1:5, c(1:10, NA))
map_dbl(x, ~ mean(.x, na.rm = TRUE))
## [1] 3.0 5.5
  • Or the simpler form.
x <- list(1:5, c(1:10, NA))
map_dbl(x, mean, na.rm = TRUE)
## [1] 3.0 5.5

  • Additional vector arguments

Map variants

List Atomic Same type Nothing
One argument map() map_lgl(), … modify() walk()
Two arguments map2() map2_lgl(), … modify2() walk2()
One argument + index imap() imap_lgl(), … imodify() iwalk()
N arguments pmap() pmap_lgl(), … pwalk()

Same type of output/input: modify()

df <- data.frame(x = 1:3, y = 6:4)
map(df, ~ .x * 2)
## $x
## [1] 2 4 6
## 
## $y
## [1] 12 10  8
modify(df, ~ .x * 2) %>%
  print()
##   x  y
## 1 2 12
## 2 4 10
## 3 6  8
simple_modify <- function(x, f, ...) {
  for (i in seq_along(x)) {
    x[[i]] <- f(x[[i]], ...)
  }
  x
}

No outputs: walk() and friends

welcome <- function(x) {
  cat("Welcome ", x, "!\n", sep = "")
}
names <- c("Hadley", "Jenny")
# As well as generate the welcomes, it also shows
# the return value of cat()
map(names, welcome)
## Welcome Hadley!
## Welcome Jenny!
## [[1]]
## NULL
## 
## [[2]]
## NULL
walk(names, welcome)
## Welcome Hadley!
## Welcome Jenny!

Two inputs: map2() and friends

xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)

xs
## [[1]]
##  [1] 0.2271982 0.3196603 0.1726035 0.4312708 0.8947273 0.1149747 0.1303645
##  [8] 0.9494872 0.1723842 0.5318115
## 
## [[2]]
##  [1] 0.11329206 0.03601022 0.24068169 0.28830587 0.07024510 0.85852935
##  [7] 0.74631596 0.03071322 0.87007659 0.16599592
## 
## [[3]]
##  [1] 0.4443118 0.1632040 0.5210672 0.3751942 0.1914044 0.1612504 0.5388899
##  [8] 0.5066812 0.2326832 0.2332805
## 
## [[4]]
##  [1] 0.94740926 0.48194669 0.61236067 0.18484342 0.76383276 0.07597317
##  [7] 0.30996611 0.76928881 0.54550755 0.17940089
## 
## [[5]]
##  [1] 0.87192390 0.12525782 0.40557795 0.80324028 0.69103718 0.06128023
##  [7] 0.93588816 0.67666827 0.11804369 0.61266477
## 
## [[6]]
##  [1] 0.8283395 0.7224854 0.2188552 0.7109752 0.3437943 0.6866659 0.2797798
##  [8] 0.8735264 0.1442089 0.9490841
## 
## [[7]]
##  [1] 0.28687855 0.20648425 0.82641649 0.13025664 0.01454089 0.19600902
##  [7] 0.66340291 0.77002564 0.52482237 0.19972811
## 
## [[8]]
##  [1] 0.08410161 0.89569706 0.85282003 0.33613543 0.20792388 0.79476840
##  [7] 0.62895827 0.20287110 0.16050154 0.11773924
ws
## [[1]]
##  [1]  6  5  5  5  6  7 10  5  4  6
## 
## [[2]]
##  [1] 4 5 9 7 6 9 7 7 6 6
## 
## [[3]]
##  [1]  6  8  5 12  1  5  6  5  4  4
## 
## [[4]]
##  [1]  9  4  6  6 11  6  8  4  6  4
## 
## [[5]]
##  [1]  7  8 11  4  8  4  8  5  5  6
## 
## [[6]]
##  [1] 5 3 4 6 8 4 6 4 7 4
## 
## [[7]]
##  [1]  7  8  5  7  4 10  3  9  2  5
## 
## [[8]]
##  [1] 3 6 4 7 7 6 9 4 8 2
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)

map_dbl(xs, mean)
## [1] 0.6522101 0.5499636 0.4522153 0.7381881 0.6047583 0.4610659 0.4495295
## [8] 0.5610550
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map_dbl(xs, weighted.mean, w = ws)
#> Error in weighted.mean.default(.x[[i]], ...): 'x' and 'w' must have the same length

set.seed(0)
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map2_dbl(xs, ws, weighted.mean)
## [1] 0.6389754 0.5211719 0.4711301 0.5323738 0.5958825 0.4148373 0.5072938
## [8] 0.6268950

set.seed(0)
xs <- map(1:8, ~ runif(10))
xs[[1]][[1]] <- NA
ws <- map(1:8, ~ rpois(10, 5) + 1)
map2_dbl(xs, ws, weighted.mean, na.rm = TRUE)
## [1] 0.5841410 0.5211719 0.4711301 0.5323738 0.5958825 0.4148373 0.5072938
## [8] 0.6268950

Any number of inputs: pmap()

xs <- map(1:8, ~ runif(10))
xs[[1]][[1]] <- NA
ws <- map(1:8, ~ rpois(10, 5) + 1)
pmap_dbl(list(xs, ws), weighted.mean)
## [1]        NA 0.6142809 0.6622019 0.4570432 0.4215889 0.6998617 0.3807412
## [8] 0.5044364
pmap_dbl(list(xs, ws), weighted.mean, na.rm = TRUE)
## [1] 0.5414482 0.6142809 0.6622019 0.4570432 0.4215889 0.6998617 0.3807412
## [8] 0.5044364

Iterating over values and indices

imap_chr(iris, ~ paste0("The first value of ", .y, " is ", .x[[1]]))
##                             Sepal.Length 
## "The first value of Sepal.Length is 5.1" 
##                              Sepal.Width 
##  "The first value of Sepal.Width is 3.5" 
##                             Petal.Length 
## "The first value of Petal.Length is 1.4" 
##                              Petal.Width 
##  "The first value of Petal.Width is 0.2" 
##                                  Species 
##   "The first value of Species is setosa"

Reduce

Reduce

  • The next most important (family of) functionals.
    • Much smaller (two main variants).
    • Powers the map-reduce framework.
  • purrr::reduce():
    • Takes a vector of length n.
    • Produces a vector of length 1 by calling a function with a pair of values at a time.
    • reduce(1:2, f) is equivalent to f(1, 2).
    • reduce(1:3, f) is equivalent to f(f(1, 2), 3).
    • reduce(1:4, f) is equivalent to f(f(f(1, 2), 3), 4).

Reduce family

  • Useful to generalize a function that works with two inputs to work with any number of inputs.
  • Problem: find the values that occur in every element.
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
str(l)
## List of 4
##  $ : int [1:15] 9 4 7 1 2 7 2 3 1 5 ...
##  $ : int [1:15] 9 5 5 9 9 5 5 2 10 9 ...
##  $ : int [1:15] 10 6 4 4 10 9 7 6 9 8 ...
##  $ : int [1:15] 7 3 10 6 8 2 2 6 6 1 ...
  • Two solutions
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
out <- l[[1]]
out <- intersect(out, l[[2]])
out <- intersect(out, l[[3]])
out <- intersect(out, l[[4]])
out
## [1] 10  6
set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
reduce(l, intersect)
## [1] 10  6
  • Can also pass additional arguments.
  • Simple implementation.
simple_reduce <- function(x, f, ...) {
  out <- x[[1]]
  for (i in seq(2, length(x))) {
    out <- f(out, x[[i]], ...)
  }
  out
}

Accumulate

Accumulate

set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
accumulate(l, intersect)
## [[1]]
##  [1]  9  4  7  1  2  7  2  3  1  5  5 10  6 10  7
## 
## [[2]]
## [1]  9  4  1  2  3  5 10  6
## 
## [[3]]
## [1]  9  4 10  6
## 
## [[4]]
## [1] 10  6

Predicate functionals

Predicate functionals

  • A predicate:
    • Function that returns a single TRUE or FALSE.
    • E.g., is.character(), is.null(), or all().
    • A predicate matches a vector if it returns TRUE.
  • A predicate functional:
    • Applies a predicate to each element of a vector.
    • 6 functions in 3 pairs.
    • some(.x, .p)/every(.x, .p).
      • Returns TRUE if any/all element matches.
      • Similar to any(map_lgl(.x, .p))/all(map_lgl(.x, .p)).
      • But terminate early.
    • detect(.x, .p)/detect_index(.x, .p).
      • Returns the value/location of the first match.
    • keep(.x, .p)/discard(.x, .p).
      • Keeps/drops all matching elements.
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
detect(df, is.factor)
## [1] a b c
## Levels: a b c
detect_index(df, is.factor)
## [1] 2
# str(keep(df, is.factor))
# str(discard(df, is.factor))

Map variants

df <- data.frame(
  num1 = c(0, 10, 20),
  num2 = c(5, 6, 7),
  chr1 = c("a", "b", "c"),
  stringsAsFactors = FALSE
)
str(map_if(df, is.numeric, mean))
## List of 3
##  $ num1: num 10
##  $ num2: num 6
##  $ chr1: chr [1:3] "a" "b" "c"
str(modify_if(df, is.numeric, mean))
## 'data.frame':    3 obs. of  3 variables:
##  $ num1: num  10 10 10
##  $ num2: num  6 6 6
##  $ chr1: chr  "a" "b" "c"
str(map(keep(df, is.numeric), mean))
## List of 2
##  $ num1: num 10
##  $ num2: num 6

Base functionals

Base functionals

  • Some base R functionals have no purrr equivalent:
    • Working with two-dimensional and higher vectors:
      • base::apply(): summarizes by collapsing rows/columns to a single value.

Matrices and arrays: base::apply()

  • Summarizes by collapsing rows/columns to a single value.
a2d <- matrix(1:20, nrow = 5)
apply(a2d, 1, mean)
## [1]  8.5  9.5 10.5 11.5 12.5
apply(a2d, 2, mean)
## [1]  3  8 13 18