Function

Functional style

Three techniques:
- Functionals:
  - Replace many loops.
  - E.g., map(), reduce().
  - The most important, used all the time in data analysis.
- Function factories:
  - Functions that create functions.
  - Partition work between different parts of your code.
- Function operators:
  - Functions that take/return functions as inputs/output.
  - Typically modify the operation of a function.

Called higher-order functions

Functionals

Functional:
- Takes a function as an input.
- Returns a vector (or scalar) as an output

randomise <- function(f) f(runif(1e3))
randomise(mean)

## [1] 0.5106934

randomise(mean)

## [1] 0.5093112

randomise(sum)

## [1] 492.8197

Outline

library(tidyverse)

Focus on the purrr package:
- Consistent interface that makes it easier to use/understand.
- We will only briefly introduce one base R functional apply.
purrr::map().
Combine multiple simple functionals to solve larger problems.
The 18 important variants of purrr::map().
purrr::reduce().
Predicates (functions returning a single TRUE or FALSE) and the functionals using them.

Map

Warm-up: `purrr::map()`

The most fundamental functional:
- Takes a vector and a function.
- Calls the function once for each element of the vector
- Returns the results in a list.
- E.g., map(1:3, f) is equivalent to list(f(1), f(2), f(3)).

triple <- function(x) x * 3
map(1:3, triple)

## [[1]]
## [1] 3
## 
## [[2]]
## [1] 6
## 
## [[3]]
## [1] 9

map() returns a list
4 more specific variants:
- map_dbl(), map_chr(), map_int() and map_lgl().

Producing atomic vectors

map_dbl() always returns a double vector.

map_dbl(mtcars, mean)

##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb 
##   0.437500   0.406250   3.687500   2.812500

map_chr() always returns a character vector

map_chr(mtcars, typeof)

##      mpg      cyl     disp       hp     drat       wt     qsec       vs 
## "double" "double" "double" "double" "double" "double" "double" "double" 
##       am     gear     carb 
## "double" "double" "double"

map_int() always returns an integer vector.

map_int(mtcars, function(x) length(unique(x)))

##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6

map_lgl() always returns a logical vector.

map_lgl(mtcars, is.double)

##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Remarks:
- Suffixes refer to the output.
- But map_*() can take any type of vector as input.
Examples rely on two facts:
- mtcars is a data frame.
- data frames are lists containing vectors of the same length.

Each call to the function must return a single value.

map_dbl(1:2, function(x) c(x, x))
#> Result 1 must be a single double, not an integer vector of length 2

And obviously return the correct type.

map_dbl(1:2, as.character)
#> Error: Can't coerce element 1 from a character to a double

In either case, use map() to see the problematic output!

Anonymous functions and shortcuts

map can use anonymous functions.

map_dbl(mtcars, function(x) length(unique(x)))

##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6

Less verbose shortcut.

map_dbl(mtcars, ~ length(unique(.x)))

##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6

Useful for generating random data.

x <- map(1:3, ~ runif(2))
str(x)

## List of 3
##  $ : num [1:2] 0.301 0.921
##  $ : num [1:2] 0.724 0.376
##  $ : num [1:2] 0.762 0.904

Extracting elements from a vector

x <- list(
  list(-1, x = 1, y = c(2), z = "a"),
  list(-2, x = 4, y = c(5, 6), z = "b"),
  list(-3, x = 8, y = c(9, 10, 11))
)
# Select by name
map_dbl(x, "x")

## [1] 1 4 8

# Or by position
map_dbl(x, 1)

## [1] -1 -2 -3

# Or by both
map_dbl(x, list("y", 1))

## [1] 2 5 9

Passing arguments with …

To pass along additional arguments, use an anonymous function.

x <- list(1:5, c(1:10, NA))
map_dbl(x, ~ mean(.x, na.rm = TRUE))

## [1] 3.0 5.5

Or the simpler form.

x <- list(1:5, c(1:10, NA))
map_dbl(x, mean, na.rm = TRUE)

## [1] 3.0 5.5

Additional vector arguments

Map variants

23 primary variants of map():
- map(), map_dbl(), map_chr(), map_int(), map_lgl()
- 18 (!!) more to learn.
- Five new ideas:
  - Output type:
    - Output same type as input with modify()
    - Return nothing with walk().
  - Input type:
    - Iterate over two inputs with map2().
    - Iterate with an index using imap()
    - Iterate over any number of inputs with pmap().

	List	Atomic	Same type	Nothing
One argument	`map()`	`map_lgl()`, …	`modify()`	`walk()`
Two arguments	`map2()`	`map2_lgl()`, …	`modify2()`	`walk2()`
One argument + index	`imap()`	`imap_lgl()`, …	`imodify()`	`iwalk()`
N arguments	`pmap()`	`pmap_lgl()`, …	—	`pwalk()`

Same type of output/input: `modify()`

df <- data.frame(x = 1:3, y = 6:4)
map(df, ~ .x * 2)

## $x
## [1] 2 4 6
## 
## $y
## [1] 12 10  8

modify(df, ~ .x * 2) %>%
  print()

##   x  y
## 1 2 12
## 2 4 10
## 3 6  8

A simple implementation.

simple_modify <- function(x, f, ...) {
  for (i in seq_along(x)) {
    x[[i]] <- f(x[[i]], ...)
  }
  x
}

No outputs: `walk()` and friends

welcome <- function(x) {
  cat("Welcome ", x, "!\n", sep = "")
}
names <- c("Hadley", "Jenny")
# As well as generate the welcomes, it also shows
# the return value of cat()
map(names, welcome)

## Welcome Hadley!
## Welcome Jenny!

## [[1]]
## NULL
## 
## [[2]]
## NULL

walk(names, welcome)

## Welcome Hadley!
## Welcome Jenny!

Two inputs: `map2()` and friends

How do we find the vector of weighted means?

xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)

xs

## [[1]]
##  [1] 0.2271982 0.3196603 0.1726035 0.4312708 0.8947273 0.1149747 0.1303645
##  [8] 0.9494872 0.1723842 0.5318115
## 
## [[2]]
##  [1] 0.11329206 0.03601022 0.24068169 0.28830587 0.07024510 0.85852935
##  [7] 0.74631596 0.03071322 0.87007659 0.16599592
## 
## [[3]]
##  [1] 0.4443118 0.1632040 0.5210672 0.3751942 0.1914044 0.1612504 0.5388899
##  [8] 0.5066812 0.2326832 0.2332805
## 
## [[4]]
##  [1] 0.94740926 0.48194669 0.61236067 0.18484342 0.76383276 0.07597317
##  [7] 0.30996611 0.76928881 0.54550755 0.17940089
## 
## [[5]]
##  [1] 0.87192390 0.12525782 0.40557795 0.80324028 0.69103718 0.06128023
##  [7] 0.93588816 0.67666827 0.11804369 0.61266477
## 
## [[6]]
##  [1] 0.8283395 0.7224854 0.2188552 0.7109752 0.3437943 0.6866659 0.2797798
##  [8] 0.8735264 0.1442089 0.9490841
## 
## [[7]]
##  [1] 0.28687855 0.20648425 0.82641649 0.13025664 0.01454089 0.19600902
##  [7] 0.66340291 0.77002564 0.52482237 0.19972811
## 
## [[8]]
##  [1] 0.08410161 0.89569706 0.85282003 0.33613543 0.20792388 0.79476840
##  [7] 0.62895827 0.20287110 0.16050154 0.11773924

ws

## [[1]]
##  [1]  6  5  5  5  6  7 10  5  4  6
## 
## [[2]]
##  [1] 4 5 9 7 6 9 7 7 6 6
## 
## [[3]]
##  [1]  6  8  5 12  1  5  6  5  4  4
## 
## [[4]]
##  [1]  9  4  6  6 11  6  8  4  6  4
## 
## [[5]]
##  [1]  7  8 11  4  8  4  8  5  5  6
## 
## [[6]]
##  [1] 5 3 4 6 8 4 6 4 7 4
## 
## [[7]]
##  [1]  7  8  5  7  4 10  3  9  2  5
## 
## [[8]]
##  [1] 3 6 4 7 7 6 9 4 8 2

Use map_dbl() to compute the unweighted means.

xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)

map_dbl(xs, mean)

## [1] 0.6522101 0.5499636 0.4522153 0.7381881 0.6047583 0.4610659 0.4495295
## [8] 0.5610550

Passing ws as an additional argument doesn’t work.

xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map_dbl(xs, weighted.mean, w = ws)
#> Error in weighted.mean.default(.x[[i]], ...): 'x' and 'w' must have the same length

Both arguments are varied in each call.

set.seed(0)
xs <- map(1:8, ~ runif(10))
ws <- map(1:8, ~ rpois(10, 5) + 1)
map2_dbl(xs, ws, weighted.mean)

## [1] 0.6389754 0.5211719 0.4711301 0.5323738 0.5958825 0.4148373 0.5072938
## [8] 0.6268950

Additional arguments still go afterwards.

set.seed(0)
xs <- map(1:8, ~ runif(10))
xs[[1]][[1]] <- NA
ws <- map(1:8, ~ rpois(10, 5) + 1)
map2_dbl(xs, ws, weighted.mean, na.rm = TRUE)

## [1] 0.5841410 0.5211719 0.4711301 0.5323738 0.5958825 0.4148373 0.5072938
## [8] 0.6268950

Any number of inputs: `pmap()`

map() and map2() ... map3(), map4(), map5()?
Instead, there is pmap():
- Supply it a single list, which contains any number of arguments.
- In most cases, a list of equal-length vectors (e.g., a data frame).
pmap(list(x), f) is the same as map(x, f).
pmap(list(x, y), f) is the same as map2(x, y, f).
pmap(list(x, y), f, na.rm = TRUE) is the same as map2(x, y, f, na.rm = TRUE).

xs <- map(1:8, ~ runif(10))
xs[[1]][[1]] <- NA
ws <- map(1:8, ~ rpois(10, 5) + 1)
pmap_dbl(list(xs, ws), weighted.mean)

## [1]        NA 0.6142809 0.6622019 0.4570432 0.4215889 0.6998617 0.3807412
## [8] 0.5044364

pmap_dbl(list(xs, ws), weighted.mean, na.rm = TRUE)

## [1] 0.5414482 0.6142809 0.6622019 0.4570432 0.4215889 0.6998617 0.3807412
## [8] 0.5044364

Iterating over values and indices

Three basic ways to loop over a vector with for:
- Over the elements: for (x in xs) f(xs)
- Over the names: for (nm in names(xs)) f(nm)
- Over the indices: for (i in seq_along(xs)) f(i)
First kind: similar to map(xs, f).
The other two: imap(xs, f).
- Same as map2(xs, names(xs), f) if xs as names.
- Same as map2(xs, seq_along(xs), f) otherwise.

imap_chr(iris, ~ paste0("The first value of ", .y, " is ", .x[[1]]))

##                             Sepal.Length 
## "The first value of Sepal.Length is 5.1" 
##                              Sepal.Width 
##  "The first value of Sepal.Width is 3.5" 
##                             Petal.Length 
## "The first value of Petal.Length is 1.4" 
##                              Petal.Width 
##  "The first value of Petal.Width is 0.2" 
##                                  Species 
##   "The first value of Species is setosa"

Reduce

The next most important (family of) functionals.
- Much smaller (two main variants).
- Powers the map-reduce framework.
purrr::reduce():
- Takes a vector of length n.
- Produces a vector of length 1 by calling a function with a pair of values at a time.
- reduce(1:2, f) is equivalent to f(1, 2).
- reduce(1:3, f) is equivalent to f(f(1, 2), 3).
- reduce(1:4, f) is equivalent to f(f(f(1, 2), 3), 4).

Reduce family

Useful to generalize a function that works with two inputs to work with any number of inputs.
Problem: find the values that occur in every element.

set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
str(l)

## List of 4
##  $ : int [1:15] 9 4 7 1 2 7 2 3 1 5 ...
##  $ : int [1:15] 9 5 5 9 9 5 5 2 10 9 ...
##  $ : int [1:15] 10 6 4 4 10 9 7 6 9 8 ...
##  $ : int [1:15] 7 3 10 6 8 2 2 6 6 1 ...

Two solutions

set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
out <- l[[1]]
out <- intersect(out, l[[2]])
out <- intersect(out, l[[3]])
out <- intersect(out, l[[4]])
out

## [1] 10  6

set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
reduce(l, intersect)

## [1] 10  6

Can also pass additional arguments.
Simple implementation.

simple_reduce <- function(x, f, ...) {
  out <- x[[1]]
  for (i in seq(2, length(x))) {
    out <- f(out, x[[i]], ...)
  }
  out
}

Accumulate

set.seed(0)
l <- map(1:4, ~ sample(1:10, 15, replace = T))
accumulate(l, intersect)

## [[1]]
##  [1]  9  4  7  1  2  7  2  3  1  5  5 10  6 10  7
## 
## [[2]]
## [1]  9  4  1  2  3  5 10  6
## 
## [[3]]
## [1]  9  4 10  6
## 
## [[4]]
## [1] 10  6

Predicate functionals

A predicate:
- Function that returns a single TRUE or FALSE.
- E.g., is.character(), is.null(), or all().
- A predicate matches a vector if it returns TRUE.
A predicate functional:
- Applies a predicate to each element of a vector.
- 6 functions in 3 pairs.
- some(.x, .p)/every(.x, .p).
  - Returns TRUE if any/all element matches.
  - Similar to any(map_lgl(.x, .p))/all(map_lgl(.x, .p)).
  - But terminate early.
- detect(.x, .p)/detect_index(.x, .p).
  - Returns the value/location of the first match.
- keep(.x, .p)/discard(.x, .p).
  - Keeps/drops all matching elements.

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
detect(df, is.factor)

## [1] a b c
## Levels: a b c

detect_index(df, is.factor)

## [1] 2

# str(keep(df, is.factor))
# str(discard(df, is.factor))

Map variants

df <- data.frame(
  num1 = c(0, 10, 20),
  num2 = c(5, 6, 7),
  chr1 = c("a", "b", "c"),
  stringsAsFactors = FALSE
)
str(map_if(df, is.numeric, mean))

## List of 3
##  $ num1: num 10
##  $ num2: num 6
##  $ chr1: chr [1:3] "a" "b" "c"

str(modify_if(df, is.numeric, mean))

## 'data.frame':    3 obs. of  3 variables:
##  $ num1: num  10 10 10
##  $ num2: num  6 6 6
##  $ chr1: chr  "a" "b" "c"

str(map(keep(df, is.numeric), mean))

## List of 2
##  $ num1: num 10
##  $ num2: num 6

Base functionals

Some base R functionals have no purrr equivalent:
- Working with two-dimensional and higher vectors:
  - base::apply(): summarizes by collapsing rows/columns to a single value.

Matrices and arrays: `base::apply()`

Summarizes by collapsing rows/columns to a single value.

a2d <- matrix(1:20, nrow = 5)
apply(a2d, 1, mean)

## [1]  8.5  9.5 10.5 11.5 12.5

apply(a2d, 2, mean)

## [1]  3  8 13 18

Function

Jitong

11/29/2021

Functional style

Functionals

Functionals

Outline

Map

Warm-up: `purrr::map()`

Producing atomic vectors

Anonymous functions and shortcuts

Extracting elements from a vector

Passing arguments with …

Map variants

Same type of output/input: `modify()`

No outputs: `walk()` and friends

Two inputs: `map2()` and friends

Any number of inputs: `pmap()`

Iterating over values and indices

Reduce

Reduce

Reduce family

Accumulate

Accumulate

Predicate functionals

Predicate functionals

Map variants

Base functionals

Base functionals

Matrices and arrays: `base::apply()`

Function

Jitong

11/29/2021

Functional style

Functionals

Functionals

Outline

Map

Warm-up: purrr::map()

Producing atomic vectors

Anonymous functions and shortcuts

Extracting elements from a vector

Passing arguments with …

Map variants

Same type of output/input: modify()

No outputs: walk() and friends

Two inputs: map2() and friends

Any number of inputs: pmap()

Iterating over values and indices

Reduce

Reduce

Reduce family

Accumulate

Accumulate

Predicate functionals

Predicate functionals

Map variants

Base functionals

Base functionals

Matrices and arrays: base::apply()

Warm-up: `purrr::map()`

Same type of output/input: `modify()`

No outputs: `walk()` and friends

Two inputs: `map2()` and friends

Any number of inputs: `pmap()`

Matrices and arrays: `base::apply()`