Control Flow

Control flow

  • Two primary tools of control flow:
    • Choices.
    • Loops.
  • Choices:
    • E.g. if(), ifelse(), switch().
    • Allows to run different code depending on the input.
  • Loops:
    • E.g. for, while, repeat.
    • Allows to repeatedly run code, typically with changing options.

if() statements

if (condition) true_action
if (condition) true_action else false_action
  • The basic idea for if statements:
    • If condition is TRUE, true_action is evaluated.
    • If condition is FALSE, the optional false_action is evaluated.
  • Typically, actions are compound statements contained within {.
x <- 87

if (x > 90) {
  print("A")
} else if (x > 80) {
  print("B")
} else if (x > 50) {
  print("C")
} else {
  print("F")
}
## [1] "B"
  • if returns a value so that you can assign the results:
    • Only do that when it fits on one line; otherwise hard to read.
x1 <- if (TRUE) 1 else 2
x2 <- if (FALSE) 1 else 2
c(x1, x2)
## [1] 1 2
  • When using if without else:
    • Returns NULL if the condition is FALSE.
    • Useful with functions like c()/paste() dropping NULL inputs.
greet <- function(name, birthday = FALSE) {
  paste0("Hi ", name, if (birthday) " and HAPPY BIRTHDAY")
}
greet("Maria", FALSE)
## [1] "Hi Maria"
greet("Jaime", TRUE)
## [1] "Hi Jaime and HAPPY BIRTHDAY"

Invalid inputs

  • The condition should evaluate to a single TRUE or FALSE:
if ("x") 1
#> Error in if ("x") 1: argument is not interpretable as logical
if (logical()) 1
#> Error in if (logical()) 1: argument is of length zero
if (NA) 1
#> Error in if (NA) 1: missing value where TRUE/FALSE needed
  • The exception (frequent source of bugs, avoid):
    • A logical vector of length greater than 1 generates a warning.
if (c(TRUE, FALSE)) 1
## Warning in if (c(TRUE, FALSE)) 1: the condition has length > 1 and only the
## first element will be used
## [1] 1
  • In R >=3.5.0+, you can turn this into an error (good practice):
Sys.setenv("_R_CHECK_LENGTH_1_CONDITION_" = "true")
if (c(TRUE, FALSE)) 1
#> Error in if (c(TRUE, FALSE)) 1: the condition has length > 1

Vectorised if() statements

  • if only works with a single TRUE or FALSE.
  • What if you have a vector of logical values?
  • Answer: ifelse(), a vectorised function with test, yes, and no vectors (recycled to the same length).
    • Missing values are propagated into the output.
    • Advice: use ifelse() only when the yes and no are vectors (otherwise hard to predict the output type).
x <- 1:10
ifelse(x %% 5 == 0, "XXX", as.character(x))
##  [1] "1"   "2"   "3"   "4"   "XXX" "6"   "7"   "8"   "9"   "XXX"
ifelse(x %% 2 == 0, "even", "odd")
##  [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"

switch() statements

switch() lets you replace code like:

x_option <- function(x) {
  if (x == "a") {
    "option 1"
  } else if (x == "b") {
    "option 2"
  } else {
    stop("Invalid `x` value")
  }
}

with:

x_option <- function(x) {
  switch(x,
    a = "option 1",
    b = "option 2",
    stop("Invalid `x` value")
  )
}

A few tips:

  • Last component should always throw an error. Otherwise:
(switch("c", a = 1, b = 2))
## NULL
  • When multiple inputs share an output:
    • Use empty right hand sides of =.
    • Same as C’s switch statement.
legs <- function(x) {
  switch(x,
    cow = ,
    horse = ,
    dog = 4,
    human = ,
    chicken = 2,
    plant = 0,
    stop("Unknown input")
  )
}
legs("cow")
## [1] 4
legs("dog")
## [1] 4
  • switch() with a numeric x is not recommended.

Loops

Loops

for loops are used to iterate over items in a vector.

for (item in vector) perform_action
  • For each item in vector, perform_action is called once; updating the value of item each time.
for (i in 1:3) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
  • When iterating over indices, use very short variable names like i, j, or k by convention.
  • Important: for assigns the item to the current environment.
i <- 100
for (i in 1:3) {}
i
## [1] 3

Early termination

Two ways to terminate a for loop early: - next exits the current iteration. - break exits the entire for loop.

for (i in 1:10) {
  if (i < 3)
    next
  print(i)
  if (i >= 5)
    break
}
## [1] 3
## [1] 4
## [1] 5

Common pitfalls

Three common pitfalls to watch out for when using for: - Preallocation. - Iteration over e.g. 1:length(x). - Iteration over S3 vectors.

Preallocation:

  • If you’re generating data, preallocate the output.
  • Otherwise the loop will be very slow.
  • vector() function is helpful.
means <- c(1, 50, 20)
out <- vector("list", length(means))
for (i in 1:length(means)) {
  out[[i]] <- rnorm(10, means[[i]])
}

Iteration over e.g. 1:length(x)

Next, beware of iterating over 1:length(x), which will fail in unhelpful ways if x has length 0.

means <- c()
out <- vector("list", length(means))
for (i in 1:length(means)) {
  out[[i]] <- rnorm(10, means[[i]])
}
#> Error in rnorm(10, means[[i]]): invalid arguments

# The reason? `:` works with both increasing and decreasing sequences.
1:length(means)
  • Use seq_along(x) instead.
means <- c()
seq_along(means)
## integer(0)
out <- vector("list", length(means))
for (i in seq_along(means)) {
  out[[i]] <- rnorm(10, means[[i]])
}

Iterating over S3 vectors

  • Finally, problems arise when iterating over S3 vectors, as loops typically strip the attributes.
xs <- as.Date(c("2020-01-01", "2010-01-01"))
for (x in xs) {
  print(x)
}
## [1] 18262
## [1] 14610
  • Work around this by using [[.
for (i in seq_along(xs)) {
  print(xs[[i]])
}
## [1] "2020-01-01"
## [1] "2010-01-01"

Functions

  • Two important ideas:
    • Functions can be broken down into three components: arguments, body, and environment.
    • Functions are objects, just as vectors are objects.
  • In the following:
    • The basics: how to create functions and the three main components of a function.
    • Function composition: the three forms commonly used in R code.
    • Lazy evaluation: the fact that function arguments are only evaluated when used for the first time.
    • The special … argument: how to pass on extra arguments to another function.
    • Exiting a function: how can a function exit and exit handlers.

Function components

A function has three parts:

  • The formals(), list of arguments controlling how you call the function.
  • The body(), the code inside the function.
  • The environment(), the data structure that determines how the function finds the values associated with the names.
f02 <- function(x, y) {
  # A comment
  x + y
}

How are those are defined? - Explicitly for the formals and body. - Implicitly for the environment (where the function was defined).

formals(f02)
## $x
## 
## 
## $y
body(f02)
## {
##     x + y
## }
environment(f02)
## <environment: R_GlobalEnv>
  • Functions can possess any number of additional attributes().
  • One attribute in base R is srcref, short for source reference.
    • Points to the source code used to create the function.
    • Used for printing because, unlike body(), it contains code comments and other formatting.
attr(f02, "srcref")
## function(x, y) {
##   # A comment
##   x + y
## }

Primitive functions

  • One exception to the three components rule.
  • Call C code directly.
sum
## function (..., na.rm = FALSE)  .Primitive("sum")
`[`
## .Primitive("[")
  • Type is either builtin or special.
typeof(sum)
## [1] "builtin"
typeof(`[`)
## [1] "special"
  • formals(), body(), and environment() are all NULL.
formals(sum)
## NULL
body(sum)
## NULL
environment(sum)
## NULL

First-class functions

  • R functions are objects in their own right!
  • This language property often called “first-class functions”.
  • Unlike in many other languages, no special syntax:
    • Create a function object (with function).
    • Bind it to a name with <-.
f01 <- function(x) {
  sin(1 / x ^ 2)
}

More on functions

  • The binding step is not compulsory.
  • A function without a name is called an anonymous function.
lapply(mtcars, function(x) length(unique(x)))
## $mpg
## [1] 25
## 
## $cyl
## [1] 3
## 
## $disp
## [1] 27
## 
## $hp
## [1] 22
## 
## $drat
## [1] 22
## 
## $wt
## [1] 29
## 
## $qsec
## [1] 30
## 
## $vs
## [1] 2
## 
## $am
## [1] 2
## 
## $gear
## [1] 3
## 
## $carb
## [1] 6
integrate(function(x) sin(x) ^ 2, 0, pi)
## 1.570796 with absolute error < 1.7e-14
  • Also possible to put functions in a list.
funs <- list(
  half = function(x) x / 2,
  double = function(x) x * 2
)
funs$double(10)
## [1] 20
  • In R, functions are often called closures.
  • The name reflects the fact that R functions capture/enclose, their environments.

Invoking a function

  • The standard way:
mean(1:10, na.rm = TRUE)
## [1] 5.5
  • The alternative way:
args <- list(1:10, na.rm = TRUE)
do.call(mean, args)
## [1] 5.5

Function composition

What if you want to apply a function to the output of another function?

  • Imagine you want to compute the population standard deviation using sqrt() and mean().
square <- function(x) x^2
deviation <- function(x) x - mean(x)
  • Either nest the function calls.
x <- runif(100)
sqrt(mean(square(deviation(x))))
## [1] 0.2845536
  • Or save the intermediate results as variables.
x <- runif(100)

out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out
## [1] 0.2903238

An alternative: Piping

  • The third option using the magrittr package:
    • The operator %>%, called pipe and pronounced as “and then”.
x <- runif(100)

library(magrittr)
## Warning: package 'magrittr' was built under R version 4.0.3
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
x %>%
  deviation() %>%
  square() %>%
  mean() %>%
  sqrt()
## [1] 0.2791327
  • Advantages: Focus on the high-level composition of functions, not the low-level flow of data.
    • Focus on what’s being done (the verbs), not on what’s being modified (the nouns).
    • Makes your code more readable by:
      • Structuring sequences of data operations left-to-right.
      • Minimizing the need for local variables and function definitions.
      • Making it easy to add steps anywhere in the sequence.

Basic piping

  • x %>% f is equivalent to f(x)
  • x %>% f(y) is equivalent to f(x, y)
  • x %>% f(y) %>% g(z) is equivalent to g(f(x, y), z)
x <- 1:10
y <- x + 1
z <- y + 1
f <- function(x, y) x + y
x %>% sum
## [1] 55
x %>% f(y)
##  [1]  3  5  7  9 11 13 15 17 19 21
x %>% f(y) %>% f(z)
##  [1]  6  9 12 15 18 21 24 27 30 33

The argument (“dot”) placeholder

  • x %>% f(y, .) is equivalent to f(y, x)
  • x %>% f(y, z = .) is equivalent to f(y, z = x)
x <- 1:10
y <- 2 * x
f <- function(z, y) y / z
x %>% f(y, .)
##  [1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
x %>% f(y, z = .)
##  [1] 2 2 2 2 2 2 2 2 2 2

Function composition: Summary

  • Each of the three options has its own strengths and weaknesses:
    • Nesting, f(g(x)):
      • Concise, and well suited for short sequences.
      • Longer sequences harder to read (inside out & right to left).
      • Arguments can get spread out over long distances creating the Dagwood sandwich problem.
    • Intermediate objects, y <- f(x); g(y):
      • Requires you to name intermediate objects.
      • A strength when objects are important, but a weakness when values are truly intermediate.
    • Piping, x %>% f() %>% g():
      • Allows to read code in straightforward left-to-right fashion.
      • Doesn’t require to name intermediate objects.
      • Only for linear sequences of transformations of a single object.
  • Most code use a combination of all three styles, but. . .
  • Piping is more common in data analysis code!

Lazy evaluation

  • In R, function arguments are lazily evaluated:
    • Only evaluated if accessed.
    • What will this code return?
h01 <- function(x) {
  10
}
h01(stop("This is an error!"))
## [1] 10
  • Allows to include expensive computations in function arguments that are only evaluated if needed.
  • Powered by promises, a data structure with three components:
    • An expression, like x + y, giving rise to the delayed computation.
    • An environment, where the expression should be evaluated.
    • A value.

Promises: the environment

  • The environment is where the expression should be evaluated.
    • i.e., where the function is called.
    • What will this code return?
y <- 10
h02 <- function(x) {
  y <- 100
  x + 1
}
h02(y)
## [1] 11
  • Also means that when assigning inside a call to a function, the variable is bound outside the function, not inside.
y <- 10
h02 <- function(x) {
  y <- 100
  x + 1
}
h02(y <- 1000)
## [1] 1001
y
## [1] 1000

Promises: the value

  • The value:
    • Computed and cached the first time a promise is accessed, when the expression is evaluated in the specified environment.
    • Ensures that the promise is evaluated at most once.
  • What will this code return?
x <- 1:10
double <- function(x) {
  message("Calculating...")
  x * 2
}
h03 <- function(x) {
  c(x, x)
}
h03(double(x))
## Calculating...
##  [1]  2  4  6  8 10 12 14 16 18 20  2  4  6  8 10 12 14 16 18 20
  • Can’t manipulate promises with R code: any inspection attempt with code will force an immediate evaluation, making the promise disappear.

Default arguments

  • Thanks to lazy evaluation:
    • Default values can be defined in terms of other arguments.
    • Or even in terms of variables defined later in the function.
  • What will this code return?
h04 <- function(x = 1, y = x * 2, z = a + b) {
  a <- 10
  b <- 100
  c(x, y, z)
}
h04()
## [1]   1   2 110
  • Many use this technique, but not recommended:
    • Makes the code harder to understand.
    • To predict what will be returned, need to know the exact order in which default arguments are evaluated.
  • The evaluation environment.
    • User supplied arguments: evaluated in the global environment.
    • Default arguments: evaluated inside the function.
  • Seemingly identical calls can yield different results.
h05 <- function(x = ls()) {
  a <- 1
  x
}
# ls() evaluated in global environment:
h05(ls())
##  [1] "args"      "deviation" "double"    "f"         "f01"       "f02"      
##  [7] "funs"      "greet"     "h01"       "h02"       "h03"       "h04"      
## [13] "h05"       "i"         "legs"      "means"     "out"       "square"   
## [19] "x"         "x_option"  "x1"        "x2"        "xs"        "y"        
## [25] "z"
# ls() evaluated inside h05:
h05()
## [1] "a" "x"

Missing arguments

  • Use missing() to determine if an argument’s value comes from the user or from a default.
h06 <- function(x = 10) {
list(missing(x), x)
}
# default
str(h06())
## List of 2
##  $ : logi TRUE
##  $ : num 10
# user supplied
str(h06(10))
## List of 2
##  $ : logi FALSE
##  $ : num 10
  • How many arguments are required?
args(sample)
## function (x, size, replace = FALSE, prob = NULL) 
## NULL
  • A “better”" sample():
    • Use an explicit NULL to indicate that size is not required but can be supplied.
sample <- function(x, size = NULL, replace = FALSE, prob = NULL) {
  if (is.null(size)) {
    size <- length(x)
  }
  x[sample.int(length(x), size, replace = replace, prob = prob)]
}

… (dot-dot-dot)

  • The special argument … (pronounced dot-dot-dot).
    • Makes a function take any number of additional arguments.
    • In other programming languages:
      • This is often called varargs (short for variable arguments).
      • A function that uses it is said to be variadic.
  • Can pass those additional arguments on to another function.
i01 <- function(y, z) {
  list(y = y, z = z)
}
i02 <- function(x, ...) {
  i01(...)
}
str(i02(x = 1, y = 2, z = 3))
## List of 2
##  $ y: num 2
##  $ z: num 3

The two primary uses of …

  • If a function takes a function as an argument, you want some way to pass additional arguments to that function.
x <- list(c(1, 3, NA), c(4, NA, 6))
str(lapply(x, mean, na.rm = TRUE))
## List of 2
##  $ : num 2
##  $ : num 5
  • If a function is an S3 generic, it needs some way to allow methods to take arbitrary extra arguments.
print(factor(letters), max.levels = 4)
##  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## 26 Levels: a b c ... z
  • list(...) evaluates the arguments and stores them in a list.
i04 <- function(...) {
  list(...)
}
str(i04(a = 1, b = 2))
## List of 2
##  $ a: num 1
##  $ b: num 2
  • In general, using ... comes with two downsides:
    • When using it to pass arguments to another function, need to carefully explain to the user where those arguments go.
      • Makes it hard to understand what a function can do.
    • A misspelled argument will not raise an error.
      • Makes it easy for typos to go unnoticed.
sum(1, 2, NA, na_rm = TRUE)
## [1] NA

Exiting a function

  • Most functions exit in one of two ways:
    • They either return a value, indicating success.
    • Or they throw an error, indicating failure.
  • In the next few slides:
    • Return values.
      • Implicit versus explicit.
      • Visible versus invisible.
    • Errors.
    • Exit handlers, allowing to run code when a function exits.

Implicit versus explicit returns

  • Implicit, where the last evaluated expression is the return value.
j01 <- function(x) {
  if (x < 10) {
    0
  } else {
    10
  }
}
j01(5)
## [1] 0
j01(15)
## [1] 10
  • Explicit, by calling return().
j02 <- function(x) {
  if (x < 10) {
    return(0)
  } else {
    return(10)
  }
}

Invisible values

  • Most functions return visibly: calling the function in an interactive context prints the result.
j03 <- function() 1
j03()
## [1] 1
  • Applying invisible() to the last value prevents this.
j04 <- function() invisible(1)
j04()

# Verify that the value exists with `print` or `()`.
print(j04())
## [1] 1
(j04())
## [1] 1
  • The most common function that returns invisibly is <-.
a <- 2
(a <- 2)
## [1] 2
  • This is what makes it possible to chain assignments.
a <- b <- c <- d <- 2
  • Functions called primarily for a side effect (like <-, print(), or plot()) should return an invisible value (often the value of the first argument).

Errors

  • If a function cannot complete its assigned task, it should throw an error with stop():
    • Immediately terminates the execution of the function.
    • Indicates that something has gone wrong, and forces the user to deal with the problem.
j05 <- function() {
  stop("I'm an error")
  return(10)
}
j05()
#> Error in j05(): I'm an error
  • Some languages rely on special return values to indicate problems, but in R you should always throw an error.

Exit handlers

j06 <- function(x) {
  cat("Hello\n")
  on.exit(cat("Goodbye!\n"), add = TRUE)
  if (x) {
    return(10)
  } else {
    stop("Error")
  }
}
j06(TRUE)
## Hello
## Goodbye!
## [1] 10
j06 <- function(x) {
  cat("Hello\n")
  on.exit(cat("Goodbye!\n"), add = TRUE)
  if (x) {
    return(10)
  } else {
    stop("Error")
  }
}

j06(FALSE)
#> Hello
#> Error in j06(FALSE): Error
#> Goodbye!

Exit handlers with on.exit()

  • Always set add = TRUE:
    • If you don’t, each call to on.exit() overwrites previous ones.
    • Even when only registering a single handler, it’s good practice to set add = TRUE.
  • on.exit() is useful because it allows to place clean-up code directly next to the code that requires clean-up.
cleanup <- function(dir, code) {
  old_dir <- setwd(dir)
  on.exit(setwd(old_dir), add = TRUE)
  old_opt <- options(stringsAsFactors = FALSE)
  on.exit(options(old_opt), add = TRUE)
}

Exit handlers with on.exit()

  • Coupled with lazy evaluation, a useful pattern for running a block of code in an altered environment.
with_dir <- function(dir, code) {
  old <- setwd(dir)
  on.exit(setwd(old), add = TRUE)
  force(code)
}
getwd()
## [1] "C:/Users/susuz/Dropbox/Programming_course/Slides"
with_dir("~", getwd())
## [1] "C:/Users/susuz/Documents"
getwd()
## [1] "C:/Users/susuz/Dropbox/Programming_course/Slides"
  • force() isn’t strictly necessary here as simply referring to code will force its evaluation.
  • But makes it clear that we are deliberately forcing the execution.