forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathComputing-on-the-language.rmd
754 lines (561 loc) · 29.3 KB
/
Computing-on-the-language.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
---
title: Non-standard evaluation
layout: default
---
```{r, echo = FALSE}
library(pryr)
```
```{r, echo = FALSE, eval = FALSE}
library(stringr)
special <- c("substitute", "deparse")
funs <- lapply(special, function(x) {
match <- paste0("^", x, "$")
c(
find_funs("package:base", fun_calls, match),
find_funs("package:utils", fun_calls, match),
find_funs("package:stats", fun_calls, match)
)
})
names(funs) <- special
names(Filter(function(x) length(x) == 2, ggplot2:::invert(funs)))
names(Filter(function(x) length(x) == 1, ggplot2:::invert(funs)))
```
# Non-standard evaluation {#nse}
> "Flexibility in syntax, if it does not lead to ambiguity, would seem a
> reasonable thing to ask of an interactive programming language."
>
> --- Kent Pitman, <http://www.nhplace.com/kent/Papers/Special-Forms.html>
R has powerful tools for computing not only on values, but on the actions that lead to those values. These tools are powerful and magical, and one of the most surprising features if you're coming from another programming language. Take the following simple snippet of code that draws a sine curve:
```{r plot-labels}
x <- seq(0, 2 * pi, length = 100)
sinx <- sin(x)
plot(x, sinx, type = "l")
```
Look at the labels on the axes. How did R know that the variable on the x axis was called `x` and the variable on the y axis was called `sinx`? In most programming languages, you can only access values of the function arguments. In R, you can also access the code used to compute them. This makes __non-standard evaluation__ (NSE) possible, and is particularly useful for functions designed to facilitate interactive data analysis because it can dramatically reduce the amount of typing.
The goal of this chapter is to help you understand NSE in existing R code, and to show you how to write your own functions that use. In [Capturing expressions](#capturing-expressions) you'll learn how to capture unevaluated expressions using `substitute()`. In [non-standard evaluation](#subset) you'll learn how `subset()` combines `substitute()` with `eval()` to allow succinctly to select rows from a data frame. [Scoping issues](#scoping-issues) will teach you about the scoping issues that arise in NSE, and show you how to resolve them.
NSE is great for interactive use, but can be hard to program with. [Calling from another function](#calling-from-another-function) shows why every function that uses NSE should have an escape hatch, a version that uses regular evaluation. [Substitute](#substitute) shows you how you can use `substitute()` to modify expressions, which makes it suitable as a general escape hatch.
While powerful, NSE makes code substantially more difficult to reason about. The chapter concludes with a look at the downsides of NSE in [The downsides](#nse-downsides).
### Prereqs
Before reading this chapter, make sure you're familiar with environments ([Environments](#environments)) and lexical scoping ([Lexical scoping](#lexical-scoping)). You'll also need to install the pryr package with `devtools::install_github("hadley/pryr")`. Some exercises require the plyr package, which you can install from CRAN with `install.packages("plyr")`.
## Capturing expressions
`substitute()` is the tool that makes non-standard evaluation possible. It looks at a function argument, and instead of seeing the value, it sees the code used to compute the value:
```{r}
f <- function(x) {
substitute(x)
}
f(1:10)
x <- 10
f(x)
y <- 13
f(x + y ^ 2)
```
We won't worry about exactly what `substitute()` returns (that's the topic of [the following chapter](#metaprogramming)), but we'll call it an expression.
`substitute()` works because function arguments in R are a special object called a __promise__. A promise captures the expression needed compute the value and the environment in which to compute. You're not normally aware of promises because the first time you access a promise its code is evaluated in its environment, returning a value.
One another function is usally paired with `substitute()`: `deparse()`. It takes the result of `substitute()` (an expression) and turns it to a character vector.
```{r}
g <- function(x) deparse(substitute(x))
g(1:10)
g(x)
g(x + y ^ 2)
```
There are a lot of functions in base R that use these ideas. Some use them to avoid quotes:
```{r, eval = FALSE}
library(ggplot2)
# the same as
library("ggplot2")
```
Other functions, like `plot.default()`, use them to provide default labels:
```{r, eval = FALSE}
plot.default <- function(x, y = NULL, xlabel = NULL, ylabel = NULL, ...) {
...
xlab <- if (is.null(xlabel) && !missing(x)) deparse(substitute(x))
ylab <- if (is.null(xlabel) && !missing(y)) deparse(substitute(y))
...
}
```
(The real code is a little more complicated because `plot()` uses `xy.coords()` to standardise the multiple ways that `x` and `y` can be supplied)
`data.frame()` labels variables with the expression used to compute them:
```{r}
x <- 1:4
y <- letters[1:4]
names(data.frame(x, y))
```
This wouldn't be possible in most programming langauges because functions usually only see values (e.g. `1:4` and `c("a", "b", "c", "d")`), not the expressions that created them (`x` and `y`).
### Exercises
1. There's one important feature of `deparse()` to be aware of when
programming with it: can return multiple strings if the input is long.
For example, calling `g()` as follows will a vector of length two.
```{r}
g(a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q +
r + s + t + u + v + w + x + y + z)
```
Why does this happen? Carefully read the documentation. Can you write a
wrapper around `deparse()` that always returns a single string?
2. Why does `as.Date.default()` use `substitute()` and `deparse()`?
Why does `pairwise.t.test()` use them? Read the source code.
3. `pairwise.t.test()` is written under the assumption that `deparse()`
always returns a length one character vector. Can you construct an
input that violates this expectation? What happens?
4. `f()`, defined above, just calls `substitute()`. Why can't we use it
to define `g()`? In other words, what will the following code return?
First make a prediction, then run the code and think about the results.
```{r, eval = FALSE}
f <- function(x) substitute(x)
g <- function(x) deparse(f(x))
g(1:10)
g(x)
g(x + y ^ 2 / z + exp(a * sin(b)))
```
5. The pattern `deparse(substitute(x))` is very common in base R code.
Why can't you write a function that does both things in one step?
## Non-standard evaluation in subset {#subset}
Just printing out the expression used to generate an argument value is useful, but we can do more with the unevaluated code. For example, take `subset()`. It's a useful interactive shortcut for subsetting data frames: instead of repeating the name of data frame you're working with again and again, you can save some typing:
```{r}
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))
subset(sample_df, a >= 4)
# equivalent to:
# sample_df[sample_df$a >= 4, ]
subset(sample_df, b == c)
# equivalent to:
# sample_df[sample_df$b == sample_df$c, ]
```
Subset is special because the expressions `a >= 4` or `b == c` aren't evaluated in the global environment: instead they're evaluated in the data frame. In other words, `subset()` implements different scoping rules so instead of looking for those variables in the current environment, `subset()` looks in the specified data frame. This is the essence of non-standard evaluation.
How does `subset()` work? We've already seen how to capture the expression that computes an argument, rather than its result, so we just need to figure out how to evaluate that expression in the right context, so that `x` is interpreted as `sample_df$x`, not `globalenv()$x`. To do this we need `eval()`, which takes an expression and evaluates it in the specified environment.
Before we can explore `eval()` we need one more useful function: `quote()`. It captures an unevaluated expression like `substitute()`, but you don't need to use it inside a function. This makes it useful for interactive experimentation.
```{r}
quote(1:10)
quote(x)
quote(x + y ^ 2)
```
We need `quote()` to experiment with `eval()` because the first argument to `eval()` is an expression. If you only provide one argument, it evaluates the expression in the current environment. This makes `eval(quote(x))` exactly equivalent to typing `x`, regardless of what `x` is:
```{r, error = TRUE}
eval(quote(x <- 1))
eval(quote(x))
eval(quote(y))
```
Note that `quote()` and `eval()` are effectively opposites. In the example below, each `eval()` peels off one layer of quoting.
```{r}
quote(2 + 2)
eval(quote(2 + 2))
quote(quote(2 + 2))
eval(quote(quote(2 + 2)))
eval(eval(quote(quote(2 + 2))))
```
The second argument to `eval()` controls the environment in which the code is executed:
```{r}
x <- 10
eval(quote(x))
e <- new.env()
e$x <- 20
eval(quote(x), e)
```
Instead of an environment, the second argument can also be a list or a data frame. This works because lists and data frames bind names to values in a similar way to environments.
```{r}
eval(quote(x), list(x = 30))
eval(quote(x), data.frame(x = 40))
```
This gives us one part of `subset()`:
```{r}
eval(quote(a >= 4), sample_df)
eval(quote(b == c), sample_df)
```
A common mistake when first starting to use `eval()` is to forget to quote the first argument. Compare the results in the following example:
```{r, error = TRUE}
a <- 10
eval(quote(a), sample_df)
eval(a, sample_df)
eval(quote(b), sample_df)
eval(b, sample_df)
```
We can use `eval()` and `substitute()` to write `subset()`. First we capture the call representing the condition, then evaluate it in the context of the data frame and use the result for subsetting:
```{r}
subset2 <- function(x, condition) {
condition_call <- substitute(condition)
r <- eval(condition_call, x)
x[r, ]
}
subset2(sample_df, a >= 4)
```
### Exercises
1. Implement your own version of `quote()` using `substitute()`.
2. What will this code return?
```{r, eval = FALSE}
eval(quote(eval(quote(eval(quote(2 + 2))))))
eval(eval(quote(eval(quote(eval(quote(2 + 2)))))))
quote(eval(quote(eval(quote(eval(quote(2 + 2)))))))
```
3. `subset2()` has a bug if you use it with a single column data frame.
What should the following code return? How can you modify `subset2()`
so it returns the correct type of object?
```{r}
sample_df2 <- data.frame(x = 1:10)
subset2(sample_df2, x > 8)
```
4. What happens if you use `quote()` instead of `substitute()` inside of
`subset2()`?
4. The real subset function (`subset.data.frame()`) removes missing
values in the condition. Modify `subset2()` to also drop these rows.
5. The real subset function also performs variable selection. It allows you
to work with variable names like they are positions, so you can do things
like `subset(mtcars, , -cyl)` to drop the cylinder variable, or
`subset(mtcars, , disp:drat)` to select all the variables between `disp`
and `drat`. How does it work? I've made it easier to
understand by extracting it out into its own function.
```{r, eval = FALSE}
select <- function(df, vars) {
vars <- substitute(vars)
var_pos <- setNames(as.list(seq_along(df)), names(df))
pos <- eval(vars, var_pos)
df[, pos, drop = FALSE]
}
select(mtcars, -cyl)
```
6. What does `evalq()` do? Use it to reduce the amount of typing for the
examples above that use both `eval()` and `quote()`
## Scoping issues
It certainly looks like our `subset2()` function works. But since we're working with expressions instead of values, we need to test a little more carefully. For example, you might expect that the following uses of `subset2()` should all return the same value because each variable refers to the same value:
```{r, error = TRUE}
y <- 4
x <- 4
condition <- 4
condition_call <- 4
subset2(sample_df, a == 4)
subset2(sample_df, a == y)
subset2(sample_df, a == x)
subset2(sample_df, a == condition)
subset2(sample_df, a == condition_call)
```
What's going wrong? You can get a hint from the variable names I've chosen: they are all variables defined inside `subset2()`. If `eval()` can't find the variable inside the data frame (its second argument), it looks in the environment of `subset2()`. That's obviously not what we want, so we need some way to tell `eval()` to look somewhere else if it can't find the variables in the data frame.
The key is the third argument to `eval()`: `enclos`. This allows us to specify a parent (or enclosing) environment for objects that don't have one (like lists and data frames). If the binding is not found in `env`, `eval()` will next look in `enclos`, and the parents of `enclos`. `enclos` is ignored if `env` is a real environment. We want to look for `x` in the environment from which `subset2()` was called. In R terminology this is called the __parent frame__ and is accessed with `parent.frame()`. This is an example of [dynamic scope](http://en.wikipedia.org/wiki/Scope_%28programming%29#Dynamic_scoping) because the values come from the location where the function was called, not where it was defined.
With this modification our function works:
```{r}
subset2 <- function(x, condition) {
condition_call <- substitute(condition)
r <- eval(condition_call, x, parent.frame())
x[r, ]
}
x <- 4
subset2(sample_df, a == x)
```
Using `enclos` is just a shortcut for converting a list or data frame to an environment. We can get the same behaviour by using `list2env()` to turn a list into an environment with an explicit parent:
```{r}
subset2a <- function(x, condition) {
condition_call <- substitute(condition)
env <- list2env(x, parent = parent.frame())
r <- eval(condition_call, env)
x[r, ]
}
x <- 5
subset2a(sample_df, a == x)
```
When using NSE it's also a good idea to test that your code works outside of the global environment:
```{r}
f <- function() {
x <- 5
subset2a(sample_df, a == x)
}
f()
```
### Exercises
1. What does `transform()` do? Read the documentation. How does it work?
Read the source code for `transform.data.frame()`. What does
`substitute(list(...))` do? Create a function that does only that
and experiment with it.
2. `plyr::arrange()` works similarly to `subset()`, but instead of selecting
rows, it reorders them. How does it work? What does
`substitute(order(...))` do?
3. `plyr::mutate()` is similar to `transform()` but it applies the
transformations sequentially so that transformation can refer to columns
that were just created:
```{r, eval = FALSE}
df <- data.frame(x = 1:5)
transform(df, x2 = x * x, x3 = x2 * x)
plyr::mutate(df, x2 = x * x, x3 = x2 * x)
```
How does mutate work? What's the key difference between `mutate()` and
`transform()`?
4. What does `with()` do? How does it work? Read the source code for
`with.default()`.
5. What does `within()` do? How does it work? Read the source code for
`within.data.frame()`. Why is the code so much more complex than
`with()`?
## Calling from another function
Typically, computing on the language is most useful for functions called directly by the user, not by other functions. For example `subset()` saves typing but it's difficult to use non-interactively, from another function. For example, imagine we want a function that randomly reorders a subset of the data. A nice way to write that function would be to compose a function for random reordering and a function for subsetting. Let's try that:
```{r}
subset2 <- function(x, condition) {
condition_call <- substitute(condition)
r <- eval(condition_call, x, parent.frame())
x[r, ]
}
scramble <- function(x) x[sample(nrow(x)), ]
subscramble <- function(x, condition) {
scramble(subset2(x, condition))
}
```
But it doesn't work:
```{r, error = TRUE}
subscramble(sample_df, a >= 4)
# Error in eval(expr, envir, enclos) : object 'a' not found
traceback()
#> 5: eval(expr, envir, enclos)
#> 4: eval(condition_call, x, parent.frame()) at #3
#> 3: subset2(x, condition) at #1
#> 2: scramble(subset2(x, condition)) at #2
#> 1: subscramble(sample_df, a >= 4)
```
What's gone wrong? To figure it out, lets `debug()` subset and work through the code line-by-line:
```{r, eval = FALSE}
debugonce(subset2)
subscramble(sample_df, a >= 4)
#> debugging in: subset2(x, condition)
#> debug at #1: {
#> condition_call <- substitute(condition)
#> r <- eval(condition_call, x, parent.frame())
#> x[r, ]
#> }
n
#> debug at #2: condition_call <- substitute(condition)
n
#> debug at #3: r <- eval(condition_call, x, parent.frame())
r <- eval(condition_call, x, parent.frame())
#> Error in eval(expr, envir, enclos) : object 'a' not found
condition_call
#> condition
eval(condition_call, x)
#> Error in eval(expr, envir, enclos) : object 'a' not found
Q
```
Can you see what the problem is? `condition_call` contains the expression `condition` so when we try to evaluate that it evaluates `condition` which has the value `a >= 4`. This can't be computed in the parent environment because it doesn't contain an object called `a`. If `a` is set in the global environment, far more confusing things can happen:
```{r}
a <- 4
subscramble(sample_df, a == 4)
a <- c(1, 1, 4, 4, 4, 4)
subscramble(sample_df, a >= 4)
```
This is an example of the general tension between functions that are designed for interactive use and functions that are safe to program with. A function that uses `substitute()` might save typing, but it's difficult to call from another function. As a developer you should always provide an escape hatch: an alternative version that uses standard evaluation. In this case, we could write a version of `subset2()` that takes a quoted expression:
```{r}
subset2_q <- function(x, condition) {
r <- eval(condition, x, parent.frame())
x[r, ]
}
```
I usually suffix these functions with `q` to indicate that they take a quoted call. Most users won't need them so the name can be a little longer. We can then rewrite both `subset2()` and `subscramble()` to use `subset2_q()`:
```{r}
subset2 <- function(x, condition) {
subset2_q(x, substitute(condition))
}
subscramble <- function(x, condition) {
condition <- substitute(condition)
scramble(subset2_q(x, condition))
}
subscramble(sample_df, a >= 3)
subscramble(sample_df, a >= 3)
```
Base R functions tend to use a different sort of escape hatch. They often have an argument that turns off NSE. For example, `require()` has `character.only = TRUE`. I don't think using an argument to change the behaviour of another argument is a good idea because it means you must completely and carefully read all of the function arguments to understand what one function argument means. Since you can't understand the effect of each argument in isolation, it's harder to predict what the function will do.
### Exercises
1. The following function attempts to figure out if the input is already
a quoted expression using `is.call()`. Why wont't it work?
```{r}
is.call(123)
is.call(quote(a == b))
subset3 <- function(x, condition) {
if (!is.call(condition)) {
condition <- substitute(condition)
}
r <- eval(condition, x)
x[r, ]
}
```
2. The following R functions all use non-standard evaluation. For each,
describe how it uses non-standard evaluation. Read the documentation
to determine the escape hatch: how do you force the function to use
standard evaluation rules?
* `rm()`
* `library()` and `require()`
* `substitute()`
* `data()`
* `data.frame()`
* `ls()`
3. Add an escape hatch to `plyr::mutate()` by splitting it into two functions.
One function should capture the unevaluated inputs, and the other should
take a data frame and list of expressions and perform the computation.
4. What's the escape hatch for `ggplot::aes()`? What about `plyr::.()`?
What do they have in common? What are the advantages and disadvantages
of their differences?
5. The version of `subset2_q()` I presented is actually somewhat simplified.
Why is the following version better?
```{r}
subset2_q <- function(x, condition, env = parent.frame()) {
r <- eval(condition, x, env)
x[r, ]
}
```
Rewrite `subset2()` and `subscramble()` to use this improved version.
## Substitute
Most functions that use non-standard evaluation provide an escape hatch. But what happens if you want to call a function without one? For example, imagine you want to create a lattice graphic given the names of two variables:
```{r, error = TRUE}
library(lattice)
xyplot(mpg ~ disp, data = mtcars)
x <- quote(mpg)
y <- quote(disp)
xyplot(x ~ y, data = mtcars)
```
We can turn to `substitute()` and use it for another purpose: to modify an expression. Unfortunately `substitute()` has a feature that makes modifying calls interactively a bit of a pain: it never does substitutions when run from the global environment, and just behaves like `quote()`:
```{r, eval = FALSE}
a <- 1
b <- 2
substitute(a + b + z)
#> a + b + z
```
However, if you run it inside a function, `substitute()` substitutes what it can and leaves everything else as is:
```{r}
f <- function() {
a <- 1
b <- 2
substitute(a + b + z)
}
f()
```
To make it easier to experiment with `substitute()`, `pryr` provides the `subs()` function. It works exactly the same way as `substitute()` except it has a shorter name and it works in the global environment. Together, this makes it much easier to experiment with substitution:
```{r}
a <- 1
b <- 2
subs(a + b + z)
```
The second argument (to both `subs()` and `substitute()`) can override the use of the current environment, and provide an alternative list of name-value pairs to use. The following example uses that technique to show some variations on substituting a string, variable name or function call:
```{r}
subs(a + b, list(a = "y"))
subs(a + b, list(a = quote(y)))
subs(a + b, list(a = quote(y())))
```
Remember that every action in R is a function call, so we can also replace `+` with another function:
```{r}
subs(a + b, list("+" = quote(f)))
subs(a + b, list("+" = quote(`*`)))
```
It's quite possible to make nonsense commands with `substitute()`:
```{r}
subs(y <- y + 1, list(y = 1))
```
You can also use `substitute()` to insert arbitrary objects into an expression, but this is a bad idea. In the example below, the expression doesn't print correctly, but it returns the correct result when we evaluate it:
```{r}
df <- data.frame(x = 1)
x <- subs(class(df))
x
eval(x)
```
Formally, substitution takes place by examining each object name in the expression. If the name is:
* an ordinary variable, it's replaced by the value of the variable.
* a promise (a function argument), it's replaced by the expression associated
with the promise.
* `...`, it's replaced by the contents of `...`
Otherwise it's left as is.
We can use this to create the right call to `xyplot()`:
```{r}
x <- quote(mpg)
y <- quote(disp)
subs(xyplot(x ~ y, data = mtcars))
```
It's even simpler inside a function, because we don't need to explicitly quote the x and y variables. Following the rules above, `substitute()` replaces named arguments with their expressions, not their values:
```{r}
xyplot2 <- function(x, y, data = data) {
substitute(xyplot(x ~ y, data = data))
}
xyplot2(mpg, disp, data = mtcars)
```
If we include `...` in the call to substitute, we can add additional arguments to the call:
```{r}
xyplot3 <- function(x, y, ...) {
substitute(xyplot(x ~ y, ...))
}
xyplot3(mpg, disp, data = mtcars, col = "red", aspect = "xy")
```
### Non-standard evaluation in substitute
`substitute()` is itself a function that uses non-standard evaluation, but doesn't have an escape hatch. For example, we can't use `substitute()` if we already have an expression saved in a variable:
```{r}
x <- quote(a + b)
substitute(x, list(a = 1, b = 2))
```
Although `substitute()` doesn't have a built-in escape hatch, so we can use `substitute()` itself to create one:
```{r}
substitute2 <- function(x, env) {
call <- substitute(substitute(y, env), list(y = x))
eval(call)
}
x <- quote(a + b)
substitute2(x, list(a = 1, b = 2))
```
The implementation of `substitute2` is short, but deep. Let's work through the example above: `substitute2(x, list(a = 1, b = 2))`. It's a little tricky because of `substitute()`'s non-standard evaluation rules, we can't use the usual technique of working through the parentheses inside-out.
1. First `substitute(substitute(y, env), list(y = x))` is evaluated.
The expression `substitute(y, env)` is captured and `y` is replaced by the
value of `x`. Because we've put `x` inside a list, it will be evaluated and
the rules of substitute will replace `y` with it's value. This yields the
expression `substitute(a + b, env)`
2. Next we evaluate that expression inside the current function.
`substitute()` specially evaluates its first argument, and looks for name
value pairs in `env`, which evaluates to `list(a = 1, b = 2)`. Those are
both values (not promises) so the result will be `1 + 2`
### Capturing unevaluated ... {#capturing-dots}
Another useful technique is to capture all of the unevaluated expressions in `...`. Base R functions do this in many ways, but there's one technique that works well in a wide variety of situations:
```{r}
dots <- function(...) {
eval(substitute(alist(...)))
}
```
This uses the `alist()` function which simply captures all its arguments. This function is the same as `pryr::dots()`. Pryr also provides `pryr::named_dots()`, which ensures all arguments are named, using deparsed expressions as default names, just like `data.frame()`.
### Exercises
1. Use `subs()` convert the LHS to the RHS for each of the following pairs:
* `a + b + c` -> `a * b * c`
* `f(g(a, b), c)` -> `(a + b) * c`
* `f(a < b, c, d)` -> `if (a < b) c else d`
2. For each of the following pairs of expressions, describe why you can't
use `subs()` to convert between them.
* `a + b + c` -> `a + b * c`
* `f(a, b)` -> `f(a, b, c)`
* `f(a, b, c)` -> `f(a, b)`
3. How does `pryr::named_dots()` work? Read the source.
## The downsides of non-standard evaluation {#nse-downsides}
A big downside of non-standard evaluation is that it is not [referentially transparent](http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)). A function is __referentially transparent__ if you can replace its arguments with their values and behaviour doesn't change. For example, if a function `f()` referentially transparent, and both `x` and `y` are 10, then both `f(x)` and `f(y)` evaluate to the same result, which will be same as `f(10)`. Referentially transparent code is easier to reason about because names of objects don't matter, and you can always work from the most inner parenthesese outwards.
There are many important functions that by their very nature are not referentially transparent. Take the assignment operator. You can't take `a <- 1` and replace `a` by its value and get the same behaviour. This is one reason that people usually write assignments at the top-level of functions. It's hard to reason about code like this:
```{r}
a <- 1
b <- 2
if ((a <- a + 1) > (b <- b - 1)) {
b <- b + 2
}
```
Using NSE automatically prevents a function from being referentially transparent. This makes the mental model needed to correctly predict the output much more complicated, so it's only worthwhile to use NSE if there is significant gain. For example, `library()` and `require()` allow you to call them either with or without quotes, because internally they use `deparse(substitute(x))` plus a couple of other tricks. That means that these two lines do exactly the same thing:
```{r, eval = FALSE}
library(ggplot2)
library("ggplot2")
```
However, things start to get complicated if the variable is associated with a value. What package will this load?
```{r, eval = FALSE}
ggplot2 <- "plyr"
library(ggplot2)
```
There are a number of other R functions that work in this way, like `ls()`, `rm()`, `data()`, `demo()`, `example()` and `vignette()`. To me, eliminating two keystrokes is not worth the loss of referential transparency, and I don't recommend you use NSE for this purpose.
One situtation where non-standard evaluation is more useful is `data.frame()`. It uses the input to automatically name the output variables if not explicitly supplied:
```{r}
x <- 10
y <- "a"
df <- data.frame(x, y)
names(df)
```
I think it is worthwhile in `data.frame()` because it eliminates a lot of redundancy in the common scenario when you're creating a data frame from existing variables, and importantly, it's easy to override this behaviour by supplying names for each variable.
Non-standard evaluation allows you to write functions that are extremely powerful, but the lack of referential transparency makes it harder to model the behaviour of a function, and makes it harder to program with. As well as always providing an escape hatch that gets back to standard evaluation, carefully consider both the benefits and costs of NSE before using it in a new domain.
### Exercises
1. What does the following function do? What's the escape hatch?
Do you think that this an appropriate use of NSE?
```{r}
nl <- function(...) {
dots <- named_dots(...)
lapply(dots, eval, parent.frame())
}
```
2. Instead of relying on promises, you can use formulas created with `~`
to explicitly capture an expression and its environment. What are the
advantages and disadvantages of making quoting explicit? How does it
impact referential transparency?
3. Read the [standard non-standard evaluation rules]
(http://developer.r-project.org/nonstandard-eval.pdf).