forked from Tazinho/Advanced-R-Solutions
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path07-Functional_programming.Rmd
216 lines (168 loc) · 11.1 KB
/
07-Functional_programming.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# Functional programming
## Annonymous functions
1. __<span style="color:red">Q</span>__: Given a function, like `"mean"`, `match.fun()` lets you find a function.
Given a function, can you find its name? Why doesn't that make sense in R?
__<span style="color:green">A</span>__: If you know `body()`, `formals()` and `environment()` it can be possible to find the function. However, this won't be possible for primitive functions, since they return `NULL` for those three properties. Also annonymous functions won't be found, because they are not bound to a name. On the other hand it could be that different names in an environment contain binding to one (or more functions) with the same `body()`, `formals()` and `environment()` which means that the solution wouldn't be unique. More general: In R a (function) name has an object, but an object (i.e. a function) doesn't have a name (just a binding sometimes).
2. __<span style="color:red">Q</span>__: Use `lapply()` and an anonymous function to find the coefficient of
variation (the standard deviation divided by the mean) for all columns in
the `mtcars` dataset
__<span style="color:green">A</span>__: `lapply(mtcars, function(x) sd(x)/mean(x))`.
3. __<span style="color:red">Q</span>__: Use `integrate()` and an anonymous function to find the area under the
curve for the following functions.
Use [Wolfram Alpha](http://www.wolframalpha.com/) to check your answers.
1. `y = x ^ 2 - x`, x in [0, 10]
1. `y = sin(x) + cos(x)`, x in [-$\pi$, $\pi$]
1. `y = exp(x) / x`, x in [10, 20]
__<span style="color:green">A</span>__:
```{r, eval = FALSE}
integrate(function(x) x^2 - x, 0, 10)
integrate(function(x) sin(x) + cos(x), -pi, pi)
integrate(function(x) exp(x) / x, 10, 20)
```
4. __<span style="color:red">Q</span>__: A good rule of thumb is that an anonymous function should fit on one line
and shouldn't need to use `{}`. Review your code. Where could you have
used an anonymous function instead of a named function? Where should you
have used a named function instead of an anonymous function?
__<span style="color:green">A</span>__:
## Closures
1. __<span style="color:red">Q</span>__: Why are functions created by other functions called closures?
__<span style="color:green">A</span>__: As stated in the book:
> because they enclose the environment of the parent function and can access all its variables.
2. __<span style="color:red">Q</span>__: What does the following statistical function do? What would be a better
name for it? (The existing name is a bit of a hint.)
```{r}
bc <- function(lambda) {
if (lambda == 0) {
function(x) log(x)
} else {
function(x) (x ^ lambda - 1) / lambda
}
}
```
__<span style="color:green">A</span>__: It is the logarithm, when lambda equals zero and `x ^ lambda - 1 / lambda` otherwise. A better name might be `box_cox_transformation` (one parametric), you can read about it (here)[https://en.wikipedia.org/wiki/Power_transform].
3. __<span style="color:red">Q</span>__: What does `approxfun()` do? What does it return?
__<span style="color:green">A</span>__: `approxfun` basically takes a combination of 2-dimensional data points + some extra specifications as arguments and returns a stepwise linear or constant interpolation function (defined on the range of given x-values, by default).
4. __<span style="color:red">Q</span>__: What does `ecdf()` do? What does it return?
__<span style="color:green">A</span>__: "ecdf" means empirical density function. For a numeric vector, `ecdf()` returns the appropriate density function (of class "ecdf", which is inheriting from class "stepfun"). You can describe it's behaviour in 2 steps. In the first part of it's body, the `(x,y)` pairs for the nodes of the density function are calculated. In the second part these pairs are given to `approxfun`.
5. __<span style="color:red">Q</span>__: Create a function that creates functions that compute the ith
[central moment](http://en.wikipedia.org/wiki/Central_moment) of a numeric
vector. You can test it by running the following code:
```{r, eval = FALSE}
m1 <- moment(1)
m2 <- moment(2)
x <- runif(100)
stopifnot(all.equal(m1(x), 0))
stopifnot(all.equal(m2(x), var(x) * 99 / 100))
```
__<span style="color:green">A</span>__: For a discrete formulation look [here](http://www.r-tutor.com/elementary-statistics/numerical-measures/moment)
```{r, eval = FALSE}
moment <- function(i){
function(x) sum((x - mean(x)) ^ i) / length(x)
}
```
6. __<span style="color:red">Q</span>__: Create a function `pick()` that takes an index, `i`, as an argument and
returns a function with an argument `x` that subsets `x` with `i`.
```{r, eval = FALSE}
lapply(mtcars, pick(5))
# should do the same as this
lapply(mtcars, function(x) x[[5]])
```
__<span style="color:green">A</span>__:
```{r, eval = FALSE}
pick <- function(i){
function(x) x[[i]]
}
stopifnot(identical(lapply(mtcars, pick(5)),
lapply(mtcars, function(x) x[[5]]))
)
```
## Lists of functions
1. __<span style="color:red">Q</span>__: Implement a summary function that works like `base::summary()`, but uses a
list of functions. Modify the function so it returns a closure, making it
possible to use it as a function factory.
__<span style="color:green">A</span>__: We have two possibilities, we can imitate `base::summary()` completely or create a new summary based on our preferences. Both is not so easy, since it involves a lot of design decisions. We choose the second option, since we just like to create a first draft to apply what we have learned and get some feeling for the challenges that might appear.
Some properties, that our new summary function `summary2` should have are nice default actions for specific data types and they should of course be changeable as this is also a part of the exercise. To limit our efforts, we focus on summaries for data frames. Everything else will be explained, via comments on the code:
```{r, error = TRUE}
# The arguments of our function factory are the lists of functions that are
# applied to data frame columns, depending on their type.
# We focus on the most important, so they can be set for characters, integer,
# double, logical, factor and date. By default they are set to NULL, but if you
# supply a list with functions, this will override the real default, for the
# specific type, which is set inside the function factory.
summary2 <- function(character_functions = NULL, integer_functions = NULL,
double_functions = NULL, logical_functions = NULL,
factor_functions = NULL, date_functions = NULL){
# The following functional will later be six times applied on the data frame,
# one time for every column type in the scope of our function
apply_typefunction <- function(df, pred, functions){
lapply(df[vapply(df, pred, logical(1))],
function(x) unlist(lapply(functions, function(y) y(x))))
}
# The following lists of functions are "somehow" similar to those, that are used
# by base::summary, so we define them once...
default_1 <- list(Table = table)
default_2 <- list(Min = min, `1st Qu.` = function(x) quantile(x)[[2]],
Median = median, Mean = mean,
`3rd Qu.` = function(x) quantile(x)[[4]], Max = max)
# All those function list, that are not specified, when calling the
# function factory, are now set to their default values
if(is.null(character_functions)) {character_functions = default_1}
if(is.null(integer_functions)) {integer_functions = default_2}
if(is.null(double_functions)) {double_functions = default_2}
if(is.null(logical_functions)) {logical_functions = default_1}
if(is.null(factor_functions)) {factor_functions = default_1}
if(is.null(date_functions)) {date_functions = default_2}
# Finally the returned function is created
function(df){
# For every column type, the specific functions will be applied to the
# appropriate columns.
characters <- apply_typefunction(df, is.character, character_functions)
integers <- apply_typefunction(df, is.integer , integer_functions )
doubles <- apply_typefunction(df, is.double , double_functions )
logicals <- apply_typefunction(df, is.logical , logical_functions )
factors <- apply_typefunction(df, is.factor , factor_functions )
dates <- apply_typefunction(df, function(x) inherits(x, 'Date'),
date_functions)
# The results will be collected in a list and if empty lists appear, because
# of non occuring columntypes, these empty lists will be removed from the output.
# There are a lot of formatting steps, like ordering, naming and converting
# output, that we could do, but we think that the idea is more important for now
out <- list(characters, integers, doubles, logicals,
factors, dates)
out[lengths(out) != 0]
}
}
# Now we can apply the function factory
summary2_default <- summary2()
# And the resulting function
summary2_default(df = iris)
# Unfortunately, we will fail if there are any NAs in integer columns
df_nas <- data.frame(integers_na = c(NA, 2:19))
summary2_default(df_nas)
# But since, we can define new functions for integer columns, we can solve this
summary2_naversion <- summary2(integer_functions = list(
Mean_na = function(x) mean(x, na.rm = TRUE),
Median_na = function(x) median(x, na.rm = TRUE),
NAs = function(x) sum(is.na(x)))
)
summary2_naversion(df_nas)
```
2. __<span style="color:red">Q</span>__: Which of the following commands is equivalent to `with(x, f(z))`?
(a) `x$f(x$z)`.
(b) `f(x$z)`.
(c) `x$f(z)`.
(d) `f(z)`.
(e) It depends.
__<span style="color:green">A</span>__: b is equivalent. If `x` is the current environment, also d would work.
## Case study: numerical integration
1. __<span style="color:red">Q</span>__: Instead of creating individual functions (e.g., `midpoint()`,
`trapezoid()`, `simpson()`, etc.), we could store them in a list. If we
did that, how would that change the code? Can you create the list of
functions from a list of coefficients for the Newton-Cotes formulae?
__<span style="color:green">A</span>__:
2. __<span style="color:red">Q</span>__: The trade-off between integration rules is that more complex rules are
slower to compute, but need fewer pieces. For `sin()` in the range
[0, $\pi$], determine the number of pieces needed so that each rule will
be equally accurate. Illustrate your results with a graph. How do they
change for different functions? `sin(1 / x^2)` is particularly challenging.
__<span style="color:green">A</span>__: