07-Functional_programming.Rmd

# Functional programming

## Annonymous functions

1. __<span style="color:red">Q</span>__: Given a function, like `"mean"`, `match.fun()` lets you find a function. 
   Given a function, can you find its name? Why doesn't that make sense in R?  
   __<span style="color:green">A</span>__: If you know `body()`, `formals()` and `environment()` it can be possible to find the function. However, this won't be possible for primitive functions, since they return `NULL` for those three properties. Also annonymous functions won't be found, because they are not bound to a name. On the other hand it could be that different names in an environment contain binding to one (or more functions) with the same `body()`, `formals()` and `environment()` which means that the solution wouldn't be unique. More general: In R a (function) name has an object, but an object (i.e. a function) doesn't have a name (just a binding sometimes).

2. __<span style="color:red">Q</span>__: Use `lapply()` and an anonymous function to find the coefficient of 
   variation (the standard deviation divided by the mean) for all columns in 
   the `mtcars` dataset
    
    __<span style="color:green">A</span>__: `lapply(mtcars, function(x) sd(x)/mean(x))`.  

3. __<span style="color:red">Q</span>__: Use `integrate()` and an anonymous function to find the area under the 
   curve for the following functions. 
   Use [Wolfram Alpha](http://www.wolframalpha.com/) to check your answers.

    1. `y = x ^ 2 - x`, x in [0, 10]
    1. `y = sin(x) + cos(x)`, x in [-$\pi$, $\pi$]
    1. `y = exp(x) / x`, x in [10, 20]

    __<span style="color:green">A</span>__: 
    
    ```{r, eval = FALSE}
    integrate(function(x) x^2 - x, 0, 10)
    integrate(function(x) sin(x) + cos(x), -pi, pi)
    integrate(function(x) exp(x) / x, 10, 20)
    ```

4. __<span style="color:red">Q</span>__: A good rule of thumb is that an anonymous function should fit on one line 
   and shouldn't need to use `{}`. Review your code. Where could you have 
   used an anonymous function instead of a named function? Where should you 
   have used a named function instead of an anonymous function?  
    __<span style="color:green">A</span>__: 
   
## Closures

1.  __<span style="color:red">Q</span>__: Why are functions created by other functions called closures?  
__<span style="color:green">A</span>__: As stated in the book:

    > because they enclose the environment of the parent function and can access all its variables.

2.  __<span style="color:red">Q</span>__: What does the following statistical function do? What would be a better 
    name for it? (The existing name is a bit of a hint.)

    ```{r}
    bc <- function(lambda) {
      if (lambda == 0) {
        function(x) log(x)
      } else {
        function(x) (x ^ lambda - 1) / lambda
      }
    }
    ```  
    
    __<span style="color:green">A</span>__: It is the logarithm, when lambda equals zero and `x ^ lambda - 1 / lambda` otherwise. A better name might be `box_cox_transformation` (one parametric), you can read about it (here)[https://en.wikipedia.org/wiki/Power_transform].

3.  __<span style="color:red">Q</span>__: What does `approxfun()` do? What does it return?  
__<span style="color:green">A</span>__: `approxfun` basically takes a combination of 2-dimensional data points + some extra specifications as arguments and returns a stepwise linear or constant interpolation function (defined on the range of given x-values, by default).

4.  __<span style="color:red">Q</span>__: What does `ecdf()` do? What does it return?  
__<span style="color:green">A</span>__: "ecdf" means empirical density function. For a numeric vector, `ecdf()` returns the appropriate density function (of class "ecdf", which is inheriting from class "stepfun"). You can describe it's behaviour in 2 steps. In the first part of it's body, the `(x,y)` pairs for the nodes of the density function are calculated. In the second part these pairs are given to `approxfun`.

5.  __<span style="color:red">Q</span>__: Create a function that creates functions that compute the ith 
    [central moment](http://en.wikipedia.org/wiki/Central_moment) of a numeric 
    vector. You can test it by running the following code:

    ```{r, eval = FALSE}
    m1 <- moment(1)
    m2 <- moment(2)

    x <- runif(100)
    stopifnot(all.equal(m1(x), 0))
    stopifnot(all.equal(m2(x), var(x) * 99 / 100))
    ```  
    
    __<span style="color:green">A</span>__: For a discrete formulation look [here](http://www.r-tutor.com/elementary-statistics/numerical-measures/moment)
    
    ```{r, eval = FALSE}
    moment <- function(i){
      function(x) sum((x - mean(x)) ^ i) / length(x)
      }
    ```

6.  __<span style="color:red">Q</span>__: Create a function `pick()` that takes an index, `i`, as an argument and 
    returns a function with an argument `x` that subsets `x` with `i`.

    ```{r, eval = FALSE}
    lapply(mtcars, pick(5))
    # should do the same as this
    lapply(mtcars, function(x) x[[5]])
    ```  
    
    __<span style="color:green">A</span>__: 
    
    ```{r, eval = FALSE}
    pick <- function(i){
      function(x) x[[i]]
      }
    
    stopifnot(identical(lapply(mtcars, pick(5)),
                        lapply(mtcars, function(x) x[[5]]))
              )
    ```
    
## Lists of functions

1.  __<span style="color:red">Q</span>__: Implement a summary function that works like `base::summary()`, but uses a 
    list of functions. Modify the function so it returns a closure, making it 
    possible to use it as a function factory.
    
    __<span style="color:green">A</span>__: We have two possibilities, we can imitate `base::summary()` completely or create a new summary based on our preferences. Both is not so easy, since it involves a lot of design decisions. We choose the second option, since we just like to create a first draft to apply what we have learned and get some feeling for the challenges that might appear.
    
    Some properties, that our new summary function `summary2` should have are nice default actions for specific data types and they should of course be changeable as this is also a part of the exercise. To limit our efforts, we focus on summaries for data frames. Everything else will be explained, via comments on the code:
    
    ```{r, error = TRUE}
    # The arguments of our function factory are the lists of functions that are
    # applied to data frame columns, depending on their type.
    # We focus on the most important, so they can be set for characters, integer,
    # double, logical, factor and date. By default they are set to NULL, but if you 
    # supply a list with functions, this will override the real default, for the
    # specific type, which is set inside the function factory.
    summary2 <- function(character_functions = NULL, integer_functions = NULL,
                         double_functions = NULL, logical_functions = NULL, 
                         factor_functions = NULL, date_functions = NULL){
      
      # The following functional will later be six times applied on the data frame,
      # one time for every column type in the scope of our function
      apply_typefunction <- function(df, pred, functions){
        lapply(df[vapply(df, pred, logical(1))],
               function(x) unlist(lapply(functions, function(y) y(x))))
      }
      
      # The following lists of functions are "somehow" similar to those, that are used
      # by base::summary, so we define them once...
      default_1 <- list(Table = table)
      default_2 <- list(Min = min, `1st Qu.` = function(x) quantile(x)[[2]],
                        Median = median, Mean = mean,
                        `3rd Qu.` = function(x) quantile(x)[[4]], Max = max)
      
      # All those function list, that are not specified, when calling the 
      # function factory, are now set to their default values
      if(is.null(character_functions)) {character_functions = default_1}
      if(is.null(integer_functions))   {integer_functions   = default_2}
      if(is.null(double_functions))    {double_functions    = default_2}
      if(is.null(logical_functions))   {logical_functions   = default_1}
      if(is.null(factor_functions))    {factor_functions    = default_1}
      if(is.null(date_functions))      {date_functions      = default_2}
      
      # Finally the returned function is created
      function(df){
        
        # For every column type, the specific functions will be applied to the
        # appropriate columns. 
        characters <- apply_typefunction(df, is.character, character_functions)
        integers   <- apply_typefunction(df, is.integer  , integer_functions  )
        doubles    <- apply_typefunction(df, is.double   , double_functions   )
        logicals   <- apply_typefunction(df, is.logical  , logical_functions  )

        factors    <- apply_typefunction(df, is.factor   , factor_functions   )
        dates      <- apply_typefunction(df, function(x) inherits(x, 'Date'), 
                                         date_functions)
        
        # The results will be collected in a list and if empty lists appear, because
        # of non occuring columntypes, these empty lists will be removed from the output.
        # There are a lot of formatting steps, like ordering, naming and converting
        # output, that we could do, but we think that the idea is more important for now
        out <- list(characters, integers, doubles, logicals, 
                factors, dates)
        out[lengths(out) != 0]
      }
    }
    
    # Now we can apply the function factory
    summary2_default <- summary2()
    # And the resulting function
    summary2_default(df = iris)
    
    # Unfortunately, we will fail if there are any NAs in integer columns
    df_nas <- data.frame(integers_na = c(NA, 2:19))
    summary2_default(df_nas)
    
    # But since, we can define new functions for integer columns, we can solve this
    summary2_naversion <- summary2(integer_functions = list(
      Mean_na = function(x) mean(x, na.rm = TRUE),
      Median_na = function(x) median(x, na.rm = TRUE),
      NAs = function(x) sum(is.na(x)))
      )
    summary2_naversion(df_nas)
    ```

2. __<span style="color:red">Q</span>__: Which of the following commands is equivalent to `with(x, f(z))`?

    (a) `x$f(x$z)`.
    (b) `f(x$z)`.
    (c) `x$f(z)`.
    (d) `f(z)`.
    (e) It depends.
    
    __<span style="color:green">A</span>__: b is equivalent. If `x` is the current environment, also d would work.
    
## Case study: numerical integration

1.  __<span style="color:red">Q</span>__: Instead of creating individual functions (e.g., `midpoint()`, 
      `trapezoid()`, `simpson()`, etc.), we could store them in a list. If we 
    did that, how would that change the code? Can you create the list of 
    functions from a list of coefficients for the Newton-Cotes formulae?  
    __<span style="color:green">A</span>__: 

2.  __<span style="color:red">Q</span>__: The trade-off between integration rules is that more complex rules are 
    slower to compute, but need fewer pieces. For `sin()` in the range 
    [0, $\pi$], determine the number of pieces needed so that each rule will 
    be equally accurate. Illustrate your results with a graph. How do they
    change for different functions? `sin(1 / x^2)` is particularly challenging.  
    __<span style="color:green">A</span>__: