-
Notifications
You must be signed in to change notification settings - Fork 7
Code Inspection with Non-Standard Evaluation (NSE) #14
Comments
What difference is there between using subset and dplyr filter? If you use filter you can use its SE version filter_. |
Sorry, I should have been more clear - ## sum(data$x)
y <- with(data, sum(x))
## aes(mtcars$wt, mtcars$mpg)
p <- ggplot(mtcars, aes(wt, mpg)) + geom_point() What I personally prioritizing is the challenge of identifying global (aka unknown) objects in R expressions. Being able to identify global objects is important in areas such as:
Although the above cannot be done perfectly during static code inspection, I do think it can be improved if we the code inspector has some extra information to go by. Also, for tasks such as memoization and distributed processing, we're in the borderland between static and run-time code inspection, i.e. when it comes to identify globals in an expression that is to be evaluated elsewhere we know the state of R and its objects at time point which allows us to better infer what the global variables are. For instance, if we at run-time know that I'm not sure if this fall under "Contract Programming" (with terms like preconditions, postconditions, errors, and invariants). |
Would one notice some problems related to this using |
cc @kevinushey |
@HenrikBengtsson My fork of Duncan Temple Lang's CodeDepends package (here: https://github.com/gmbecker/CodeDepends) has facilities for dealing with piping and non-standard evaluation in a way that, while not fully automatic, is specifiable at the function level and has defaults for most
Note the nsevalVars slot (and apologies for the lack of a pretty printing method for the objects). |
@gmbecker, this looks very interesting. I haven't looking at the code, but are you saying that your version of CodeDepends is doing code inspection of Here's my toy examples (but does not seem to do what I expect): > subset2 <- function(x, subset, select, drop = FALSE ,...) { x[subset,] }
> code <- "df <- data.frame(x=1:10, y = 21:30); subset2(df, x < sqrt(y))"
> script <- readScript(txt=code)
> getInputs(script)
[...]
Slot "nsevalVars":
character(0)
Slot "sideEffects":
character(0)
Slot "code":
subset2(df, x < sqrt(y)) Below I would expect to pick up something in > subset3 <- function(x, subset, ...) { rows <- eval(substitute(subset), x); x[rows,] }
> code <- "df <- data.frame(x=1:10, y = 21:30); subset3(df, x < sqrt(y))"
> script <- readScript(txt=code)
> getInputs(script)
[...]
Slot "nsevalVars":
character(0)
Slot "sideEffects":
character(0)
Slot "code":
subset3(df, x < sqrt(y)) |
@HenrikBengtsson Sorry i was a bit unclear before, as I wrote that message quickly. (my version of) CodeDepends does not currently delve into function definitions to attempt to detect non-standard evaluation, but it is parameterized so that customizing default (all function) or specific function behavior is (relatively) easy. As it works now, it "knows" that subset, the dplyr verbs, etc have non-standard evaluation and I've written handlers for the type of non-standard eval they do, which are the defaults for those functions. So it's easy to tell it that your function has nse, and have it do the right thing (I need to export and doc
It's also "easy" to hack together a function that detects straightforward instances of nse given a function object. (note this is nowhere near production grade, I wrote it in like 3 minutes as an illustrative example) Here we set a specialized handler for the substitute function that collects up all the things that are passed to it.
(recall that It doesn't do that on it's own in one step now though, though it's possible it could be made to by default. P.S. so they know we're talking about this @duncantl @nick-ulle |
FWIW, RStudio does something similar when detecting whether a function argument is used in an NSE-way -- we just see if it's present in a call to e.g. |
As an aside, I think the simplest way to offload this validation work would be to allow functions to have an attribute, e.g. fn <- function(x) {
# do some NSE
}
attr(fn, "validate") <- function(call) {
# perform validation
} Environments embedding R could supply the current call (as a string, or as a call object) and that function could return a list of diagnostic objects that the environment hosting R could present as appropriate. Unfortunately, this does become more complicated when considering S3 / S4 dispatch since the host environment also needs to figure out what method would actually be dispatched to. :/ |
@kevinushey I wouldn't think it is possible/coherent to use nse for arguments that are dispatched on. They seem mutually exclusive. Are there examples of this being done that you know of? |
For the records, the codetools package (used by addCollectUsageHandler("library", "base", function(e, w) {
w$enterGlobal("function", "library", e, w)
if (length(e) > 2)
for(a in dropMissings(e[-(1:2)])) walkCode(a, w)
}) I've started my own discussion on this over at futureverse/globals#12 |
Is it possible to provide metadata to R functions that use non-standard evaluation (NSE) in order to help static code inspection to identify global/unknown variables? For instance, consider
In this piece of code the expression
{x < 3}
is ambiguous. For instance, here we know from experience/documentation/manual code inspection thatx
could be either (i) a global variable or (ii) an element of thedata
object, i.e.data$x
;In other words, expression
{x < 3}
is basically evaluated aseval({x < 3}, envir=data)
such that ifdata$x
exists then that is used forx
, otherwise a globalx
is searched for.However, without this "human" knowledge, any static code inspection can really only assume the former, i.e. it will identify
x
as a global/unknown object.Some thoughts:
subset
ofsubset()
should only be interpreted as a non-evaluated expression?data
argument?data
is up front? And what if we knowdata
contains fieldx
?subset()
itself?The text was updated successfully, but these errors were encountered: