-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Execution Settings R6 Class #196
base: develop
Are you sure you want to change the base?
Conversation
}, | ||
|
||
# disconnect to database | ||
disconnect = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a "withConnection" withr style context can be implemented as a method see here:
}, | ||
|
||
#TODO make this more rigorous | ||
# add warning if no connection available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include a DatabaseConnector::dbIsValid call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I left this open as I need to think of all checks on the database connection.
connection = NULL, | ||
cdmDatabaseSchema = NULL, | ||
workDatabaseSchema = NULL, | ||
tempEmulationSchema = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tempEmulationSchema
is normally an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we have it as an argument? snowflake requires it for everything selfishly hope to keep it here 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, keep it as an argument but the current convention across other packages is to use
tempEmulationSchema = getOption('sqlRenderTempEmulationSchema')
So I'm proposing that this behvaiour is consistent.
#' An R6 class to define an ExecutionSettings object | ||
#' | ||
#' @export | ||
ExecutionSettings <- R6::R6Class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anthonysena I think this is a useful abstraction that can be used in almost all our packages and CohortGenerator seems an appropriate place to put it.
Some initial points that I think we need:
-
Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function
executeCohorts
. -
We will need to write a vignette
-
we should update all functions in this package to use this
-
We should look at the stratuegus execution context list to see if there is additional stuff we should replicate there
-
Perhaps we also move or replicate the cdm source meta data function from strategus here?
-
We should add a helper function to create an instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed @azimov and thanks for this contribution @mdlavallee92!
Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function executeCohorts.
Let's aim to add this as an additional way of encapsulating the settings as you suggested and then fully adopt it when we have the opportunity to break compatibility with v1.x.
@azimov, @anthonysena and @chrisknoll this is the file I use for the R6Class functions. I added to the PR for execution settings for now but it would be better if this lived in some "tools" package we maintain elsewhere. tagging @alondhe and @katy-sadowski for awareness |
These would make good utility functions. I'm not sure I'd use them in my case, as there's cases where multiple checks are made (see here or here). But if it makes sense to use them, all for it. |
ExecutionSettings <- R6::R6Class( | ||
classname = "ExecutionSettings", | ||
public = list( | ||
initialize = function(connectionDetails = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, the from of initialize here is different than what we discussed about R6 classes in the monthly meeting:
- Initialize takes list-of-list (ie: fromJSON()) or a string which is JSON format and the object is initialized from that.
- The objective of those R6 classes (in CohortIncidence) was to just enshroud object state into a formalized construct. So, it wasn't meant to have behavior about connecting, and the entire object model should be able to be serialized from/to json and I'm not sure connection details is that.
So based on the above I was imagining the execution settings to contain user-input data about schemas, runtime options, etc, and then to execute a study you have the analysis, execution settings and connection details provided separately (especially because sometimes you pass along an actual open connection (for whatever reason) or a connection details and the code decides which to use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the context of this class is useful for a different reason that serialization or de-serialization. This would allow us to standardize across packages how we encapsulate inputs. At the moment these are fairly inconsistent.
I would say we explicitly never want to serialize connectionDetails as this adds the potential to leak security details. So the construction should be:
createExecutionSettings <- function(connection = NULL, connectionDetails = NULL, cdmDatabaseSchema, cohortDatabaseSchema = ... etc) {
data <- list(
<insertAllCommonSettings>
)
ExecutionSettings$new(connection = connection, connectionDetails = connectionDetails, data = data)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the context of this class is useful for a different reason that serialization or de-serialization.
That's a fair point, but this PR seems to set the basis for one example of implementing the other R6 classes. I was looking at the R6 classes as a means to define the analytic input API (at least at a data structure level, not a functional level). So there will be some that are pure data oriented (to define structure and handle serialization) and maybe there will be some that have behavior (for example, CI has a query builder R6 class I believe).
Maybe we need to see an example of both in this PR so we see where lines are drawn between behavior classes and structure classes.
I think we should seek to expand this list, there will always be exceptions but there are basic things that require a double check. For example, if a value is a big int this is not readily supported by checkmate. The best solution I could think of was to check if the value is numeric and It could also be possible to use
|
Proposal: add an r6 class for an execution settings that handles the routing of the connectionDetails and the databaseSchemas. Class would also have a method to connect, disconnect and identify the dbms.