Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add noSplit option to executeSql #298

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

add noSplit option to executeSql #298

wants to merge 2 commits into from

Conversation

MPagel
Copy link

@MPagel MPagel commented Dec 19, 2024

This allows for multi-statement execution "at once" for non-Oracle DBMSes and/or for pre-split content.

Specifically this is made to work around issues with sqlRender::splitSql where valid single (but compound) SQL statements are broken into multiple statements that are syntactically invalid on their own. This was observed when trying run executeSql on a CREATE PROCEDURE statement. Please see this issue

this allows for multi-statement execution at once for non-oracle DBMSes and/or for pre-split content.
vectorize
@schuemie
Copy link
Member

schuemie commented Jan 8, 2025

Adding an argument to this specific function feels a bit hacky. Two alternates:

  1. Why don't you directly call lowLevelExecuteSql()? This skips the splitSql() function.
  2. Couldn't executeSql() just detect if the input is a character or a list, and if a list, then use unlist() instead of splitSql()?

@MPagel
Copy link
Author

MPagel commented Jan 8, 2025

alternative 1 would likely function properly. However, if additional error logging (.createErrorReport) or quirks handling (e.g. the Redshift getAutoCommit workaround) or output of progress bar and timing results are desired, code redundant with executeSql would need to be wrapped around the lowLevelExecuteSql call.

alternative 2 would work for my circumstance, but MAY break the implementation of other packages that use DatabaseConnector. For instance, if such packages currently feed a list what I will term "segmented queries"* to executeSql on Oracle or another DBMS that is incompatible with multiple queries in a single query string, they may be counting on executeSql to parse those into individual queries before execution. Thus, such a change as outlined in alternative 2 would move the onus of writing "Oracle friendly" platform-specific code to the author of code calling executeSql directly to have to preprocess with sqlRender::splitSql prior to the executeSql call. I don't know how widespread the use of DatabaseConnector is for non-OHDSI projects, and if any of these use executeSql on lists of multi-statement containing strings and still wish to have each of those broken into multiple queries.

A third alternative: editing SqlRender::splitSql to more robust with regard to parsing SQL statements in examining for valid procedural-DL code, in the process disregarding single-line comments and multi-line comment blocks when parsing for an end-of-statement ; or END would require splitSql to be more DBMS-aware, and parse for all the subtle differences in procedural language between MSSQL, postgres, Oracle and others

I personally favor the adding of an optional parameter to the executeSql function call with a default value that will not cause anyone's existing implementation to break, rather than counting on current executeSql usage to be 100% consistent with the paradigm of being supplied with either a single string of statements or a list of strings of individual statements. But I do understand the desire to keep code bloat minimized in order to aid processing speed, code legibility and ease of debugging. Thus, I can understand if you do not wish to merge this PR into the main branch. I can always use my work-around present in this PR or use alternative 1 in my own code.


*My current working definition of "segmented queries" is a set of statements that are designed to logically encompass grouped statements, but ones that can be executed serially. For instance, a segmented query could include a grouping of drop and create table, key and index definitions, and initialization with static values contained in a multi-statement string
"DROP TABLE x; CREATE TABLE x (id int NOT NULL, str string DEFAULT NULL); Alter table x add Primary Key (id);" or "DROP TABLE y; CREATE TABLE y as (Select * from t); add constraint pk_y primary key (id desc);"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants