-
Postgres: New translations for stringr functions:
str_squish()
,str_remove()
,str_remove_all()
(@shosaco). -
All old lazy eval shims have been removed.
-
median()
now takes a new argumentna.rm
for consistency withmean()
, but still does always remove any missing values (@lorenzwalthert, #483).
-
Internally
DBI::dbExecute()
now usesimmediate = TRUE
; this improves support for session-scoped temporary tables in MS SQL (@krlmlr, #438). -
Subqueries with
ORDER BY
useTOP 9223372036854775807
instead ofTOP 100 PERCENT
on SQL Server for compatibility with Azure Data Warehouse (#337, @alexkyllo). -
escape()
now supportsblob
vectors using newsql_escape_raw()
generic. It enables using blob variables in dplyr verbs, for example to filter nvarchar values by UTF-16 blobs (see r-dbi/DBI#215 (comment)). (@okhoma, #433) -
Added
setOldClass()
calls for"ident"
and"ident_q"
classes for compatibility with dplyr 1.0.0 (#448, @krlmlr). -
Postgres
str_detect()
translation uses same argument names as stringr, and gains anegate
argument (#444). -
semi_join()
andanti_join()
now correctly support thesql_on
argument (#443, @krlmlr).
-
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
-
The Date and POSIXt methods for
escape()
now use exportedsql_escape_date()
andsql_escape_datetime()
generics to allow backend specific formatting of date and datetime literals. These are used to provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391). -
first()
,last()
,nth()
,lead()
andlag()
now respect thewindow_frame()
(@krlmlr, #366). -
SQL server: new translations for
str_flatten()
(@PauloJhonny, #405). -
SQL server: temporary datasets are now session-local, not global (#401).
-
Postgres: correct
str_detect()
,str_replace()
andstr_replace_all()
translation (@shosaco, #362).
-
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
-
Fixes for rlang and dpylr compatibility.
Minor improvements to SQL generation
-
x %in% y
strips names ofy
(#269). -
Enhancements for scoped verbs (
mutate_all()
,summarise_if()
,filter_at()
etc) (#296, #306). -
MS SQL use
TOP 100 PERCENT
as stop-gap to allow subqueries withORDER BY
(#277). -
Window functions now translated correctly for Hive (#293, @cderv).
-
Error: `con` must not be NULL
: If you see this error, it probably means that you have forgotten to passcon
down to a dbplyr function. Previously, dbplyr defaulted to usingsimulate_dbi()
which introduced subtle escaping bugs. (It's also possible I have forgotten to pass it somewhere that the dbplyr tests don't pick up, so if you can't figure it out, please let me know). -
Subsetting (
[[
,$
, and[
) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200).
-
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
-
Overall, dplyr generates many fewer subqueries:
-
Joins and semi-joins no longer add an unneeded subquery (#236). This is facilitated by the new
bare_identifier_ok
argument tosql_render()
; the previous argument was calledroot
and confused me. -
Many sequences of
select()
,rename()
,mutate()
, andtransmute()
can be collapsed into a single query, instead of always generating a subquery (#213).
-
-
New
vignette("sql")
describes some advantages of dbplyr over SQL (#205) and gives some advice about writing literal SQL inside of dplyr, when you need to (#196). -
New
vignette("reprex")
gives some hints on creating reprexes that work anywhere (#117). This is supported by a newtbl_memdb()
that matches the existingtbl_lazy()
. -
All
..._join()
functions gain ansql_on
argument that allows specifying arbitrary join predicates in SQL code (#146, @krlmlr).
-
New translations for some lubridate functions:
today()
,now()
,year()
,month()
,day()
,hour()
,minute()
,second()
,quarter()
,yday()
(@colearendt, @derekmorr). Also added new translation foras.POSIXct()
. -
New translations for stringr functions:
str_c()
,str_sub()
,str_length()
,str_to_upper()
,str_to_lower()
, andstr_to_title()
(@colearendt). Non-translated stringr functions throw a clear error. -
New translations for bitwise operations:
bitwNot()
,bitwAnd()
,bitwOr()
,bitwXor()
,bitwShiftL()
, andbitwShiftR()
. Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235). -
New translation for
x[y]
toCASE WHEN y THEN x END
. This enablessum(a[b == 0])
to work as you expect from R (#202).y
needs to be a logical expression; if not you will likely get a type error from your database. -
New translations for
x$y
andx[["y"]]
tox.y
, enabling you to index into nested fields in databases that provide them (#158). -
The
.data
and.env
pronouns of tidy evaluation are correctly translated (#132). -
New translation for
median()
andquantile()
. Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169) -
na_if()
is correct translated toNULLIF()
(rather thanNULL_IF
) (#211). -
n_distinct()
translation throws an error when given more than one argument. (#101, #133). -
New default translations for
paste()
,paste0()
, and the hyperbolic functions (these previously were only available for ODBC databases). -
Corrected translations of
pmin()
andpmax()
toLEAST()
andGREATEST()
for ANSI compliant databases (#118), toMIN()
andMAX()
for SQLite, and to an error for SQL server. -
New translation for
switch()
to the simple form ofCASE WHEN
(#192).
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
-
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
`
for field names and'
for strings). -
tbl_lazy()
now actually puts adbplyr::src
in the$src
field. This shouldn't affect any downstream code unless you were previously working around this weird difference betweentbl_lazy
andtbl_sql
classes. It also includes thesrc
class in its class, and when printed, shows the generated SQL (#111).
-
MySQL/MariaDB
-
Translations also applied to connections via the odbc package (@colearendt, #238)
-
Basic support for regular expressions via
str_detect()
and
str_replace_all()
(@colearendt, #168). -
Improved translation for
as.logical(x)
toIF(x, TRUE, FALSE)
.
-
-
Oracle
- New custom translation for
paste()
andpaste0()
(@cderv, #221)
- New custom translation for
-
Postgres
- Basic support for regular expressions via
str_detect()
and
str_replace_all()
(@colearendt, #168).
- Basic support for regular expressions via
-
SQLite
explain()
translation now generatesEXPLAIN QUERY PLAN
which generates a higher-level, more human friendly explanation.
-
SQL server
-
Improved translation for
as.logical(x)
toCAST(x as BIT)
(#250). -
Translates
paste()
,paste0()
, andstr_c()
to+
. -
copy_to()
method applies temporary table name transformation earlier so that you can now overwrite temporary tables (#258). -
db_write_table()
method uses correct argument name for passing along field types (#251).
-
-
Aggregation functions only warn once per session about the use of
na.rm = TRUE
(#216). -
table names generated by
random_table_name()
have the prefix "dbplyr_", which makes it easier to find them programmatically (@mattle24, #111) -
Functions that are only available in a windowed (
mutate()
) query now throw an error when called in a aggregate (summarise()
) query (#129) -
arrange()
understands the.by_group
argument, making it possible sort by groups if desired. The default isFALSE
(#115) -
distinct()
now handles computed variables likedistinct(df, y = x + y)
(#154). -
escape()
,sql_expr()
andbuild_sql()
no longer acceptcon = NULL
as a shortcut forcon = simulate_dbi()
. This made it too easy to forget to passcon
along, introducing extremely subtle escaping bugs.win_over()
gains acon
argument for the same reason. -
New
escape_ansi()
always uses ANSI SQL 92 standard escaping (for use in examples and documentation). -
mutate(df, x = NULL)
dropsx
from the output, just like when working with local data frames (#194). -
partial_eval()
processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs likedf %>% summarise_all(~ mean(.))
,df %>% summarise_all(list(mean))
(#134). -
sql_aggregate()
now takes an optional argumentf_r
for passing tocheck_na_rm()
. This allows the warning to show the R function name rather than the SQL function name (@sverchkov, #153). -
sql_infix()
gains apad
argument for the rare operator that doesn't need to be surrounded by spaces. -
sql_prefix()
no longer turns SQL functions into uppercase, allowing for correct translation of case-sensitive SQL functions (#181, @mtoto). -
summarise()
gives a clear error message if you refer to a variable created in that samesummarise()
(#114). -
New
sql_call2()
which is torlang::call2()
assql_expr()
is torlang::expr()
. -
show_query()
andexplain()
usecat()
rather than message. -
union()
,union_all()
,setdiff()
andintersect()
do a better job of matching columns across backends (#183).
- Now supports for dplyr 0.8.0 (#190) and R 3.1.0
-
Calls of the form
dplyr::foo()
are now evaluated in the database, rather than locally (#197). -
vars
argument totbl_sql()
has been formally deprecated; it hasn't actually done anything for a while (#3254). -
src
andtbl
objects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
-
x %in% y
is now translated toFALSE
ify
is empty (@mgirlich, #160). -
New
as.integer64(x)
translation toCAST(x AS BIGINT)
(#3305) -
case_when
now translates with a ELSE clause if a formula of the formTRUE~<RHS>
is provided . (@cderv, #112) -
cummean()
now generatesAVG()
notMEAN()
(#157) -
str_detect()
now uses correct parameter order (#3397) -
MS SQL
- Cumulative summary functions now work (#157)
ifelse()
usesCASE WHEN
instead ofIIF
; this allows more complex operations, such as%in%
, to work properly (#93)
-
Oracle
- Custom
db_drop_table()
now only drops tables if they exist (#3306) - Custom
setdiff()
translation (#3493) - Custom
db_explain()
translation (#3471)
- Custom
-
SQLite
- Correct translation for
as.numeric()
/as.double()
(@chris-park, #171).
- Correct translation for
-
Redshift
substr()
translation improved (#3339)
-
copy_to()
will only remove existing table whenoverwrite = TRUE
and the table already exists, eliminating a confusing "NOTICE" from PostgreSQL (#3197). -
partial_eval()
handles unevaluated formulas (#184). -
pull.tbl_sql()
now extracts correctly from grouped tables (#3562). -
sql_render.op()
now correctly forwards thecon
argument (@kevinykuo, #73).
- R CMD check fixes
- Forward compatibility fixes for rlang 0.2.0
-
New translations for
- MS Access (#2946) (@DavisVaughan)
- Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz)
- Teradata.
- Redshift.
-
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
-
copy_to()
can now "copy" tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can alsocopy_to
tbl_sqls from another source, andcopy_to()
will automatically collect then copy. -
Initial support for stringr functions:
str_length()
,str_to_upper()
,str_to_lower()
,str_replace_all()
,str_detect()
,str_trim()
. Regular expression support varies from database to database, but most simple regular expressions should be ok.
-
db_compute()
gains ananalyze
argument to matchdb_copy_to()
. -
New
remote_name()
,remote_con()
,remote_src()
,remote_query()
andremote_query_plan()
provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824). -
New
sql_expr()
is a more convenient building block for low-level SQL translation (#3169). -
New
sql_aggregate()
andwin_aggregate()
for generating SQL and windowed SQL functions for aggregates. These take one argument,x
, and warn ifna.rm
is notTRUE
(#3155).win_recycled()
is equivalent towin_aggregate()
and has been soft-deprecated. -
db_write_table
now needs to return the table name
-
Multiple
head()
calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084). -
escape()
now works with integer64 values from the bit64 package (#3230) -
if
,ifelse()
, andif_else()
now correctly scope the false condition so that it only applies to non-NULL conditions (#3157) -
ident()
andident_q()
handle 0-length inputs better, and should be easier to use with S3 (#3212) -
in_schema()
should now work in more places, particularly incopy_to()
(#3013, @baileych) -
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
-
mutate()
has better logic for splitting a single mutate into multiple subqueries (#3095). -
Improved
paste()
andpaste0()
support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support forstr_flatten()
which behaves likepaste(x, collapse = "-")
(but for technical reasons can't be implemented as a straightforward translation ofpaste()
). -
same_src.tbl_sql()
now performs correct comparison instead of always returningTRUE
. This means thatcopy = TRUE
once again allows you to perform cross-database joins (#3002). -
select()
queries no longer alias column names unnecessarily (#2968, @DavisVaughan). -
select()
andrename()
are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860). -
summarise()
once again performs partial evaluation before database submission (#3148). -
test_src()
makes it easier to access a single test source.
-
MS SQL
-
Better support for temporary tables (@Hong-Revo)
-
Different translations for filter/mutate contexts for:
NULL
evaluation (is.na()
,is.null()
), logical operators (!
,&
,&&
,|
,||
), and comparison operators (==
,!=
,<
,>
,>=
,<=
)
-
-
MySQL:
copy_to()
(viadb_write_table()
) correctly translates logical variables to integers (#3151). -
odbc: improved
n()
translation in windowed context. -
SQLite: improved
na_if
translation (@cwarden) -
PostgreSQL: translation for
grepl()
added (@zozlak) -
Oracle: changed VARVHAR to VARCHAR2 datatype (@washcycle, #66)
-
full_join()
over non-overlapping columnsby = character()
translated toCROSS JOIN
(#2924). -
case_when()
now translates to SQL "CASE WHEN" (#2894) -
x %in% c(1)
now generates the same SQL asx %in% 1
(#2898). -
New
window_order()
andwindow_frame()
give you finer control over the window functions that dplyr creates (#2874, #2593). -
Added SQL translations for Oracle (@edgararuiz).
-
x %in% c(1)
now generates the same SQL asx %in% 1
(#2898). -
head(tbl, 0)
is now supported (#2863). -
select()
ing zero columns gives a more information error message (#2863). -
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
-
PostgreSQL gains a better translation for
round()
(#60). -
Added custom
db_analyze_table()
for MS SQL, Oracle, Hive and Impala (@edgararuiz) -
Added support for
sd()
for aggregate and window functions (#2887) (@edgararuiz) -
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character())
. -
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULL
with a warning. Nowsql_variant()
checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
-
tbl()
andcopy_to()
now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src.library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2
-
glimpse()
now works with remote tables (#2665) -
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
-
compute()
andcollapse()
now preserve the "ordering" of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281). -
copy_to()
gains anoverwrite
argument which allows you to overwrite an existing table. Use with care! (#2296) -
New
in_schema()
function makes it easy to refer to tables in schema:in_schema("my_schema_name", "my_table_name")
.
query()
is no longer exported. It hasn't been useful for a while so this shouldn't break any code.
-
Partial evaluation occurs immediately when you execute a verb (like
filter()
ormutate()
) rather than happening when the query is executed (#2370). -
mutate.tbl_sql()
will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483). -
SQL joins have been improved:
-
SQL joins always use the
ON ...
syntax, avoidingUSING ...
even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you'd write by hand, eliminating a layer or two of subqueries (#2333) -
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x
, and never fromy
(#2410) -
[API] The
sql_join()
generic now gains avars
argument which lists the variables taken from the left and right sides of the join. If you have a customsql_join()
method, you'll need to update how your code generates joins, following the template insql_join.generic()
. -
full_join()
throws a clear error when you attempt to use it with a MySQL backend (#2045) -
right_join()
andfull_join()
now return results consistent with local data frame sources when there are records in the right table with no match in the left table.right_join()
returns values ofby
columns from the right table.full_join()
returns coalesced values ofby
columns from the left and right tables (#2578, @ianmcook)
-
-
group_by()
can now perform an inline mutate for database backends (#2422). -
The SQL generation set operations (
intersect()
,setdiff()
,union()
, andunion_all()
) have been considerably improved.By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
-
rename()
andgroup_by()
now combine correctly (#1962) -
tbl_lazy()
andlazy_tbl()
have been exported. These help you test generated SQL with out an active database connection. -
ungroup()
correctly resets grouping variables (#2704).
-
New
as.sql()
safely coerces an input to SQL. -
More translators for
as.character()
,as.integer()
andas.double()
(#2775). -
New
ident_q()
makes it possible to specifier identifiers that do not need to be quoted. -
Translation of inline scalars:
-
Logical values are now translated differently depending on the backend. The default is to use "true" and "false" which is the SQL-99 standard, but not widely support. SQLite translates to "0" and "1" (#2052).
-
Inf
and-Inf
are correctly escaped -
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
-
Quoting defaults to
DBI::dbEscapeString()
andDBI::dbQuoteIdentifier()
respectively.
-
-
::
and:::
are handled correctly (#2321) -
x %in% 1
is now correctly translated tox IN (1)
(#511). -
ifelse()
andif_else()
use correct argument names in SQL translation (#2225). -
ident()
now returns an object with classc("ident", "character")
. It no longer contains "sql" to indicate that this is not already escaped. -
is.na()
andis.null()
gain extra parens in SQL translation to preserve correct precedence (#2302). -
[API]
log(x, b)
is now correctly translated to the SQLlog(b, x)
(#2288). SQLite does not support the 2-argument log function so it is translated tolog(x) / log(b)
. -
nth(x, i)
is now correctly translated tonth_value(x, i)
. -
n_distinct()
now accepts multiple variables (#2148). -
[API]
substr()
is now translated to SQL, correcting for the difference in the third argument. In R, it's the position of the last character, in SQL it's the length of the string (#2536). -
win_over()
escapes expression using current database rules.
-
copy_to()
now usesdb_write_table()
instead ofdb_create_table()
anddb_insert_into()
.db_write_table.DBIConnection()
usesdbWriteTable()
. -
New
db_copy_to()
,db_compute()
anddb_collect()
allow backends to override the entire database process behindcopy_to()
,compute()
andcollect()
.db_sql_render()
allow additional control over the SQL rendering process. -
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
-
sql_escape_logical()
allows you to control the translation of literal logicals (#2614). -
src_desc()
has been replaced bydb_desc()
and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src. -
win_over()
,win_rank()
,win_recycled()
,win_cumulative()
,win_current_group()
andwin_current_order()
are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126). -
SQL translation for Microsoft SQL Server (@edgararuiz)
-
SQL translation for Apache Hive (@edgararuiz)
-
SQL translation for Apache Impala (@edgararuiz)
-
collect()
once again defaults to return all rows in the data (#1968). This makes it behave the same asas.data.frame()
andas_tibble()
. -
collect()
only regroups by variables present in the data (#2156) -
collect()
will automatically LIMIT the result to then
, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083). -
common_by()
gets a better error message for unexpected inputs (#2091) -
copy_to()
no longer checks that the table doesn't exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470) -
copy_to()
now succeeds for MySQL if a character column containsNA
(#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe). -
copy_to()
now returns it's output invisibly (since you're often just calling for the side-effect). -
distinct()
reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359). -
Ungrouped
do()
on database backends now collects all data locally first (#2392). -
Call
dbFetch()
instead of the deprecatedfetch()
(#2134). UseDBI::dbExecute()
for non-query SQL commands (#1912) -
explain()
andshow_query()
now invisibly return the first argument, making them easier to use inside a pipeline. -
print.tbl_sql()
displays ordering (#2287) and prints table name, if known. -
print(df, n = Inf)
andhead(df, n = Inf)
now work with remote tables (#2580). -
db_desc()
andsql_translate_env()
get defaults for DBIConnection. -
Formatting now works by overriding the
tbl_sum()
generic instead ofprint()
. This means that the output is more consistent with tibble, and thatformat()
is now supported also for SQL sources (tidyverse#14).
-
[API] The signature of
op_base
has changed toop_base(x, vars, class)
-
[API]
translate_sql()
andpartial_eval()
have been refined:-
translate_sql()
no longer takes a vars argument; instead callpartial_eval()
yourself. -
Because it no longer needs the environment
translate_sql()_
now works with a list of dots, rather than alazy_dots
. -
partial_eval()
now takes a character vector of variable names rather than a tbl. -
This leads to a simplification of the
op
data structure: dots is now a list of expressions rather than alazy_dots
.
-
-
[API]
op_vars()
now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).