From 0f505c2f38a5c04235dd8c103bdeabf4208cd191 Mon Sep 17 00:00:00 2001 From: jkammerer <57443493+jkammerer@users.noreply.github.com> Date: Mon, 5 Feb 2024 12:44:11 +0100 Subject: [PATCH] Update doc with changes to numeric type propagation (#79) --- website/docs/releases.md | 5 ++++ website/docs/sql/aggregate.md | 18 ++++++++------ website/docs/sql/datatype/numeric.md | 36 +++++++++++++++++----------- 3 files changed, 38 insertions(+), 21 deletions(-) diff --git a/website/docs/releases.md b/website/docs/releases.md index bc92419..92fda54 100644 --- a/website/docs/releases.md +++ b/website/docs/releases.md @@ -28,6 +28,11 @@ In case you are wondering why all our releases start with `0.0`, read [this FAQ ### Upcoming release +* Overhauled the [SQL type propagation rules for the `NUMERIC` data type](/docs/sql/datatype/numeric). This can lead to differences in the precision and number of decimal digits of `NUMERIC` calculations in existing queries and their results. The changes are: + * Multiplication and division of `NUMERIC` with integer and numeric types now follows different rules to determine the result's precision and scale. + * The `AVG` aggregate function now adds 6 instead of 4 decimal digits. + * The statistical aggregate functions (like `STDDEV_SAMP`) now always return `NUMERIC(38,6)` for integer and numeric inputs. If a scale of 6 is too low for your purposes consider casting the input to `double precision`. + * Integer literals are now treated as if they were of type numeric with the minimal required precision when in arithmetic operations with `NUMERIC` values. * IANA released version 2023d of the Time Zone Database. Hyper’s time zone information is updated accordingly. Noteworthy changes: * Ittoqqortoormiit, Greenland changes time zones on 2024-03-31. * Vostok, Antarctica changed time zones on 2023-12-18. diff --git a/website/docs/sql/aggregate.md b/website/docs/sql/aggregate.md index 3d45ce6..607ec38 100644 --- a/website/docs/sql/aggregate.md +++ b/website/docs/sql/aggregate.md @@ -17,7 +17,7 @@ Function|Argument Type(s)|Return Type|Description| ---|---|---|---| `any_value(expression)`|any type|same as argument data type|an arbitrary, implementation-defined value from the set of input values. The result is non-deterministic. `approx_count_distinct(expression, e)`|any, `double precision`|`bigint`|Computes approximation of `count(distinct expression)`, with expected relative error `e`. Supported values of `e` are in range (0.002, 1]. The `e` argument is optional, if omitted, the value 0.023 is used (2.3% expected relative error to real distinct count). -`avg(expression)`|any numerical type|`numeric` with a scale of 4 for any integer-type argument, `double precision` for a floating-point argument, otherwise the same as the argument data type|the average (arithmetic mean) of all input values +`avg(expression)`|any numerical type|`numeric` with a scale of 6 for any integer-type argument and numeric arguments with a scale less than 6, `double precision` for a floating-point argument, otherwise the same as the argument data type|the average (arithmetic mean) of all input values `bit_and(expression)`|integral types|same as argument data type|the bitwise AND of all non-null input values, or null if none `bit_or(expression)`|integral types|same as argument data type|the bitwise OR of all non-null input values, or null if none `bool_and(expression)`|`bool`|`bool`|true if all input values are true, otherwise false @@ -48,12 +48,16 @@ Function|Argument Type|Return Type|Description `corr(Y, X)`|`double precision`|`double precision`|correlation coefficient `covar_pop(Y, X)`|`double precision`|`double precision`|population covariance `covar_samp(Y, X)`|`double precision`|`double precision`|sample covariance -`stddev(expression)`|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|alias for `stddev_samp` -`stddev_pop(expression)`|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|population standard deviation of the input values -`stddev_samp(expression)`|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|sample standard deviation of the input values -`variance`(``)|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|alias for `var_samp` -`var_pop`(``)|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|population variance of the input values (square of the population standard deviation) -`var_samp`(``)|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|sample variance of the input values (square of the sample standard deviation) +`stddev(expression)`|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|alias for `stddev_samp` +`stddev_pop(expression)`|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|population standard deviation of the input values +`stddev_samp(expression)`|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|sample standard deviation of the input values +`variance`(``)|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|alias for `var_samp` +`var_pop`(``)|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|population variance of the input values (square of the population standard deviation) +`var_samp`(``)|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|sample variance of the input values (square of the sample standard deviation) + +:::tip +Casting the input of an aggregate function can be used to force a different output type. E.g., `VAR_POP(CAST(A AS DOUBLE PRECISION))` can be used to get a `double precision` result independent of the type of the column `A`. +::: ## Ordered set aggregates {#ordered-set} diff --git a/website/docs/sql/datatype/numeric.md b/website/docs/sql/datatype/numeric.md index eb77157..71016c8 100644 --- a/website/docs/sql/datatype/numeric.md +++ b/website/docs/sql/datatype/numeric.md @@ -45,32 +45,40 @@ can be configured. To declare a column of type `numeric` use the syntax Alternatively, `NUMERIC(precision)` selects a scale of 0. Specifying `NUMERIC` selects the maximum precision of 38 and a scale of 0. -The type propagation rules for arithmetic operations with numerics never -decrease the scale and set the precision such that the biggest possible -result will fit into the result type. This may lead to undesired growth -of both scale and precision, especially when chaining multiple -arithmetic operations. Large scale might be undesirable because it takes -away from the digits in front of the decimal point, potentially leading -to overflow errors. Large precision might also be undesirable because -`numeric` values with precision over 18 internally use 128-bit which may -slow down processing. To avoid this, explicit casts to the desired scale -and precision can be added throughout a query. +The type propagation rules for arithmetic operations with numerics often +lead to larger precision and scale in the result type. This may lead to +undesired growth of both scale and precision, especially when chaining +multiple arithmetic operations. Large scale might be undesirable because +it takes away from the digits in front of the decimal point, potentially +leading to overflow errors. Large precision might also be undesirable +because `numeric` values with precision over 18 internally use 128-bit +which may slow down processing. To avoid this, explicit casts to the +desired scale and precision can be added throughout a query. The resulting +precision is always capped at the maximum of 38. Arithmetic operations between a `NUMERIC(p1,s1)` and a `NUMERIC(p2,s2)` have the following results: |Operator|Result Type| |---|---| -| + or - |NUMERIC(precision, scale) with:
scale = max(s1,s2)
precision = min(38, max((p1-s1),(p2-s2)) + 1 + scale)| -|*|NUMERIC(precision, scale) with:
scale = max(max(s1,s2), min(s1+s2, 38 - (p1-s1) - (p2-s2)))
precision = min(p1+p2, 38)| -|/|NUMERIC(precision, scale) with:
scale = max(s1,s2)
precision = min(38, ((p1-s1) + s2 + scale)) | -|%|NUMERIC(precision, scale) with:
scale = max(s1,s2)
precision = min((p1-s1), (p2-s2)) + scale | +| + or - |NUMERIC(precision, scale) with:
scale = max(s1,s2)
precision = max((p1-s1),(p2-s2)) + 1 + scale| +|*|NUMERIC(precision, scale) with:
scale* = s1 + s2
precision = p1+p2| +|/|NUMERIC(precision, scale) with:
scale* = max(6, s1 + p2 + 1)
precision = p1 - s1 + s2 + max(6, s1 + p2 + 1)| +|%|NUMERIC(precision, scale) with:
scale = max(s1,s2)
precision = min((p1-s1), (p2-s2)) + scale| + +*) An additional rule applies for multiplication and division: If the resulting precision +from the above rules exceeds 38, the scale is reduced by the exceeding amount. +During this step, the scale is never reduced below 6. When used in arithmetic operation together with `NUMERIC`, `DOUBLE PRECISION` operands will always give `DOUBLE PRECISION` results, `SMALLINT` behaves the same as `NUMERIC(5,0)`, `INTEGER` as `NUMERIC(10,0)` and `BIGINT` as `NUMERIC(19,0)`. +When used in arithmetic operation together with number literals, the literal will be treated +as the smallest fitting `NUMERIC` type. E.g., `100` will be treated as +`NUMERIC(3,0)`, `10.0` as `NUMERIC(3,1)`. This rule only applies to literals, not to expressions containing literals. E.g., `(1+1)` will be treated as `NUMERIC(10,0)` not `NUMERIC(1,0)`. + :::note In the SQL standard, as well as in PostgreSQL and many other database systems, the types `decimal` and `numeric` are equivalent and both