Skip to content

Commit

Permalink
Update doc with changes to numeric type propagation
Browse files Browse the repository at this point in the history
  • Loading branch information
jkammerer committed Jan 16, 2024
1 parent ad6e07d commit 259dcb4
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 22 deletions.
6 changes: 6 additions & 0 deletions website/docs/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ In case you are wondering why all our releases start with `0.0`, read [this FAQ

### Upcoming release

* Overhauled the [SQL type propagation rules for the `NUMERIC` data type](/docs/sql/datatype/numeric). This can lead to differences in the precision and number of decimal digits of `NUMERIC` calculations in existing queries and their results. The chagnes are:
* Multiplication and division of `NUMERIC` with integer and numeric types now follows different rules to determine the result's precision and scale.
* The `AVG` aggregate function now adds 6 instead of 4 decimal digits.
* The statistical aggregate functions (like `STDDEV_SAMP`) now always return `NUMERIC(38,6)` for integer and numeric inputs. If a scale of 6 is to low for your purposes consider casting the input to `double precision`.
* Integer literals are now treated as if they were of type numeric with the minimal required precision when in arithmetic operations with `NUMERIC` values.

### 0.0.18441 [January 10, 2024]

* Various performance and stability improvements
Expand Down
18 changes: 11 additions & 7 deletions website/docs/sql/aggregate.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Function|Argument Type(s)|Return Type|Description|
---|---|---|---|
`any_value(expression)`|any type|same as argument data type|an arbitrary, implementation-defined value from the set of input values. The result is non-deterministic.
`approx_count_distinct(expression, e)`|any, `double precision`|`bigint`|Computes approximation of `count(distinct expression)`, with expected relative error `e`. Supported values of `e` are in range (0.002, 1]. The `e` argument is optional, if omitted, the value 0.023 is used (2.3% expected relative error to real distinct count).
`avg(expression)`|any numerical type|`numeric` with a scale of 4 for any integer-type argument, `double precision` for a floating-point argument, otherwise the same as the argument data type|the average (arithmetic mean) of all input values
`avg(expression)`|any numerical type|`numeric` with a scale of 6 for any integer-type argument and numeric arguments with a scale less than 6, `double precision` for a floating-point argument, otherwise the same as the argument data type|the average (arithmetic mean) of all input values
`bit_and(expression)`|integral types|same as argument data type|the bitwise AND of all non-null input values, or null if none
`bit_or(expression)`|integral types|same as argument data type|the bitwise OR of all non-null input values, or null if none
`bool_and(expression)`|`bool`|`bool`|true if all input values are true, otherwise false
Expand Down Expand Up @@ -48,12 +48,16 @@ Function|Argument Type|Return Type|Description
`corr(Y, X)`|`double precision`|`double precision`|correlation coefficient
`covar_pop(Y, X)`|`double precision`|`double precision`|population covariance
`covar_samp(Y, X)`|`double precision`|`double precision`|sample covariance
`stddev(expression)`|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|alias for `stddev_samp`
`stddev_pop(expression)`|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|population standard deviation of the input values
`stddev_samp(expression)`|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|sample standard deviation of the input values
`variance`(`<expression>`)|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|alias for `var_samp`
`var_pop`(`<expression>`)|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|population variance of the input values (square of the population standard deviation)
`var_samp`(`<expression>`)|any numerical type|`NUMERIC(38,4)` for any integer-type argument, `double precision` for a floating-point argument, `NUMERIC(38,s)` for any `NUMERIC(p,s)`|sample variance of the input values (square of the sample standard deviation)
`stddev(expression)`|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|alias for `stddev_samp`
`stddev_pop(expression)`|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|population standard deviation of the input values
`stddev_samp(expression)`|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|sample standard deviation of the input values
`variance`(`<expression>`)|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|alias for `var_samp`
`var_pop`(`<expression>`)|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|population variance of the input values (square of the population standard deviation)
`var_samp`(`<expression>`)|any numerical type|`NUMERIC(38,6)` for any integer-type or numeric argument, `double precision` for a floating-point argument|sample variance of the input values (square of the sample standard deviation)

:::tip
Casting the input of an aggregate function can be used to force a different output type. E.g., `VAR_POP(CAST(A AS DOUBLE PRECISION))` can be used to get a `double precision` result independent of the type of the column `A`.
:::

## Ordered set aggregates {#ordered-set}

Expand Down
38 changes: 23 additions & 15 deletions website/docs/sql/datatype/numeric.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,32 +45,40 @@ can be configured. To declare a column of type `numeric` use the syntax
Alternatively, `NUMERIC(precision)` selects a scale of 0.
Specifying `NUMERIC` selects the maximum precision of 38 and a scale of 0.

The type propagation rules for arithmetic operations with numerics never
decrease the scale and set the precision such that the biggest possible
result will fit into the result type. This may lead to undesired growth
of both scale and precision, especially when chaining multiple
arithmetic operations. Large scale might be undesirable because it takes
away from the digits in front of the decimal point, potentially leading
to overflow errors. Large precision might also be undesirable because
`numeric` values with precision over 18 internally use 128-bit which may
slow down processing. To avoid this, explicit casts to the desired scale
and precision can be added throughout a query.
The type propagation rules for arithmetic operations with numerics often
lead to larger precision and scale in the result type. This may lead to
undesired growth of both scale and precision, especially when chaining
multiple arithmetic operations. Large scale might be undesirable because
it takes away from the digits in front of the decimal point, potentially
leading to overflow errors. Large precision might also be undesirable
because `numeric` values with precision over 18 internally use 128-bit
which may slow down processing. To avoid this, explicit casts to the
desired scale and precision can be added throughout a query. The resulting
precision is always capped at the maximum of 38.

Arithmetic operations between a `NUMERIC(p1,s1)` and a `NUMERIC(p2,s2)`
have the following results:

|Operator|Result Type|
|Operator| Result Type |
|---|---|
| + or - |NUMERIC(precision, scale) with:<br/>scale = max(s1,s2)<br/>precision = min(38, max((p1-s1),(p2-s2)) + 1 + scale)|
|*|NUMERIC(precision, scale) with:<br/> scale = max(max(s1,s2), min(s1+s2, 38 - (p1-s1) - (p2-s2)))<br/> precision = min(p1+p2, 38)|
|/|NUMERIC(precision, scale) with:<br/> scale = max(s1,s2)<br/>precision = min(38, ((p1-s1) + s2 + scale)) |
|%|NUMERIC(precision, scale) with:<br/> scale = max(s1,s2)<br/>precision = min((p1-s1), (p2-s2)) + scale |
| + or - |NUMERIC(precision, scale) with:<br/>scale = max(s1,s2)<br/>precision = max((p1-s1),(p2-s2)) + 1 + scale|
|*|NUMERIC(precision, scale) with:<br/> scale* = s1 + s2<br/> precision = p1+p2+1|
|/|NUMERIC(precision, scale) with:<br/> scale* = max(6, s1 + p2 + 1)<br/>precision = p1 - s1 + s2 + max(6, s1 + p2 + 1)|
|%|NUMERIC(precision, scale) with:<br/> scale = max(s1,s2)<br/>precision = min((p1-s1), (p2-s2)) + scale|

*) An additional rule applies for multiplication and division: If the resulting precision
from the above rules exceeds 38, the scale is reduced by the exceeding amount.
During this step, the scale is never reduced below 6.

When used in arithmetic operation together with `NUMERIC`,
`DOUBLE PRECISION` operands will always give `DOUBLE PRECISION` results,
`SMALLINT` behaves the same as `NUMERIC(5,0)`, `INTEGER` as
`NUMERIC(10,0)` and `BIGINT` as `NUMERIC(19,0)`.

When used in arithmetic operation together with number literals, the literal will be treated
as the smallest fitting `NUMERIC` type. E.g., `100` will be treated as
`NUMERIC(3,0)`, `10.0` as `NUMERIC(3,1)`. This rule only applies to literals, not to expressions containing literals. E.g., `(1+1)` will be treated as `NUMERIC(10,0)` not `NUMERIC(1,0)`.

:::note
In the SQL standard, as well as in PostgreSQL and many other database
systems, the types `decimal` and `numeric` are equivalent and both
Expand Down

0 comments on commit 259dcb4

Please sign in to comment.