Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: POSIXct without timezone and naive time handling #878

Merged
merged 17 commits into from
Mar 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,77 @@
(either lexical or physical). This also means that calling `pl$Categorical`
doesn't create a `DataType` anymore. All calls to `pl$Categorical` must be
replaced by `pl$Categorical()` (#860).
- The conversion strategy between the POSIXct type without time zone attribute
and Polars datetime has been changed (#878).
`POSIXct` class vectors without a time zone attribute have UTC time internally
and is displayed based on the system's time zone. Previous versions of `polars`
only considered the internal value and interpreted it as UTC time, so the
time displayed as `POSIXct` and in Polars was different.

```r
# polars 0.14.1
Sys.setenv(TZ = "Europe/Paris")
datetime = as.POSIXct("1900-01-01")
datetime
#> [1] "1900-01-01 PMT"

s = polars::as_polars_series(datetime)
s
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#> 1899-12-31 23:50:39
#> ]

as.vector(s)
#> [1] "1900-01-01 PMT"
```

Now the internal value is updated to match the displayed value.

```r
# polars 0.15.0
Sys.setenv(TZ = "Europe/Paris")
datetime = as.POSIXct("1900-01-01")
datetime
#> [1] "1900-01-01 PMT"

s = polars::as_polars_series(datetime)
s
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#> 1900-01-01 00:00:00
#> ]

as.vector(s)
#> [1] "1900-01-01 PMT"
```

This update may cause errors when converting from Polars to `POSIXct` for non-existent
or ambiguous times. It is recommended to explicitly add a time zone before converting
from Polars to R.

```r
Sys.setenv(TZ = "America/New_York")
ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
ambiguous_time
#> [1] "2020-11-01 01:00:00 EDT"

pls = polars::as_polars_series(ambiguous_time)
pls
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#> 2020-11-01 01:00:00
#> ]

## This will be error!
# pls |> as.vector()

pls$dt$replace_time_zone("UTC") |> as.vector()
#> [1] "2020-11-01 01:00:00 UTC"
```
etiennebacher marked this conversation as resolved.
Show resolved Hide resolved
- Removed argument `eager` in `pl$date_range()` and `pl$struct()` for more
consistency of output. It is possible to replace `eager = TRUE` by calling
`$to_series()` (#882).
Expand Down
42 changes: 42 additions & 0 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,46 @@
#'
#' `$width` returns the number of columns in the DataFrame.
#'
#' @section Conversion to R data types considerations:
#' When converting Polars objects, such as [DataFrames][DataFrame_class]
#' to R objects, for example via the [`as.data.frame()`][as.data.frame.RPolarsDataFrame] generic function,
#' each type in the Polars object is converted to an R type.
#' In some cases, an error may occur because the conversion is not appropriate.
#' In particular, there is a high possibility of an error when converting
#' a [Datetime][DataType_Datetime] type without a time zone.
#' A [Datetime][DataType_Datetime] type without a time zone in Polars is converted
#' to the [POSIXct] type in R, which takes into account the time zone in which
#' the R session is running (which can be checked with the [Sys.timezone()]
#' function). In this case, if ambiguous times are included, a conversion error
etiennebacher marked this conversation as resolved.
Show resolved Hide resolved
#' will occur. In such cases, change the session time zone using
#' [`Sys.setenv(TZ = "UTC")`][base::Sys.setenv] and then perform the conversion, or use the
#' [`$dt$replace_time_zone()`][ExprDT_replace_time_zone] method on the Datetime type column to
#' explicitly specify the time zone before conversion.
#'
#' ```{r}
#' # Due to daylight savings, clocks were turned forward 1 hour on Sunday, March 8, 2020, 2:00:00 am
#' # so this particular date-time doesn't exist
#' non_existent_time = pl$Series("2020-03-08 02:00:00")$str$strptime(pl$Datetime(), "%F %T")
eitsupi marked this conversation as resolved.
Show resolved Hide resolved
#'
#' withr::with_envvar(
#' new = c(TZ = "America/New_York"),
#' {
#' tryCatch(
#' # This causes an error due to the time zone (the `TZ` env var is affected).
#' as.vector(non_existent_time),
#' error = function(e) e
#' )
#' }
#' )
#'
#' withr::with_envvar(
#' new = c(TZ = "America/New_York"),
#' {
#' # This is safe.
#' as.vector(non_existent_time$dt$replace_time_zone("UTC"))
#' }
#' )
#' ```
#' @details Check out the source code in
#' [R/dataframe_frame.R](https://github.com/pola-rs/r-polars/blob/main/R/dataframe__frame.R)
#' to see how public methods are derived from private methods. Check out
Expand Down Expand Up @@ -885,6 +925,7 @@ DataFrame_group_by = function(..., maintain_order = polars_options()$maintain_or
#' * `"string"` converts Int64 values to character.
#'
#' @return An R data.frame
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @keywords DataFrame
#' @examples
#' df = pl$DataFrame(iris[1:3, ])
Expand Down Expand Up @@ -917,6 +958,7 @@ DataFrame_to_data_frame = function(..., int64_conversion = polars_options()$int6
#' structure is not very typical or efficient in R.
#'
#' @return R list of vectors
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @keywords DataFrame
#' @examples
#' pl$DataFrame(iris)$to_list()
Expand Down
53 changes: 43 additions & 10 deletions R/expr__datetime.R
Original file line number Diff line number Diff line change
Expand Up @@ -652,19 +652,52 @@ ExprDT_cast_time_unit = function(tu = c("ns", "us", "ms")) {
#' @aliases (Expr)$dt$convert_time_zone
#' @examples
#' df = pl$DataFrame(
#' date = pl$date_range(
#' start = as.Date("2001-3-1"),
#' end = as.Date("2001-5-1"),
#' interval = "1mo12m34s"
#' )
#' london_timezone = pl$date_range(
#' as.POSIXct("2020-03-01", tz = "UTC"),
#' as.POSIXct("2020-07-01", tz = "UTC"),
#' "1mo",
#' time_zone = "UTC"
#' )$dt$convert_time_zone("Europe/London")
#' )
#'
#' df$select(
#' pl$col("date"),
#' pl$col("date")
#' $dt$replace_time_zone("Europe/Amsterdam")
#' $dt$convert_time_zone("Europe/London")
#' $alias("London_with")
#' "london_timezone",
#' London_to_Amsterdam = pl$col(
#' "london_timezone"
#' )$dt$replace_time_zone("Europe/Amsterdam")
#' )
#'
#' # You can use `ambiguous` to deal with ambiguous datetimes:
#' dates = c(
#' "2018-10-28 01:30",
#' "2018-10-28 02:00",
#' "2018-10-28 02:30",
#' "2018-10-28 02:00"
#' )
#'
#' df = pl$DataFrame(
#' ts = pl$Series(dates)$str$strptime(pl$Datetime("us"), "%F %H:%M"),
#' ambiguous = c("earliest", "earliest", "latest", "latest")
#' )
#'
#' df$with_columns(
#' ts_localized = pl$col("ts")$dt$replace_time_zone(
#' "Europe/Brussels",
#' ambiguous = pl$col("ambiguous")
#' )
#' )
#'
#' # Polars Datetime type without a time zone will be converted to R
#' # with respect to the session time zone. If ambiguous times are present
#' # an error will be raised. It is recommended to add a time zone before
#' # converting to R.
#' s_without_tz = pl$Series(dates)$str$strptime(pl$Datetime("us"), "%F %H:%M")
#' s_without_tz
#'
#' s_with_tz = s_without_tz$dt$replace_time_zone("UTC")
#' s_with_tz
#'
#' as.vector(s_with_tz)
ExprDT_convert_time_zone = function(tz) {
check_tz_to_result(tz) |>
map(\(valid_tz) .pr$Expr$dt_convert_time_zone(self, valid_tz)) |>
Expand Down
1 change: 1 addition & 0 deletions R/lazyframe__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
#'
#' `$width` returns the number of columns in the LazyFrame.
#'
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @keywords LazyFrame
#' @examples
#' # see all exported methods
Expand Down
2 changes: 2 additions & 0 deletions R/s3_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ dimnames.RPolarsLazyFrame = function(x) list(NULL, names(x))
#' @param x An object to convert to a [data.frame].
#' @param ... Additional arguments passed to methods.
#' @inheritParams DataFrame_to_data_frame
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @seealso
#' - [as_polars_df()]
#' - [`<DataFrame>$to_data_frame()`][DataFrame_to_data_frame]
Expand Down Expand Up @@ -409,6 +410,7 @@ sum.RPolarsSeries = function(x, ...) x$sum()
#'
#' @param x A Polars Series
#' @param mode Not used.
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @export
#' @rdname S3_as.vector
as.vector.RPolarsSeries = function(x, mode) x$to_vector()
Expand Down
3 changes: 2 additions & 1 deletion R/series__series.R
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
#'
#' `$struct` stores all struct related methods.
#'
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @keywords Series
#'
#' @examples
Expand Down Expand Up @@ -428,7 +429,7 @@ Series_compare = function(other, op) {
#' @details
#' Fun fact: Nested polars Series list must have same inner type, e.g. List(List(Int32))
#' Thus every leaf(non list type) will be placed on the same depth of the tree, and be the same type.
#'
#' @inheritSection DataFrame_class Conversion to R data types considerations
#' @examples
#'
#' series_vec = pl$Series(letters[1:3])
Expand Down
44 changes: 44 additions & 0 deletions man/DataFrame_class.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 44 additions & 0 deletions man/DataFrame_to_data_frame.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 44 additions & 0 deletions man/DataFrame_to_list.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading