Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coerce_float option to pl.from_arrow #3761

Closed
munro opened this issue Jun 21, 2022 · 4 comments
Closed

Add coerce_float option to pl.from_arrow #3761

munro opened this issue Jun 21, 2022 · 4 comments

Comments

@munro
Copy link

munro commented Jun 21, 2022

Add coerce_float option to pl.from_arrow

thread '<unnamed>' panicked at 'Arrow datatype Decimal(38, 9) not supported by Polars', /Users/runner/work/polars/polars/polars/polars-core/src/datatypes.rs:1033:19

Currently I can't read in Arrow data with decimals. If it's not easy enough to add support, another, perhaps easier/quicker option is adding a coerce_float flag that will convert them to floating points. Similar to what Pandas does [1]. It's kinda hacky, but I need something now 😭 and coercing is a totally acceptable for my use case.

[1] https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html#pandas.read_sql

Thanks y'all! This is an amazing project!

@munro munro added the feature label Jun 21, 2022
@ghuls
Copy link
Collaborator

ghuls commented Jul 7, 2022

You can cast to float64 on the pyarrow table with pyarrow, if you need it. Before converting it to a polars dataframe.

It might get supported natively in the future.
jorgecarleitao/arrow2#896

@ghuls
Copy link
Collaborator

ghuls commented Apr 12, 2023

Polars has some support for Decimal now, but for now it still converts pyarrow decimals to float64 by default, but converting to decimal can be enabled with POLARS_ACTIVATE_DECIMAL=1:

In [40]:     tbl = pa.table(
    ...:         {
    ...:             "a": pa.array([1, 2, 3, 4, 5], pa.decimal128(38, 2)),
    ...:             "b": pa.array([1, 2, 3, 4, 5], pa.int64()),
    ...:         }
    ...:     )

In [41]: tbl.schema
Out[41]: 
a: decimal128(38, 2)
b: int64

In [42]: pl.from_arrow(tbl)
Out[42]: 
shape: (5, 2)
┌─────┬─────┐
│ ab   │
│ ------ │
│ f64i64 │
╞═════╪═════╡
│ 1.01   │
│ 2.02   │
│ 3.03   │
│ 4.04   │
│ 5.05   │
└─────┴─────┘

In [8]: import os

In [9]: os.environ["POLARS_ACTIVATE_DECIMAL"] = "1"

In [23]: pl.from_arrow(tbl)
Out[23]: 
shape: (5, 2)
┌────────────────┬─────┐
│ ab   │
│ ------ │
│ decimal[.38,2] ┆ i64 │
╞════════════════╪═════╡
│ 11   │
│ 22   │
│ 33   │
│ 44   │
│ 55   │
└────────────────┴─────┘

@ghuls
Copy link
Collaborator

ghuls commented Apr 13, 2023

Or use the config option, instead of the environment variable:

pl.Config.activate_decimals(True)

@ghuls
Copy link
Collaborator

ghuls commented Apr 13, 2023

Closing as it seems resolved now.

@ghuls ghuls closed this as completed Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants