You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some background: I've been trying to work with a data file that has values labelled with units. For example, one column contains "5.2 t" and "72 kg". Other values are more awkward and don't have a well known value, so might contain "72.5–89 cm" and "7–9 m".
I've had a few issues; in the simple case of non-ranged values, any missing data gets assigned a "dimensionless" unit and then fails to convert to the correct type. For example:
fails on the second row with DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'pound' ([mass]).
As a minor issue I've noticed that the resulting unit comes from the first row rather than the requested unit. In the above code, removing the second row results in a dtype of pint[pound] rather than pint[kilogram].
When trying to deal with the ranged values, I've been using a regex to split these values apart and then get then back into lower&upper columns. I can get them out with a simple regex, but then struggle to reproduce the nice unit parsing behavior given by read_csv. For example:
pd.Series(["100kg", "1 t"], dtype="pint[t]")
doesn't fail, but the values remain as strings hence subsequent numerical operations fail. A kind stackoverflow user suggested applying the scalar constructor first:
which works here, but feels sub-optimal. I should note that this also fails for missing values similarly to the read_csv example above.
Not sure if this issue is the right place to report all this feedback—I personally like having everything in one place. If you think these are worth fixing I could have a go at working on pull-requests to fix these (I'm counting at least three separate issues here).
The text was updated successfully, but these errors were encountered:
partial fix for hgrecco#267
previously the code would fail with:
DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'someunit' ([mass]).
Some background: I've been trying to work with a data file that has values labelled with units. For example, one column contains "5.2 t" and "72 kg". Other values are more awkward and don't have a well known value, so might contain "72.5–89 cm" and "7–9 m".
I've had a few issues; in the simple case of non-ranged values, any missing data gets assigned a "dimensionless" unit and then fails to convert to the correct type. For example:
fails on the second row with
DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'pound' ([mass])
.As a minor issue I've noticed that the resulting unit comes from the first row rather than the requested unit. In the above code, removing the second row results in a dtype of
pint[pound]
rather thanpint[kilogram]
.When trying to deal with the ranged values, I've been using a regex to split these values apart and then get then back into lower&upper columns. I can get them out with a simple regex, but then struggle to reproduce the nice unit parsing behavior given by
read_csv
. For example:doesn't fail, but the values remain as strings hence subsequent numerical operations fail. A kind stackoverflow user suggested applying the scalar constructor first:
which works here, but feels sub-optimal. I should note that this also fails for missing values similarly to the
read_csv
example above.Not sure if this issue is the right place to report all this feedback—I personally like having everything in one place. If you think these are worth fixing I could have a go at working on pull-requests to fix these (I'm counting at least three separate issues here).
The text was updated successfully, but these errors were encountered: