Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle two warnings that pollute the output of sssom-py CLI #561

Merged
merged 9 commits into from
Nov 9, 2024
2 changes: 1 addition & 1 deletion src/sssom/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ def from_sssom_dataframe(
# This is to address: A value is trying to be set on a copy of a slice from a DataFrame
if CONFIDENCE in df.columns:
df2 = df.copy()
df2[CONFIDENCE].replace(r"^\s*$", np.nan, regex=True, inplace=True)
df2[CONFIDENCE] = df2[CONFIDENCE].replace(r"^\s*$", np.nan, regex=True)
matentzn marked this conversation as resolved.
Show resolved Hide resolved
df = df2

mapping_set = _get_mapping_set_from_df(df=df, meta=meta)
Expand Down
8 changes: 8 additions & 0 deletions src/sssom/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,15 @@ def from_mapping_set_document(cls, doc: MappingSetDocument) -> "MappingSetDataFr
# For pandas < 2.0.0, call 'infer_objects()' without any parameters
df = df.infer_objects()
# remove columns where all values are blank.

# Context https://github.com/pandas-dev/pandas/issues/57734
try:
pd.set_option("future.no_silent_downcasting", True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather put that somewhere when the library is initialized, so that we can be sure the option is enabled everywhere at all times and not only when the code path goes through this particular function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to sssom/init.py, not sure this is the right thing to do (can this effect other libraries that might not want that option to be set for some reason?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally be fine having it in sssom/init.py. This is forcing the behaviour to apply globally, which I believe is a good thing.

That being said, it is true that this could have cascading effects for people who are using sssom as a library rather than as a command-line tool (people who import sssom in their own Python projects, rather than calling sssom-py from the command line), if they also happen to be using pandas in the rest of their code (no problem if their only use of pandas is through sssom).

If you’re not comfortable forcing the “no downcasting” behaviour to all potential users of the sssom Python module¹, I can think of two options:

(A) Setting the option somewhere at the beginning of the main function in sssom/cli.py. This means the option will be enabled globally in all of sssom when it is used as a CLI tool, but not when it is used as an imported module in another Python project.

It will then be up to the people who imports sssom to decide whether they themselves want to enable the “no downcasting” behaviour, by setting (or not) the option.

(B) Enabling the option within sssom, but only where we need it and only for the duration we need it. That is, we enable it before calling replace here, and disable it afterwards.

A context manager would come in handy here:

class pandas_no_silent_downcasting():

    def __init__(self):
        try:
            self._already_set = pd.get_option('future.no_silent_downcasting')
            self._supported = True
        except pd.errors.OptionError:
            self._supported = False

    def __enter__(self):
        if self._supported and not self._already_set:
            # Entering the context, set the option
            pd.set_option('future.no_silent_downcasting', True)

    def __exit__(self, type, value, tb):
        if self._supported and not self._already_set:
            # Leaving the context, unset the option
            pd.set_option('future.no_silent_downcasting', False)

It could then be used whenever we have to perform an operation and need to locally make sure the option (if it is available and if it is not already set) is enabled:

with pandas_no_silent_downcasting():
    df.replace(...)

This is equivalent to:

pd.set_option('future.no_silent_downcasting', True)
df.replace(...)
pd.set_option('future.no_silent_downcasting', False)

except that it is more “pythonic” and it deals gracefully with the possible absence of that option or the possibility that the option has already been enabled elsewhere.


¹ But it should be noted that this behaviour will be forced upon them sooner or later anyway, since it will be the only behaviour available in Pandas 3.0… Better for those people to start dealing with it now.

except KeyError:
# Option does not exist in this version of pandas
pass
df.replace("", np.nan, inplace=True)
df.infer_objects(copy=False)
df.dropna(axis=1, how="all", inplace=True) # remove columns with all row = 'None'-s.

slots = _get_sssom_schema_object().dict["slots"]
Expand Down
Loading