Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented MultiIndex.equal_levels #1789

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 23, 2020

This PR proposes MultiIndex.equal_levels.

>>> kmidx1 = ks.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> kmidx2 = ks.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")])
>>> kmidx1.equal_levels(kmidx2)
True

databricks/koalas/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_indexes.py Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Nov 11, 2020

Codecov Report

Merging #1789 (d679dc2) into master (138c7b8) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1789      +/-   ##
==========================================
- Coverage   94.64%   94.61%   -0.04%     
==========================================
  Files          49       49              
  Lines       10818    10724      -94     
==========================================
- Hits        10239    10146      -93     
+ Misses        579      578       -1     
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100.00% <ø> (ø)
databricks/koalas/indexes.py 96.98% <100.00%> (+0.02%) ⬆️
databricks/koalas/__init__.py 85.93% <0.00%> (-4.69%) ⬇️
databricks/koalas/testing/utils.py 79.12% <0.00%> (-1.40%) ⬇️
...ricks/koalas/tests/plot/test_series_plot_plotly.py 95.83% <0.00%> (-0.47%) ⬇️
databricks/koalas/generic.py 92.79% <0.00%> (-0.13%) ⬇️
databricks/koalas/plot/matplotlib.py 93.23% <0.00%> (-0.12%) ⬇️
databricks/koalas/mlflow.py 95.12% <0.00%> (-0.12%) ⬇️
databricks/koalas/indexing.py 92.53% <0.00%> (-0.10%) ⬇️
databricks/koalas/plot/core.py 92.68% <0.00%> (-0.09%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 138c7b8...d679dc2. Read the comment docs.

@itholic
Copy link
Contributor Author

itholic commented Dec 3, 2020

Could someone double check this just once more? Seems fine to me.

@ueshin
Copy link
Collaborator

ueshin commented Dec 3, 2020

I'll take a look later.
@itholic Could you push an empty or a merge commit to rerun tests as it's been a while since the last build.

@ueshin ueshin self-requested a review December 3, 2020 01:42
@itholic
Copy link
Contributor Author

itholic commented Dec 3, 2020

Sure, thanks :)

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see some issues:

>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z")])
>>> pmidx1.equal_levels(pmidx2)
True

>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False

or

>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> pmidx1.equal_levels(pmidx2)
True

>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False

return False
self_frame = self.sort_values().to_frame()
other_frame = other.sort_values().to_frame()
with option_context("compute.ops_on_diff_frames", True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might avoid force enabling compute.ops_on_diff_frames. let's see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the review! Let me resolve the comments

@xinrong-meng
Copy link
Contributor

HyukjinKwon pushed a commit to apache/spark that referenced this pull request Oct 1, 2021
### What changes were proposed in this pull request?
This PR proposes implementing `MultiIndex.equal_levels`.

```python
>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")])
>>> psmidx1.equal_levels(psmidx2)
True

>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> psmidx1.equal_levels(psmidx2)
True
```

This was originally proposed in databricks/koalas#1789, and all reviews in origin PR has been resolved.

### Why are the changes needed?

We should support the pandas API as much as possible for pandas-on-Spark module.

### Does this PR introduce _any_ user-facing change?

Yes, the `MultiIndex.equal_levels` API is available.

### How was this patch tested?

Unittests

Closes #34113 from itholic/SPARK-36435.

Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants