-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented MultiIndex.equal_levels #1789
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1789 +/- ##
==========================================
- Coverage 94.64% 94.61% -0.04%
==========================================
Files 49 49
Lines 10818 10724 -94
==========================================
- Hits 10239 10146 -93
+ Misses 579 578 -1
Continue to review full report at Codecov.
|
Could someone double check this just once more? Seems fine to me. |
I'll take a look later. |
Sure, thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see some issues:
>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z")])
>>> pmidx1.equal_levels(pmidx2)
True
>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False
or
>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> pmidx1.equal_levels(pmidx2)
True
>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False
return False | ||
self_frame = self.sort_values().to_frame() | ||
other_frame = other.sort_values().to_frame() | ||
with option_context("compute.ops_on_diff_frames", True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might avoid force enabling compute.ops_on_diff_frames
. let's see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the review! Let me resolve the comments
### What changes were proposed in this pull request? This PR proposes implementing `MultiIndex.equal_levels`. ```python >>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")]) >>> psmidx2 = ps.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")]) >>> psmidx1.equal_levels(psmidx2) True >>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")]) >>> psmidx2 = ps.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")]) >>> psmidx1.equal_levels(psmidx2) True ``` This was originally proposed in databricks/koalas#1789, and all reviews in origin PR has been resolved. ### Why are the changes needed? We should support the pandas API as much as possible for pandas-on-Spark module. ### Does this PR introduce _any_ user-facing change? Yes, the `MultiIndex.equal_levels` API is available. ### How was this patch tested? Unittests Closes #34113 from itholic/SPARK-36435. Lead-authored-by: itholic <haejoon.lee@databricks.com> Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This PR proposes
MultiIndex.equal_levels
.