-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A means of viewing all differences between two datatrees #9929
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
Hey , I wanna work on it . If not assigned to anyone else , kindly assign it to me . |
Thanks for raising @danielfromearth . (FYI @mhuzaifa5 we don't generally assign issues to individuals - usually we discuss how to solve things on an issue and then anyone is free to open a pull request.) My main question about this feature is whether or not the use case could instead be relatively easily handled using existing API, or with a tweak to the behaviour of existing API. Pseudocode using existing (though private) API: from xarray.core.formatting import diff_dataset_repr
def show_all_differences(dt1: DataTree, dt2: DataTree) -> str:
diff = ''
for ds1, ds2 in zip(
[node.ds for node in dt1.subtree],
[node.ds for node in dt1.subtree],
):
diff += diff_dataset_repr(ds1, ds2)
return diff Tweaking existing API:
We could change that. Either we could show all differences by default, we could show the differences in the structure then show the detailed differences afterwards (I think that would be my vote), or we could even use Exception Groups and put differences for each node in a different Exception Group... It would be useful if others could weigh in on how useful they would find this. |
@TomNicholas I think the showing differences in the strutured followed by detailed differences would be a more structured and friendly way of viewing the differences in the respective tree structures. |
@TomNicholas Can i work on this along with your guidance . |
Following some conversation with @betolink, note that in ncompare, a side-by-side text report is generated that shows a group node (regardless of whether it is in both trees) followed by all differences in that group, then it proceeds to another group node, etc. That works well enough, but for |
Interesting idea! The implementation of xarray's HTML repr is in |
I have some un-formed thoughts. Mostly about how much to reveal in the details and is it possible to tune that to the user's desires. If the datatree is a giant, possibly lazy, zarr store you might not want to show all of the detailed differences. I have wanted to know each of the possible options: are the trees identical? are there missing groups from either (and what is missing)? what variables are different, and even what data in the variables are different. Not positive this is useful information though. |
Good points @flamingbear. That reminds me that we do already have an |
I would add at least one other level of detail before the last (data value checking) one in your list: "what variable characteristics (e.g., scale factors, dimensions, shape, units) are different?" |
I think
These are not explicitly part of xarray's model, so aren't available to be compared in the same way. Instead the dataset will have used these values to decode upon opening. (The
Unless you're using |
Is your feature request related to a problem?
It can be frustrating to figure out why two Datatrees are not returning
True
when runningxarray.DataTree.identical()
orxarray.DataTree.equals()
.Currently, if
xarray
's diff functions detect any difference in the tree structure, they raise at that point, and so do not show all of the differences. Thus, the current functions excel when the user wants to check that two datatrees are equal, but not when the user wants to discover subtle differences — and there are cases in which such subtle differences may be desired.For example, when developing or testing new datatree transformations, I would like to be able to quickly check that the datatree has been modified as expected. Or, when expecting two datasets to be the same but they are not, it would be helpful to be able to quickly traverse the entire tree structure and see the differences.
Describe the solution you'd like
I think it would be useful to have a means of visually representing all the differences between two xarray Datatree objects, either showing the whole trees and highlighting all the differences, or showing only the differences.
I'm imagining a solution that shows a comparison report similar to ncompare, which provides aligned and colorized difference reports for quick assessments of groups, variable names, types, shapes, and attributes (see ncompare's readme gif or the example notebook). In contrast to
ncompare
, the proposed solution would work on thexarray
data model.The solution could be a new function, perhaps in the testing suite, such as
xarray.testing.all_differences(dt1: DataTree, dt2: DataTree)
. This could be based on thediff_datatree_repr
function that is used inassert_isomorphic
:xarray/xarray/testing/assertions.py
Line 81 in 1486bea
xarray/xarray/core/formatting.py
Line 1053 in 1486bea
Describe alternatives you've considered
Showing differences between Datatrees will achieve similar goals to https://github.com/nasa/ncompare. However, a solution in
xarray
would be different thanncompare
, becausencompare
looks directly at the netCDF/HDF files, and makes assumptions that that is the data model you care about.xarray
instead opens netCDF (or a range of other formats) into an in-memory object which has a data model that is almost but not quite the same as netCDF's data model, thenxarray
's assertions compare those. For example, netCDF can have dimensions with no corresponding coordinate values, which aren't a part ofxarray
's data model. In addition, a solution inxarray
would be applicable to data coming from additional formats like Zarr.Additional context
No response
The text was updated successfully, but these errors were encountered: