-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a facet for array uncertainty propagation #1797
base: master
Are you sure you want to change the base?
Added a facet for array uncertainty propagation #1797
Conversation
Fascinating new package! Are you intending to obsolete the What I like about your approach is (1) nice modularity with facets, and (2) it uses ExtensionArray rather than wrappers. I think that since Pint and Pint-Pandas has gone all-in on ExtensionArrays, that's a great way to go. But I wonder how you deal with parsing uncertainties as may inevitably be required when Quantities must be parsed, not only constructed. And is your package ready to do all the uncertainties tasks that would be required by Pint-Pandas? |
Interesting PR. I think that @MichaelTiemannOSC questions are important, though. |
By this I am guessing you mean converting a string
I don't think I got the ExtensionArray functionality working, primarily because I was having trouble making it work well with Pint normally - and then I saw the update with facets, and realized it was a lot more straightforward than I had thought! Again, having
No, for a few reasons. First off, all the features above that you might want that aren't there yet. Second, it feels presumptuous without feedback from other users. Third, my package requires JAX for autodiff, and pip based binary installs of XLA only work on computers that support AVX instruction sets. Now AVX has been out for around a decade so I'm not sure if this is a big deal for the majority of users, but I have had to deal with older HPC systems where this was an issue. If the PR is approved and users feel that they have everything they need, perhaps we can revisit this question then. Currently I view this PR as adding a bare minimum support for array based uncertainty propagation with units that we can build on going forwards. |
I see you've used a single class for both scalar and arrays. I'd suggest having different classes for each will make it easier to write an ExtensionArray as pandas seems to be written with that in mind. It is possible to make it work as pint-pandas has shown, and it is improving, but there are still some pint-pandas features that don't currently work but would if pint split scalars and arrays. It would have saved many hours debugging too. There are several bits of metadata you'll need to store with the ExtensionArray:
This metadata is in addition to dealing with two bits of data (value and error) per scalar. I haven't yet seen an ExtensionArray that does this yet. I'd suggest getting an ExtensionArray without units working as a first step, before dealing with storing and propgating units. I don't believe there's an ExtensionArray for uncertainties yet, so this would be valueable! |
@MichaelTiemannOSC the latest commits should add parsing support for scalar uncertainties. I looked through your PR #1615 to get an idea how to do it, but I thought this way seemed a bit easier - I simply added a few lines to I haven't tried to deal with array uncertainties, but I notice that pint can't parse array quantities in any case, so I didn't think that was vital to deal with right now. By this I mean from pint import unit_registry
import numpy as np
x = np.asarray([2,3]) * unit_registry('s')
string_rep = str(x)
# This is [2 3] s
y = unit_registry.Quantity(string_rep)
# This is 6 s I guess that's because the string gets tokenized to (2 * 3) * s? |
I'm taking a look at your changes now--seeing how hard/easy it is for me to swap out what I did and swap in what you have done. |
Let me know what you think. You probably need to install the version on GitHub rather than the released version on pypi; I've been adding better typing support. We're using it in production internally and I've been pretty pleased with how smooth it's been, but I'd appreciate any feedback! |
Yes, I am using a GitHub fork of your project (in case I need to suggest changes). So far it's looking very nice, but I'm still getting wires connected between how I used |
Awesome that this conversation is moving forward! |
Here's my attempt to take stock of the various problems and solutions. Python handles a range of numeric types (integer, float, complex), but math.isnan only works for int and float. Numpy supports single- and multi-dimensional arrays. np.isnan operates on sclars and arrays. While one can construct numpy arrays holding arbitrary (object) types, numpy functions such as np.isnan raise errors if it cannot safely convert the data type of the scalar or array to a numeric type. (np.isnat similarly works only on datetime64 and timedelta64.) The Numpy functions (such as isnan) are not overriden per se, but are implemented via A unified Numpy/uncertainty interface would be nice! Pint facets combine The
Which means that like it or not, the Quantity facet does and will implement In a sense the AutoUncertainties changes of this PR are what one would expect if we extended the Measurement facet to handle arrays itself, rather than punting those arrays back to the Quantity facet. And as we see, there's quite a bit of code and structure that has to be duplicated/reimplemented from the Quantity facet to give/(update) Measurements (to) the same functionality. This gives us the potential niceness of being able to broadcast scalar Quantities to arrays of Measurements, Measurements to arrays of Quantities, and to do array operations on any combination of Quantities and Measurements (provided the shapes are compatible). Nice! But is it really so nice? If Quantities can, by themselves, do all the work, what's the point of duplicating so much code to create a new thing that can do only half as much? One thing I've noticed is that AutoUncertainties allows uncertain values to be downcasted to nominal values whereas traditional uncertainties require users to explicitly ask for nominal values. I've seen the same sort of thing where some packages implementing complex numbers will just drop the imaginary component to automatically convert the complex number to a float, whereas other packages require the user to explicitly select the real component if that's what they want. The big changes I think I have seen in the interfaces between Pint and Numpy/Pandas is that object arrays (Numpy) and ExtensionArrays (Pandas) are really solid when it comes to the kind of polymorphic behavior we seek to implement. But the more classes we have running around doing roughly the same things (such as Quantities and Measurements), the more we need to flesh out the combinatorial matrix. I can construct a PintArray from a sequence of Quantities, but not Measurements:
Now, AutoUncertainties has a mostly-implemented Pandas EA for itself, but then we have to explain how PintArrays (which are grounded in Quantities) and UncertaintyArrays (grounded in Uncertainties) interoperate. Finally, I made a Pull Request (hgrecco/pint-pandas#198) to start benchmarking PintArrays. The goal of that benchmarking is to both demonstrate what we consider to be the best, fastest implementation we can cleanly get using PintPandas, document that as a pattern, and ensure that we keep it running fast. New implementation ideas can compare themselves to that benchmark and see whether the new ideas really do help make things better, or instead just different (with possible performance penalties). I acknowledge that the np.isnan vs. unp.isnan thing is ugly. But everything else about using uncertainties in Quantities looks better to me than creating a parallel university with Measurement (even if renamed Uncertainty). |
CodSpeed Performance ReportMerging #1797 will not alter performanceComparing Summary
|
@MichaelTiemannOSC Thanks for the detailed explanation. I do agree with your view! |
I am wondering a better approach is for pint to define an interface for uncertainty libraries, as both autouncertainties and uncertainties use almost the same variables and names. This would make it easier for potential users too as they'd be abe to use the same functiosn and methods from pint. |
@andrewgsavage did you mean to say "as both autouncertainties and Measurements use almost the same variables and names."? I cannot quite parse your comment above. I did attend several Pandas community member meetings in hopes of advancing your ExtensionArray work. I encourage you to put the topic on the discussion agenda and try to attend one of those meetings. |
meant to write 'as both autouncertainties and uncertainties ' ah I'd forgotten about that. was planning to make an EA for uncertainties as a seperate module without it being in pandas for the time being. |
Update branch with upstream changes to Pint (0.24.4)
I found pint quite useful in my research, but the one feature I've always wanted was units + uncertainty propagation + support for arrays. I made a small package that uses automatic differentiation to propagate uncertainties for all numpy functions, and I figured adding a facet for it in pint might be of interest! Let me know what you think.