-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NaN handling in Record.adc, and other fixes #481
Conversation
When converting physical to digital sample arrays, we must replace NaN values (which represent a missing sample) with the appropriate invalid-sample sentinel value. This is done correctly for normal uses of the package, but if the application directly invoked adc(inplace=True), NaNs would not have been handled (and were instead set to an implementation-defined value.) (Note that we don't use inplace=True internally because this overwrites the original floating-point array. Applications may want to use inplace=True to save memory, but this requires knowing that the original array is no longer needed.)
When converting physical to digital sample arrays, all the information we need is contained in self.e_p_signal, self.adc_gain, self.baseline, and self.fmt. We don't need to rely on self.n_sig here, and we don't use n_sig in the expanded=False case, so for consistency, don't use n_sig in the expanded=True case either.
When converting physical to digital sample arrays, we must replace NaN values (which represent a missing sample) with the appropriate invalid-sample sentinel value. Attempting to convert a floating-point NaN to an integer, as was done here, is implementation-defined behavior (and is controlled, to an extent, by the global numpy configuration.) We don't want to be dependent on the hardware or the global numpy configuration, and for efficiency it's best to avoid triggering floating-point errors to begin with. So instead of converting the floating-point array to integers, and fixing up the integer array after the fact, we want to replace the floating-point values *first*, and then convert to integers.
- Test that Record.adc works when n_sig is not set. (Previously, this didn't work with expanded=True.) - Test that Record.adc handles NaN by mapping it to the correct invalid-sample value. (Previously, this didn't work with expanded=False and inplace=True.) Use multiple formats to test that this takes the format into account. Furthermore, the previous code relied on implementation-defined behavior to handle NaN, which normally results in a RuntimeWarning. Within the test suite, we set the numpy error handling mode to "raise", so such implementation-defined conversions actually result in a FloatingPointError.
When converting physical to digital sample arrays, we must replace NaN values with the appropriate invalid-sample sentinel value. To do this, we need to call np.isnan and use the result as a mask to replace entries in the output array. (Although the function np.nan_to_num also exists, it's less efficient: it literally does just this, but also handles infinities.) What we don't need to do is to call any() to check whether there are any true entries - that just means we're iterating through the same array three times rather than once. Furthermore, np.copyto can broadcast d_nans across the rows of p_signal, so all the channels can be handled at once. Also use copyto in adc_inplace_1d for consistency.
@bemoody this looks good to me. were the tests intentionally stopped? |
Hmm, it looks like maybe it is failing on format checks (https://github.com/MIT-LCP/wfdb-python/actions/runs/8744000168/job/24038235919) and then the remaining tests are being cancelled. |
Yeah, I think it should work, it's failing because the latest version of black requires formatting changes (not related to this pull AFAIK.) |
I assume these issues were fixed in #482? If so, please could you rebase this PR on main? |
This pull request adds a changelog for `v4.2.0`. The changelog is based on the following auto-generated summary of merge commits generated by GitHub: ``` ## What's Changed * bug-fix: Numpy ValueError when cheking empty list equality by @ajadczaksunriselabs in #459 * bug-fix: Pandas set indexing error by @ajadczaksunriselabs in #460 * fix for /issues/452 by @tecamenz in #465 * Use numpydoc to render documentation by @SnoopJ in #472 * build(deps): bump readthedocs-sphinx-search from 0.1.1 to 0.3.2 in /docs by @dependabot in #477 * Update style by @bemoody in #482 * Fix NaN handling in Record.adc, and other fixes by @bemoody in #481 * Set upper bound on Numpy version (numpy = ">=1.10.1,<2.0.0"). Ref #493. by @tompollard in #494 * Update actions to use actions/checkout@v3 and actions/setup-python@v4. by @tompollard in #495 * Fix: Indent code to ensure 'j' is within for-loop in GQRS algorithm by @tompollard in #499 * Add write_dir argument to csv_to_wfdb. Fixes #67. by @tompollard in #492 * Fix warnings by @cbrnr in #502 * README improvements by @bemoody in #503 * Change in type promotion. Fixes to annotation.py by @tompollard in #506 * Use uv by @cbrnr in #504 * Change in type promotion. Fixes to _signal.py by @tompollard in #507 * Test round-trip write/read of supported binary formats by @bemoody in #509 * Corrected typo and extended allowed types for MultiSegmentRecord by @agent3gatech in #514 * Allow expanded physical signal in `calc_adc_params` by @briangow in #512 * Add capability to write signal with unique `samps_per_frame` to `wfdb.io.wrsamp` by @briangow in #510 * Fix selection of channels when converting to EDF by @SamJelfs in #519 * Change in type promotion introduced in Numpy 2.0. Fixes to edf.py. by @tompollard in #527 * Bump dependencies for NumPy 2 compatibility by @cbrnr in #511 * Bump version to v4.2.0 and update notes on creating new releases by @tompollard in #497 ## New Contributors * @ajadczaksunriselabs made their first contribution in #459 * @tecamenz made their first contribution in #465 * @SnoopJ made their first contribution in #472 * @dependabot made their first contribution in #477 * @agent3gatech made their first contribution in #514 * @SamJelfs made their first contribution in #519 **Full Changelog**: v4.1.2...v4.2.0 ```
Fix several bugs in
Record.adc
:Previously, the function would try to convert all samples to integers and then, for any samples that were NaN, replace the corresponding elements with the appropriate sentinel value. Even though this was probably safe in most cases, casting NaN to an integer is implementation-defined behavior, and raises a warning by default (issue Record.adc: RuntimeWarning: invalid value encountered in cast #480).
NaN just plain wasn't handled for the
inplace=True, expanded=False
case. (Currently, we don't useinplace=True
anywhere internally; although it saves a bit of memory, it's destructive and so it's probably wise for high-level functions likewrsamp
to avoid it.)The
expanded=True
case relied onself.n_sig
(in contrast toexpanded=False
, which operates based on the dimensions ofp_signal
.) This meant it would fail if the caller didn't explicitly setn_sig
, which was an annoying inconsistency.Also, tidy up duplicated code and make things a little more efficient.
A side note: I don't think the
inplace=True
mode is particularly great to have. It conflates two things (modifying the Record object attributes, which many applications want; and modifying the array contents, which you may think you want until you realize it subtly breaks something.) It does save some memory, but not as much as you'd hope. (Thatcopy=False
is pretty much a lie.) And of course I don't like functions whose return type is dependent on their arguments. So I would definitely putinplace
on the chopping block for 5.0.0. Still, I think the updated code here isn't too terribly ugly.This set of changes is the first step to making
wfdb.wrsamp
work for multi-frequency (issue #336). Next is to fixRecord.calc_adc_params
, thenRecord.set_d_features
.