Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NaN handling in Record.adc, and other fixes #481

Merged
merged 13 commits into from
Apr 19, 2024
Merged

Fix NaN handling in Record.adc, and other fixes #481

merged 13 commits into from
Apr 19, 2024

Conversation

bemoody
Copy link
Collaborator

@bemoody bemoody commented Apr 18, 2024

Fix several bugs in Record.adc:

  1. Previously, the function would try to convert all samples to integers and then, for any samples that were NaN, replace the corresponding elements with the appropriate sentinel value. Even though this was probably safe in most cases, casting NaN to an integer is implementation-defined behavior, and raises a warning by default (issue Record.adc: RuntimeWarning: invalid value encountered in cast #480).

  2. NaN just plain wasn't handled for the inplace=True, expanded=False case. (Currently, we don't use inplace=True anywhere internally; although it saves a bit of memory, it's destructive and so it's probably wise for high-level functions like wrsamp to avoid it.)

  3. The expanded=True case relied on self.n_sig (in contrast to expanded=False, which operates based on the dimensions of p_signal.) This meant it would fail if the caller didn't explicitly set n_sig, which was an annoying inconsistency.

Also, tidy up duplicated code and make things a little more efficient.

A side note: I don't think the inplace=True mode is particularly great to have. It conflates two things (modifying the Record object attributes, which many applications want; and modifying the array contents, which you may think you want until you realize it subtly breaks something.) It does save some memory, but not as much as you'd hope. (That copy=False is pretty much a lie.) And of course I don't like functions whose return type is dependent on their arguments. So I would definitely put inplace on the chopping block for 5.0.0. Still, I think the updated code here isn't too terribly ugly.

This set of changes is the first step to making wfdb.wrsamp work for multi-frequency (issue #336). Next is to fix Record.calc_adc_params, then Record.set_d_features.

Benjamin Moody added 12 commits April 18, 2024 15:32
When converting physical to digital sample arrays, we must replace NaN
values (which represent a missing sample) with the appropriate
invalid-sample sentinel value.

This is done correctly for normal uses of the package, but if the
application directly invoked adc(inplace=True), NaNs would not have
been handled (and were instead set to an implementation-defined
value.)

(Note that we don't use inplace=True internally because this
overwrites the original floating-point array.  Applications may want
to use inplace=True to save memory, but this requires knowing that the
original array is no longer needed.)
When converting physical to digital sample arrays, all the information
we need is contained in self.e_p_signal, self.adc_gain, self.baseline,
and self.fmt.

We don't need to rely on self.n_sig here, and we don't use n_sig in
the expanded=False case, so for consistency, don't use n_sig in the
expanded=True case either.
When converting physical to digital sample arrays, we must replace NaN
values (which represent a missing sample) with the appropriate
invalid-sample sentinel value.

Attempting to convert a floating-point NaN to an integer, as was done
here, is implementation-defined behavior (and is controlled, to an
extent, by the global numpy configuration.)  We don't want to be
dependent on the hardware or the global numpy configuration, and for
efficiency it's best to avoid triggering floating-point errors to
begin with.

So instead of converting the floating-point array to integers, and
fixing up the integer array after the fact, we want to replace the
floating-point values *first*, and then convert to integers.
- Test that Record.adc works when n_sig is not set.  (Previously, this
  didn't work with expanded=True.)

- Test that Record.adc handles NaN by mapping it to the correct
  invalid-sample value.  (Previously, this didn't work with
  expanded=False and inplace=True.)  Use multiple formats to test
  that this takes the format into account.

Furthermore, the previous code relied on implementation-defined
behavior to handle NaN, which normally results in a RuntimeWarning.
Within the test suite, we set the numpy error handling mode to
"raise", so such implementation-defined conversions actually result in
a FloatingPointError.
When converting physical to digital sample arrays, we must replace NaN
values with the appropriate invalid-sample sentinel value.

To do this, we need to call np.isnan and use the result as a mask to
replace entries in the output array.  (Although the function
np.nan_to_num also exists, it's less efficient: it literally does just
this, but also handles infinities.)

What we don't need to do is to call any() to check whether there are
any true entries - that just means we're iterating through the same
array three times rather than once.  Furthermore, np.copyto can
broadcast d_nans across the rows of p_signal, so all the channels can
be handled at once.

Also use copyto in adc_inplace_1d for consistency.
@tompollard
Copy link
Member

@bemoody this looks good to me. were the tests intentionally stopped?

@tompollard
Copy link
Member

Hmm, it looks like maybe it is failing on format checks (https://github.com/MIT-LCP/wfdb-python/actions/runs/8744000168/job/24038235919) and then the remaining tests are being cancelled.

@bemoody
Copy link
Collaborator Author

bemoody commented Apr 19, 2024

Yeah, I think it should work, it's failing because the latest version of black requires formatting changes (not related to this pull AFAIK.)

@tompollard
Copy link
Member

Yeah, I think it should work, it's failing because the latest version of black requires formatting changes (not related to this pull AFAIK.)

I assume these issues were fixed in #482? If so, please could you rebase this PR on main?

@tompollard tompollard merged commit 34b989e into main Apr 19, 2024
14 checks passed
@tompollard tompollard deleted the bm/adc branch April 19, 2024 19:43
tompollard added a commit that referenced this pull request Jan 21, 2025
This pull request adds a changelog for `v4.2.0`. The changelog is based
on the following auto-generated summary of merge commits generated by
GitHub:

```
## What's Changed

* bug-fix: Numpy ValueError when cheking empty list equality by @ajadczaksunriselabs in #459
* bug-fix: Pandas set indexing error by @ajadczaksunriselabs in #460
* fix for /issues/452 by @tecamenz in #465
* Use numpydoc to render documentation by @SnoopJ in #472
* build(deps): bump readthedocs-sphinx-search from 0.1.1 to 0.3.2 in /docs by @dependabot in #477
* Update style by @bemoody in #482
* Fix NaN handling in Record.adc, and other fixes by @bemoody in #481
* Set upper bound on Numpy version (numpy = ">=1.10.1,<2.0.0"). Ref #493. by @tompollard in #494
* Update actions to use actions/checkout@v3 and actions/setup-python@v4. by @tompollard in #495
* Fix: Indent code to ensure 'j' is within for-loop in GQRS algorithm by @tompollard in #499
* Add write_dir argument to csv_to_wfdb. Fixes #67. by @tompollard in #492
* Fix warnings by @cbrnr in #502
* README improvements by @bemoody in #503
* Change in type promotion. Fixes to annotation.py by @tompollard in #506
* Use uv by @cbrnr in #504
* Change in type promotion. Fixes to _signal.py by @tompollard in #507
* Test round-trip write/read of supported binary formats by @bemoody in #509
* Corrected typo and extended allowed types for MultiSegmentRecord by @agent3gatech in #514
* Allow expanded physical signal in `calc_adc_params` by @briangow in #512
* Add capability to write signal with unique `samps_per_frame` to `wfdb.io.wrsamp` by @briangow in #510
* Fix selection of channels when converting to EDF by @SamJelfs in #519
* Change in type promotion introduced in Numpy 2.0. Fixes to edf.py. by @tompollard in #527
* Bump dependencies for NumPy 2 compatibility by @cbrnr in #511
* Bump version to v4.2.0 and update notes on creating new releases by @tompollard in #497

## New Contributors

* @ajadczaksunriselabs made their first contribution in #459
* @tecamenz made their first contribution in #465
* @SnoopJ made their first contribution in #472
* @dependabot made their first contribution in #477
* @agent3gatech made their first contribution in #514
* @SamJelfs made their first contribution in #519

**Full Changelog**: v4.1.2...v4.2.0
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants