-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type stubs to scipy/signal/_peak_finding.pyi
#87
Conversation
Namely: 1. `argrelmin` 2. `argrelmax` 3. `argrelextrema`
Namely: 1. `peaks_prominences` 2. `peaks_widths`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
In general this looks like a solid improvement.
I left a couple of minor suggestions that could help make this even better.
scipy-stubs/signal/_peak_finding.pyi
Outdated
class _FindPeaksResultsDict(TypedDict): | ||
peak_heights: NotRequired[npt.NDArray[np.float64]] | ||
left_thresholds: NotRequired[npt.NDArray[np.float64]] | ||
right_thresholds: NotRequired[npt.NDArray[np.float64]] | ||
prominences: NotRequired[npt.NDArray[np.float64]] | ||
left_bases: NotRequired[npt.NDArray[np.intp]] | ||
right_bases: NotRequired[npt.NDArray[np.intp]] | ||
width_heights: NotRequired[npt.NDArray[np.float64]] | ||
left_ips: NotRequired[npt.NDArray[np.float64]] | ||
right_ips: NotRequired[npt.NDArray[np.float64]] | ||
plateau_sizes: NotRequired[npt.NDArray[np.intp]] | ||
left_edges: NotRequired[npt.NDArray[np.intp]] | ||
right_edges: NotRequired[npt.NDArray[np.intp]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if every key is NotRequired
, then you could also use _FindPeaksResultsDict(TypedDict, total=False): ...
instead. See https://docs.python.org/3/library/typing.html#typing.TypedDict for details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. Haven't used TypedDict
before.
scipy-stubs/signal/_peak_finding.pyi
Outdated
right_edges: NotRequired[npt.NDArray[np.intp]] | ||
|
||
def argrelmin( | ||
data: npt.NDArray[np.generic], axis: int = 0, order: int = 1, mode: _Mode = "clip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually tend to put a trailing comma in cases like this, which causes ruff format
to place each parameters on a new line, improving readability. But I'm also fine if you leave it like this :)
scipy-stubs/signal/_peak_finding.pyi
Outdated
order: int = 1, | ||
mode: _Mode = "clip", | ||
) -> tuple[npt.NDArray[np.intp], ...]: ... | ||
def peak_prominences(x: npt.ArrayLike, peaks: npt.ArrayLike, wlen: int | None = None) -> _ProminencesResult: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that peaks
must always be an array-like of integers. So even though this is perfectly fine as-is, you could also consider using numpy._typing._ArrayLikeInt_co
for this (no need to worry about the API stability here; at the moment I'm the only active numpy maintainer that works on typing ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice to know! I feel a bit weird using ArrayLike
when the source assumes the dtype but it is nice to know that there are types like that in numpy although it seems to be behind a private module.
scipy-stubs/signal/_peak_finding.pyi
Outdated
def argrelextrema( | ||
data: npt.NDArray[np.generic], | ||
comparator: Callable[[npt.NDArray[np.generic], npt.NDArray[np.generic]], npt.NDArray[np.bool_]], | ||
axis: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The axis
argument accepts everything that implement __index__
.
So for example, instances of this Index
type are also allowed.
>>> class Index:
... def __init__(self, i: int, /) -> None:
... self._i = int(i) # upcast `bool` or other `int` subtypes
... def __index__(self, /) -> int:
... return self._i
You could use the typing.SupportsIndex
or optype.CanIndex
protocols for this. (the latter is optionally generic on the return type of __index__
which allows using it with e.g. Literal
).
Note that there are some scipy functions that also require __lt__
or __add__
as well. But in any case, it's better to have overly-broad parameter types than overly-narrow ones.
scipy-stubs/signal/_peak_finding.pyi
Outdated
peaks: npt.ArrayLike, | ||
rel_height: float = 0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stubgen
was wrong in this case: rel_height
also accepts e.g. np.float16(0.5)
and np.bool_(True)
. The scipy._typing.AnyReal
type alias could be used in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is one of the ones that gets passed straight into a cython function, here is the signature
def _peak_widths(const np.float64_t[::1] x not None,
const np.intp_t[::1] peaks not None,
np.float64_t rel_height,
const np.float64_t[::1] prominences not None,
const np.intp_t[::1] left_bases not None,
const np.intp_t[::1] right_bases not None):
I'm not familiar with cython so I didn't know how it handles type conversion from python into cython. I think the cython docs say something like python objects are converted to their c types automatically for def
functions, so I think rel_height
can be any type that can get converted into np.float64_t
which I think is just an alias for double
. Not sure which types are allowed in that case.
The AnyReal
type alias is a nice suggestion though.
scipy-stubs/signal/_peak_finding.pyi
Outdated
x: npt.ArrayLike, | ||
height: float | npt.NDArray[np.float64] | tuple[float | None, float | None] | None = None, | ||
threshold: float | npt.NDArray[np.float64] | tuple[float | None, float | None] | None = None, | ||
distance: np.float64 | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that (at least) float
is also allowed here at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as rel_height
where it is passed straight into a cython function which I didn't know before did automatic type conversion. This means using AnyReal
should be fine I think.
scipy-stubs/signal/_peak_finding.pyi
Outdated
min_length: Untyped | None = None, | ||
vector: npt.ArrayLike, | ||
widths: npt.ArrayLike, | ||
wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this always have to return an array of float64
dtype, or can it also be e.g. float16
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it as a placeholder but wavelet
is the same wavelet
that gets passed into _cwt
def _cwt(data, wavelet, widths, dtype=None, **kwargs):
# Determine output type
if dtype is None:
if np.asarray(wavelet(1, widths[0], **kwargs)).dtype.char in 'FDG':
dtype = np.complex128
else:
dtype = np.float64
output = np.empty((len(widths), len(data)), dtype=dtype)
for ind, width in enumerate(widths):
N = np.min([10 * width, len(data)])
wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1])
output[ind] = convolve(data, wavelet_data, mode='same')
return output
which means it can just be an ArrayLike
?
scipy-stubs/signal/_peak_finding.pyi
Outdated
vector: npt.ArrayLike, | ||
widths: npt.ArrayLike, | ||
wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None, | ||
max_distances: Sequence[int] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a bit weird, but an ndarray
isn't assignable to a Sequence
, which the docs say that this should accept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thats a bit weird/annoying. The reason why I chose Sequence
is because I didn't look at the docs for find_peaks_cwt
(because it is a bit long) but instead for the function that max_distances
gets passed to which is called _identify_ridge_lines
which has says that max_distances
is a 1-D sequence.
The source code for that function only calls len
and indexes max_distances
which fits the definition of Sequence
. The values from max_distances
gets compared to np.intp
as well.
Would the type then be like onpt.Array[tuple[int], AnyInt]
? That might still be too narrow considering that Sequence
is a relatively weak type constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with Sequence
is that apart from __getitem__
it also has several other methods like .count()
and .index()
that aren't present in np.ndarray
🤷🏻
And yea I agree that onpt.Array
(or some other direct np.ndarray
alias) would be indeed too narrow. So you could use e.g. _ArrayLikeInt_co
, which is compatible with both Sequence[int]
and npt.NDArray[integer[Any]]
. And as far as I'm concerned, the fact that it would also allow integer scalars isn't much of a problem, as we're talking about the input of a function.
scipy-stubs/signal/_peak_finding.pyi
Outdated
window_size: Untyped | None = None, | ||
) -> Untyped: ... | ||
window_size: int | None = None, | ||
) -> npt.NDArray[np.intp]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're feeling up to it, you could narrow this to a 1-D array as e.g. optype.numpy.Array[tuple[int], np.intp]
(but it's also perfectly fine if you leave it like this).
scipy-stubs/signal/_peak_finding.pyi
Outdated
_Mode: TypeAlias = Literal["clip", "wrap"] | ||
_ProminencesResult: TypeAlias = tuple[npt.NDArray[np.float64], npt.NDArray[np.intp], npt.NDArray[np.intp]] | ||
_WidthsResult: TypeAlias = tuple[ | ||
npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen quite a lot of npt.NDArray[np.int64]
and npt.NDArray[np.inpt]
being used in this module, so perhaps it could help to introduce type aliases for these two as well
Specifically for `_peak_finding.pyi`
For `_peak_finding.pyi`.
Thanks for the feedback! The PR wasn't really complete because I'm not too well versed in all of the typing available, so working on this repo is really helping me learn about the possibilities. I'll try to fix some of the issues although I think there are still a few hangups:
|
In `_peak_finding.pyi`
In `_peak_finding.pyi`.
In `_peak_finding.pyi`.
In `_peak_finding.pyi` use them instead of `float` and `int` respectively. This is to allow usage of numpy scalars like `float16` and `int32`.
In `_peak_finding.pyi`. The `wavelet` parameter does not need to return `NDArray[np.float64]` it only needs to return something that can be coerced into a numpy array.
That's completely understandable; it took me quite a while to figure out how to properly type abstract things like an "array-like" and a "dtype-like".
|
After experimenting a bit with it, it looks like it is allowed to return both
So tldr; I guess the signature is best described as an overloaded one, that looks a bit like
(the |
You could try it out for a bit in a (i)python console or something, or you could play it safe and use
As these can be scalar, ndarray or a sequence, it's basically an "array-like". |
In `_peak_finding.pyi`'s `find_peaks` instead of having separate type annotations for scalars and NDArrays that have a concrete dtype, use numpy's _ArrayLike*_co type aliases to cover both cases and allow for various dtypese
scipy-stubs/signal/_peak_finding.pyi
Outdated
widths: npt.ArrayLike, | ||
wavelet: Callable[Concatenate[int, float, ...], npt.ArrayLike] | None = None, | ||
max_distances: Sequence[int] | None = None, | ||
gap_thresh: AnyInt | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be a float, according to the docs
scipy-stubs/signal/_peak_finding.pyi
Outdated
vector: npt.ArrayLike, | ||
widths: npt.ArrayLike, | ||
wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None, | ||
max_distances: Sequence[int] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with Sequence
is that apart from __getitem__
it also has several other methods like .count()
and .index()
that aren't present in np.ndarray
🤷🏻
And yea I agree that onpt.Array
(or some other direct np.ndarray
alias) would be indeed too narrow. So you could use e.g. _ArrayLikeInt_co
, which is compatible with both Sequence[int]
and npt.NDArray[integer[Any]]
. And as far as I'm concerned, the fact that it would also allow integer scalars isn't much of a problem, as we're talking about the input of a function.
scipy-stubs/signal/_peak_finding.pyi
Outdated
) -> Untyped: ... | ||
vector: npt.ArrayLike, | ||
widths: npt.ArrayLike, | ||
wavelet: Callable[Concatenate[int, float, ...], npt.ArrayLike] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment on this signature
scipy-stubs/signal/_peak_finding.pyi
Outdated
data: npt.NDArray[np.generic], | ||
comparator: Callable[[npt.NDArray[np.generic], npt.NDArray[np.generic]], npt.NDArray[np.bool_]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally optional; but you could use a TypeVar(..., bound=np.generic)
instead of the current np.generic
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that require the comparator function to have two arguments of the same type? I think that in the ideal case the function signature makes sense but what if someone were to pass a function that takes in a float and an int? They would still be able to do the comparison despite the types being different.
Assuming that using TypeVar
enforces that the two argument types to be the same that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that require the comparator function to have two arguments of the same type?
Yes exactly.
I think that in the ideal case the function signature makes sense but what if someone were to pass a function that takes in a float and an int?
If someone would pass a function that accepts arrays whose scalar types are incompatible, then it would indeed not be allowed.
And that's a good thing, because it could prevent some potentially type-unsafe situations.
Assuming that using
TypeVar
enforces that the two argument types to be the same that is.
Well, it kinda depends on what you mean with "the same".
Because in this case, we also need to take the relationship of NDArray
and its scalar type parameter into account: It's covariant.
So that means that you're allowed to pass something like (NDArray[np.int32], NDArray[np.integer[Any]]) -> NDArray[np.bool_]
as function if you also pass data: NDarray[np.int32]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. Just misunderstood the semantics. I can add this then.
scipy-stubs/signal/_peak_finding.pyi
Outdated
def argrelmin( | ||
data: npt.NDArray[np.generic], | ||
axis: op.CanIndex = 0, | ||
order: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that besides int
, you could also pass some np.integer[Any]
and np.bool_
to order
. And coincidentally, scipy._typing.AnyInt
is an alias for exactly that :)
There's also some other order
s like this below
In `_peak_finding.pyi`
In `_peak_finding.pyi`.
I finally got around to understanding this signature and I agree that the overloaded ones are good. I'm not sure how to make a type alias like the one you suggested though. I think a simple way to annotate without having to use too many generics is This doesn't enforce that the wavelet second argument is of the same type as the scalar type of the array like but it should be broader right? |
Sorry was a bit busy the last couple of days so couldn't finish off the PR. |
Actually on the topic of wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1]) so I think it expects that |
For the function `find_peaks_cwt` in `_peak_finding.pyi`.
I also don't think we need to write overloads for this as well although it maybe more accurate. I think the spirit of the wavelet function is that it should be able to handle float values as well because it represents a continuous mathematical function. Following this interpretation as well, the first argument which corresponds to the number of points should be something like _WaveletFunction: TypeAlias = Callable[Concatenate[AnyInt, AnyReal, ...], npt.NDArray[np.generic]] with the only caveats being that the source actually passes in |
For the `argrelextrema` function in `_peak_finding.pyi`.
Yes that seems about right. Just keep in mind that
Yes; the parameters of a |
Good catch! I then just got lucky when I tried it with a |
Found in `_peak_finding.pyi`. Changes are: 1. Use `_ArrayLikeFloat_co` for `widths` parameter This is because the documents specify `float or sequence` so the dtype is implied to be float. 2. Change type annotation for `max_distances` from `_ArrayLikeInt_co` to `_ArrayLikeFloat_co`. In case of `None` the array is given by the `widths` parameter divided by 4.0 so it is has to be compatible with float dtypes. 3. Change the output for `_WaveletFunction` to be more accurate. The output has to be sliceable with the type of the sliced access being an array like.
Found in `_peak_finding.pyi`. Change the type annotation for the `vector` parameter from `ArrayLike` to `NDArray[generic]`. The docs specify the input as `ndarray` not an array like and as there is no conversion in the source code, we can't use `ArrayLike`.
I added the following changes:
|
Great! Can I merge this? |
sorry just found a mistake. I think |
The docs aren't always accurate I have found, so be sure to check that first. But in case they are, then I guess that |
Yeah I actually was a bit confused with the docs because it says |
In `_peak_finding.pyi` and the function `find_peaks_cwt`. According to docs this is an NDArray not an ArrayLike.
Should be good to merge now after CI! |
Thanks @pavyamsiri ! |
I have added type stubs to
scipy/signal/_peak_finding.pyi
although I think the type annotations could be improved a bit.In particular:
npt.NDArray[np.generic]
to denote that the argument must benp.ndarray
but the inner type is not relevant. Is there a better type for that?np.asarray
but some also require that two arraylike arguments have the same shape. Is it possible to write this requirement as a type?TypedDict
but because the presence of the keys are dependent on whether the input arguments to the function are None or not, all values must beNotRequired
but I feel like this makes the typed dict a bit useless. Making overloads is possible I guess but that would be combinatorically bad.Open to any improvements and suggestions.