Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type stubs to scipy/signal/_peak_finding.pyi #87

Merged
merged 22 commits into from
Oct 20, 2024

Conversation

pavyamsiri
Copy link
Contributor

I have added type stubs to scipy/signal/_peak_finding.pyi although I think the type annotations could be improved a bit.

In particular:

  • I am using npt.NDArray[np.generic] to denote that the argument must be np.ndarray but the inner type is not relevant. Is there a better type for that?
  • I am using a lot array likes as some of the functions cast the arguments using np.asarray but some also require that two arraylike arguments have the same shape. Is it possible to write this requirement as a type?
  • I am using a lot of concrete dtypes for arrays which I found by inspecting the source. I'm not sure how subtyping works with Cython because some of these arrays are passed straight into a cython function which has a concrete dtype.
  • I am using TypedDict but because the presence of the keys are dependent on whether the input arguments to the function are None or not, all values must be NotRequired but I feel like this makes the typed dict a bit useless. Making overloads is possible I guess but that would be combinatorically bad.

Open to any improvements and suggestions.

Copy link
Owner

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
In general this looks like a solid improvement.
I left a couple of minor suggestions that could help make this even better.

Comment on lines 16 to 28
class _FindPeaksResultsDict(TypedDict):
peak_heights: NotRequired[npt.NDArray[np.float64]]
left_thresholds: NotRequired[npt.NDArray[np.float64]]
right_thresholds: NotRequired[npt.NDArray[np.float64]]
prominences: NotRequired[npt.NDArray[np.float64]]
left_bases: NotRequired[npt.NDArray[np.intp]]
right_bases: NotRequired[npt.NDArray[np.intp]]
width_heights: NotRequired[npt.NDArray[np.float64]]
left_ips: NotRequired[npt.NDArray[np.float64]]
right_ips: NotRequired[npt.NDArray[np.float64]]
plateau_sizes: NotRequired[npt.NDArray[np.intp]]
left_edges: NotRequired[npt.NDArray[np.intp]]
right_edges: NotRequired[npt.NDArray[np.intp]]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if every key is NotRequired, then you could also use _FindPeaksResultsDict(TypedDict, total=False): ... instead. See https://docs.python.org/3/library/typing.html#typing.TypedDict for details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. Haven't used TypedDict before.

right_edges: NotRequired[npt.NDArray[np.intp]]

def argrelmin(
data: npt.NDArray[np.generic], axis: int = 0, order: int = 1, mode: _Mode = "clip"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually tend to put a trailing comma in cases like this, which causes ruff format to place each parameters on a new line, improving readability. But I'm also fine if you leave it like this :)

order: int = 1,
mode: _Mode = "clip",
) -> tuple[npt.NDArray[np.intp], ...]: ...
def peak_prominences(x: npt.ArrayLike, peaks: npt.ArrayLike, wlen: int | None = None) -> _ProminencesResult: ...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that peaks must always be an array-like of integers. So even though this is perfectly fine as-is, you could also consider using numpy._typing._ArrayLikeInt_co for this (no need to worry about the API stability here; at the moment I'm the only active numpy maintainer that works on typing ;) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice to know! I feel a bit weird using ArrayLike when the source assumes the dtype but it is nice to know that there are types like that in numpy although it seems to be behind a private module.

def argrelextrema(
data: npt.NDArray[np.generic],
comparator: Callable[[npt.NDArray[np.generic], npt.NDArray[np.generic]], npt.NDArray[np.bool_]],
axis: int = 0,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The axis argument accepts everything that implement __index__.

So for example, instances of this Index type are also allowed.

>>> class Index:
...     def __init__(self, i: int, /) -> None:
...         self._i = int(i)  # upcast `bool` or other `int` subtypes
...     def __index__(self, /) -> int:
...         return self._i

You could use the typing.SupportsIndex or optype.CanIndex protocols for this. (the latter is optionally generic on the return type of __index__ which allows using it with e.g. Literal).

Note that there are some scipy functions that also require __lt__ or __add__ as well. But in any case, it's better to have overly-broad parameter types than overly-narrow ones.

Comment on lines 46 to 47
peaks: npt.ArrayLike,
rel_height: float = 0.5,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stubgen was wrong in this case: rel_height also accepts e.g. np.float16(0.5) and np.bool_(True). The scipy._typing.AnyReal type alias could be used in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is one of the ones that gets passed straight into a cython function, here is the signature

def _peak_widths(const np.float64_t[::1] x not None,
                 const np.intp_t[::1] peaks not None,
                 np.float64_t rel_height,
                 const np.float64_t[::1] prominences not None,
                 const np.intp_t[::1] left_bases not None,
                 const np.intp_t[::1] right_bases not None):

I'm not familiar with cython so I didn't know how it handles type conversion from python into cython. I think the cython docs say something like python objects are converted to their c types automatically for def functions, so I think rel_height can be any type that can get converted into np.float64_t which I think is just an alias for double. Not sure which types are allowed in that case.

The AnyReal type alias is a nice suggestion though.

x: npt.ArrayLike,
height: float | npt.NDArray[np.float64] | tuple[float | None, float | None] | None = None,
threshold: float | npt.NDArray[np.float64] | tuple[float | None, float | None] | None = None,
distance: np.float64 | None = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that (at least) float is also allowed here at runtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as rel_height where it is passed straight into a cython function which I didn't know before did automatic type conversion. This means using AnyReal should be fine I think.

min_length: Untyped | None = None,
vector: npt.ArrayLike,
widths: npt.ArrayLike,
wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this always have to return an array of float64 dtype, or can it also be e.g. float16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it as a placeholder but wavelet is the same wavelet that gets passed into _cwt

def _cwt(data, wavelet, widths, dtype=None, **kwargs):
    # Determine output type
    if dtype is None:
        if np.asarray(wavelet(1, widths[0], **kwargs)).dtype.char in 'FDG':
            dtype = np.complex128
        else:
            dtype = np.float64

    output = np.empty((len(widths), len(data)), dtype=dtype)
    for ind, width in enumerate(widths):
        N = np.min([10 * width, len(data)])
        wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1])
        output[ind] = convolve(data, wavelet_data, mode='same')
    return output

which means it can just be an ArrayLike?

vector: npt.ArrayLike,
widths: npt.ArrayLike,
wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None,
max_distances: Sequence[int] | None = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a bit weird, but an ndarray isn't assignable to a Sequence, which the docs say that this should accept

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thats a bit weird/annoying. The reason why I chose Sequence is because I didn't look at the docs for find_peaks_cwt (because it is a bit long) but instead for the function that max_distances gets passed to which is called _identify_ridge_lines which has says that max_distances is a 1-D sequence.

The source code for that function only calls len and indexes max_distances which fits the definition of Sequence. The values from max_distances gets compared to np.intp as well.

Would the type then be like onpt.Array[tuple[int], AnyInt]? That might still be too narrow considering that Sequence is a relatively weak type constraint.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with Sequence is that apart from __getitem__ it also has several other methods like .count() and .index() that aren't present in np.ndarray 🤷🏻

And yea I agree that onpt.Array (or some other direct np.ndarray alias) would be indeed too narrow. So you could use e.g. _ArrayLikeInt_co, which is compatible with both Sequence[int] and npt.NDArray[integer[Any]]. And as far as I'm concerned, the fact that it would also allow integer scalars isn't much of a problem, as we're talking about the input of a function.

window_size: Untyped | None = None,
) -> Untyped: ...
window_size: int | None = None,
) -> npt.NDArray[np.intp]: ...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're feeling up to it, you could narrow this to a 1-D array as e.g. optype.numpy.Array[tuple[int], np.intp] (but it's also perfectly fine if you leave it like this).

Comment on lines 10 to 14
_Mode: TypeAlias = Literal["clip", "wrap"]
_ProminencesResult: TypeAlias = tuple[npt.NDArray[np.float64], npt.NDArray[np.intp], npt.NDArray[np.intp]]
_WidthsResult: TypeAlias = tuple[
npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64]
]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen quite a lot of npt.NDArray[np.int64] and npt.NDArray[np.inpt] being used in this module, so perhaps it could help to introduce type aliases for these two as well

@pavyamsiri
Copy link
Contributor Author

pavyamsiri commented Oct 17, 2024

Thanks for the feedback! The PR wasn't really complete because I'm not too well versed in all of the typing available, so working on this repo is really helping me learn about the possibilities.

I'll try to fix some of the issues although I think there are still a few hangups:

  • Not sure on wavelet regarding the output type and also the input types as well
  • Not sure on max_distances where the docs of two functions disagree a bit on the type constraints.
  • The dtype of NDArray for the find_peaks function arguments are too narrow.

In `_peak_finding.pyi` use them instead of `float` and `int`
respectively. This is to allow usage of numpy scalars like `float16` and
`int32`.
In `_peak_finding.pyi`. The `wavelet` parameter does not need to return
`NDArray[np.float64]` it only needs to return something that can be
coerced into a numpy array.
@jorenham
Copy link
Owner

The PR wasn't really complete because I'm not too well versed in all of the typing available, so working on this repo is really helping me learn about the possibilities.

That's completely understandable; it took me quite a while to figure out how to properly type abstract things like an "array-like" and a "dtype-like".
In such cases it'll certainly help a lot to know about the available typing tools.
So maybe it'll help to give a quick summary of the ones that I have used in these cases:

  • There's scipy._typing, where I've put several type aliases that are usually related to numpy or scipy in general
  • Similarly, there are the package-specific scipy.{package}._typing types. These are only used within that package.
  • The numpy._typing module is not part of the public API, but as a numpy maintainer I know that these aren't gonna change anytime soon. Especially the dtype-specific array-like aliases such as _ArrayLikeFloat_co and _ArrayLike[T: ] are very useful in scipy-stubs. These are widely used within numpy's bundled stubs, and have withstood the test of time. But there aren't documented, so you'll need to learn about them by looking at the code itself, e.g. https://github.com/numpy/numpy/blob/v2.2.0.dev0/numpy/_typing/_array_like.py (I'm not saying you should do this or anything, I just wanted to mention it).
  • optype is another library of mine that I've used quite a bit here, often as alternative to collections.abc. For instance, the single-method protocols (those that start with optype.Can{}) allow for writing very precise annotations.
  • optype.typing is a a bit more high-level, and has very useful aliases, such as optype.typing.AnyInt that describes exactly everything that the builtins.int constructor can accept.
  • Finally there's optype.numpy, which I plan to improve in the coming weeks, so that it can be used instead of numpy._typing. Within scipy-stubs I always import this as onpt (matching the conventional npt alias for numpy.typing). One of its most useful types is onpt.Array, which is the shape-typed analogue of npt.NDArray, and the type params are optional, so that onpt.Array is the same as npt.NDArray[np.generic].

@jorenham
Copy link
Owner

jorenham commented Oct 17, 2024

  • Not sure on wavelet regarding the output type and also the input types as well

After experimenting a bit with it, it looks like it is allowed to return both ndarray and list, as long as they're 1-D, and the length must be equal to the first argument, so the return type should be a (1-d) "array-like".
It always gets passed 2 arguments as the docs say, but their types turn out to be rather weird

  1. The first argument is a builtins.int on the first call, but np.int_ or np.float64 (huh?) after that, depending on whether widths is an array-like of integers or floats.
  2. The second argument is (always) a numpy scalar type SCT: np.generic, and matches the scalar-type/dtype of the widths array-like.

So find_peaks_cwt(x, np.arange(1, 10, dtype=np.uint8), f) first calls f as (int, np.uint8), and then as (np.int64, np.uint8) (np.int_ is an alias for np.int64 on my machine).
And find_peaks_cwt(x, np.arange(1, 10, dtype=np.float16), f) will first call f as (int, np.float16), and then as (np.float64, np.float16). The second time the f was called, its *args were (np.float64(10.0), np.float16(1.0)) (wasn't the first argument is supposed to be a discrete size?).

tldr;

I guess the signature is best described as an overloaded one, that looks a bit like

  • (int | np.int_, SCT) -> array_like in case of widths: ArrayLike[SCT: np.integer[Any]]
  • (int | np.float64, SCT) in case of widths: ArrayLike[SCT: np.floating[Any]]

(the ArrayLike alias is fictitious unfortunately)

@jorenham
Copy link
Owner

jorenham commented Oct 17, 2024

  • Not sure on max_distances where the docs of two functions disagree a bit on the type constraints.

You could try it out for a bit in a (i)python console or something, or you could play it safe and use npt.ArrayLike (which also covers Sequence)


  • The dtype of NDArray for the find_peaks function arguments are too narrow.

As these can be scalar, ndarray or a sequence, it's basically an "array-like".
And in this case, we probably only should allow reals.
So you could use numpy._typing._ArrayLikeFloat_co for this, which also allows you to get rid of the tuple (although for documentation purposes it might help if it stays).

In `_peak_finding.pyi`'s `find_peaks` instead of having separate type
annotations for scalars and NDArrays that have a concrete dtype, use
numpy's _ArrayLike*_co type aliases to cover both cases and allow for
various dtypese
widths: npt.ArrayLike,
wavelet: Callable[Concatenate[int, float, ...], npt.ArrayLike] | None = None,
max_distances: Sequence[int] | None = None,
gap_thresh: AnyInt | None = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a float, according to the docs

vector: npt.ArrayLike,
widths: npt.ArrayLike,
wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None,
max_distances: Sequence[int] | None = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with Sequence is that apart from __getitem__ it also has several other methods like .count() and .index() that aren't present in np.ndarray 🤷🏻

And yea I agree that onpt.Array (or some other direct np.ndarray alias) would be indeed too narrow. So you could use e.g. _ArrayLikeInt_co, which is compatible with both Sequence[int] and npt.NDArray[integer[Any]]. And as far as I'm concerned, the fact that it would also allow integer scalars isn't much of a problem, as we're talking about the input of a function.

) -> Untyped: ...
vector: npt.ArrayLike,
widths: npt.ArrayLike,
wavelet: Callable[Concatenate[int, float, ...], npt.ArrayLike] | None = None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment on this signature

Comment on lines 47 to 48
data: npt.NDArray[np.generic],
comparator: Callable[[npt.NDArray[np.generic], npt.NDArray[np.generic]], npt.NDArray[np.bool_]],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally optional; but you could use a TypeVar(..., bound=np.generic) instead of the current np.generics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that require the comparator function to have two arguments of the same type? I think that in the ideal case the function signature makes sense but what if someone were to pass a function that takes in a float and an int? They would still be able to do the comparison despite the types being different.

Assuming that using TypeVar enforces that the two argument types to be the same that is.

Copy link
Owner

@jorenham jorenham Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that require the comparator function to have two arguments of the same type?

Yes exactly.


I think that in the ideal case the function signature makes sense but what if someone were to pass a function that takes in a float and an int?

If someone would pass a function that accepts arrays whose scalar types are incompatible, then it would indeed not be allowed.
And that's a good thing, because it could prevent some potentially type-unsafe situations.


Assuming that using TypeVar enforces that the two argument types to be the same that is.

Well, it kinda depends on what you mean with "the same".
Because in this case, we also need to take the relationship of NDArray and its scalar type parameter into account: It's covariant.
So that means that you're allowed to pass something like (NDArray[np.int32], NDArray[np.integer[Any]]) -> NDArray[np.bool_] as function if you also pass data: NDarray[np.int32].

For a demonstration of this, see https://basedpyright.com/?pythonVersion=3.13&typeCheckingMode=all&reportUnusedExpression=true&code=GYJw9gtgBAxmA28CmMAuBLMA7AzgOgEMAjGKdCABzBFSgGUkBHAVySxiQChRIpUBPCuiwBzMpWq0AwgUTFkAGlgEcqTpwDEULGFRI%2BACwK0CMOCAAmwsajB9BKAygDWSEDiUADYak9kcUCo46CJY8vq2UJ7A8GDGnuoCFPoAgiAgBPwA2gAqALpQALz0TKzsSLkFUFpwAG4EIOgEWLQAFADu1M44AJSa9slQaRnZ%2BUUlLGwclVDVZFj1jc1tFmBIOFgA5LSdIM59nBZIwFDAla0WxgQAXEPpmZVKcJQNxtS3MnJEyFlZww-5JT-UZ5PJA%2B7ZIhgBCgnpQAC0AD4oFCENdOLNZiAkKhmCAsIFEK1nhRXrYQBcrllrvCAIxgqCXVAELK0655HoHI4nHwAfWQvJicVQrRudxGWR8DKIt2BWSFxgZAHo4UjxQ9UfA8ujMVBsbj8VAsgR0FAADzFIim4DUQLoJRW%2BZQABe6AooodPTy6kOV3GMBUIrlUqU8HQqlaGVESFaAFZOX1w7ycJIkBZxsBKcylHyBQrUAcgA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. Just misunderstood the semantics. I can add this then.

def argrelmin(
data: npt.NDArray[np.generic],
axis: op.CanIndex = 0,
order: int = 1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that besides int, you could also pass some np.integer[Any] and np.bool_ to order. And coincidentally, scipy._typing.AnyInt is an alias for exactly that :)

There's also some other orders like this below

@jorenham jorenham added the stubs: improvement Improve or refactor existing annotations label Oct 17, 2024
@pavyamsiri
Copy link
Contributor Author

  • Not sure on wavelet regarding the output type and also the input types as well

After experimenting a bit with it, it looks like it is allowed to return both ndarray and list, as long as they're 1-D, and the length must be equal to the first argument, so the return type should be a (1-d) "array-like". It always gets passed 2 arguments as the docs say, but their types turn out to be rather weird

  1. The first argument is a builtins.int on the first call, but np.int_ or np.float64 (huh?) after that, depending on whether widths is an array-like of integers or floats.
  2. The second argument is (always) a numpy scalar type SCT: np.generic, and matches the scalar-type/dtype of the widths array-like.

So find_peaks_cwt(x, np.arange(1, 10, dtype=np.uint8), f) first calls f as (int, np.uint8), and then as (np.int64, np.uint8) (np.int_ is an alias for np.int64 on my machine). And find_peaks_cwt(x, np.arange(1, 10, dtype=np.float16), f) will first call f as (int, np.float16), and then as (np.float64, np.float16). The second time the f was called, its *args were (np.float64(10.0), np.float16(1.0)) (wasn't the first argument is supposed to be a discrete size?).

tldr;

I guess the signature is best described as an overloaded one, that looks a bit like

  • (int | np.int_, SCT) -> array_like in case of widths: ArrayLike[SCT: np.integer[Any]]
  • (int | np.float64, SCT) in case of widths: ArrayLike[SCT: np.floating[Any]]

(the ArrayLike alias is fictitious unfortunately)

I finally got around to understanding this signature and I agree that the overloaded ones are good. I'm not sure how to make a type alias like the one you suggested though.

I think a simple way to annotate without having to use too many generics is widths: _ArrayLikeInt_co implies that wavelet: Callable[Concatenate[int | np.int_, AnyInt, ...], npt.ArrayLike] and then likewise for the float case.

This doesn't enforce that the wavelet second argument is of the same type as the scalar type of the array like but it should be broader right?

@pavyamsiri
Copy link
Contributor Author

Sorry was a bit busy the last couple of days so couldn't finish off the PR.

@pavyamsiri
Copy link
Contributor Author

Actually on the topic of wavelet I don't think npt.ArrayLike is the correct output type. The second time it is called the code reverses the output using [::-1]

wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1])

so I think it expects that wavelet returns some non-scalar arraylike. For example np.float16(0.3) is an array like I believe but you can't reverse it like this. I would opt to put it as NDArray[np.generic] as the output type but I'm open to any improvements to make it less broad.

@pavyamsiri
Copy link
Contributor Author

Actually on the topic of wavelet I don't think npt.ArrayLike is the correct output type. The second time it is called the code reverses the output using [::-1]

wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1])

so I think it expects that wavelet returns some non-scalar arraylike. For example np.float16(0.3) is an array like I believe but you can't reverse it like this. I would opt to put it as NDArray[np.generic] as the output type but I'm open to any improvements to make it less broad.

I also don't think we need to write overloads for this as well although it maybe more accurate. I think the spirit of the wavelet function is that it should be able to handle float values as well because it represents a continuous mathematical function.

Following this interpretation as well, the first argument which corresponds to the number of points should be something like AnyInt although the source code does not respect that and passes in np.float64 which kinds of ruins things. So maybe the type annotation should be like

_WaveletFunction: TypeAlias = Callable[Concatenate[AnyInt, AnyReal, ...], npt.NDArray[np.generic]]

with the only caveats being that the source actually passes in np.float64 as the first argument sometimes (I guess we can concede and add as part of type union) and that the output is allowed to be any non-scalar array likes like tuple[float, ...] is fine as well.

For the `argrelextrema` function in `_peak_finding.pyi`.
@jorenham
Copy link
Owner

I think a simple way to annotate without having to use too many generics is widths: _ArrayLikeInt_co implies that wavelet: Callable[Concatenate[int | np.int_, AnyInt, ...], npt.ArrayLike] and then likewise for the float case.

Yes that seems about right. Just keep in mind that _ArrayLikeFloat_co is more general than _ArrayLikeInt_co (the latter is assignable to the former).

This doesn't enforce that the wavelet second argument is of the same type as the scalar type of the array like but it should be broader right?

Yes; the parameters of a Callable behave as if contravariant

@jorenham
Copy link
Owner

jorenham commented Oct 19, 2024

Actually on the topic of wavelet I don't think npt.ArrayLike is the correct output type. The second time it is called the code reverses the output using [::-1]

Good catch! I then just got lucky when I tried it with a list as return value. So I guess that anything can be returned that implements __getattr__: (Self, slice) -> npt.ArrayLike, which includes, Sequence andnpt.NDArray. With optype you could write that as optype.CanGetitem[slice, npt.ArrayLike] btw.

Found in `_peak_finding.pyi`. Changes are:

1. Use `_ArrayLikeFloat_co` for `widths` parameter
    This is because the documents specify `float or sequence` so the
    dtype is implied to be float.
2. Change type annotation for `max_distances` from `_ArrayLikeInt_co` to
   `_ArrayLikeFloat_co`. In case of `None` the array is given by the
   `widths` parameter divided by 4.0 so it is has to be compatible with
   float dtypes.
3. Change the output for `_WaveletFunction` to be more accurate. The
   output has to be sliceable with the type of the sliced access being
   an array like.
Found in `_peak_finding.pyi`.

Change the type annotation for the `vector` parameter from `ArrayLike`
to `NDArray[generic]`. The docs specify the input as `ndarray` not an
array like and as there is no conversion in the source code, we can't
use `ArrayLike`.
@pavyamsiri
Copy link
Contributor Author

I added the following changes:

  1. Use _ArrayLikeFloat_co for widths parameter
    This is because the documents specify float or sequence so the
    dtype is implied to be float.
  2. Change type annotation for max_distances from _ArrayLikeInt_co to
    _ArrayLikeFloat_co. In case of None the array is given by the
    widths parameter divided by 4.0 so it is has to be compatible with
    float dtypes.
  3. Change the output for _WaveletFunction to be more accurate. The
    output has to be sliceable with the type of the sliced access being
    an array like.
  4. Change the type annotation for the vector parameter from ArrayLike
    to NDArray[generic]. The docs specify the input as ndarray not an
    array like and as there is no conversion in the source code, we can't use ArrayLike.

@jorenham
Copy link
Owner

Great! Can I merge this?

@pavyamsiri
Copy link
Contributor Author

Great! Can I merge this?

sorry just found a mistake. I think max_distances is supposed to be an ndarray according to the docs so not an array like. Let me change that first. Would NDArray[np.floating[Any]] be good or is a better annotation?

@jorenham
Copy link
Owner

jorenham commented Oct 20, 2024

Great! Can I merge this?

sorry just found a mistake. I think max_distances is supposed to be an ndarray according to the docs so not an array like. Let me change that first. Would NDArray[np.floating[Any]] be good or is a better annotation?

The docs aren't always accurate I have found, so be sure to check that first. But in case they are, then I guess that NDArray[np.floating[Any] | np.integer[Any] | np.bool_] would work here, so that all coercible dtypes are allowed.

@pavyamsiri
Copy link
Contributor Author

pavyamsiri commented Oct 20, 2024

Yeah I actually was a bit confused with the docs because it says max_distances is an ndarray but its default value is widths / 4.0. I thought this conflicted because widths can be a scalar but the function firsts casts widths to an ndarray first so I think the docs are right in this case.

In `_peak_finding.pyi` and the function `find_peaks_cwt`. According to
docs this is an NDArray not an ArrayLike.
@pavyamsiri
Copy link
Contributor Author

Should be good to merge now after CI!

@jorenham jorenham merged commit bc9eba3 into jorenham:master Oct 20, 2024
2 checks passed
@jorenham
Copy link
Owner

Thanks @pavyamsiri !

@pavyamsiri pavyamsiri deleted the find_peaks branch October 20, 2024 04:43
@jorenham jorenham mentioned this pull request Nov 25, 2024
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scipy.signal stubs: improvement Improve or refactor existing annotations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants