Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] Tags using enums #2235

Open
chrisholder opened this issue Oct 21, 2024 · 8 comments
Open

[MNT] Tags using enums #2235

chrisholder opened this issue Oct 21, 2024 · 8 comments
Labels
enhancement New feature, improvement request or other non-bug code enhancement

Comments

@chrisholder
Copy link
Contributor

Describe the feature or idea you want to propose

In aeon we use the _tags to define the capabilities. For examples:

    _tags = {
        "X_inner_type": ["np-list", "numpy3D"],
        "capability:multivariate": True,
        "capability:unequal_length": True,
        "capability:multithreading": True,
        "algorithm_type": "feature",
    }

For values like "algorithm_type" and "X_inner_type" which we use a string value for it would be nice if these were defined with enums.

Describe your proposed solution

Add an enum for each tag form example:

from enum import Enum

class AlgorithmTypeTag(Enum):
    FEATURE = "feature"
    DISTANCE = "distance"
    DICTIONARY = "dictionary"
    HYBRID = "hybrid"
    INTERVAL = "interval"
    CONVOLUTION = "convolution"
    SHAPELET = "shapelet"
    DEEPLEARNING = "deeplearning"

Assuming we had a enum for each type we could then do something like this:

    _tags = {
        "X_inner_type": [InnerTypeTag.NP_LIST, InnerTypeTag.NUMPY3D],
        "capability:multivariate": True,
        "capability:unequal_length": True,
        "capability:multithreading": True,
        "algorithm_type": AlgorithmTypeTag.FEATURE,
    }

As the values of the enums that they resolve to are the same string values it means we don't need to change any of the actual tag logic. Additionally we can better assert the tag is valid by checking if a value is an instance for Enum.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@chrisholder chrisholder added the enhancement New feature, improvement request or other non-bug code enhancement label Oct 21, 2024
@itsdivya1309
Copy link
Contributor

Hi, can I take up this issue?

@chrisholder
Copy link
Contributor Author

Yes please! Let me know if you want any help.

@lucifer4073
Copy link

Hi @chrisholder, I'd like to work on this issue. I have a few queries:

  1. Should I address the feature for every Python file that uses _tags, or only for specific files/modules?
  2. There are two approaches I can take to implement this:
    • Separate Enum Classes for Each File: For example, in transformations/collection/shapelet_based, each file (e.g., dilated_shapelet_transform.py, rsast.py, sast.py, and shapelet_transform.py) would have its own enum class.
    • Single Enum Class for the Module/Submodule: For instance, the transformations/collection/shapelet_based module could have a single enum class, with all tags from the respective files assigned values from that class.

Could you please advise on which approach would be more appropriate?

@lucifer4073
Copy link

@aeon-actions-bot assign @lucifer4073

@chrisholder
Copy link
Contributor Author

Hi @lucifer4073! Happy for you to take this issue assuming @itsdivya1309 is also happy with this as they indicated they wanted to work on this.

My idea of how this issue would be developed is incrementally module by module. Since I work closely on the clustering module that could be a good place to start? To begin the enums should be optional (i.e. the changes should not break any module that currently uses the strings). Additionally we don't need enums (for now) for the tags that are just booleans only for fields where you pass a string currently. For example for algorithm_type we could have something like:

class AlgorithmType(Enum):
    """Enum for distance types."""

    DISTANCE = "distance"
    DICTONARY = "dictonary"
    ...etc.

We could then use it:

    _tags = {
        "capability:train_estimate": True,
        "capability:multithreading": True,
        "algorithm_type": AlgorithmType.DISTANCE,
    }

Keep it simple to begin just refactoring one module (clustering if you want). Once the PR is up we will probably need to get the opinion of others since this will be a package wide change.

This file here contains all the info on valid tags currently and will probably be where the enums will exist. I'd imagine once we have incrementally refactored the package we will then remove support for strings and enforce the use of enums.

@itsdivya1309
Copy link
Contributor

Hi, @lucifer4073! I appreciate you taking this up. I was interested in working on this, but I am occupied with other stuff. I’m happy for you to take the lead on this one. Let me know if you need any help!

@lucifer4073
Copy link

Hi @chrisholder @TonyBagnall can you please review the PR?

@MatthewMiddlehurst
Copy link
Member

Someone will review it when they get around to it, no guarantee that is going to be quick.

@lucifer4073 lucifer4073 removed their assignment Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

No branches or pull requests

4 participants