Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of the ordinal recoder. #11098

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Dec 14, 2024

There is no integration yet, just the recoder. Retrieving and storing the data in xgboost is more complicated than the recoder itself, will upstream it in future PRs.

The recoder still uses some utilities in XGBoost like the Span class and an iterator. If we want to extract it to a different project, we can find a different implementation of these utilities. Other things like error handling and memory allocation can be customized through the policy class.

Tests require a container class in XGBoost. We can merge the container into the encoder module if needed. At the moment, the encoder is view-only and doesn't own any memory. After the Python and R integration is finished, more extensive and sophisticated tests will be done.

ref #11088

@trivialfis trivialfis marked this pull request as draft December 14, 2024 09:37
@trivialfis trivialfis marked this pull request as ready for review December 14, 2024 09:53
@trivialfis
Copy link
Member Author

cc @rongou .

No integration yet, just the recoder. Retrieving the storing the data in xgboost is more
complicated than the recoder itself, will upstream it in future PRs.

The recoder still uses some utilities in XGBoost like the `Span` class and an iterator. If
we want to extract it to a different project, we can find a different implementation of
these utilities.

Tests require a container class in XGBoost, we can merge the container into the encoder
module if needed. At the moment, the encoder is view-only and doesn't own any
memory. Larger and more sophisticated tests will be done after the Python and R
integration is finished.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants