Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy way to build wheel out of Scikit-Learn-Tree ? #68

Closed
simonprovost opened this issue Jun 28, 2024 · 6 comments
Closed

Easy way to build wheel out of Scikit-Learn-Tree ? #68

simonprovost opened this issue Jun 28, 2024 · 6 comments

Comments

@simonprovost
Copy link

Describe the workflow you want to enable

First and foremost, we appreciate your NeuroData work focused on Scikit-Learn trees. This is brilliant. Anyway, we did some tree modifications, but it surely compiles on Linux and MacOS without much trouble, so we built the wheel on MacOS (local) and re-used it in our dependencies instead of recompiling the whole Scikit-Learn fork, and it works, thus speeding up the installation of our project, which uses an update of Scikit-Learn-Tree. However, in our Docker container, doing the same appears to produce a problem with the wheel generated after installation.

Is there a method to create the wheels we want (MacOS and Linux) so that our end users can use both (open source) instead of having to re compile the whole thing?

Cheers

Describe your proposed solution

None

Describe alternatives you've considered, if relevant

Did not find any yet

Additional context

No response

@adam2392
Copy link
Collaborator

adam2392 commented Jul 2, 2024

I think perhaps when I recommend you take a look at this fork, it was to demonstrate how to leverage the cython refactoring I did.

However, I never wanted to get in the position of maintaining a fork from scikit-learn, so this is only used as a submodule within scikit-tree. You can go over there to see how to implement new trees using the Cython internals.

@simonprovost
Copy link
Author

Cheers Adam! I'll look into those

@simonprovost
Copy link
Author

@adam2392 Actually! I remember seeing a sklearn_fork here once, but the latest version of the submodule v3 returns to Sklearn. Anyway, I have modified the trees and it works great in our direction and am attempting to upload to Pypi while obviously acknowledging your and Scikit-learn's efforts, but I am having trouble uploading this; Twine assumes I want to upload to Scikit-learn's pypi project despite the fact that I have made changes to the setup.py reflecting on our new name. Is there anything I should change from your experience please?

Is there a method that quickly passes from "sklearn" to "<a_new_name>" on the other hand? But we would not mind keeping sklearn in our main library which would refer to our "updated"'s one. E.g. I saw something in a short search, but I am not sure if it is still functional today. https://github.com/neurodata/scikit-tree/blob/98b67fcbaf028ee8a9bd92407089ab4a34008e9b/doc/conf.py#L391

Cheers in advance ! here is our repository; you can already see the readme with acknowledgements but if it needs more than that feel free to mention we have no issue with that your work is tremendously good!

https://github.com/simonprovost/scikit-lexicographical-trees

@simonprovost simonprovost reopened this Jul 2, 2024
@adam2392
Copy link
Collaborator

adam2392 commented Jul 2, 2024

scikit-learn-fork is not maintained and will not be because it is too burdensome. I opted for a submodule strategy compared to a pypi maintained fork. Scikit-tree contains all the necessary cimports. There, I will consistently update the submodule.

I would either replicate the submodule strategy, try to import from scikit-tree, or see if you want to propose the specific tree model within scikit-tree.

If you extend from scikit-tree, or within scikit-tree, then you will have some assurance that the submodule is maintained and aligned with scikit-learn main (because I will do it). If you opt for the submodule strategy, then you will potentially have many breaking changes if/when scikit-learn main modifies various code because I will then update the submodule.

Hopefully this addresses your issues, but if not, let me know.

@adam2392
Copy link
Collaborator

adam2392 commented Jul 2, 2024

Is there a method that quickly passes from "sklearn" to "<a_new_name>" on the other hand? But we would not mind keeping sklearn in our main library which would refer to our "updated"'s one. E.g. I saw something in a short search, but I am not sure if it is still functional today. https://github.com/neurodata/scikit-tree/blob/98b67fcbaf028ee8a9bd92407089ab4a34008e9b/doc/conf.py#L391

No that code should be prolly deleted as its outdated. Thanks for pointing it out!

@simonprovost
Copy link
Author

Thank you very much for your rapid replies! I will work on making it work, but it makes perfect sense! And cheers for helping you out with that you surely helped me out much more!

I just removed my previous comment, but in general, it would not make much sense to provide a new PR to Scikit-Tree because we do something very different, such as needing a vector all along the trees to know how recent a feature is (repeated measurements / longitudinal data) so that we could apply different approaches in consequences. For example, the only one currently is lexicographical: https://link.springer.com/article/10.1007/s10462-024-10718-1

I will open a new issue if I think it would be beneficial to contribute to Scikit-Tree, which is an awesome endeavour to be honest and thanks once more!

Cheers,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants