Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA Implement pruning using honest subsample data to fit the leaves #286

Merged
merged 53 commits into from
Dec 12, 2024

Conversation

adam2392
Copy link
Collaborator

Changes proposed in this pull request:

  • Adds honest pruning, so no empty leaves occur within the HonestTreeClassifier

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

adam2392 added 6 commits June 21, 2024 14:35
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link

codecov bot commented Jul 19, 2024

Codecov Report

Attention: Patch coverage is 77.50000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 80.54%. Comparing base (e1c38ad) to head (d1221e4).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
treeple/tree/_honest_tree.py 78.37% 3 Missing and 5 partials ⚠️
treeple/stats/forest.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #286      +/-   ##
==========================================
+ Coverage   80.50%   80.54%   +0.03%     
==========================================
  Files          24       24              
  Lines        2334     2364      +30     
  Branches      339      343       +4     
==========================================
+ Hits         1879     1904      +25     
- Misses        318      319       +1     
- Partials      137      141       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

adam2392 added 23 commits July 22, 2024 07:58
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
adam2392 and others added 11 commits August 13, 2024 10:19
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the errors revolve around:

ERROR treeple/tests/test_honest_forest.py - AttributeError: 'ClassifierTags' object has no attribute 'multi_output',

while the others are failing test_sklearn_compatible_estimator on "fitting with sample_weight is not equivalent to fitting with removed or repeated data points."

The first is due to the recent changes in sklearn 1.6 I assume? And I'll add some tests to skip in the others.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 I skipped all the failing checks, and the package doesn't see any problems so far. Is there anything else to work on except adding tests for criterion? Do you have any example in mind?

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of test coverage is due to the expansion of honesty into regressors, which we currently don't have applicable cases.

@adam2392
Copy link
Collaborator Author

Merging. Thanks for the in-depth review and fixes @PSSF23 !

@adam2392 adam2392 merged commit ab12ca9 into neurodata:main Dec 12, 2024
32 of 37 checks passed
@adam2392 adam2392 deleted the honestprune branch December 12, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants