Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scarliles/honesty #69

Draft
wants to merge 85 commits into
base: submodulev3
Choose a base branch
from

Conversation

SamuelCarliles3
Copy link
Collaborator

Reference Issues/PRs

What does this implement/fix? Explain your changes.

First draft honesty module

Any other comments?

SamuelCarliles3 and others added 30 commits February 16, 2024 13:36
Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to edit this comment to add information:

Testing

The current unit test tells me that the code runs, but uncertain if it works to an outsider.

  1. Add a unit-test comparing honest tree and dishonest tree depth on the same dataset def test_honest_tree_depth_vs_dishonest_tree
  2. Add a short Jupyter notebook comparing the visualization of a honest/dishonest tree (https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html) on a fixed toy simulated dataset.
  3. etc. please document things you intend on testing w/ a brief sketch?

Questions/Comments

  1. logistics: I think it is possible to keep all sort functions within _partitioner and have this diff be essentially almost gone. See https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_partitioner.pxd. It's just easier to reason about the code if there's less diff. Similarly in areas where there's diff that isn't related to the functionality of the PR, it'd be good to remove whenever you can.
  2. It's unclear to me exactly the diff between _events.pxd/pyx, _honesty.pxd/pyx files, and the events abstractions created in the splitter/tree files. That is they define relevant events, but unclear what is necessary versus what is not. Can you elaborate by adding a file docstring at the top of the pxd files to help illustrate the intentions?
  3. I think overall this is an interesting design exploration for the reasons we've discussed over the past 6 months. However, I don't see us merging this as is because the changes are going to affect the maintainability of the scikit-learn fork, which is a hard dependency for treeple. With more testing, and a separate naive implementation of honesty, I think we can scope out how to get this functionality into treeple.

I think a separate PR implementing honesty naively as a separate splitter (similar to EconML and does not have to be beautiful) would be good to compare side-by-side. Do you think you can implement that once this PR branch has been tested and documented?

name, criterion, score
)

clf = Tree(criterion=criterion, max_features=2, random_state=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason max_features=2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants