-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scarliles/honesty #69
base: submodulev3
Are you sure you want to change the base?
Scarliles/honesty #69
Conversation
… memory utilization in asv
added regression forest benchmark
upstream changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to edit this comment to add information:
Testing
The current unit test tells me that the code runs, but uncertain if it works to an outsider.
- Add a unit-test comparing honest tree and dishonest tree depth on the same dataset
def test_honest_tree_depth_vs_dishonest_tree
- Add a short Jupyter notebook comparing the visualization of a honest/dishonest tree (https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html) on a fixed toy simulated dataset.
- etc. please document things you intend on testing w/ a brief sketch?
Questions/Comments
- logistics: I think it is possible to keep all sort functions within
_partitioner
and have this diff be essentially almost gone. See https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_partitioner.pxd. It's just easier to reason about the code if there's less diff. Similarly in areas where there's diff that isn't related to the functionality of the PR, it'd be good to remove whenever you can. - It's unclear to me exactly the diff between _events.pxd/pyx, _honesty.pxd/pyx files, and the events abstractions created in the splitter/tree files. That is they define relevant events, but unclear what is necessary versus what is not. Can you elaborate by adding a file docstring at the top of the pxd files to help illustrate the intentions?
- I think overall this is an interesting design exploration for the reasons we've discussed over the past 6 months. However, I don't see us merging this as is because the changes are going to affect the maintainability of the scikit-learn fork, which is a hard dependency for treeple. With more testing, and a separate naive implementation of honesty, I think we can scope out how to get this functionality into treeple.
I think a separate PR implementing honesty naively as a separate splitter (similar to EconML and does not have to be beautiful) would be good to compare side-by-side. Do you think you can implement that once this PR branch has been tested and documented?
sklearn/tree/tests/test_tree.py
Outdated
name, criterion, score | ||
) | ||
|
||
clf = Tree(criterion=criterion, max_features=2, random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason max_features=2
?
Reference Issues/PRs
What does this implement/fix? Explain your changes.
First draft honesty module
Any other comments?