Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add option to permute per forest fraction #145

Merged
merged 34 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a3a002d
Add option to permute per forest fraction
adam2392 Oct 16, 2023
5277a5b
Add sep parallel func for building and predicting
adam2392 Oct 16, 2023
df3a1b1
Finished adding
adam2392 Oct 16, 2023
d46f1ad
Modify parallel building
adam2392 Oct 16, 2023
17b01ac
New submodule
adam2392 Oct 17, 2023
16122d3
Add additional pickle test
adam2392 Oct 17, 2023
7d42ac7
Add changelog
adam2392 Oct 17, 2023
4423377
Remove unnecessary comments
adam2392 Oct 17, 2023
1c8eedc
Merge branch 'main' into might-params
adam2392 Oct 17, 2023
5730b32
Remove extra LOC
adam2392 Oct 17, 2023
cd99a11
Merge branch 'might-params' of https://github.com/neurodata/scikit-tr…
adam2392 Oct 17, 2023
7fe487c
Merge branch 'main' into might-params
adam2392 Oct 17, 2023
6f978cb
Fix pvalue
adam2392 Oct 17, 2023
58d5365
Lint
adam2392 Oct 17, 2023
f5f282a
STart work on permute fraction of forest
adam2392 Oct 18, 2023
921eb2f
Merge branch 'main' into might-params
adam2392 Oct 19, 2023
261e359
Merge branch 'might-params' of https://github.com/neurodata/scikit-tr…
adam2392 Oct 19, 2023
0b167ba
Merging in main
adam2392 Oct 19, 2023
3674cc2
FIX add stratifi
PSSF23 Oct 23, 2023
0f27d01
Try stash
adam2392 Oct 24, 2023
e894708
UPdate and address permute forest fraction
adam2392 Oct 24, 2023
2887909
WIP
adam2392 Oct 24, 2023
739c7be
Adding ability to turn off train/test split
adam2392 Oct 24, 2023
30e9e95
Merging main
adam2392 Nov 8, 2023
36c5582
Fix type checK
adam2392 Nov 8, 2023
1ff7a5c
Fix typing
adam2392 Nov 8, 2023
c9c22e9
Fix ci
adam2392 Nov 8, 2023
2a6cda3
Remove fluff
adam2392 Nov 8, 2023
b183db1
Remove any mention of permute_per_tree
adam2392 Nov 8, 2023
8967d60
Merge branch 'main' into might-params
adam2392 Nov 9, 2023
59ae89b
Fix slow test
adam2392 Nov 9, 2023
2e1d53b
Try to fix slow
adam2392 Nov 9, 2023
f3aa7d7
Update
adam2392 Nov 9, 2023
9d0f2db
Lint
adam2392 Nov 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/whats_new/v0.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Version 0.4

Changelog
---------
-

- |API| ``FeatureImportanceForest*`` now has a hyperparameter to control the number of permutations is done per forest ``permute_per_forest_fraction``, by `Adam Li`_ (:pr:`145`)

Code and Documentation Contributors
-----------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,19 +108,13 @@
),
random_state=seed,
test_size=test_size,
permute_per_tree=False,
sample_dataset_per_tree=False,
)

print(
f"Permutation per tree: {est.permute_per_tree} and sampling dataset per tree: "
f"{est.sample_dataset_per_tree}"
)
# we test for the first feature set, which is important and thus should return a pvalue < 0.05
stat, pvalue = est.test(
X, y, covariate_index=np.arange(n_features_set, dtype=int), metric="mi", n_repeats=n_repeats
)
print(f"Estimated MI difference: {stat} with Pvalue: {pvalue}")
print(f"Estimated MI difference for the important feature set: {stat} with Pvalue: {pvalue}")

# we test for the second feature set, which is unimportant and thus should return a pvalue > 0.05
stat, pvalue = est.test(
Expand All @@ -130,7 +124,7 @@
metric="mi",
n_repeats=n_repeats,
)
print(f"Estimated MI difference: {stat} with Pvalue: {pvalue}")
print(f"Estimated MI difference for the unimportant feature set: {stat} with Pvalue: {pvalue}")

# %%
# References
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,16 +134,10 @@ def make_multiview_classification(
),
random_state=seed,
test_size=test_size,
permute_per_tree=False,
sample_dataset_per_tree=False,
)

mv_results = dict()

print(
f"Permutation per tree: {est.permute_per_tree} and sampling dataset per tree: "
f"{est.sample_dataset_per_tree}"
)
# we test for the first feature set, which is important and thus should return a pvalue < 0.05
stat, pvalue = est.test(
X, y, covariate_index=np.arange(10, dtype=int), metric="mi", n_repeats=n_repeats
Expand Down Expand Up @@ -179,8 +173,6 @@ def make_multiview_classification(
),
random_state=seed,
test_size=test_size,
permute_per_tree=False,
sample_dataset_per_tree=False,
)

rf_results = dict()
Expand Down
2 changes: 1 addition & 1 deletion examples/hypothesis_testing/plot_might_auc.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
),
random_state=seed,
test_size=test_size,
permute_per_tree=True,
permute_forest_fraction=1.0 / n_estimators,
sample_dataset_per_tree=True,
)

Expand Down
4 changes: 2 additions & 2 deletions examples/hypothesis_testing/plot_might_mv_auc.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
),
random_state=seed,
test_size=test_size,
permute_per_tree=True,
permute_forest_fraction=1.0 / n_estimators,
sample_dataset_per_tree=True,
)

Expand Down Expand Up @@ -104,7 +104,7 @@
),
random_state=seed,
test_size=test_size,
permute_per_tree=True,
permute_forest_fraction=1.0 / n_estimators,
sample_dataset_per_tree=True,
)

Expand Down
2 changes: 1 addition & 1 deletion sktree/experimental/mutual_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def mutual_info_ksg(
algorithm="kd_tree",
n_jobs: int = -1,
transform: str = "rank",
random_seed: int = None,
random_seed: Optional[int] = None,
):
"""Compute the generalized (conditional) mutual information KSG estimate.

Expand Down
Loading
Loading