Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Cython axis-aligned multi-view splitter #129

Merged
merged 24 commits into from
Oct 12, 2023

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Sep 26, 2023

Closes #92

Changes proposed in this pull request:

  • Adds an experimental Cython API for multi-view splitting

I foresee multi-view as a separate submodule for now since it's a rather experimental feature that is there to support MIGHT/Co-MIGHT in Cancer research.

Multiview splitting can analogously have its axis-aligned, oblique and morf versions imo. This initial PR implements the axis-aligned multi-view approach.

Note: what is not done is tracking constants, which is an important feature.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

Possible code to sample different feature sets:

        # dataset1 from 0 to N
        for i in range(max_features//2):
            index = random_index_selection(dataset_1)

            rand_int(0, N, random_state)

        # dataset2 from N+1 to M 
        for i in range(max_features//2):
            index = random_index_selection(dataset_2)

            rand_int(N+1, M, random_state)

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 marked this pull request as ready for review October 6, 2023 01:39
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 changed the title [ENH, WIP] Cython multiview work [ENH] Cython axis-aligned multi-view splitter Oct 11, 2023
@adam2392 adam2392 requested a review from sampan501 October 11, 2023 14:47
Copy link
Collaborator Author

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly files to review:

  • sktree/tree/_multiview.py
  • sktree/tree/_oblique_splitter.pxd
  • sktree/tree/_oblique_splitter.pyx
  • examples/splitters/plot_multiview_axis_aligned_splitter.py

@adam2392 adam2392 requested review from PSSF23, YuxinB and SUKI-O October 11, 2023 14:48
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@sampan501 sampan501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! @adam2392 will there be another PR exposing a class for the multiview hypothesis test? Maybe in forestht.py?

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

Looks good to me! @adam2392 will there be another PR exposing a class for the multiview hypothesis test? Maybe in forestht.py?

The design allows any arbitrary tree to be used in HonestForest, and the hypothesis test allows any arbitrary HonestForest parametrization to be used, so in principle this is "possible" once this PR is merged.

However, the next PR will show a more explicit example of multi-view learning on simulated data in examples/ to test the API out. I think there should be some guardrails in place on checking values of feature_set_ends relative to X. I need to add these still tho.

Signed-off-by: Adam Li <adam2392@gmail.com>
@sampan501
Copy link
Member

I agree, I just think it would be easier for the user if the whole class was exposed the user. I imagine a class kind of like FeatureImportanceForestClassifier that just calls HonestForest with the correct parameters. I guess I'm referring more to the Multiview test and not specifically the splitter

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (6760662) 88.78% compared to head (e2ee6e5) 89.76%.
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #129      +/-   ##
==========================================
+ Coverage   88.78%   89.76%   +0.98%     
==========================================
  Files          37       41       +4     
  Lines        3138     3351     +213     
==========================================
+ Hits         2786     3008     +222     
+ Misses        352      343       -9     
Files Coverage Δ
sktree/__init__.py 80.76% <100.00%> (ø)
sktree/ensemble/__init__.py 100.00% <100.00%> (ø)
sktree/ensemble/_multiview.py 100.00% <100.00%> (ø)
sktree/experimental/sdf.py 69.23% <ø> (ø)
sktree/stats/tests/test_coleman.py 100.00% <ø> (ø)
sktree/stats/utils.py 91.89% <ø> (ø)
sktree/tests/test_multiview_forest.py 100.00% <100.00%> (ø)
sktree/tests/test_supervised_forest.py 99.41% <100.00%> (+<0.01%) ⬆️
sktree/tree/__init__.py 100.00% <ø> (ø)
sktree/tree/_classes.py 75.00% <ø> (ø)
... and 6 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adam2392
Copy link
Collaborator Author

I agree, I just think it would be easier for the user if the whole class was exposed the user. I imagine a class kind of like FeatureImportanceForestClassifier that just calls HonestForest with the correct parameters. I guess I'm referring more to the Multiview test and not specifically the splitter

Sure yeah I agree it'll be easier if we can expose a sensible interface that makes semantic sense for users. We can play around w/ the best way to do this.

@adam2392 adam2392 merged commit eb946d4 into neurodata:main Oct 12, 2023
@adam2392 adam2392 deleted the multiviewv2 branch October 12, 2023 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants