Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Track v1.4] New PR branch to serve as submodule for scikit-tree #53

Open
wants to merge 332 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Collaborator

Reference Issues/PRs

As of v0.2 for sktree, we have decided we do not need a custom built and released via pypi scikit-learn fork. Instead, we just have to keep an updated fork branch here that maintains the changes under tree/ and ensemble/.

This branch has significantly lower diff and less complexity compared to e.g. #44

What does this implement/fix? Explain your changes.

Any other comments?

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


sklearn/tree/_classes.py:9:1: I001 [*] Import block is un-sorted or un-formatted
   |
 7 |   # SPDX-License-Identifier: BSD-3-Clause
 8 |   
 9 | / import copy
10 | | import numbers
11 | | from abc import ABCMeta, abstractmethod
12 | | from math import ceil
13 | | from numbers import Integral, Real
14 | | 
15 | | import numpy as np
16 | | from scipy.sparse import issparse
17 | | 
18 | | from sklearn.utils import metadata_routing
19 | | from sklearn.base import (
20 | |     BaseEstimator,
21 | |     ClassifierMixin,
22 | |     MultiOutputMixin,
23 | |     RegressorMixin,
24 | |     _fit_context,
25 | |     clone,
26 | |     is_classifier,
27 | | )
28 | | from sklearn.utils import Bunch, check_random_state, compute_sample_weight
29 | | from sklearn.utils._param_validation import Hidden, Interval, RealNotInt, StrOptions
30 | | from sklearn.utils.multiclass import (
31 | |     _check_partial_fit_first_call,
32 | |     check_classification_targets,
33 | | )
34 | | from sklearn.utils.validation import (
35 | |     _assert_all_finite_element_wise,
36 | |     _check_n_features,
37 | |     _check_sample_weight,
38 | |     assert_all_finite,
39 | |     check_is_fitted,
40 | |     validate_data,
41 | | )
42 | | 
43 | | from . import _criterion, _splitter, _tree
44 | | from ._criterion import BaseCriterion
45 | | from ._splitter import BaseSplitter
46 | | from ._tree import (
47 | |     BestFirstTreeBuilder,
48 | |     DepthFirstTreeBuilder,
49 | |     Tree,
50 | |     _build_pruned_tree_ccp,
51 | |     ccp_pruning_path,
52 | | )
53 | | from ._utils import _any_isnan_axis0
54 | | 
55 | | __all__ = [
   | |_^ I001
56 |       "DecisionTreeClassifier",
57 |       "DecisionTreeRegressor",
   |
   = help: Organize imports

Found 1 error.
[*] 1 fixable with the `--fix` option.

Generated for commit: 66658db. Link to the linter CI: here

sklearn/tree/_tree.pyx Outdated Show resolved Hide resolved
@PSSF23

This comment was marked as outdated.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce the error on my local machine with following code:

from sklearn.ensemble import RandomForestClassifier

import numpy as np
from inspect import signature

rnd = np.random.RandomState(0)
n_samples = 30
X = rnd.uniform(size=(n_samples, 3))
y = np.arange(n_samples)

clf_1 = RandomForestClassifier()
clf_1.set_params(random_state=0)

func = getattr(clf_1, "fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "score", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "partial_fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

The code should replicate most of the check_fit_score_takes_y test, but it runs smoothly every time. I also don't understand why only these 2 CIs failed when all should have the same test library.

  • Linux_Runs pylatest_conda_forge_mkl
  • macOS pylatest_conda_mkl_no_openmp

@adam2392
Copy link
Collaborator Author

I cannot reproduce the error on my local machine with following code:

from sklearn.ensemble import RandomForestClassifier

import numpy as np
from inspect import signature

rnd = np.random.RandomState(0)
n_samples = 30
X = rnd.uniform(size=(n_samples, 3))
y = np.arange(n_samples)

clf_1 = RandomForestClassifier()
clf_1.set_params(random_state=0)

func = getattr(clf_1, "fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "score", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "partial_fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

The code should replicate most of the check_fit_score_takes_y test, but it runs smoothly every time. I also don't understand why only these 2 CIs failed when all should have the same test library.

  • Linux_Runs pylatest_conda_forge_mkl
  • macOS pylatest_conda_mkl_no_openmp

If you try running the HonestForestClassifier instead, then the error seems to be able to be produced on my computer

@PSSF23
Copy link
Member

PSSF23 commented Aug 25, 2023

@adam2392 Is sktree updated with the current submodule? After the resizing bug is fixed?

@adam2392
Copy link
Collaborator Author

Yeah it should be. The test is commented out tho rn.

@PSSF23

This comment was marked as outdated.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CIs passed.

@adam2392 adam2392 force-pushed the submodulev3 branch 2 times, most recently from 62f0c60 to ea330a7 Compare September 8, 2023 18:36
@adam2392
Copy link
Collaborator Author

adam2392 commented Oct 6, 2023

TODO: Change

int -> intp_t
double -> float64_t
SIZE_t -> intp_t
DTYPE_t -> float32_t
INT32_t -> int32_t

see: https://github.com/scikit-learn/scikit-learn/pull/27352/files and related PRs

@adam2392
Copy link
Collaborator Author

adam2392 commented Oct 9, 2023

TODO: Change

int -> intp_t double -> float64_t SIZE_t -> intp_t DTYPE_t -> float32_t INT32_t -> int32_t

see: https://github.com/scikit-learn/scikit-learn/pull/27352/files and related PRs

This was accomplished in 9a5d91b

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 10, 2024

5ccd00f introduces a major change, where n_constant_features is migrated to WITHIN the SplitRecord. This is to enable over-riding what gets passed from parent node to child node.

I.e. MultiviewSplitRecord could in principle override SplitRecord and store additional n_constant_features. One per feature set. Ofc, I don't think this is actually the best design cuz the idea of "multi-view" really breaks away from the underlying assumption that there is only one feature set.

adam2392 added 13 commits June 21, 2024 08:43
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

adam2392 commented Sep 6, 2024

We should remove binning for now as it is a untested feature...

lesteve and others added 30 commits October 29, 2024 13:09
…ant and keep_empty_features is False (scikit-learn#29950)

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Marc Torrellas Socastro <marc.torsoc@gmail.com>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
…mes to contribution (scikit-learn#30177)

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: rajath.k <rajath.k@dish.com>
…t actually support NaN values (scikit-learn#25330)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.