Skip to content

Commit

Permalink
DOC improve RandomForest docstring by explicitely stating the splitte…
Browse files Browse the repository at this point in the history
…r strategy used (scikit-learn#27746)
  • Loading branch information
dlhaar authored Nov 8, 2023
1 parent 096b525 commit 714c500
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
6 changes: 3 additions & 3 deletions doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -885,9 +885,9 @@ from a sample drawn with replacement (i.e., a bootstrap sample) from the
training set.

Furthermore, when splitting each node during the construction of a tree, the
best split is found either from all input features or a random subset of size
``max_features``. (See the :ref:`parameter tuning guidelines
<random_forest_parameters>` for more details).
best split is found through an exhaustive search of the features values of
either all input features or a random subset of size ``max_features``.
(See the :ref:`parameter tuning guidelines <random_forest_parameters>` for more details.)

The purpose of these two sources of randomness is to decrease the variance of
the forest estimator. Indeed, individual decision trees typically exhibit high
Expand Down
10 changes: 7 additions & 3 deletions sklearn/ensemble/_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -1177,6 +1177,8 @@ class RandomForestClassifier(ForestClassifier):
A random forest is a meta estimator that fits a number of decision tree
classifiers on various sub-samples of the dataset and uses averaging to
improve the predictive accuracy and control over-fitting.
Trees in the forest use the best split strategy, i.e. equivalent to passing
`splitter="best"` to the underlying :class:`~sklearn.tree.DecisionTreeRegressor`.
The sub-sample size is controlled with the `max_samples` parameter if
`bootstrap=True` (default), otherwise the whole dataset is used to build
each tree.
Expand Down Expand Up @@ -1565,9 +1567,11 @@ class RandomForestRegressor(ForestRegressor):
"""
A random forest regressor.
A random forest is a meta estimator that fits a number of decision
tree regressors on various sub-samples of the dataset and uses averaging
to improve the predictive accuracy and control over-fitting.
A random forest is a meta estimator that fits a number of decision tree
regressors on various sub-samples of the dataset and uses averaging to
improve the predictive accuracy and control over-fitting.
Trees in the forest use the best split strategy, i.e. equivalent to passing
`splitter="best"` to the underlying :class:`~sklearn.tree.DecisionTreeRegressor`.
The sub-sample size is controlled with the `max_samples` parameter if
`bootstrap=True` (default), otherwise the whole dataset is used to build
each tree.
Expand Down

0 comments on commit 714c500

Please sign in to comment.