Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
In trying to prevent my
Random Forest
model from overfitting on the training dataset, I looked at the
ccp_alpha
parameter.
I do notice that it is possible to tune it with a hyperparameter search method (as
GridSearchCV
).
I discovered that there is a
Scikit-Learn tutorial
for tuning this
ccp_alpha
parameter for
Decision Tree
models.
The methodology described uses the
cost_complexity_pruning_path
method of the Decision Tree model.
This section
explains well how the method works. I understand that it seeks to find a sub-tree of the generated model that reduces overfitting, while using values of
ccp_alpha
determined by the
cost_complexity_pruning_path
method.
clf = DecisionTreeClassifier()
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
However, I wonder why the Random Forest model type in Scikit-Learn does not implement these ccp_alpha selection and pruning concept.
Would it be possible to do this with a little tinkering?
It seems more logical to me than trying to find a good value by searching for hyperparameters (whatever one you use..)
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.