Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

In trying to prevent my Random Forest model from overfitting on the training dataset, I looked at the ccp_alpha parameter. I do notice that it is possible to tune it with a hyperparameter search method (as GridSearchCV ).

I discovered that there is a Scikit-Learn tutorial for tuning this ccp_alpha parameter for Decision Tree models. The methodology described uses the cost_complexity_pruning_path method of the Decision Tree model. This section explains well how the method works. I understand that it seeks to find a sub-tree of the generated model that reduces overfitting, while using values of ccp_alpha determined by the cost_complexity_pruning_path method.

clf = DecisionTreeClassifier()
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities

However, I wonder why the Random Forest model type in Scikit-Learn does not implement these ccp_alpha selection and pruning concept. Would it be possible to do this with a little tinkering? It seems more logical to me than trying to find a good value by searching for hyperparameters (whatever one you use..)

I’m voting to close this question because it probably belongs on datascience.stackexchange.com – rickhg12hs Nov 3, 2021 at 20:50

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.