Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have this code which model the imbalance class via decision tree. but some how ccp_alpha in the end its not picking the right value. the ccp_alpha should be around 0.005 instead of code is picking up 0.020.
I am not sure why "cp_alpha=0.02044841897041862" instead of 0.005 as per the graph of
"Recall vs alpha for training and testing sets"
class_weight_t={0: 0.07, 1: 0.89}
clf = DecisionTreeClassifier(random_state=1, class_weight=class_weight_t)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight=class_weight_t
clf.fit(X_train, y_train)
clfs.append(clf)
#print(str(clf)+","+str(ccp_alpha)+","+str(clfs[-1].tree_.node_count))
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
Number of nodes in the last tree is: 1 with ccp_alpha: 0.29696815935983295
recall_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
recall_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
#ax.plot(
# ccp_alphas, train_scores, marker="o", label="train", drawstyle="steps-post",
#ax.plot(ccp_alphas, test_scores, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
https://i.stack.imgur.com/0imAq.png
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.02044841897041862,class_weight={0: 0.07, 1: 0.89}, random_state=1)
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.