相关文章推荐
直爽的牛肉面  ·  python/openpyxl/DataVa ...·  3 周前    · 
勤奋的鸭蛋  ·  python - Set up of ...·  2 周前    · 
大力的长颈鹿  ·  python - Conda env ...·  2 周前    · 
大方的鼠标  ·  Java后端WebSocket的Tomcat ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I would like to use a native accuracy_score as a scoring function.

So here is my attempt. Imports and some data:

import numpy as np
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn import neighbors
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([0, 1, 0, 0, 0, 1])

Now when I use just k-fold cross-validation without my scoring function, everything works as intended:

parameters = {
    'n_neighbors': [2, 3, 4],
    'weights':['uniform', 'distance'],
    'p': [1, 2, 3]
model = neighbors.KNeighborsClassifier()
k_fold = KFold(len(Y), n_folds=6, shuffle=True, random_state=0)
clf = GridSearchCV(model, parameters, cv=k_fold)  # TODO will change
clf.fit(X, Y)
print clf.best_score_

But when I change the line to

clf = GridSearchCV(model, parameters, cv=k_fold, scoring=accuracy_score) # or accuracy_score()

I get the error: ValueError: Cannot have number of folds n_folds=10 greater than the number of samples: 6. which in my opinion does not represent the real problem.

In my opinion the problem is that accuracy_score does not follow the signature scorer(estimator, X, y), which is written in the documentation

So how can I fix this problem?

It will work if you change scoring=accuracy_score to scoring='accuracy' (see the documentation for the full list of scorers you can use by name in this way.)

In theory, you should be able to pass custom scoring functions like you're trying, but my guess is that you're right and accuracy_score doesn't have the right API.

Thank you. The first bug is irrelevant, because this is just a typo that I introduced when tried to quickly create a simple example. Thank you very much for solving my real problem. I actually thought that I have to pass a real function, not a string. – Salvador Dali Aug 4, 2016 at 5:57 The issue is that you have to create a scorer from the metric. See: scikit-learn.org/stable/modules/generated/… – A. Bollans Oct 19, 2022 at 11:55

Here is an example of using Weighted Kappa as scoring metric for GridSearchCV for a simple Random Forest model. The key learning for me was to use the parameters related to the scorer in the 'make_scorer' function.

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import cohen_kappa_score, make_scorer
kappa_scorer = make_scorer(cohen_kappa_score,weights="quadratic")
# Create the parameter grid based on the results of random search 
param_grid = {
    'bootstrap': [True],
    'max_features':  range(2,10), # try features from 2 to 10
    'min_samples_leaf': [3, 4, 5],
    'n_estimators' : [100,300,500],
    'max_depth':  [5]
# Create a based model
random_forest = RandomForestClassifier(class_weight ="balanced_subsample",random_state=1)
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest, param_grid = param_grid, 
                         cv = 5, n_jobs = -1, verbose = 2, scoring = kappa_scorer) # search for best model using roc_auc
# Fit the grid search to the data
grid_search.fit(final_tr, yTrain)
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.