Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to evaluate multiple scoring metrics to determine the best parameters for model performance. i.e., to say:

To maximize F1, I should use these parameters. To maximize precision, I should use these parameters.

I am working off the following example from this sklearn page

import numpy as np
from sklearn.datasets import make_hastie_10_2
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
X, y = make_hastie_10_2(n_samples=5000, random_state=42)
scoring = {'PRECISION': 'precision', 'F1': 'f1'}
gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
                  param_grid={'min_samples_split': range(2, 403, 10)},
                  scoring=scoring, refit='F1', return_train_score=True)
gs.fit(X, y)
best_params = gs.best_params_
best_estimator = gs.best_estimator_
print(best_params)
print(best_estimator)

Which yields:

{'min_samples_split': 62}
DecisionTreeClassifier(min_samples_split=62, random_state=42)

However, what I would be looking for would be to find these results for each metric, so in this case, for F1 and precision

How can I achieve getting the best parameters for each type of scoring metric in GridSearchCV?

Note - I believe it has something to do with my usage of refit='F1', but am not sure how to use multiple metrics there?

To do so, you'll have to dig into the detailed results of the whole grid search CV procedure; fortunately, these detailed results are returned in the cv_results_ attribute of the GridSearchCV object (docs).

I have rerun your code as-is, but I am not retyping it here; it suffices to say that, despite explicitly setting the random number generator's seed, I am getting a different final result (I guess due to different versions) as:

{'min_samples_split': 322}
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=322,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=42, splitter='best')

but this is not important for the issue at hand here.

The easiest way to use the returned cv_results_ dictionary is to convert it to a pandas dataframe:

import pandas as pd
cv_results = pd.DataFrame.from_dict(gs.cv_results_)

Still, as it includes too much info (columns), I will further simplify it here to demonstrate the issue (feel free to explore it more fully yourself):

df = cv_results[['params', 'mean_test_PRECISION', 'rank_test_PRECISION', 'mean_test_F1', 'rank_test_F1']]
pd.set_option("display.max_rows", None, "display.max_columns", None)
pd.set_option('expand_frame_repr', False)
print(df)

Result:

                        params  mean_test_PRECISION  rank_test_PRECISION  mean_test_F1  rank_test_F1
0     {'min_samples_split': 2}             0.771782                    1      0.763041            41
1    {'min_samples_split': 12}             0.768040                    2      0.767331            38
2    {'min_samples_split': 22}             0.767196                    3      0.776677            29
3    {'min_samples_split': 32}             0.760282                    4      0.773634            32
4    {'min_samples_split': 42}             0.754572                    8      0.777967            26
5    {'min_samples_split': 52}             0.754034                    9      0.777550            27
6    {'min_samples_split': 62}             0.758131                    5      0.773348            33
7    {'min_samples_split': 72}             0.756021                    6      0.774301            30
8    {'min_samples_split': 82}             0.755612                    7      0.768065            37
9    {'min_samples_split': 92}             0.750527                   10      0.771023            34
10  {'min_samples_split': 102}             0.741016                   11      0.769896            35
11  {'min_samples_split': 112}             0.740965                   12      0.765353            39
12  {'min_samples_split': 122}             0.731790                   13      0.763620            40
13  {'min_samples_split': 132}             0.723085                   14      0.768605            36
14  {'min_samples_split': 142}             0.713345                   15      0.774117            31
15  {'min_samples_split': 152}             0.712958                   16      0.776721            28
16  {'min_samples_split': 162}             0.709804                   17      0.778287            24
17  {'min_samples_split': 172}             0.707080                   18      0.778528            22
18  {'min_samples_split': 182}             0.702621                   19      0.778516            23
19  {'min_samples_split': 192}             0.697630                   20      0.778103            25
20  {'min_samples_split': 202}             0.693011                   21      0.781047            10
21  {'min_samples_split': 212}             0.693011                   21      0.781047            10
22  {'min_samples_split': 222}             0.693011                   21      0.781047            10
23  {'min_samples_split': 232}             0.692810                   24      0.779705            13
24  {'min_samples_split': 242}             0.692810                   24      0.779705            13
25  {'min_samples_split': 252}             0.692810                   24      0.779705            13
26  {'min_samples_split': 262}             0.692810                   24      0.779705            13
27  {'min_samples_split': 272}             0.692810                   24      0.779705            13
28  {'min_samples_split': 282}             0.692810                   24      0.779705            13
29  {'min_samples_split': 292}             0.692810                   24      0.779705            13
30  {'min_samples_split': 302}             0.692810                   24      0.779705            13
31  {'min_samples_split': 312}             0.692810                   24      0.779705            13
32  {'min_samples_split': 322}             0.688417                   33      0.782772             1
33  {'min_samples_split': 332}             0.688417                   33      0.782772             1
34  {'min_samples_split': 342}             0.688417                   33      0.782772             1
35  {'min_samples_split': 352}             0.688417                   33      0.782772             1
36  {'min_samples_split': 362}             0.688417                   33      0.782772             1
37  {'min_samples_split': 372}             0.688417                   33      0.782772             1
38  {'min_samples_split': 382}             0.688417                   33      0.782772             1
39  {'min_samples_split': 392}             0.688417                   33      0.782772             1
40  {'min_samples_split': 402}             0.688417                   33      0.782772             1

The names of the columns should be self-explanatory; they include the parameters tried, the score for each one of the metrics used, and the corresponding rank (1 meaning the best). You can immediately see, for example, that, despite the fact that 'min_samples_split': 322 gives indeed the best F1 score, it is not the only parameter setting that does so, and there are many more settings that also give the best F1 score and a respective rank_test_F1 of 1 in the results.

From this point, it is trivial to get the info you want; for example, here are the best models for each one of your two metrics:

print(df.loc[df['rank_test_PRECISION']==1]) # best precision
# result:
                     params  mean_test_PRECISION  rank_test_PRECISION  mean_test_F1  rank_test_F1
0  {'min_samples_split': 2}             0.771782                    1      0.763041            41
print(df.loc[df['rank_test_F1']==1]) # best F1
# result:
                        params  mean_test_PRECISION  rank_test_PRECISION  mean_test_F1  rank_test_F1
32  {'min_samples_split': 322}             0.688417                   33      0.782772             1
33  {'min_samples_split': 332}             0.688417                   33      0.782772             1
34  {'min_samples_split': 342}             0.688417                   33      0.782772             1
35  {'min_samples_split': 352}             0.688417                   33      0.782772             1
36  {'min_samples_split': 362}             0.688417                   33      0.782772             1
37  {'min_samples_split': 372}             0.688417                   33      0.782772             1
38  {'min_samples_split': 382}             0.688417                   33      0.782772             1
39  {'min_samples_split': 392}             0.688417                   33      0.782772             1
40  {'min_samples_split': 402}             0.688417                   33      0.782772             1
                just for understanding's sake...this would read that min_samples_split: 2 was the best hyperparameter for maximizing precision, even with refit=F1?
– artemis
                Jul 20, 2020 at 16:59
                @wundermahn exactly; going through the df, you can easily confirm that the respective precision value of 0.771782 is the maximum one indeed. What you specify in refit determines what the process will return as best_params and best_estimator  (that's why here you got the parameters that maximize F1, and not precision), since it should be apparent that you cannot optimize for more than one metric simultaneously.
– desertnaut
                Jul 20, 2020 at 20:06
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.