Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am implementing an example from the O'Reilly book "
Introduction to Machine Learning with Python
", using Python 2.7 and sklearn 0.16.
The code I am using:
pipe = make_pipeline(TfidfVectorizer(), LogisticRegression())
param_grid = {"logisticregression_C": [0.001, 0.01, 0.1, 1, 10, 100], "tfidfvectorizer_ngram_range": [(1,1), (1,2), (1,3)]}
grid = GridSearchCV(pipe, param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best cross-validation score: {:.2f}".format(grid.best_score_))
The error being returned boils down to:
ValueError: Invalid parameter logisticregression_C for estimator Pipeline
Is this an error related to using Make_pipeline from v.0.16? What is causing this error?
There should be two underscores between estimator name and it's parameters in a Pipeline
logisticregression__C
. Do the same for tfidfvectorizer
It is mentioned in the user guide here: https://scikit-learn.org/stable/modules/compose.html#nested-parameters.
See the example at https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py
–
–
For a more general answer to using Pipeline
in a GridSearchCV
, the parameter grid for the model should start with whatever name you gave when defining the pipeline. For example:
# Pay attention to the name of the second step, i. e. 'model'
pipeline = Pipeline(steps=[
('preprocess', preprocess),
('model', Lasso())
# Define the parameter grid to be used in GridSearch
param_grid = {'model__alpha': np.arange(0, 1, 0.05)}
search = GridSearchCV(pipeline, param_grid)
search.fit(X_train, y_train)
In the pipeline, we used the name model
for the estimator step. So, in the grid search, any hyperparameter for Lasso regression should be given with the prefix model__
. The parameters in the grid depends on what name you gave in the pipeline. In plain-old GridSearchCV
without a pipeline, the grid would be given like this:
param_grid = {'alpha': np.arange(0, 1, 0.05)}
search = GridSearchCV(Lasso(), param_grid)
You can find out more about GridSearch from this post.
Note that if you are using a pipeline with a voting classifier and a column selector, you will need multiple layers of names:
pipe1 = make_pipeline(ColumnSelector(cols=(0, 1)),
LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=(1, 2, 3)),
SVC())
votingClassifier = VotingClassifier(estimators=[
('p1', pipe1), ('p2', pipe2)])
You will need a param grid that looks like the following:
param_grid = {
'p2__svc__kernel': ['rbf', 'poly'],
'p2__svc__gamma': ['scale', 'auto'],
p2
is the name of the pipe and svc
is the default name of the classifier you create in that pipe. The third element is the parameter you want to modify.
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.