Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have a data set with 100 columns of continuous features, and a continuous label, and I want to run SVR; extracting features of relevance, tuning hyper parameters, and then cross-validating my model that is fit to my data.
I wrote this code:
X_train, X_test, y_train, y_test = train_test_split(scaled_df, target, test_size=0.2)
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# define the pipeline to evaluate
model = SVR()
fs = SelectKBest(score_func=mutual_info_regression)
pipeline = Pipeline(steps=[('sel',fs), ('svr', model)])
# define the grid
grid = dict()
#How many features to try
grid['estimator__sel__k'] = [i for i in range(1, X_train.shape[1]+1)]
# define the grid search
#search = GridSearchCV(pipeline, grid, scoring='neg_mean_squared_error', n_jobs=-1, cv=cv)
search = GridSearchCV(
pipeline,
# estimator=SVR(kernel='rbf'),
param_grid={
'estimator__svr__C': [0.1, 1, 10, 100, 1000],
'estimator__svr__epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 1, 5, 10],
'estimator__svr__gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 1, 5, 10]
scoring='neg_mean_squared_error',
verbose=1,
n_jobs=-1)
for param in search.get_params().keys():
print(param)
# perform the search
results = search.fit(X_train, y_train)
# summarize best
print('Best MAE: %.3f' % results.best_score_)
print('Best Config: %s' % results.best_params_)
# summarize all
means = results.cv_results_['mean_test_score']
params = results.cv_results_['params']
for mean, param in zip(means, params):
print(">%.3f with: %r" % (mean, param))
I get the error:
ValueError: Invalid parameter estimator for estimator Pipeline(memory=None,
steps=[('sel',
SelectKBest(k=10,
score_func=<function mutual_info_regression at 0x7fd2ff649cb0>)),
('svr',
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
gamma='scale', kernel='rbf', max_iter=-1, shrinking=True,
tol=0.001, verbose=False))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.
When I print estimator.get_params().keys()
, as suggested in the error message, I get:
error_score
estimator__memory
estimator__steps
estimator__verbose
estimator__sel
estimator__svr
estimator__sel__k
estimator__sel__score_func
estimator__svr__C
estimator__svr__cache_size
estimator__svr__coef0
estimator__svr__degree
estimator__svr__epsilon
estimator__svr__gamma
estimator__svr__kernel
estimator__svr__max_iter
estimator__svr__shrinking
estimator__svr__tol
estimator__svr__verbose
estimator
n_jobs
param_grid
pre_dispatch
refit
return_train_score
scoring
verbose
Fitting 5 folds for each of 405 candidates, totalling 2025 fits
But when I change the line:
pipeline = Pipeline(steps=[('sel',fs), ('svr', model)])
pipeline = Pipeline(steps=[('estimator__sel',fs), ('estimator__svr', model)])
I get the error:
ValueError: Estimator names must not contain __: got ['estimator__sel', 'estimator__svr']
Could someone explain what I'm doing wrong, i.e. how do I combine the pipeline/feature selection step into the GridSearchCV?
As a side note, if I comment out pipeline
in the GridSearchCV, and uncomment estimator=SVR(kernal='rbf')
, the cell runs without issue, but in that case, I presume I am not incorporating the feature selection in, as it's not called anywhere. I have seen some previous SO questions, e.g. here, but they don't seem to answer this specific question.
Is there a cleaner way to write this?
The first error message is about the pipeline
parameters, not the search
parameters, and indicates that your param_grid
is bad, not the pipeline step names. Running pipeline.get_params().keys()
should show you the right parameter names. Your grid should be:
param_grid={
'svr__C': [0.1, 1, 10, 100, 1000],
'svr__epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 1, 5, 10],
'svr__gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 1, 5, 10]
I don't know how substituting the plain SVR for the pipeline runs; your parameter grid doesn't specify the right things there either...
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.