上图为论文中的图片,我们可以看到,数据的balacing我们可以选择做与不做,缺失值填充可以选择用中位数(median)或者平均数(mean)填充… ,最后的那根”管子“:估计器(estimator)是一根特殊的管子,他不仅要”选择什么样的水管“(算法选择),还要知道”将水管上的阀门拧到什么样的位置“(超参选择)。
创建搜索空间
self.configuration_space, configspace_path = self._create_search_space(
进入对应区域:autosklearn.automl.AutoML#_create_search_space
看到configuration_space = pipeline.get_configuration_space(
进入对应区域:autosklearn.util.pipeline.get_configuration_space
在这个函数中,配置了info
字典之后,最后一段代码:
if info['task'] in REGRESSION_TASKS:
return _get_regression_configuration_space(info, include, exclude)
else:
return _get_classification_configuration_space(info, include, exclude)
- 进入对应区域:
autosklearn.util.pipeline._get_classification_configuration_space
最后一段代码:
return SimpleClassificationPipeline(
dataset_properties=dataset_properties,
include=include, exclude=exclude).\
get_hyperparameter_search_space()
- 进入对应区域:
autosklearn.pipeline.base.BasePipeline#get_hyperparameter_search_space
最后一段代码:
if not hasattr(self, 'config_space') or self.config_space is None:
self.config_space = self._get_hyperparameter_search_space(
include=self.include_, exclude=self.exclude_,
dataset_properties=self.dataset_properties_)
return self.config_space
- 进入对应区域:
autosklearn.pipeline.classification.SimpleClassificationPipeline#_get_hyperparameter_search_space
至此,进过多次跳转与入栈,我们终于进入了”干货“最为丰富的区域了。
看到如下代码:
cs = self._get_base_search_space(
cs=cs, dataset_properties=dataset_properties,
exclude=exclude, include=include, pipeline=self.steps)
注意,这里的self.steps表示autosklearn想要优化出的Pipeline的所有节点。
- 进入对应区域:
autosklearn.pipeline.base.BasePipeline#_get_base_search_space
看到要获取matches
,我们想知道matches是怎么来的:
- 进入对应区域:
autosklearn.pipeline.create_searchspace_util.get_match_array
在for node_name, node in pipeline:
这个循环中,构造了一个很重要的变量:node_i_choices
,他是一个2维列表。在原生形式中,维度1为7,表示7个Pipeline的结点。其中每个子列表表示可以选择的所有option
我取前4个作为样例
node_i_choices[0]
Out[16]:
[autosklearn.pipeline.components.data_preprocessing.one_hot_encoding.no_encoding.NoEncoding,
autosklearn.pipeline.components.data_preprocessing.one_hot_encoding.one_hot_encoding.OneHotEncoder]
node_i_choices[1]
Out[17]: [Imputation(random_state=None, strategy='median')]
node_i_choices[2]
Out[18]: [VarianceThreshold(random_state=None)]
node_i_choices[3]
Out[19]:
[autosklearn.pipeline.components.data_preprocessing.rescaling.minmax.MinMaxScalerComponent,
autosklearn.pipeline.components.data_preprocessing.rescaling.none.NoRescalingComponent,
autosklearn.pipeline.components.data_preprocessing.rescaling.normalize.NormalizerComponent,
autosklearn.pipeline.components.data_preprocessing.rescaling.quantile_transformer.QuantileTransformerComponent,
autosklearn.pipeline.components.data_preprocessing.rescaling.robust_scaler.RobustScalerComponent,
autosklearn.pipeline.components.data_preprocessing.rescaling.standardize.StandardScalerComponent]
之后,matches_dimensions
表示每个子列表的长度,用来构造一个高维张量matches
matches_dimensions
Out[20]: [2, 1, 1, 6, 1, 15, 15]
matches = np.ones(matches_dimensions, dtype=int)
pipeline_idxs = [range(dim) for dim in matches_dimensions]
for pipeline_instantiation_idxs in itertools.product(*pipeline_idxs):
可以理解为遍历这条Pipeline中所有的可能。
pipeline_instantiation_idxs
表示某个Pipeline在matches
中的坐标
pipeline_instantiation_idxs
Out[25]: (0, 0, 0, 0, 0, 0, 0)
node_input = node.get_properties()['input']
node_output = node.get_properties()['output']
node_input
Out[26]: (5, 6, 10)
node_output
Out[27]: (8,)
这个操作乍一看不理解,跳转get_properties
函数我们看到:
'input': (DENSE, SPARSE, UNSIGNED_DATA),
'output': (PREDICTIONS,)}
应该是适应哪些类型。
首先判断sparse与dense是否check:
# First check if these two instantiations of this node can work
# together. Do this in multiple if statements to maintain
# readability
if (data_is_sparse and SPARSE not in node_input) or \
not data_is_sparse and DENSE not in node_input:
matches[pipeline_instantiation_idxs] = 0
break
# No need to check if the node can handle SIGNED_DATA; this is
# always assumed to be true
elif not dataset_is_signed and UNSIGNED_DATA not in node_input:
matches[pipeline_instantiation_idxs] = 0
break
后面的操作也差不多,反正就是检查这个Pipeline是否合理。源码很sophisticated,我暂时跳过。
最后返回matches
- 返回对应区域:
autosklearn.pipeline.base.BasePipeline#_get_base_search_space:293
if not is_choice:
cs.add_configuration_space(node_name,
node.get_hyperparameter_search_space(dataset_properties))
# If the node isn't a choice, we have to figure out which of it's
# choices are actually legal choices
else:
choices_list = \
autosklearn.pipeline.create_searchspace_util.find_active_choices(
matches, node, node_idx,
dataset_properties,
include.get(node_name),
exclude.get(node_name)
sub_config_space = node.get_hyperparameter_search_space(
dataset_properties, include=choices_list)
cs.add_configuration_space(node_name, sub_config_space)
如果是选择性的结点,则进入else的部分,choices_list
是所有的候选项
choices_list
Out[29]: ['no_encoding', 'one_hot_encoding']
我们再打印一下
sub_config_space
Out[30]:
Configuration space object:
Hyperparameters:
__choice__, Type: Categorical, Choices: {no_encoding, one_hot_encoding}, Default: one_hot_encoding
one_hot_encoding:minimum_fraction, Type: UniformFloat, Range: [0.0001, 0.5], Default: 0.01, on log-scale
one_hot_encoding:use_minimum_fraction, Type: Categorical, Choices: {True, False}, Default: True
Conditions:
one_hot_encoding:minimum_fraction | one_hot_encoding:use_minimum_fraction == 'True'
one_hot_encoding:use_minimum_fraction | __choice__ == 'one_hot_encoding'
我们打印一下特征处理部分:
Configuration space object:
Hyperparameters:
__choice__, Type: Categorical, Choices: {extra_trees_preproc_for_classification, fast_ica, feature_agglomeration, kernel_pca, kitchen_sinks, liblinear_svc_preprocessor, no_preprocessing, nystroem_sampler, pca, polynomial, random_trees_embedding, select_percentile_classification, select_rates}, Default: no_preprocessing
extra_trees_preproc_for_classification:bootstrap, Type: Categorical, Choices: {True, False}, Default: False
extra_trees_preproc_for_classification:criterion, Type: Categorical, Choices: {gini, entropy}, Default: gini
extra_trees_preproc_for_classification:max_depth, Type: Constant, Value: None
extra_trees_preproc_for_classification:max_features, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
extra_trees_preproc_for_classification:max_leaf_nodes, Type: Constant, Value: None
extra_trees_preproc_for_classification:min_impurity_decrease, Type: Constant, Value: 0.0
extra_trees_preproc_for_classification:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
extra_trees_preproc_for_classification:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
extra_trees_preproc_for_classification:min_weight_fraction_leaf, Type: Constant, Value: 0.0
extra_trees_preproc_for_classification:n_estimators, Type: Constant, Value: 100
fast_ica:algorithm, Type: Categorical, Choices: {parallel, deflation}, Default: parallel
fast_ica:fun, Type: Categorical, Choices: {logcosh, exp, cube}, Default: logcosh
fast_ica:n_components, Type: UniformInteger, Range: [10, 2000], Default: 100
fast_ica:whiten, Type: Categorical, Choices: {False, True}, Default: False
feature_agglomeration:affinity, Type: Categorical, Choices: {euclidean, manhattan, cosine}, Default: euclidean
feature_agglomeration:linkage, Type: Categorical, Choices: {ward, complete, average}, Default: ward
feature_agglomeration:n_clusters, Type: UniformInteger, Range: [2, 400], Default: 25
feature_agglomeration:pooling_func, Type: Categorical, Choices: {mean, median, max}, Default: mean
kernel_pca:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0
kernel_pca:degree, Type: UniformInteger, Range: [2, 5], Default: 3
kernel_pca:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 1.0, on log-scale
kernel_pca:kernel, Type: Categorical, Choices: {poly, rbf, sigmoid, cosine}, Default: rbf
kernel_pca:n_components, Type: UniformInteger, Range: [10, 2000], Default: 100
kitchen_sinks:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 1.0, on log-scale
kitchen_sinks:n_components, Type: UniformInteger, Range: [50, 10000], Default: 100, on log-scale
liblinear_svc_preprocessor:C, Type: UniformFloat, Range: [0.03125, 32768.0], Default: 1.0, on log-scale
liblinear_svc_preprocessor:dual, Type: Constant, Value: False
liblinear_svc_preprocessor:fit_intercept, Type: Constant, Value: True
liblinear_svc_preprocessor:intercept_scaling, Type: Constant, Value: 1
liblinear_svc_preprocessor:loss, Type: Categorical, Choices: {hinge, squared_hinge}, Default: squared_hinge
liblinear_svc_preprocessor:multi_class, Type: Constant, Value: ovr
liblinear_svc_preprocessor:penalty, Type: Constant, Value: l1
liblinear_svc_preprocessor:tol, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.0001, on log-scale
nystroem_sampler:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0
nystroem_sampler:degree, Type: UniformInteger, Range: [2, 5], Default: 3
nystroem_sampler:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 0.1, on log-scale
nystroem_sampler:kernel, Type: Categorical, Choices: {poly, rbf, sigmoid, cosine}, Default: rbf
nystroem_sampler:n_components, Type: UniformInteger, Range: [50, 10000], Default: 100, on log-scale
pca:keep_variance, Type: UniformFloat, Range: [0.5, 0.9999], Default: 0.9999
pca:whiten, Type: Categorical, Choices: {False, True}, Default: False
polynomial:degree, Type: UniformInteger, Range: [2, 3], Default: 2
polynomial:include_bias, Type: Categorical, Choices: {True, False}, Default: True
polynomial:interaction_only, Type: Categorical, Choices: {False, True}, Default: False
random_trees_embedding:bootstrap, Type: Categorical, Choices: {True, False}, Default: True
random_trees_embedding:max_depth, Type: UniformInteger, Range: [2, 10], Default: 5
random_trees_embedding:max_leaf_nodes, Type: Constant, Value: None
random_trees_embedding:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
random_trees_embedding:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
random_trees_embedding:min_weight_fraction_leaf, Type: Constant, Value: 1.0
random_trees_embedding:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 10
select_percentile_classification:percentile, Type: UniformFloat, Range: [1.0, 99.0], Default: 50.0
select_percentile_classification:score_func, Type: Categorical, Choices: {chi2, f_classif, mutual_info}, Default: chi2
select_rates:alpha, Type: UniformFloat, Range: [0.01, 0.5], Default: 0.1
select_rates:mode, Type: Categorical, Choices: {fpr, fdr, fwe}, Default: fpr
select_rates:score_func, Type: Categorical, Choices: {chi2, f_classif}, Default: chi2
Conditions:
extra_trees_preproc_for_classification:bootstrap | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:criterion | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:max_depth | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:max_features | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:max_leaf_nodes | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:min_impurity_decrease | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:min_samples_leaf | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:min_samples_split | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:min_weight_fraction_leaf | __choice__ == 'extra_trees_preproc_for_classification'
extra_trees_preproc_for_classification:n_estimators | __choice__ == 'extra_trees_preproc_for_classification'
fast_ica:algorithm | __choice__ == 'fast_ica'
fast_ica:fun | __choice__ == 'fast_ica'
fast_ica:n_components | fast_ica:whiten == 'True'
fast_ica:whiten | __choice__ == 'fast_ica'
feature_agglomeration:affinity | __choice__ == 'feature_agglomeration'
feature_agglomeration:linkage | __choice__ == 'feature_agglomeration'
feature_agglomeration:n_clusters | __choice__ == 'feature_agglomeration'
feature_agglomeration:pooling_func | __choice__ == 'feature_agglomeration'
kernel_pca:degree | kernel_pca:kernel == 'poly'
kernel_pca:kernel | __choice__ == 'kernel_pca'
kernel_pca:n_components | __choice__ == 'kernel_pca'
kitchen_sinks:gamma | __choice__ == 'kitchen_sinks'
kitchen_sinks:n_components | __choice__ == 'kitchen_sinks'
liblinear_svc_preprocessor:C | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:dual | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:fit_intercept | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:intercept_scaling | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:loss | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:multi_class | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:penalty | __choice__ == 'liblinear_svc_preprocessor'
liblinear_svc_preprocessor:tol | __choice__ == 'liblinear_svc_preprocessor'
nystroem_sampler:degree | nystroem_sampler:kernel == 'poly'
nystroem_sampler:kernel | __choice__ == 'nystroem_sampler'
nystroem_sampler:n_components | __choice__ == 'nystroem_sampler'
pca:keep_variance | __choice__ == 'pca'
pca:whiten | __choice__ == 'pca'
polynomial:degree | __choice__ == 'polynomial'
polynomial:include_bias | __choice__ == 'polynomial'
polynomial:interaction_only | __choice__ == 'polynomial'
preprocessor:kernel_pca:coef0 | preprocessor:kernel_pca:kernel in {'poly', 'sigmoid'}
preprocessor:kernel_pca:gamma | preprocessor:kernel_pca:kernel in {'poly', 'rbf'}
preprocessor:nystroem_sampler:coef0 | preprocessor:nystroem_sampler:kernel in {'poly', 'sigmoid'}
preprocessor:nystroem_sampler:gamma | preprocessor:nystroem_sampler:kernel in {'poly', 'rbf', 'sigmoid'}
random_trees_embedding:bootstrap | __choice__ == 'random_trees_embedding'
random_trees_embedding:max_depth | __choice__ == 'random_trees_embedding'
random_trees_embedding:max_leaf_nodes | __choice__ == 'random_trees_embedding'
random_trees_embedding:min_samples_leaf | __choice__ == 'random_trees_embedding'
random_trees_embedding:min_samples_split | __choice__ == 'random_trees_embedding'
random_trees_embedding:min_weight_fraction_leaf | __choice__ == 'random_trees_embedding'
random_trees_embedding:n_estimators | __choice__ == 'random_trees_embedding'
select_percentile_classification:percentile | __choice__ == 'select_percentile_classification'
select_percentile_classification:score_func | __choice__ == 'select_percentile_classification'
select_rates:alpha | __choice__ == 'select_rates'
select_rates:mode | __choice__ == 'select_rates'
select_rates:score_func | __choice__ == 'select_rates'
Forbidden Clauses:
(Forbidden: preprocessor:feature_agglomeration:affinity in {'cosine', 'manhattan'} && Forbidden: preprocessor:feature_agglomeration:linkage == 'ward')
(Forbidden: preprocessor:liblinear_svc_preprocessor:penalty == 'l1' && Forbidden: preprocessor:liblinear_svc_preprocessor:loss == 'hinge')
我们打印一下模型超参部分:
Configuration space object:
Hyperparameters:
__choice__, Type: Categorical, Choices: {adaboost, bernoulli_nb, decision_tree, extra_trees, gaussian_nb, gradient_boosting, k_nearest_neighbors, lda, liblinear_svc, libsvm_svc, multinomial_nb, passive_aggressive, qda, random_forest, sgd}, Default: random_forest
adaboost:algorithm, Type: Categorical, Choices: {SAMME.R, SAMME}, Default: SAMME.R
adaboost:learning_rate, Type: UniformFloat, Range: [0.01, 2.0], Default: 0.1, on log-scale
adaboost:max_depth, Type: UniformInteger, Range: [1, 10], Default: 1
adaboost:n_estimators, Type: UniformInteger, Range: [50, 500], Default: 50
bernoulli_nb:alpha, Type: UniformFloat, Range: [0.01, 100.0], Default: 1.0, on log-scale
bernoulli_nb:fit_prior, Type: Categorical, Choices: {True, False}, Default: True
decision_tree:criterion, Type: Categorical, Choices: {gini, entropy}, Default: gini
decision_tree:max_depth_factor, Type: UniformFloat, Range: [0.0, 2.0], Default: 0.5
decision_tree:max_features, Type: Constant, Value: 1.0
decision_tree:max_leaf_nodes, Type: Constant, Value: None
decision_tree:min_impurity_decrease, Type: Constant, Value: 0.0
decision_tree:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
decision_tree:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
decision_tree:min_weight_fraction_leaf, Type: Constant, Value: 0.0
extra_trees:bootstrap, Type: Categorical, Choices: {True, False}, Default: False
extra_trees:criterion, Type: Categorical, Choices: {gini, entropy}, Default: gini
extra_trees:max_depth, Type: Constant, Value: None
extra_trees:max_features, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
extra_trees:max_leaf_nodes, Type: Constant, Value: None
extra_trees:min_impurity_decrease, Type: Constant, Value: 0.0
extra_trees:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
extra_trees:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
extra_trees:min_weight_fraction_leaf, Type: Constant, Value: 0.0
extra_trees:n_estimators, Type: Constant, Value: 100
gradient_boosting:early_stop, Type: Categorical, Choices: {off, train, valid}, Default: off
gradient_boosting:l2_regularization, Type: UniformFloat, Range: [1e-10, 1.0], Default: 1e-10, on log-scale
gradient_boosting:learning_rate, Type: UniformFloat, Range: [0.01, 1.0], Default: 0.1, on log-scale
gradient_boosting:loss, Type: Constant, Value: auto
gradient_boosting:max_bins, Type: Constant, Value: 256
gradient_boosting:max_depth, Type: Constant, Value: None
gradient_boosting:max_iter, Type: UniformInteger, Range: [32, 512], Default: 100
gradient_boosting:max_leaf_nodes, Type: UniformInteger, Range: [3, 2047], Default: 31, on log-scale
gradient_boosting:min_samples_leaf, Type: UniformInteger, Range: [1, 200], Default: 20, on log-scale
gradient_boosting:n_iter_no_change, Type: UniformInteger, Range: [1, 20], Default: 10
gradient_boosting:scoring, Type: Constant, Value: loss
gradient_boosting:tol, Type: Constant, Value: 1e-07
gradient_boosting:validation_fraction, Type: UniformFloat, Range: [0.01, 0.4], Default: 0.1
k_nearest_neighbors:n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 1, on log-scale
k_nearest_neighbors:p, Type: Categorical, Choices: {1, 2}, Default: 2
k_nearest_neighbors:weights, Type: Categorical, Choices: {uniform, distance}, Default: uniform
lda:n_components, Type: UniformInteger, Range: [1, 250], Default: 10
lda:shrinkage, Type: Categorical, Choices: {None, auto, manual}, Default: None
lda:shrinkage_factor, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
lda:tol, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.0001, on log-scale
liblinear_svc:C, Type: UniformFloat, Range: [0.03125, 32768.0], Default: 1.0, on log-scale
liblinear_svc:dual, Type: Constant, Value: False
liblinear_svc:fit_intercept, Type: Constant, Value: True
liblinear_svc:intercept_scaling, Type: Constant, Value: 1
liblinear_svc:loss, Type: Categorical, Choices: {hinge, squared_hinge}, Default: squared_hinge
liblinear_svc:multi_class, Type: Constant, Value: ovr
liblinear_svc:penalty, Type: Categorical, Choices: {l1, l2}, Default: l2
liblinear_svc:tol, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.0001, on log-scale
libsvm_svc:C, Type: UniformFloat, Range: [0.03125, 32768.0], Default: 1.0, on log-scale
libsvm_svc:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0
libsvm_svc:degree, Type: UniformInteger, Range: [2, 5], Default: 3
libsvm_svc:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 0.1, on log-scale
libsvm_svc:kernel, Type: Categorical, Choices: {rbf, poly, sigmoid}, Default: rbf
libsvm_svc:max_iter, Type: Constant, Value: -1
libsvm_svc:shrinking, Type: Categorical, Choices: {True, False}, Default: True
libsvm_svc:tol, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.001, on log-scale
multinomial_nb:alpha, Type: UniformFloat, Range: [0.01, 100.0], Default: 1.0, on log-scale
multinomial_nb:fit_prior, Type: Categorical, Choices: {True, False}, Default: True
passive_aggressive:C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 1.0, on log-scale
passive_aggressive:average, Type: Categorical, Choices: {False, True}, Default: False
passive_aggressive:fit_intercept, Type: Constant, Value: True
passive_aggressive:loss, Type: Categorical, Choices: {hinge, squared_hinge}, Default: hinge
passive_aggressive:tol, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.0001, on log-scale
qda:reg_param, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.0
random_forest:bootstrap, Type: Categorical, Choices: {True, False}, Default: True
random_forest:criterion, Type: Categorical, Choices: {gini, entropy}, Default: gini
random_forest:max_depth, Type: Constant, Value: None
random_forest:max_features, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
random_forest:max_leaf_nodes, Type: Constant, Value: None
random_forest:min_impurity_decrease, Type: Constant, Value: 0.0
random_forest:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1
random_forest:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2
random_forest:min_weight_fraction_leaf, Type: Constant, Value: 0.0
random_forest:n_estimators, Type: Constant, Value: 100
sgd:alpha, Type: UniformFloat, Range: [1e-07, 0.1], Default: 0.0001, on log-scale
sgd:average, Type: Categorical, Choices: {False, True}, Default: False
sgd:epsilon, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.0001, on log-scale
sgd:eta0, Type: UniformFloat, Range: [1e-07, 0.1], Default: 0.01, on log-scale
sgd:fit_intercept, Type: Constant, Value: True
sgd:l1_ratio, Type: UniformFloat, Range: [1e-09, 1.0], Default: 0.15, on log-scale
sgd:learning_rate, Type: Categorical, Choices: {optimal, invscaling, constant}, Default: invscaling
sgd:loss, Type: Categorical, Choices: {hinge, log, modified_huber, squared_hinge, perceptron}, Default: log
sgd:penalty, Type: Categorical, Choices: {l1, l2, elasticnet}, Default: l2
sgd:power_t, Type: UniformFloat, Range: [1e-05, 1.0], Default: 0.5
sgd:tol, Type: UniformFloat, Range: [1e-05, 0.1], Default: 0.0001, on log-scale
Conditions:
adaboost:algorithm | __choice__ == 'adaboost'
adaboost:learning_rate | __choice__ == 'adaboost'
adaboost:max_depth | __choice__ == 'adaboost'
adaboost:n_estimators | __choice__ == 'adaboost'
bernoulli_nb:alpha | __choice__ == 'bernoulli_nb'
bernoulli_nb:fit_prior | __choice__ == 'bernoulli_nb'
decision_tree:criterion | __choice__ == 'decision_tree'
decision_tree:max_depth_factor | __choice__ == 'decision_tree'
decision_tree:max_features | __choice__ == 'decision_tree'
decision_tree:max_leaf_nodes | __choice__ == 'decision_tree'
decision_tree:min_impurity_decrease | __choice__ == 'decision_tree'
decision_tree:min_samples_leaf | __choice__ == 'decision_tree'
decision_tree:min_samples_split | __choice__ == 'decision_tree'
decision_tree:min_weight_fraction_leaf | __choice__ == 'decision_tree'
extra_trees:bootstrap | __choice__ == 'extra_trees'
extra_trees:criterion | __choice__ == 'extra_trees'
extra_trees:max_depth | __choice__ == 'extra_trees'
extra_trees:max_features | __choice__ == 'extra_trees'
extra_trees:max_leaf_nodes | __choice__ == 'extra_trees'
extra_trees:min_impurity_decrease | __choice__ == 'extra_trees'
extra_trees:min_samples_leaf | __choice__ == 'extra_trees'
extra_trees:min_samples_split | __choice__ == 'extra_trees'
extra_trees:min_weight_fraction_leaf | __choice__ == 'extra_trees'
extra_trees:n_estimators | __choice__ == 'extra_trees'
gradient_boosting:early_stop | __choice__ == 'gradient_boosting'
gradient_boosting:l2_regularization | __choice__ == 'gradient_boosting'
gradient_boosting:learning_rate | __choice__ == 'gradient_boosting'
gradient_boosting:loss | __choice__ == 'gradient_boosting'
gradient_boosting:max_bins | __choice__ == 'gradient_boosting'
gradient_boosting:max_depth | __choice__ == 'gradient_boosting'
gradient_boosting:max_iter | __choice__ == 'gradient_boosting'
gradient_boosting:max_leaf_nodes | __choice__ == 'gradient_boosting'
gradient_boosting:min_samples_leaf | __choice__ == 'gradient_boosting'
gradient_boosting:n_iter_no_change | gradient_boosting:early_stop in {'valid', 'train'}
gradient_boosting:scoring | __choice__ == 'gradient_boosting'
gradient_boosting:tol | __choice__ == 'gradient_boosting'
gradient_boosting:validation_fraction | gradient_boosting:early_stop == 'valid'
k_nearest_neighbors:n_neighbors | __choice__ == 'k_nearest_neighbors'
k_nearest_neighbors:p | __choice__ == 'k_nearest_neighbors'
k_nearest_neighbors:weights | __choice__ == 'k_nearest_neighbors'
lda:n_components | __choice__ == 'lda'
lda:shrinkage | __choice__ == 'lda'
lda:shrinkage_factor | lda:shrinkage == 'manual'
lda:tol | __choice__ == 'lda'
liblinear_svc:C | __choice__ == 'liblinear_svc'
liblinear_svc:dual | __choice__ == 'liblinear_svc'
liblinear_svc:fit_intercept | __choice__ == 'liblinear_svc'
liblinear_svc:intercept_scaling | __choice__ == 'liblinear_svc'
liblinear_svc:loss | __choice__ == 'liblinear_svc'
liblinear_svc:multi_class | __choice__ == 'liblinear_svc'
liblinear_svc:penalty | __choice__ == 'liblinear_svc'
liblinear_svc:tol | __choice__ == 'liblinear_svc'
libsvm_svc:C | __choice__ == 'libsvm_svc'
libsvm_svc:coef0 | libsvm_svc:kernel in {'poly', 'sigmoid'}
libsvm_svc:degree | libsvm_svc:kernel == 'poly'
libsvm_svc:gamma | __choice__ == 'libsvm_svc'
libsvm_svc:kernel | __choice__ == 'libsvm_svc'
libsvm_svc:max_iter | __choice__ == 'libsvm_svc'
libsvm_svc:shrinking | __choice__ == 'libsvm_svc'
libsvm_svc:tol | __choice__ == 'libsvm_svc'
multinomial_nb:alpha | __choice__ == 'multinomial_nb'
multinomial_nb:fit_prior | __choice__ == 'multinomial_nb'
passive_aggressive:C | __choice__ == 'passive_aggressive'
passive_aggressive:average | __choice__ == 'passive_aggressive'
passive_aggressive:fit_intercept | __choice__ == 'passive_aggressive'
passive_aggressive:loss | __choice__ == 'passive_aggressive'
passive_aggressive:tol | __choice__ == 'passive_aggressive'
qda:reg_param | __choice__ == 'qda'
random_forest:bootstrap | __choice__ == 'random_forest'
random_forest:criterion | __choice__ == 'random_forest'
random_forest:max_depth | __choice__ == 'random_forest'
random_forest:max_features | __choice__ == 'random_forest'
random_forest:max_leaf_nodes | __choice__ == 'random_forest'
random_forest:min_impurity_decrease | __choice__ == 'random_forest'
random_forest:min_samples_leaf | __choice__ == 'random_forest'
random_forest:min_samples_split | __choice__ == 'random_forest'
random_forest:min_weight_fraction_leaf | __choice__ == 'random_forest'
random_forest:n_estimators | __choice__ == 'random_forest'
sgd:alpha | __choice__ == 'sgd'
sgd:average | __choice__ == 'sgd'
sgd:epsilon | sgd:loss == 'modified_huber'
sgd:eta0 | sgd:learning_rate in {'invscaling', 'constant'}
sgd:fit_intercept | __choice__ == 'sgd'
sgd:l1_ratio | sgd:penalty == 'elasticnet'
sgd:learning_rate | __choice__ == 'sgd'
sgd:loss | __choice__ == 'sgd'
sgd:penalty | __choice__ == 'sgd'
sgd:power_t | sgd:learning_rate == 'invscaling'
sgd:tol | __choice__ == 'sgd'
Forbidden Clauses:
(Forbidden: liblinear_svc:penalty == 'l1' && Forbidden: liblinear_svc:loss == 'hinge')
(Forbidden: liblinear_svc:dual == 'False' && Forbidden: liblinear_svc:penalty == 'l2' && Forbidden: liblinear_svc:loss == 'hinge')
(Forbidden: liblinear_svc:dual == 'False' && Forbidden: liblinear_svc:penalty == 'l1')
至此,我们基本搞定出了构造超参的方法。