串联多个特征提取方法¶
在许多实际示例中,有很多方法可以从数据集中提取要素。 通常,结合几种方法以获得良好的性能是有益的。 本示例说明如何使用FeatureUnion组合通过PCA和单变量选择获得的特征。
使用该转换器将功能组合在一起的好处是,它可以在整个过程中进行交叉验证和网格搜索。
本示例中使用的组合对该数据集没有特别帮助,仅用于说明FeatureUnion的用法。
输入:
# 作者: Andreas Mueller <amueller@ais.uni-bonn.de>
#
# 执照: BSD 3 clause
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
iris = load_iris()
X, y = iris.data, iris.target
# 这个数据集太高维了。 最好做PCA:
pca = PCA(n_components=2)
# 也许某些原始功能还不错?
selection = SelectKBest(k=1)
# 通过PCA和单变量选择构建估算器:
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])
# 使用组合特征转换数据集:
X_features = combined_features.fit(X, y).transform(X)
print("Combined space has", X_features.shape[1], "features")
svm = SVC(kernel="linear")
# 对k,n_components和C进行网格搜索:
pipeline = Pipeline([("features", combined_features), ("svm", svm)])
param_grid = dict(features__pca__n_components=[1, 2, 3],
features__univ_select__k=[1, 2],
svm__C=[0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid=param_grid, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
输出:
Combined space has 3 features
Fitting 5 folds for each of 18 candidates, totalling 90 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.867, total= 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.900, total= 0.0s
[Parallel(n_jobs=1)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[Parallel(n_jobs=1)]: Done 7 out of 7 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.867, total= 0.0s
[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed: 0.0s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[Parallel(n_jobs=1)]: Done 9 out of 9 | elapsed: 0.1s remaining: 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=1, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.867, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=2, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=1, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.933, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=0.1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=1, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=0.900, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=0.967, total= 0.0s
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10
[CV] features__pca__n_components=3, features__univ_select__k=2, svm__C=10, score=1.000, total= 0.0s
[Parallel(n_jobs=1)]: Done 90 out of 90 | elapsed: 0.5s finished
Pipeline(steps=[('features',
FeatureUnion(transformer_list=[('pca', PCA(n_components=3)),
('univ_select',
SelectKBest(k=1))])),
('svm', SVC(C=10, kernel='linear'))])
脚本的总运行时间:(0分钟0.477 秒)。