sklearn.ensemble.VotingClassifier¶

class sklearn.ensemble.VotingClassifier(estimators, *, voting='hard', weights=None, n_jobs=None, flatten_transform=True, verbose=False)

[源码]

针对非拟合估计器的Soft Voting/Majority规则分类器。

0.17新增功能。

在用户指南中阅读更多内容。

参数	说明
estimators	list of (str, estimator) tuples 在投票分类器上调用`fit`方法将你和存储在类属性`self.estimators_`中的原始估计器的克隆体。可以使用`set_params`将评估器设置为“`drop`”。 - 版本0.21中的更改:“drop”收录进该版本。自版本0.22以来已弃用: 使用None删除评估器在0.22中已弃用，在0.24中删除了该功能并使用字符串'drop'代替。
voting	{‘hard’, ‘soft’}, default=’hard’ 如果是"hard"，则使用预测的类标签进行多数决定投票。如果“soft”，预测类标签基于预测概率总和的最大值，这是一个精心校准的分类器的集成推荐。
weights	array-like of shape (n_classifiers,), default=None 权重序列(`float`或`int`)，用于在平均(soft voting)之前对预测的类标签(hard voting)或类概率的出现进行加权。如果没有，使用统一的权重。
n_jobs	int, default=None int, default=None 所有并行`estimators fit`作业数量。除非在`joblib.parallel_backend`中，否则`None`表示是1。-1表示使用所有处理器。参见Glossary了解更多细节。 0.18版本新功能
flatten_transform	bool, default=True 如果`voting= 'soft'`且`flatten_transform=True`, 转化方法返回大小为(n_samples, n_classifiers * n_classes)的矩阵。如果`flat _transform=False`，它返回`(n_classifiers, n_samples, n_classes)`。
verbose	bool, default=False 如果为True，拟合时经过的时间将在拟合完成时打印出来。

属性	说明
estimators_	list of estimators 估计器参数的元素，已在训练数据上拟合。如果一个估计器被设置为“drop”，那么它将不会出现在`estimators_`中。
named_estimators_	`Bunch` 属性来按名称访问任何拟合的子估计器。
classes_	array-like of shape (n_predictions,) 类标签。

另见

VotingRegressor 预测投票回归量。

实例

>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.ensemble import RandomForestClassifier, VotingClassifier
>>> clf1 = LogisticRegression(multi_class='multinomial', random_state=1)
>>> clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
>>> clf3 = GaussianNB()
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> eclf1 = VotingClassifier(estimators=[
...         ('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')
>>> eclf1 = eclf1.fit(X, y)
>>> print(eclf1.predict(X))
[1 1 1 2 2 2]
>>> np.array_equal(eclf1.named_estimators_.lr.predict(X),
...                eclf1.named_estimators_['lr'].predict(X))
True
>>> eclf2 = VotingClassifier(estimators=[
...         ('lr', clf1), ('rf', clf2), ('gnb', clf3)],
...         voting='soft')
>>> eclf2 = eclf2.fit(X, y)
>>> print(eclf2.predict(X))
[1 1 1 2 2 2]
>>> eclf3 = VotingClassifier(estimators=[
...        ('lr', clf1), ('rf', clf2), ('gnb', clf3)],
...        voting='soft', weights=[2,1,1],
...        flatten_transform=True)
>>> eclf3 = eclf3.fit(X, y)
>>> print(eclf3.predict(X))
[1 1 1 2 2 2]
>>> print(eclf3.transform(X).shape)
(6, 6)

方法

方法	说明
`fit`(X[, y, sample_weight])	拟合估计器。
`fit_transform`(X[, y])	拟合估计器和变换数据集。
`get_params`([deep])	从集成中得到估计器的参数。
`predict`(X)	预测X的类标签。
`score`(X, y[, sample_weight])	返回给定测试数据和标签的平均精度。
`set_params`(params)**	从集成中设置估计器的参数。
`transform`(X)	返回每个估计器`X`的类标签或概率。

__init__(estimators, *, voting='hard', weights=None, n_jobs=None, flatten_transform=True, verbose=False)

[源码]

初始化self。有关准确的签名，请参见help(type(self))。

fit(X, y, sample_weight = None)

[源码]

拟合估计器。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中n_samples为样本数量，n_features为特征数量。
y	array-like of shape (n_samples,) 目标值。
sample_weight	array-like of shape (n_samples,), default=None 样本权重。如果没有，那么样本的权重相等。注意，只有当所有的潜在估计器都支持样本权值时，才支持此方法。 0.18新增功能

返回值	说明
self	object

fit_transform(X, y=None, **fit_params)

[源码]

拟合数据，然后转换它。

使用可选参数fit_params将transformer与X和y匹配，并返回X的转换版本。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中n_samples为样本数量，n_features为特征数量。
y	ndarray of shape (n_samples,), default=None 目标值。
**fit_params	dict 其他拟合参数。

返回值	说明
X_new	ndarray array of shape (n_samples, n_features_new) 转化后的数组。

get_params(deep=True)

[源码]

从集成中得到估计器的参数。

参数	说明
deep	deep : bool, default = True 将其设置为True将获得各种分类器以及分类器的参数。

predict(X)

[源码]

预测X的分类。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 输入样本。

返回值	说明
maj	array-like of shape (n_samples,) 预测后的分类标签。

property predict_proba

计算X中样本可能结果的概率。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 输入样本。

返回值	说明
avg	array-like of shape (n_samples,) 加权平均每类每个样本的概率

score(X, y, sample_weight=None)

[源码]

返回给定测试数据和标签的平均精度。

在多标签分类中，这是子集精度，同时也是个苛刻的指标，因为你需要对每个样本正确预测每个标签集。

参数	说明
X	array-like of shape (n_samples, n_features) 测试样本。
y	array-like of shape (n_samples,) or (n_samples, n_outputs) `X`的正确标签。
sample_weight	array-like of shape (n_samples,), default=None 样本权重。

返回值	说明
score	float self.predict(X) 关于y的平均准确率。

set_params(**params)

[源码]

从集成中设置估计器的参数。

有效的参数键可以用get_params()列出。

参数	说明
**params	keyword arguments 使用例如`set_params(parameter_name=new_value)`的特定参数。此外，为了设置堆料估算器的参数，还可以设置叠加估算器的单个估算器，或者通过将它们设置为“`drop`”来删除它们。

transform(X)

[源码]

返回每个估计量X的类标签或概率。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中`n_samples`为样本数量，`n_features`为特征数量。

返回值说明

probabilities_or_labels If voting='soft' and flatten_transform=True:
返回大小为(n_classifiers, n_samples * n_classes)的ndarray，为每个分类器计算的类概率。
If voting='soft' and flatten_transform=False:
返回大小为(n_classifiers, n_samples, n_classes)的ndarray。
Ifvoting='hard':
返回大小为(n_samples, n_classifiers)的ndarray，是每个分类器预测的类标签。

返回值	说明
probabilities_or_labels	If `voting='soft'` and `flatten_transform=True`: 返回大小为`(n_classifiers, n_samples * n_classes)`的`ndarray`，为每个分类器计算的类概率。 If `voting='soft'` and `flatten_transform=False`: 返回大小为`(n_classifiers, n_samples, n_classes)`的ndarray。 If`voting='hard'`: 返回大小为`(n_samples, n_classifiers)`的ndarray，是每个分类器预测的类标签。

sklearn.ensemble.VotingClassifier使用示例¶

绘制投票分类器的决策边界 ¶

绘制通过投票分类器计算的类概率 ¶