sklearn.ensemble.StackingRegressor¶

class sklearn.ensemble.StackingRegressor(estimators, final_estimator=None, *, cv=None, n_jobs=None, passthrough=False, verbose=0)

[源码]

带有最终分类器的估计器堆栈。

堆叠泛化包括堆叠个别估计器的输出，并使用分类器来计算最终的预测。叠加允许使用每个估计器的强度，使用它们的输出作为最终估计器的输入。

注意，estimators_是在完整的X上拟合而来的，而final_estimator_是通过使用cross_val_predict对基础估计器进行交叉验证的预测来进行训练的。

0.22版本新功能。

在用户指南中阅读更多内容。

参数	说明
estimators	list of (str, estimator) 将被堆叠在一起的基础估计器。列表中的每个元素都被定义为一个字符串元组(即名称)和一个`estimator`实例。可以使用`set_params`将评估器设置为“drop”。
final_estimator	estimator, default=None 一个分类器，它将被用来组合基础估计器。默认的分类器是`RidgeCV`。
cv	int, cross-validation generator or an iterable, default=None 确定`cross_val_predict`中用于训练`final_estimator`的交叉验证拆分策略。`cv`可能的输入有: - None，使用默认的5折交叉验证。 - Integer, 用于指定(分层的)K-Fold中的折叠数。 - 用作交叉验证生成器的对象。 - 一个可迭代的产生的训练、测试分割。对于`integer`/`None`，如果估计器是一个分类器，并且`y`是二进制的或多类的，则使用`StratifiedKFold`。所有其他情况下，使用`K-Fold`。参考用户指南，了解这里可以使用的各种交叉验证策略。注意：如果训练样本的数量足够大，那么分割的数量再大也没有什么好处。事实上，训练时间会增加。cv不是用于模型评估，而是用于预测。
n_jobs	int, default=None 所有并行`estimators fit`作业数量。除非在`joblib.parallel_backend`中，否则`None`表示是1。-1表示使用所有处理器。参见Glossary了解更多细节。
passthrough	bool, default=False 当为`False`时，只使用估计器的预测作为`final_estimator`的训练数据。当为真时，`final_estimator`将在预测和原始训练数据上进行训练。
verbose	int, default=0 冗余水平。

属性	说明
estimators_	list of estimators 估计器参数的元素，已在训练数据上拟合。如果一个估计器被设置为“drop”，那么它将不会出现在`estimators_`中。
named_estimators_	`Bunch` 属性来按名称访问任何拟合的子估计器。
final_estimator_	estimator 给出估计器的输出来预测的`estimators_`。

参考文献

Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

实例

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.svm import LinearSVR
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.ensemble import StackingRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> estimators = [
...     ('lr', RidgeCV()),
...     ('svr', LinearSVR(random_state=42))
... ]
>>> reg = StackingRegressor(
...     estimators=estimators,
...     final_estimator=RandomForestRegressor(n_estimators=10,
...                                           random_state=42)
... )
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=42
... )
>>> reg.fit(X_train, y_train).score(X_test, y_test)
0.3...

方法

方法	说明
`fit`(X[, y, sample_weight])	拟合估计器。
`fit_transform`(X[, y])	拟合估计器和变换数据集。
`get_params`([deep])	从集成中得到估计器的参数。
`predict`(X, predict_params)**	预测`X`的目标值。
`score`(X, y[, sample_weight])	返回给定测试数据和标签的平均精度。
`set_params`(params)**	从集成中设置估计器的参数。
`transform`(X)	返回每个估计器`X`的类标签或概率。

__init__(estimators, final_estimator=None, *, cv=None, n_jobs=None, passthrough=False, verbose=0)

[源码]

初始化self。有关准确的签名，请参见help(type(self))。

fit(X, y, sample_weight = None)

[源码]

拟合估计器。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中n_samples为样本数量，n_features为特征数量。
y	array-like of shape (n_samples,) 目标值。
sample_weight	array-like of shape (n_samples,), default=None 样本权重。如果没有，那么样本的权重相等。注意，只有当所有的潜在估计器都支持样本权值时，才支持此方法。

返回值	说明
self	object

fit_transform(X, y=None, **fit_params)

[源码]

拟合数据，然后转换它。

使用可选参数fit_params将transformer与X和y匹配，并返回X的转换版本。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中n_samples为样本数量，n_features为特征数量。
y	ndarray of shape (n_samples,), default=None 目标值。
**fit_params	dict 其他拟合参数。

返回值	说明
X_new	ndarray array of shape (n_samples, n_features_new) 转化后的数组。

get_params(deep=True)

[源码]

从集成中得到估计器的参数。

参数	说明
deep	deep : bool, default = True 将其设置为True将获得各种分类器以及分类器的参数。

property n_features_in_

fit 过程中可见的特征数量。

predict(X, **predict_params)

[源码]

预测X的目标值。

该方法适用于简单估计器和嵌套对象(如pipline)。后者具有形式为<component>_<parameter>的参数，这样就可以更新嵌套对象的每个组件。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中n_samples为样本数量，n_features为特征数量。
**predict_params	dict of str -> obj 由`final_estimator`调用的`predict`的参数。请注意，这可能用于从一些使用`return_std`或`return_cov`的估计器返回不确定性。请注意，它只会在最终的估计器中考虑不确定性。

返回值	说明
y_pred	ndarray of shape (n_samples,) or (n_samples, n_output) 预测后的目标值。

score(X, y, sample_weight=None)

[源码]

返回预测的决定系数R^2。

定义系数R^2为(1 - u/v)，其中u为(y_true - y_pred) ** 2).sum()的残差平方和，v为(y_true - y_true.mean()) ** 2).sum()的平方和。最好的可能的分数是1.0，它可能是负的(因为模型可以任意地更糟)。一个常数模型总是预测y的期望值，而不考虑输入特征，得到的R^2得分为0.0。

参数	说明
X	array-like of shape (n_samples, n_features) 测试样品。对于某些估计器，这可能是一个预先计算的内核矩阵或一列通用对象，而不是`shape= (n_samples, n_samples_fitting)`，其中`n_samples_fitting`是用于拟合估计器的样本数量
y	array-like of shape (n_samples,) or (n_samples, n_outputs) `X`的正确值。
sample_weight	array-like of shape (n_samples,), default=None 样本权重。

返回值	说明
score	float self.predict(X) 关于y的平均准确率。

注意：

调用回归变器的score时使用的R2 score，与0.23版本的multioutput='uniform_average'中r2_score的默认值保持一致。这影响了所有多输出回归的score方法(除了MultiOutputRegressor)。

set_params(**params)

[源码]

从集成中设置估计器的参数。

有效的参数键可以用get_params()列出。

参数	说明
**params	keyword arguments 使用例如`set_params(parameter_name=new_value)`的特定参数。此外，为了设置堆料估算器的参数，还可以设置叠加估算器的单个估算器，或者通过将它们设置为“`drop`”来删除它们。

transform(X)

[源码]

返回每个估计量X的类标签或概率。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中`n_samples`为样本数量，`n_features`为特征数量。

返回值	说明
y_preds	*ndarray of shape (n_samples, n_estimators) or (n_samples, n_classes n_estimators)** 每个估计器的预测输出。

sklearn.ensemble.StackingRegressor使用示例¶

使用stacking的组合预测器 ¶