sklearn.feature_selection.RFECV¶

class sklearn.feature_selection.RFECV(estimator, *, step=1, min_features_to_select=1, cv=None, scoring=None, verbose=0, n_jobs=None)

[源码]

具有递归特征消除和交叉验证选择最佳特征数的特征排序。

有关交叉验证估算器，请参阅词汇表条目。

在用户指南中阅读更多内容。

参数	说明
estimator	object 一种监督学习估计器，其`fit`方法通过`coef_` 属性或`feature_importances_`属性提供有关特征重要性的信息。
step	int or float, optional (default=1) 如果大于或等于1，则`step`对应于每次迭代要删除的个特征个数。如果在（0.0，1.0）之内，则`step`对应于每次迭代要删除的特征的百分比（向下舍入）。请注意，为了达到`min_features_to_select`，最后一次迭代删除的特征可能少于`step`。
min_features_to_select	int, (default=1) 最少要选择的特征数量。即使原始特征数量与`min_features_to_select`之间的差不能被`step`整除，也会对这些特征数进行评分。 0.20版中的新功能。
cv	int, cross-validation generator or an iterable, optional 确定交叉验证拆分策略。可能的输入是： - 无，要使用默认的5倍交叉验证 - 整数，用于指定折数。 - CV分配器 - 可迭代的产生（训练，测试）拆分为索引数组。对于整数或无输入，如果`y`是二分类或多分类，则使用`sklearn.model_selection.StratifiedKFold`。如果估计器是分类器，或者`y`既不是二分类也不是多分类，则使用`sklearn.model_selection.KFold`。有关可在此处使用的各种交叉验证策略，请参阅用户指南。在0.22版本中更改：`cv`无的默认值从3更改为5。
scoring	string, callable or None, optional, (default=None) 字符串(参见模型评估文档)或具有`scorer(estimator, X, y)`签名的scorer可调用对象或函数。
verbose	int, (default=0) 控制输出的详细程度。
n_jobs	int or None, optional (default=None) 跨折时要并行运行的核心数。除非在上下文中设置了`joblib.parallel_backend`参数，否则`None`表示1 。 `-1`表示使用所有处理器。有关更多详细信息，请参见词汇表。 0.18版本中的新功能。

属性	说明
n_features_	int 利用交叉验证所选特征的数量。
support_	array of shape [n_features] 选定特征的掩码。
ranking_	array of shape [n_features] 特征排序，使ranking_[i]对应第i个特征的排序位置。选择的(即估计的最佳)特征被排在第1位。
grid_scores_	array of shape [n_subsets_of_features] 交叉验证得分， `grid_scores_[i]`对应于第i个特征子集的CV得分。
estimator_	object 拟合简化后的数据集的外部估算器。

另见

RFE

递归特征消除

注

grid_scores_的大小等于 ceil((n_features - min_features_to_select) / step) + 1，其中step是每次迭代删除的特征数量。

如果基估计器也可以输入，则允许NaN 或Inf。

参考

1 Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

示例

下面的示例显示如何检索Friedman＃1数据集中的先验未知的5个信息特征。

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True, False, False, False, False,
       False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

方法

方法	说明
`decision_function`(X)	计算`X`的决策函数。
`fit`(X, y[, groups])	拟合RFE模型并自动调整所选特征数量。
`fit_transform`(X[, y])	拟合数据并对其进行转换。
`get_params`([deep])	获取此估计器的参数。
`get_support`([indices])	获取所选特征的掩码或整数索引。
`inverse_transform`(X)	反向转换操作
`predict`(X)	将X简化为选定的特征，然后使用基估计器进行预测。
`predict_log_proba`(X)	预测X的类对数概率。
`predict_proba`(X)	预测X的类概率。
`score`(X, y)	将X简化为选择的特征，然后返回基估计器的分数。
`set_params`(**params)	设置此估算器的参数。
`transform`(X)	将X缩小为选定的特征。

__init__(estimator, *, step=1, min_features_to_select=1, cv=None, scoring=None, verbose=0, n_jobs=None)

[源码]

初始化self，参见help(type(self))获取更多信息。

decision_function(X)

[源码]

计算X的决策函数。

参数	说明
X	{array-like or sparse matrix} of shape (n_samples, n_features) 输入样本。在内部，如果将稀疏矩阵提供给 `csr_matrix`,它将转换为 `dtype=np.float32`。

返回值	说明
score	array, shape = [n_samples, n_classes] or [n_samples] 输入样本的决策函数。类的顺序与属性classes_中的顺序相对应。回归和二分类产生一个形状为[n_samples]的数组。

fit（X，y，groups = None ）

[源码]

拟合RFE模型并自动调整所选特征的数量。

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 训练向量，其中`n_samples`是样本数量， `n_features`是特征数量。
y	array-like of shape (n_samples,) 目标值（用于分类的整数，用于回归的实数）。
groups	array-like of shape (n_samples,) or None 将数据集拆分为训练集和测试集时使用的样本的标签分组。仅与“ Group” cv 实例（例如`GroupKFold`）结合使用。

fit_transform(X, y=None, **fit_params)

[源码]

拟合数据，然后对其进行转换。

使用可选参数fit_params将转换器拟合到X和y，并返回X的转换值。

参数	说明
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y	ndarray of shape (n_samples,), default=None 目标值
**fit_params	dict 其他拟合参数。

返回值	说明
X_new	ndarray array of shape (n_samples, n_features_new) 转换后的数组。

get_params(deep=True)

[源码]

获取此估计器的参数。

参数	说明
deep	bool, default=True 如果为True，则将返回此估算器和所包含子对象的参数。

返回值	说明
params	mapping of string to any 参数名称映射到其值。

get_support(indices=False)

[源码]

获取所选特征的掩码或整数索引。

参数	说明
indices	boolean (default False) 如果为True，则返回值将是一个整数数组，而不是布尔掩码。

返回值	说明
support	array 从特征向量中选择保留特征的索引。如果`indices`为False，则为形状为[＃输入特征]的布尔数组，其中元素为True时（如果已选择其对应的特征进行保留）。如果`indices`为True，则这是一个形状为[＃输出特征]的整数数组，其值是输入特征向量的索引。

inverse_transform(X)

[源码]

反向转换操作。

参数	说明
X	array of shape [n_samples, n_selected_features] 输入样本。

返回值	说明
X_r	array of shape [n_samples, n_original_features] `X`中插入的列名为零的特征将被`transform`删除。

predict(X)

[源码]

将X简化为选定的特征，然后使用基估计器进行预测。

参数	说明
X	array of shape [n_samples, n_features] 输入样本。

返回值	说明
y	array of shape [n_samples] 预测目标值。

predict_log_proba（X ）

[源码]

预测X的类对数概率。

参数	说明
X	array of shape [n_samples, n_features] 输入样本。

返回值	说明
p	array of shape [n_samples] 输入样本的类对数概率。类的顺序与属性classes_中的顺序相对应。

predict_proba（X ）

[源码]

参数	说明
X	{array-like or sparse matrix} of shape (n_samples, n_features) 输入样本。在内部，如果将稀疏矩阵提供给`csr_matrix`，它将转换为 `dtype=np.float32`。

返回值	说明
p	array of shape (n_samples, n_classes) 输入样本的分类概率。类的顺序与属性classes_中的顺序相对应。

score（X，y ）

将X简化为选择的特征，然后返回基估计器的分数。

参数	说明
X	array of shape [n_samples, n_features] 输入样本。
y	array of shape [n_samples] 目标值。

set_params(**params)

[源码]

设置此估算器的参数。

该方法适用于简单的估计器以及嵌套对象（例如管道）。后者具有<component>__<parameter>形式的参数，以便可以更新嵌套对象的每个组件。

参数	说明
**params	dict 估计器参数。

返回值	说明
self	object 估计器实例。

transform(X)

[源码]

将X缩小为选定的特征。

参数	说明
X	array of shape [n_samples, n_features] 输入样本。

返回值	说明
X_r	array of shape [n_samples, n_selected_features] 仅具有所选特征的输入样本。

sklearn.feature_selection.RFECV使用示例¶

带交叉验证的递归特征消除 ¶