sklearn.model_selection.permutation_test_score¶

sklearn.model_selection.permutation_test_score(estimator, X, y, *, groups=None, cv=None, n_permutations=100, n_jobs=None, random_state=0, verbose=0, scoring=None)

[源码]

通过排列评估交叉验证准确率的重要性

在用户指南中阅读更多内容。

参数	说明
estimator	estimator object implementing ‘fit’ 用于拟合数据的对象。
X	array-like of shape at least 2D 用于拟合的数据。
y	array-like of shape (n_samples,) or (n_samples, n_outputs) or None 在监督学习的情况下要尝试预测的目标变量。
groups	array-like of shape (n_samples,), default=None 标签用于限制组内的排列，即，在具有相同组标识的样本之间排列`y`值。如果未指定，则在所有样本中排列`y`值。使用分组的交叉验证器时，组标签也将传递给它交叉验证器的`split`方法。交叉验证器使用它们对样本进行分组，同时将数据集切分为训练集或测试集。
scoring	str or callable, default=None 单个str（请参阅评分参数：定义模型评估规则）或可调用项（请参阅从度量函数定义评分策略）以评估测试集上的预测。
cv	int, cross-validation generator or an iterable, default=None 确定交叉验证切分策略。cv值可以输入： - None，默认使用5折交叉验证 - int，用于指定`(Stratified)KFold`的折数 - CV splitter, - 可迭代输出训练集和测试集的切分作为索引数组对于int或 None输入，如果估计器是分类器，并且`y`是二分类或多分类，则使用`StratifiedKFold`。在所有其他情况下，均使用`KFold`。有关可在此处使用的各种交叉验证策略，请参阅用户指南。在版本0.22中：如果`cv`为None，默认值从3折更改为5折。
n_permutations	int, default=100 排列的次数`y`。
n_jobs	int, default=None 用于进行计算的CPU数量。 `None`除非在`joblib.parallel_backend`环境中，否则表示1 。 `-1`表示使用所有处理器。有关更多详细信息，请参见词汇表。
random_state	int, RandomState instance or None, default=0 传递一个整数以获得可重复的输出，以在样本之间对`y`值进行排列。请参阅词汇表。
verbose	int, default=0 详细程度。

返回值	说明
score	float 没有排列目标的真实准确率。
permutation_scores	array of shape (n_permutations,) 每个排列获得的准确率。
pvalue	float p值，近似于随机获得准确率的概率。计算公式为： `(C + 1) / (n_permutations + 1)` 其中C是准确率大于等于真实准确率的排列数量。最好的p值是1 /（n_permutations +1），最差的是1.0。

注

此功能在以下位置实现测试1：

Ojala and Garriga. Permutation Tests for Studying Classifier Performance. The Journal of Machine Learning Research (2010) vol. 11 [pdf].

sklearn.model_selection.permutation_test_score使用示例¶

分类得分的排列测试的意义 ¶