在cross_val_score和GridSearchCV上进行多指标评估的演示¶
在本案例中,我们通过将评分参数(scoring)设置为多个评估指标的列表、或将评估指标名称映射到评分器可调用项的字典,来执行多个评估指标的搜索。
所有的打分器都在cv_results_字典中,并且可以使用以'_ <scorer_name>'结尾的键(“ mean_test_precision”,“ rank_test_precision”等)进行调用和获取。
best_estimator_,best_index_,best_score_和best_params_对应于设置为refit属性的打分器(键)。
# 作者: Raghav RV <rvraghav93@gmail.com>
# 执照: BSD
import numpy as np
from matplotlib import pyplot as plt
from sklearn.datasets import make_hastie_10_2
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
print(__doc__)
使用多个评估指标运行GridSearchCV
X, y = make_hastie_10_2(n_samples=8000, random_state=42)
# 评分器可以是预定义的度量标准字符串之一,也可以是可调用的评分器,例如make_scorer返回的评分器
scoring = {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score)}
'''
设置refit ='AUC',使用具有最佳交叉验证AUC得分的参数设置重新拟合整个数据集上的估算器
可以在gs.best_estimator_处获得该估算器,还可以使用gs.best_score_,gs.best_params_和gs.best_index_获得参数
'''
gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
param_grid={'min_samples_split': range(2, 403, 10)},
scoring=scoring, refit='AUC', return_train_score=True)
gs.fit(X, y)
results = gs.cv_results_
绘制结果
plt.figure(figsize=(13, 13))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
fontsize=16)
plt.xlabel("min_samples_split")
plt.ylabel("Score")
ax = plt.gca()
ax.set_xlim(0, 402)
ax.set_ylim(0.73, 1)
# 从MaskedArray获取常规的numpy数组
X_axis = np.array(results['param_min_samples_split'].data, dtype=float)
for scorer, color in zip(sorted(scoring), ['g', 'k']):
for sample, style in (('train', '--'), ('test', '-')):
sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
sample_score_std = results['std_%s_%s' % (sample, scorer)]
ax.fill_between(X_axis, sample_score_mean - sample_score_std,
sample_score_mean + sample_score_std,
alpha=0.1 if sample == 'test' else 0, color=color)
ax.plot(X_axis, sample_score_mean, style, color=color,
alpha=1 if sample == 'test' else 0.7,
label="%s (%s)" % (scorer, sample))
best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
best_score = results['mean_test_%s' % scorer][best_index]
# 在以叉(x)标记的那个最佳评估指标的最佳得分上绘制一条垂直虚线
ax.plot([X_axis[best_index], ] * 2, [0, best_score],
linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)
# 注释该得分手的评估指标
ax.annotate("%0.2f" % best_score,
(X_axis[best_index], best_score + 0.005))
plt.legend(loc="best")
plt.grid(False)
plt.show()
输出:
脚本的总运行时间:(0分钟27.117秒)