sklearn.gaussian_process.GaussianProcessRegressor¶

class sklearn.gaussian_process.GaussianProcessRegressor(kernel=None, *, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None)

[源码]

高斯过程回归。

该方法是基于Rasmussen和Williams的机器学习高斯过程(GPML)算法2.1节实现的。

除了标准scikit-learn estimator API, GaussianProcessRegressor:

允许预测而不需要预先拟合(基于GP先验)
提供另一个方法sample_y(X)，它评估给定输入时从探地雷达(前或后)中抽取的样本
公开一个方法log_marginal_likelihood(theta)，它可以被外部用于其他选择超参数的方法，例如通过马尔科夫链蒙特卡洛。

在用户指南中阅读更多内容。

新版本0.18。

参数	说明
kernel	kernel instance, default=None 指定GP的协方差函数的核。如果没有传递，则默认使用内核“1.0 * RBF(1.0)”。注意，在拟合过程中优化了内核的超参数。
alpha	float or array-like of shape (n_samples), default=1e-10 拟合时在核矩阵对角线上增加的值。较大的值对应于观测中噪声水平的增加。这也可以防止在拟合过程中可能出现的数值问题，确保计算值形成一个正定矩阵。如果传递了一个数组，它必须具有与用于拟合的数据相同的条目数，并且用作数据点相关的噪声级别。注意，这相当于添加一个c=alpha的白色内核。允许将噪音声级直接指定为参数，主要是为了方便和与岭度的一致性。
optimizer	“fmin_l_bfgs_b” or callable, default=”fmin_l_bfgs_b” 可以是内部支持的优化器之一，用于优化由字符串指定的内核参数，也可以是作为callable传递的外部定义的优化器。如果一个可调用的被传递，它必须有说明 def optimizer(obj_func, initial_theta, bounds): return theta_opt, func_min # * 'obj_func' is the objective function to be minimized, which # takes the hyperparameters theta as parameter and an # optional flag eval_gradient, which determines if the # gradient is returned additionally to the function value # * 'initial_theta': the initial value for theta, which can be # used by local optimizers # * 'bounds': the bounds on the values of theta .... # Returned are the best found hyperparameters theta and # the corresponding value of the target function. 默认情况下，来自scipy.optimize的“L-BGFS-B”算法。减少使用。如果没有传递，内核的参数将保持不变。可用的内部优化器是: “fmin_l_bfgs_b”
n_restarts_optimizer	int, default=0 用于查找使对数边际似然最大化的内核参数的优化器重启的次数。优化器的第一次运行是从内核的初始参数执行的，其余的参数(如果有的话)是从允许的ta值的空间中随机采样的log-uniform日志。如果大于0，所有的边界必须是有限的。注意，n_restarts_optimizer == 0表示执行一次运行。
normalize_y	boolean, optional (default: False) 无论目标值y是否归一化，目标值的均值和方差分别设为0和1。这是推荐的情况下，零均值，单位方差先验是使用。注意，在这个实现中，在GP的预测被报告之前，正态化是颠倒的。在版本0.23中进行了更改。
copy_X_train	bool, default=True 如果为真，则在对象中存储训练数据的持久副本。否则，只存储对训练数据的引用，如果数据被外部修改，则可能导致预测发生更改。
random_state	int or RandomState, default=None 确定用于初始化中心的随机数生成。在多个函数调用中传递可重复的结果。参见: term: `Glossary <random_state>`.

属性	说明
X_train_	array-like of shape (n_samples, n_features) or list of object 训练数据的特征向量或其他表示(也需要预测)。
y_train_	array-like of shape (n_samples,) or (n_samples, n_targets) 训练数据中的目标值(也是预测所需要的)
kernel_	kernel instance 用于预测的核函数。内核的结构与作为参数传递的内核的结构相同，但具有优化的超参数
L_	array-like of shape (n_samples, n_samples) X_train_内核的下三角Cholesky分解
alpha_	array-like of shape (n_samples,) 核空间中训练数据点的对偶系数
log_marginal_likelihood_value_	float self.kernel_.theta的对数边缘似然

示例

>>> from sklearn.datasets import make_friedman2
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
>>> X, y = make_friedman2(n_samples=500, noise=0, random_state=0)
>>> kernel = DotProduct() + WhiteKernel()
>>> gpr = GaussianProcessRegressor(kernel=kernel,
...         random_state=0).fit(X, y)
>>> gpr.score(X, y)
0.3680...
>>> gpr.predict(X[:2,:], return_std=True)
(array([653.0..., 592.1...]), array([316.6..., 316.6...]))

方法

方法	说明
`fit`(self, X, y)	拟合高斯过程回归模型。
`get_params`(self[, deep])	获取这个估计器的参数。
`log_marginal_likelihood`(self[, theta, …])	返回训练数据的theta的对数边际似然。
`predict`(self, X[, return_std, return_cov])	使用高斯过程回归模型进行预测
`sample_y`(self, X[, n_samples, random_state])	从高斯过程中抽取样本，在X处取值。
`score`(self, X, y[, sample_weight])	返回确定系数R ^ 2的预测。
`set_params`(self, **params)	设置的参数估计量。

__init__(self, kernel=None, *, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None)

[源码]

初始化self.请参阅help(type(self))以获得准确的说明。

fit( X, y)

[源码]

拟合高斯过程回归模型。

参数	说明
X	array-like of shape (n_samples, n_features) or list of object 训练数据的特征向量或其他表示。
y	array-like of shape (n_samples,) or (n_samples, n_targets) 目标值

返回值	说明
self	returns an instance of self.

get_params(self, deep=True)

[源码]

获取这个估计器的参数。

参数	说明
deep	bool, default=True 如果为真，将返回此估计器的参数以及包含的作为估计器的子对象。

返回值	说明
params	mapping of string to any 参数名称映射到它们的值。

log_marginal_likelihood(self, theta=None, eval_gradient=False, clone_kernel=True)

[源码]

返回训练数据的theta的对数边际似然。

参数	说明
theta	array-like of shape (n_kernel_params,) default=None 核超参数的对数边际似然被评估。如果没有，则预先计算self.kernel_的log_marginal_likelihood。θ是回来了。
eval_gradient	bool, default=False 如果为真，则额外返回关于位置的核超参数的对数边际似然的梯度。如果为真，一定不为零。
clone_kernel	bool, default=True 如果为真，则复制内核属性。如果为False，则修改内核属性，但可能会导致性能改进。

返回值	说明
log_likelihood	float 训练数据的对数边际似然。
log_likelihood_gradient	ndarray of shape (n_kernel_params,), optional 关于位置的核超参数的对数边际似然的梯度。只有当eval_gradient为真时才返回。

predict(self, X, return_std=False, return_cov=False)

[源码]

使用高斯过程回归模型进行预测

我们也可以使用GP先验，基于一个不拟合的模型进行预测。除了预测分布的均值，还有其标准差 (return_std=True)或协方差(return_cov=True)。注意，最多可以请求其中的一个。

参数	说明
X	array-like of shape (n_samples, n_features) or list of object 对GP进行评估的查询点。
return_std	bool, default=False 如果为真，则返回查询点预测分布的标准偏差以及平均值。
return_cov	bool, default=False 如果为真，则返回联合预测分布在查询点的协方差和均值

返回值	说明
y_mean	ndarray of shape (n_samples, [n_output_dims]) 查询点的预测分布平均值
y_std	ndarray of shape (n_samples,), optional 查询点预测分布的标准差。只有当`return_std`为真时才返回。
y_cov	y_cov 联合预测分布的协方差是一个疑问点。只有当`return_cov`为真时才返回。

sample_y(self, X, n_samples=1, random_state=0)

[源码]

从高斯过程中抽取样本，在X处取值。

参数	说明
X	array-like of shape (n_samples, n_features) or list of object 对GP进行评估的查询点。
n_samples	int, default=1 从高斯过程中抽取的样本数
random_state	int, RandomState, default=0 确定随机数生成随机抽取样本。在多个函数调用中传递可重复的结果。参见:term: `Glossary <random_state>`.

返回值	说明
y_samples	ndarray of shape (n_samples_X, [n_output_dims], n_samples) Values of n_samples samples drawn from Gaussian process and evaluated at query points.

score(self, X, y, sample_weight=None)

[源码]

返回预测的决定系数R^2。

定义系数R^2为(1 - u/v)，其中u为(y_true - y_pred) ** 2).sum()的残差平方和，v为(y_true - y_true.mean()) ** 2).sum()的平方和。最好的可能的分数是1.0，它可能是负的(因为模型可以任意地更糟)。常数模型总是预测y的期望值，而不考虑输入特征，得到的R^2得分为0.0。

参数	说明
X	array-like of shape (n_samples, n_features) 测试样品。对于某些估计器，这可能是一个预先计算的内核矩阵或一列通用对象，而不是形状= (n_samples, n_samples_fitting)，其中n_samples_fitting是用于拟合估计器的样本数量。
y	array-like of shape (n_samples,) or (n_samples, n_outputs) X的真值。
sample_weight	array-like of shape (n_samples,), default=None 样本权重。

返回值	说明
score	float R^2 of self.predict(X) wrt. y.

注

调用回归变量上的score时使用的R2 score使用0.23版本的multioutput='uniform_average'来保持与r2_score的默认值一致。这影响了所有多输出回归的评分方法(除了MultiOutputRegressor)。

set_params(self, **params)

[源码]

设置这个估计器的参数。

该方法适用于简单估计器和嵌套对象(如管道)。后者具有形式为<component>__<parameter>的参数，这样就可以更新嵌套对象的每个样本。

参数	说明
**params	dict 估计参数。

返回值	说明
self	object 估计实例。

示例sklearn.gaussian_process.GaussianProcessRegressor¶

核岭回归与高斯过程回归的比较 ¶

不同核的高斯过程的先验和后验示例 ¶

带噪声水平估计的高斯过程回归(GPR) ¶

高斯过程回归：基本介绍性示例 ¶

Mauna Loa CO2数据的高斯过程回归(GPR) ¶

离散数据结构上的高斯过程 ¶