sklearn.decomposition.NMF¶

class sklearn.decomposition.NMF(n_components=None, *, init=None, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=200, random_state=None, alpha=0.0, l1_ratio=0.0, verbose=0, shuffle=False)

[源码]

非负矩阵分解

找出两个非负矩阵(W, H)，它们的乘积近似于非负矩阵x。这种分解可以用于降维、源分离或主题提取。

目标函数为:

0.5 * ||X - WH||_Fro^2
+ alpha * l1_ratio * ||vec(W)||_1
+ alpha * l1_ratio * ||vec(H)||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2
+ 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2

Where:

||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm)
||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)

对于乘更新(' mu ')求解器，通过改变参数beta_loss，可以将Frobenius范数(0.5 * ||X - WH||_Fro^2)变为另一个散度损失。

通过W和H的交替最小化来最小化目标函数。

在用户指南中阅读更多内容

新版本 0.18 。

参数	说明
n_components	int or None 样本的数量，如果没有设置n_components，则保留所有特性。
init	None / ‘random’ / ‘nndsvd’ / ‘nndsvda’ / ‘nndsvdar’ / ‘custom’ 用于初始化过程的方法。默认值:None。有效的选项: None: ‘nndsvd’ if n_components <= min(n_samples, n_features) 否则随机。 ‘random’: non-negative random matrices, scaled with: sqrt(X.mean() / n_components) ‘nndsvd’: Nonnegative Double Singular Value Decomposition (NNDSVD) 初始化 (better for sparseness) ‘nndsvda’: NNDSVD with zeros filled with the average of X (better when sparsity is not desired) ‘nndsvdar’: NNDSVD with zeros filled with small random values (当不需要稀疏性时，通常更快，更不精确的NNDSVDa替代方案) ‘custom’: 使用自定义矩阵W和H
solver	'cd'/'mu' “cd”是一个坐标下降求解器。' mu '是一个乘法更新求解器。新版本0.17:坐标下降求解器。版本0.19中的新版本:乘法更新求解器。
beta_loss	float or string, default ‘frobenius’ 字符串必须是{' frobenius '， ' kullback-leibler '， ' itakura-saito '}。为了使散度最小，测量X和点积WH之间的距离。注意，与“frobenius”(或2)和“kullback-leibler”(或1)不同的值会导致匹配速度明显较慢。注意，对于beta_loss <= 0(或' itakura-saito ')，输入矩阵X不能包含0。只在求解器中使用。新版本为0.19。
tol	float, default: 1e-4 停止条件的容忍度。
max_iter	integer, default: 200 超时前的最大迭代次数。
random_state	int, RandomState instance, default=None 用于初始化(当`init`== ' nndsvdar '或' random ')，并在坐标下降。在多个函数调用中传递可重复的结果。详见术语表。。
alpha	double, default: 0. 乘正则化项的常数。将它设为0，这样就没有正则化。在0.17版本中新增:用于坐标下降求解器的alpha。
l1_ratio	double, default: 0. 正则化混合参数，0 <= l1_ratio <= 1。对于l1_ratio = 0，罚分为元素L2罚分(又名Frobenius Norm)。对于l1_ratio = 1，它是元素上的L1惩罚。对于0 < l1_ratio < 1，惩罚为L1和L2的组合。在0.17版本中新增:在坐标下降求解器中使用正则化参数l1_ratio。
verbose	bool, default=False 是否冗长。
shuffle	boolean, default: False If true, randomize the order of coordinates in the CD solver.

属性	说明
components_	array, [n_components, n_features] 分解矩阵，有时称为“字典”。
n_components_	integer 组件的数量。如果给定n_components参数，则它与n_components参数相同。否则，它将与特性的数量相同。
reconstruction_err_	number 训练数据X与拟合模型重建数据WH之间的矩阵差(或贝塔散度)的Frobenius范数。
n_iter_	int 实际迭代次数。

参考文献

Cichocki, Andrzej, and P. H. A. N. Anh-Huy. “Fast local algorithms for large scale nonnegative matrix and tensor factorizations.” IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708-721, 2009.

Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9).

示例

>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import NMF
>>> model = NMF(n_components=2, init='random', random_state=0)
>>> W = model.fit_transform(X)
>>> H = model.components_

方法

方法	说明
`fit`(X[, y])	学习数据X的NMF模型。
`fit_transform`(X[, y, W, H])	学习数据X的NMF模型并返回转换后的数据。
`get_params`([deep])	获取这个估计器的参数。
`inverse_transform`(W)	将数据转换回其原始空间。
`set_params`(**params)	设置这个估计器的参数。
`transform`(X)	根据拟合的NMF模型对数据X进行变换

__init__(n_components=None, *, init=None, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=200, random_state=None, alpha=0.0, l1_ratio=0.0, verbose=0, shuffle=False)

[源码]

初始化self. See 请参阅help(type(self))以获得准确的说明。

fit(X, y=None, **params)

[源码]

学习数据X的NMF模型。

参数	说明
X	{array-like, sparse matrix}, shape (n_samples, n_features) 待分解的数据矩阵
y	Ignored

返回值	说明
self	-

fit_transform(X, y=None, W=None, H=None)

[源码]

学习数据X的NMF模型并返回转换后的数据。

这比先调用fit再进行转换更有效。

参数	说明
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features) 待分解的数据矩阵
y	Ignored
W	array-like, shape (n_samples, n_components) 如果`init= ' custom '`，则使用它作为解决方案的初始猜测。
H	array-like, shape (n_components, n_features) 如果`init= ' custom '`，则使用它作为解决方案的初始猜测。

返回值	说明
W	array, shape (n_samples, n_components) 转换数据。

get_params(deep=True)

[源码]

获取这个估计器的参数。

参数	说明
deep	bool, default=True 如果为True，则将返回此估计器的参数和所包含的作为估计器的子对象。

返回值	说明
X_new	ndarray array of shape (n_samples, n_features_new) 转换的数组

inverse_transform(W)

[源码]

将数据转换回其原始空间。

参数	说明
W	{array-like, sparse matrix}, shape (n_samples, n_components) 转换后的数据矩阵

返回值	说明
X	{array-like, sparse matrix}, shape (n_samples, n_features) 原始形状的数据矩阵新版本 0.18 。

set_params(*params)

[源码]

设置这个估计器的参数。

该方法适用于简单估计量和嵌套对象(如pipelines)。后者具有形式为<component>_<parameter>的参数，这样就让更新嵌套对象的每个组件成为了可能。

参数	说明
**params	dict 估计参数。

返回值	说明
self	object 估计参数。

transform(X)

[源码]

根据拟合的NMF模型对数据X进行变换

参数	说明
X	{array-like, sparse matrix}, shape (n_samples, n_features) 模型需要转换的数据矩阵

返回值	说明
W	array, shape (n_samples, n_components) 转换后的数据

示例sklearn.decomposition.NMF¶

Beta-divergence损失函数 ¶

非负矩阵分解与潜在Dirichlet分配的主题提取 ¶

使用Pipeline和GridSearchCV选择降维算法 ¶

人脸数据集分解 ¶