sklearn.decomposition.DictionaryLearning¶

class sklearn.decomposition.DictionaryLearning(n_components=None, *, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=None, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000)

[源码]

字典学习

找到能用稀疏代码表示数据的字典(a set of atoms)。

解决优化问题:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
            (U,V)
            with || V_k ||_2 = 1 for all  0 <= k < n_components

在用户指南中阅读更多内容

参数	说明
n_components	int, default=n_features 提取的字典元素数量
alpha	float, default=1.0 稀疏控制参数
max_iter	int, default=1000 要执行的最大迭代次数
tol	float, default=1e-8 数值误差容限
fit_algorithm	{‘lars’, ‘cd’}, default=’lars’ lars:使用最小角度回归方法解决lasso问题(linear_model.lars_path) cd:使用坐标下降法计算Lasso解(linear_model.Lasso)。如果估计的样本是稀疏的，Lars会更快。新版本0.17: 采用cd坐标下降方法，以提高速度。
transform_algorithm	{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}, default=’omp’ 数据变换算法lars:使用最小角度回归法(linear_model.lars_path) 。lasso_lars:使用lars计算Lasso解。lasso_cd:使用坐标下降法计算Lasso解(linear_model.Lasso)。如果估计的样本是稀疏的，lasso_lars会更快。omp:使用正交匹配追踪估计稀疏解阈值:将投影字典中所有小于alpha的系数都压缩为`dictionary * X'` 新的版本0.17:lasso_cd坐标下降方法，以提高速度。
transform_n_nonzero_coefs	*int, default=0.1n_features** 在解的每一列中目标的非零系数的数目。这只用于`algorithm='lars'`和 `algorithm='omp'` ,并且在 `omp` 情况下被`alpha`覆盖。
transform_alpha	float, default=1.0 如果`algorithm='lasso_lars'` 或 `algorithm='lasso_cd'`, 则`alpha` 是对L1范数的惩罚。如果`algorithm='threshold'`, `alpha`是阈值的绝对值，低于这个阈值，系数将被压缩为零。若 `algorithm='omp'`,则`alpha`为容差参数:目标重构误差的值。在本例中，它覆盖`n_nonzero_coefs`。
n_jobs	int or None, default=None 要运行的并行计算数量。`None`就是1，除非在`joblib.parallel_backend`上下文。-1表示使用所有处理器。更多细节请参见Glossary。
code_init	array of shape (n_samples, n_components), default=None 代码的初始值，用于warm restart
dict_init	array of shape (n_components, n_features), default=None 字典的初始值，用于warm restart
verbose	bool, default=False 控制程序的冗长。
split_sign	bool, default=False 是否将稀疏特征向量分割为其负部分和正部分的连接。这可以提高下游分类器的性能。
random_state	int, RandomState instance or None, optional (default=None) 用于在没有指定`dict_init`时初始化字典，在`shuffle`被设置为`True`时随机变换数据，以及更新字典。在多个函数调用中传递可重复的结果。详见Glossary。
positive_code	bool, default=False 在寻找代码时是否加强积极性。新版本0.20。
positive_dict	bool, default=False 在寻找字典时是否加强积极性。新版本0.20。
transform_max_iter	int, default=1000 如果`algorithm='lasso_lars'` 或 `algorithm='lasso_cd'`,，则执行最大迭代次数。新版本0.22。

属性	说明
components_array	[n_components, n_features]
error_	array
n_iter_	int

另见：

SparseCoder
MiniBatchDictionaryLearning
SparsePCA
MiniBatchSparsePCA

笔记

参考资料

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (https://www.di.ens.fr/sierra/pdfs/icml09.pdf)

方法

方法	说明
`fit`(self, X[, y])	根据X中的数据拟合模型。
`fit_transform`(self, X[, y])	拟合数据，然后转换它。
`get_params`(self[, deep])	获取这个估计器的参数。
`set_params`(self, **params)	设置这个估计器的参数。
`transform`(self, X)	将数据编码为字典原子的稀疏组合。

__init__(self, n_components=None, *, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=None, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000)

[源码]

初始化self.请参阅help(type(self))以获得准确的说明。

fit(self, X, y=None)

[源码]

根据X中的数据拟合模型。

参数	说明
X	array-like, shape (n_samples, n_features) 训练向量，其中样本数量中的n_samples和n_features为feature的数量。
y	Ignored

返回值	说明
self	object 返回object自身

fit_transform(self, X, y=None, fit_params)

[源码]

拟合数据，然后转换它。

使用可选参数fit_params将transformer与X和y匹配，并返回X的转换版本。

参数	说明
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
Y	ndarray of shape (n_samples,), default=None 目标值
fit_params	dict 其他拟合参数。

返回值	说明
X_new	ndarray array of shape (n_samples, n_features_new) 转换过的数组

get_params(self, deep=True)

[源码]

获取这个估计器的参数。

参数	说明
deep	bool, default=True 如果为真，将返回此估计器的参数以及包含的作为估计器的子对象。

返回值	说明
params	mapping of string to any 参数名称映射到它们的值。

set_params(self, **params)

[源码]

设置这个估计器的参数。

该方法适用于简单估计器和嵌套对象(如管道)。后者具有形式为__的参数，这样就可以更新嵌套对象的每个样本。

参数	说明
params	dict 估计参数

返回值	说明
self	object 估计距离

transform(self, X)

[源码]

将数据编码为字典原子的稀疏组合。

编码方法由对象参数transform_algorithm决定。

参数	说明
X	array of shape (n_samples, n_features) 要转换的测试数据，必须具有与用于训练模型的数据相同数量的特征。

返回值	说明
X	array, shape (n_samples, n_components) 转换过的数据