sklearn.decomposition.MiniBatchDictionaryLearning¶

class sklearn.decomposition.MiniBatchDictionaryLearning(n_components=None, *, alpha=1, n_iter=1000, fit_algorithm='lars', n_jobs=None, batch_size=3, shuffle=True, dict_init=None, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000)

[源码]

Mini-batch字典学习

找到能用稀疏代码表示数据的字典(一组原子)。

解决优化问题:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
             (U,V)
             with || V_k ||_2 = 1 for all  0 <= k < n_components

在用户指南中阅读更多内容

参数	说明
n_components	int 要提取的字典元素的数量
alpha	float 稀疏控制参数
n_iter	int 要执行的迭代总数
fit_algorithm	{‘lars’, ‘cd’} cd:使用坐标下降法计算lasso解(linear_model.lars_path)。如果估计的组件是稀疏的，Lars会更快。
n_jobs	int or None, optional (default=None) 要运行的并行作业数量。没有一个是1，除非在joblib。parallel_backend上下文。-1表示使用所有处理器。更多细节请参见术语表。
batch_size	int 每批样品的数量
shuffle	bool 是否在成批前对样品进行洗牌
dict_init	array of shape (n_components, n_features) 用于热重启场景的字典的初始值
transform_algorithm	{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’} 用于转换数据的算法。lars:使用最小角度回归法(linear_model.lars_path) lasso_lars:使用lars计算Lasso解lasso_cd:使用坐标下降法计算Lasso解(linear_model.Lasso)如果估计的组件是稀疏的，lasso_lars会更快。omp:使用正交匹配追踪估计稀疏解阈值:将投影字典中所有小于alpha的系数都压缩为零* X '
transform_n_nonzero_coefs	int, `0.1 * n_features` by default 在解的每一列中目标的非零系数的数目。这只被algorithm='lars'和algorithm='omp'使用，在`omp`情况下被`alpha`覆盖。
transform_alpha	float, 1. by default 如果`algorithm'lasso_lars'`或`algorithm='lasso_cd'`，则`alpha`是对L1范数的惩罚。如果`algorithm='threshold'`，`alpha`是阈值的绝对值，低于这个阈值，系数将被压缩为零。若`algorithm='omp'`，则`alpha`为容差参数:目标重构误差的值。在本例中，它覆盖`n_nonzero_coefs`。
verbose	bool, optional (default: False) 控制程序的冗长。
split_sign	bool, False by default 是否将稀疏特征向量分割为其负部分和正部分的连接。这可以提高下游分类器的性能。
random_state	int, RandomState instance or None, optional (default=None) 用于在没有指定dict_init时初始化字典，在shuffle被设置为True时随机变换数据，以及更新字典。在多个函数调用中传递可重复的结果。看到术语表。新版本0.20。
positive_code	bool 在寻找代码时是否加强积极性。
positive_dict	bool 查找字典时是否要加强积极性。新版本0.20。
transform_max_iter	int, optional (default=1000) 如果algorithm='lasso_cd'或lasso_lars，则执行的最大迭代次数。新版本0.22。

属性	说明
components_	array, [n_components, n_features] 从数据中提取的样本
inner_stats_	tuple of (A, B) ndarrays 由算法保存的内部充分的统计信息。保留它们在在线设置中很有用，以避免丢失演进的历史，但它们对最终用户不应该有任何用处。A (n_components, n_components)是字典协方差矩阵。B (n_features, n_components)是数据近似矩阵
n_iter_	int 运行的迭代次数。
iter_offset_	int 以前执行的数据批的迭代次数。
random_state_	RandomState 由种子、随机数生成器或np.random生成的RandomState实例。

另见:

SparseCoder
DictionaryLearning
SparsePCA
MiniBatchSparsePCA

笔记

参考文献

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (https://www.di.ens.fr/sierra/pdfs/icml09.pdf)

方法

方法	说明
`fit`(self, X[, y])	根据X中的数据拟合模型。
`fit_transform`(self, X[, y])	拟合数据，然后转换它。
`get_params`(self[, deep])	获取这个估计器的参数。
`partial_fit`(self, X[, y, iter_offset])	使用X中的数据作为一个小批更新模型。
`set_params`(self, **params)	设置这个估计器的参数。
`transform`(self, X)	将数据编码为字典原子的稀疏组合。

__init__(n_components=None, *, alpha=1, n_iter=1000, fit_algorithm='lars', n_jobs=None, batch_size=3, shuffle=True, dict_init=None, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, verbose=False, split_sign=False, random_state=None, positive_code=False, positive_dict=False, transform_max_iter=1000)

[源码]

该方法适用于简单估计器和嵌套对象(如管道)。后者具有形式为__的参数，这样就可以更新嵌套对象的每个样本。

fit(self, X, y=None)

[源码]

根据X中的数据拟合模型。

参数	说明
X	array-like, shape (n_samples, n_features) 训练向量，其中样本数量中的n_samples和n_features为feature的数量。
y	Ignored

返回值	说明
self	object 返回实例本身。

fit_transform(*self*, *X*, *y=None*, ***fit_params*)

[源码]

拟合数据，然后转换它。

使用可选参数fit_params将transformer与X和y匹配，并返回X的转换版本。

参数	说明
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
y	ndarray of shape (n_samples,), default=None 目标值
**fit_params	dict 其他拟合参数。

返回值	说明
X_new	ndarray array of shape (n_samples, n_features_new) 转换的数组

get_params(self, deep=True)

[源码]

获取这个估计器的参数。

参数	说明
deep	bool, default=True 如果为真，将返回此估计器的参数以及包含的作为估计器的子对象。

返回值	说明
params	mapping of string to any 参数名称映射到它们的值。

`partial_fit`(self, X, y=None, iter_offset=None)

[源码]

使用X中的数据作为一个mini-batch更新模型。

参数	说明
X	array-like, shape (n_samples, n_features) 训练向量，其中样本数量中的n_samples和n_features为feature的数量。
y	Ignored
iter_offset	integer, optional 在调用partial_fit之前执行的数据批的迭代次数。这是可选的:如果没有传递数字，则使用对象的内存。

返回值	说明
self	object 返回实例本身。

set_params(self, params)

[源码]

设置这个估计器的参数。

该方法适用于简单估计器和嵌套对象(如管道)。后者具有形式为__的参数，这样就可以更新嵌套对象的每个样本。

参数	说明
**params	dict 估计器参数

返回值	说明
self	object 估计器实例。

transform(self, X)

[源码]

将数据编码为字典原子的稀疏组合。

编码方法由对象参数transform_algorithm决定。

参数	说明
X	array of shape (n_samples, n_features) 要转换的测试数据，必须具有与用于训练模型的数据相同数量的特征。

返回值	说明书
X_new	array, shape (n_samples, n_components) 转换过的数据。

示例sklearn.decomposition.MiniBatchDictionaryLearning¶

基于字典学习的图像去噪 ¶

人脸数据集分解 ¶