sklearn.datasets.make_multilabel_classification¶

sklearn.datasets.make_multilabel_classification(n_samples=100, n_features=20, *, n_classes=5, n_labels=2, length=50, allow_unlabeled=True, sparse=False, return_indicator='dense', return_distributions=False, random_state=None)

[源码]

生成随机的多标签分类问题。

对于每个样本，生成过程为：

选择标签数：n〜Poisson（n_labels）
n次，选择一个类c：c〜多项式（theta）
选择文档长度：k〜泊松（长度）
k次，选择一个单词：w〜多项式（theta_c）

在上述过程中，使用拒绝采样来确保n永远不为零或大于n_classes，并且文档长度永远不为零。同样，我们拒绝已经选择的类。

在用户指南中阅读更多内容。

参数	说明
n_samples	nt, optional (default=100) 样本数。
n_features	int, optional (default=20) 功能总数。
n_classes	int，可选（默认值= 5）分类问题的类数。
n_labels	int, optional (default=2) 每个实例的平均标签数。更准确地说，每个样本的标签数是从泊松分布中以n_labels作为其期望值得出的，但是样本受n_class限制（使用拒绝采样），并且如果allow_unlabeled为False，则该值必须为非零。
length	int, optional (default=50) 特征的总和（如果是文档，则为单词数）是从具有此期望值的泊松分布中得出的。
allow_unlabeled	bool, optional (default=True) 如果为True，则某些实例可能不属于任何类。
sparse	bool, optional (default=False) 如果为True，则返回一个稀疏特征矩阵 0.17版中的新功能：允许稀疏输出的参数。
return_indicator	‘dense’ (default),‘sparse’,False 如果为dense，则以密集二进制指示符格式返回Y。如果为'sparse'，则以稀疏二进制指示符格式返回Y。 False返回标签列表的列表。
return_distributions	bool, optional (default=False) 如果为True，则返回给定类别的特征的先验类别概率和条件概率，并从中得出数据。
random_state	int, RandomState instance, default=None 确定用于生成数据集的随机数生成。为多个函数调用传递可重复输出的int值。请参阅词汇表。

返回值	说明
X	array of shape [n_samples, n_features] 生成的样本。
Y	array or sparse CSR matrix of shape [n_samples, n_classes] 标签集。
p_c	array, shape [n_classes] 绘制每个类的概率。仅在return_distributions = True时返回。
p_w_c	array, shape [n_features, n_classes] 给每个类别绘制每个要素的概率。仅在return_distributions = True时返回。

sklearn.datasets.make_multilabel_classification使用示例¶

多标签分类 ¶

绘制随机生成的多标签数据集 ¶