Toggle Menu

sklearn.naive_bayes.BernoulliNB¶

class sklearn.naive_bayes.BernoulliNB(*, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)

用于多元伯努利模型的朴素贝叶斯分类器。

像MultinomialNB一样，这个分类器也适用于离散数据。区别在于，MultinomialNB可处理多分类，但BernoulliNB是为二分类或布尔型函数而设计的。

在用户指南中阅读更多内容。

参数	说明
alpha	float, default=1.0 附加的平滑参数(Laplace/Lidstone)，0是不平滑
binarize	float or None, default=0.0 用于将样本特征二值化（映射为布尔值）的阈值。如果为None，则假定输入已经由二分类向量组成。
fit_prior	bool, default=True 是否学习类别先验概率。如果为False，将使用统一的先验。
class_prior	array-like of shape (n_classes,), default=None 类别的先验概率。一经指定先验概率不能随着数据而调整。

属性	说明
class_count_	ndarray of shape (n_classes) 拟合期间每个类别遇到的样本数。此值由提供的样本权重加权。
class_log_prior_	ndarray of shape (n_classes) 每个类别的对数概率（平滑）。
classes_	ndarray of shape (n_classes,) 分类器已知的类别标签
feature_count_	ndarray of shape (n_classes, n_features) 拟合期间每个（类别，特征）遇到的样本数。此值由提供的样品权重加权。
feature_log_prob_	ndarray of shape (n_classes, n_features) 给定一类P(x_i / y)的特征的经验对数概率。
n_features_	int 每个样本的特征数量。

参考文献

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

示例

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB()
>>> print(clf.predict(X[2:3]))
[3]

方法

方法	说明
`fit`(X, y[, sample_weight])	根据X，y拟合朴素贝叶斯分类器
`get_params`([deep])	获取这个估计器的参数
`partial_fit`(X, y[, classes, sample_weight])	对一批样本进行增量拟合
`predict`(X)	对测试向量X进行分类。
`predict_log_proba`(X)	返回针对测试向量X的对数概率估计
`predict_proba`(X)	返回针对测试向量X的概率估计
`score`(X, y[, sample_weight])	返回给定测试数据和标签上的平均准确率。
`set_params`(**params)	为这个估计器设置参数

__init__(*, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)

初始化self。详情可参阅 type（self）的帮助。

fit(X, y, sample_weight=None)

根据X，y拟合朴素贝叶斯分类器

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 用于训练的向量，其中n_samples是样本数量，n_features是特征数量。
y	array-like of shape (n_samples,) 目标值。
sample_weight	array-like of shape (n_samples,), default=None 应用于单个样本的权重（1.未加权）。

返回值	说明
self	object

get_params(deep=True)

获取这个估计器的参数

参数	说明
deep	bool, default=True 如果为True，则将返回这个估计器的参数和所包含的估算器子对象。

返回值	说明
params	mapping of string to any 参数名称映射到其值。

partial_fit(X, y, classes=None, sample_weight=None)

对一批样本进行增量拟合.

参数	说明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 用于训练的向量，其中n_samples是样本数量，n_features是特征数量。
y	array-like of shape (n_samples,) 目标值。
classes	array-like of shape (n_classes), default=None y向量中可能出现的所有类别的列表。必须在第一次调用partial_fit时提供，在随后的调用中可以省略。
sample_weight	array-like of shape (n_samples,), default=None 应用于单个样本的权重（1.未加权）。

返回值	说明
self	object

predict(X)

对测试向量X进行分类。

参数	说明
X	array-like of shape (n_samples, n_features)

返回值	说明
C	ndarray of shape (n_samples,) X的预测目标值

predict_log_proba(X)

返回针对测试向量X的对数概率估计

参数	说明
X	array-like of shape (n_samples, n_features)

返回值	说明
C	array-like of shape (n_samples, n_classes) 返回模型中每个类别的样本的对数概率。这些列按照排序顺序对应于类，就像它们出现在属性classes_中一样。

predict_proba(X)

返回针对测试向量X的概率估计

参数	说明
X	array-like of shape (n_samples, n_features)

返回值	说明
C	array-like of shape (n_samples, n_classes) 返回模型中每个类别的样本概率。这些列按照排序顺序对应于类，就像它们出现在属性classes_中一样。

score(X, y, sample_weight=None)

返回给定测试数据和标签上的平均准确率。

在多标签分类中，这是子集准确性，这是一个严格的指标，因为您需要为每个样本正确预测每个标签集。

参数	说明
X	array-like of shape (n_samples, n_features) 测试样本
y	array-like of shape (n_samples,) or (n_samples, n_outputs) X的真实标签
sample_weight	array-like of shape (n_samples,), default=None 样本权重

返回值	说明
score	float self.predict(X) 关于y的平均准确率。

set_params(**params)

为这个估计器设置参数。

参数	说明
**params	dict 估计器参数。

返回值	说明
self	object 估计器实例。

sklearn.naive_bayes.BernoulliNB使用示例¶

基于完全随机树的哈希特征变换

基于完全随机树的哈希特征变换 ¶

使用稀疏特征对文本文档进行分类

使用稀疏特征对文本文档进行分类 ¶

加入交流群
备注:机器学习