主要内容

logp

幼稚贝叶斯分类模型的日志无条件概率密度用于增量学习

年代yntax

描述

example

LP= logp(MDL,X)returns the log无条件的概率密度LP预测数据数据中的观察结果Xusing the naive Bayes classification model for incremental learningMDL. You can useLPto identify outliers in the training data.

例子

全部收缩

通过使用fitcnb, convert it to an incremental learner, and then use the incremental model to detect outliers in streaming data.

负载和预处理数据

加载人类活动数据集。随机洗牌数据。

loadhumanactivityRNG(1);%可再现性n = numel(actid); idx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

有关数据集的详细信息,请输入描述at the command line.

Train Naive Bayes Classification Model

Fit a naive Bayes classification model to a random sample of about 25% of the data.

idxtt = randsample([true false false false],n,true);ttmdl = fitcnb(x(idxtt,:),y(idxtt))
ttmdl =分类naiveBayes响应名称:'y'分类器:[] classNames:[1 2 3 4 5] scoretransform:'无'numObservations:6167 DistributionNames:{1×60 cell} cell} cell} distributionParameters:{5×60 cell} properties,方法,方法,方法,方法,方法,方法,方法

ttmdl是a分类NaiveBayesmodel object representing a traditionally trained model.

转换训练的模型

Convert the traditionally trained model to a naive Bayes classification model for incremental learning.

增量= incrementalLearner(TTMdl)
增量= incrementalClassificationNaiveBayes IsWarm: 1 Metrics: [1×2 table] ClassNames: [1 2 3 4 5] ScoreTransform: 'none' DistributionNames: {1×60 cell} DistributionParameters: {5×60 cell} Properties, Methods

增量是一个incrementalClassificationNaiveBayes目的。增量代表一个幼稚的贝叶斯分类模型,用于增量学习;参数值与参数相同ttmdl.

Detect Outliers

Determine unconditional density thresholds for outliers by using the traditionally trained model and training data. Observations in the streaming data yielding densities beyond the thresholds are considered outliers.

ttlp = logp(ttmdl,x(idxtt,:));[〜,下,上] = iSoutlier(ttlp)
lower = -336.0424
upper = 399.9853

相对于创建所学的内容,检测其余数据中的离群值ttmdl. Simulate a data stream by processing 1 observation at a time. At each iteration, calllogpto compute the log unconditional probability density of the observation and store each value.

% Preallocationidxil = 〜IDXTT;nil = sum(idxil);numobsperchunk = 1;nChunk =地板(nil/numobsperchunk);lp =零(nchunk,1);iso = false(nchunk,1);xil = x(idxil,:);yil = y(idxil);%增量拟合forj = 1:nchunk ibegin = min(nil,numObsPerChunk*(j-1) + 1); iend = min(nil,numObsPerChunk*j); idx = ibegin:iend; lp(j) = logp(IncrementalMdl,Xil(idx,:)); iso(j) = ((lp(j) < lower) + (lp(j) > upper)) >= 1;end

Plot the log unconditional probability densities of the streaming data. Identify the outliers.

figure; h1 = plot(lp); holdx = 1:nchunk;h2 = plot(x(iso),lp(iso),,'r*'); h3 = line(xlim,[lower lower],'颜色','g','linestyle',' - '); line(xlim,[upper upper],'颜色','g','linestyle',' - ') xlim ([0 nchunk]);ylabel ('Unconditional Density')xlabel('迭代') legend([h1 h2 h3],[“日志无条件概率”"Outliers"“临界点”]) 抓住离开

输入参数

全部收缩

渐进学习的天真贝叶斯分类模型,指定为incrementalClassificationNaiveBayes模型对象。您可以创建MDLdirectly or by converting a supported, traditionally trained machine learning model using theincrementalLearnerfunction. For more details, see the corresponding reference page.

您必须配置MDLto compute the log conditional probability densities on a batch of observations.

  • 如果MDL是经过转换的传统训练模型,您可以在没有任何修改的情况下计算日志条件概率。

  • Otherwise,MDL.DistributionParametersmust be a cell matrix withMDL.NumPredictors> 0 columns and at least one row, where each row corresponds to each class name inmdl.classnames.

批量的预测数据数据可以使用该数据计算日志条件概率密度,称为n-经过-MDL.NumPredictors浮点矩阵.

Note

  • logp万博1manbetx仅支持浮点输入预测数据。如果输入模型MDLrepresents a converted, traditionally trained model fit to categorical data, usedummyvarto convert each categorical variable to a numeric matrix of dummy variables, and concatenate all dummy variable matrices and any other numeric predictors. For more details, see虚拟变量.

  • For eachj= 1 throughn, 如果X(j,:)至少包含一个NaN,LP(j)NaN.

Data Types:single|双倍的

Output Arguments

全部收缩

日志无条件的概率密度,作为一个n-经过-1 floating-point vector.LP(j)是评估在X(j,:).

Data Types:single|双倍的

More About

全部收缩

无条件概率密度

The无条件概率密度of the predictors is the density's distribution marginalized over the classes.

In other words, the unconditional probability density is

P ( X 1 , .. , X P ) = k = 1 K P ( X 1 , .. , X P , Y = k ) = k = 1 K P ( X 1 , .. , X P | y = k ) π ( Y = k ) ,

whereπ(Y=k)是类先验概率。给定类的数据的条件分布(P(X1,..,XP|y=k))和类先验概率分布是培训选项(即,在训练分类器时指定它们)。

Prior Probability

Theprior probabilityof a class is the assumed relative frequency with which observations from that class occur in a population.

在R2021a中引入