Main Content

resubMargin

Resubstitution classification margins for naive Bayes classifier

Description

example

m= resubMargin(Mdl)returns the resubstitutionClassification Margin(m) for the naive Bayes classifierMdlusing the training data stored inMdl.Xand the corresponding class labels stored inMdl.Y.

mis returned as a numeric vector with the same length asY. The software estimates each entry ofmusing the trained naive Bayes classifierMdl, the corresponding row ofX, and the true class labelY.

Examples

collapse all

Estimate the resubstitution (in-sample) classification margins of a naive Bayes classifier. An observation margin is the observed true class score minus the maximum false class score among all scores in the respective class.

Load thefisheririsdata set. CreateXas a numeric matrix that contains four petal measurements for 150 irises. CreateYas a cell array of character vectors that contains the corresponding iris species.

loadfisheririsX = meas; Y = species;

Train a naive Bayes classifier using the predictorsXand class labelsY. A recommended practice is to specify the class names.fitcnbassumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})
Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods

Mdlis a trainedClassificationNaiveBayesclassifier.

Estimate the resubstitution classification margins.

m = resubMargin(Mdl); median(m)
ans = 1.0000

Display the histogram of the in-sample classification margins.

histogram(m,30,'Normalization','probability') xlabel('In-Sample Margins') ylabel('Probability') title('Probability Distribution of the In-Sample Margins')

Classifiers that yield relatively large margins are preferred.

Perform feature selection by comparing in-sample margins from multiple models. Based solely on this comparison, the model with the highest margins is the best model.

Load thefisheririsdata set. Specify the predictorsXand class labelsY.

loadfisheririsX = meas; Y = species;

Define these two data sets:

  • fullXcontains all predictors.

  • partXcontains the last two predictors.

fullX = X; partX = X(:,3:4);

Train a naive Bayes classifier for each predictor set.

FullMdl = fitcnb(fullX,Y); PartMdl = fitcnb(partX,Y);

Estimate the in-sample margins for each classifier.

fullM = resubMargin(FullMdl); median(fullM)
ans = 1.0000
partM = resubMargin(PartMdl); median(partM)
ans = 1.0000

The two models have similar performance. However,PartMdlis less complex.

Input Arguments

collapse all

Full, trained naive Bayes classifier, specified as aClassificationNaiveBayesmodel trained byfitcnb.

More About

collapse all

Classification Edge

Theclassification edgeis the weighted mean of the classification margins.

If you supply weights, then the software normalizes them to sum to the prior probability of their respective class. The software uses the normalized weights to compute the weighted mean.

When choosing among multiple classifiers to perform a task such as feature section, choose the classifier that yields the highest edge.

Classification Margin

Theclassification marginfor each observation is the difference between the score for the true class and the maximal score for the false classes. Margins provide a classification confidence measure; among multiple classifiers, those that yield larger margins (on the same scale) are better.

Posterior Probability

Theposterior probabilityis the probability that an observation belongs in a particular class, given the data.

朴素贝叶斯的posterior probability that a classification iskfor a given observation (x1,...,xP) is

P ^ ( Y = k | x 1 , .. , x P ) = P ( X 1 , ... , X P | y = k ) π ( Y = k ) P ( X 1 , ... , X P ) ,

where:

  • P ( X 1 , ... , X P | y = k ) 的条件联合密度预测吗given they are in classk.Mdl.DistributionNamesstores the distribution names of the predictors.

  • π(Y=k) is the class prior probability distribution.Mdl.Priorstores the prior distribution.

  • P ( X 1 , .. , X P ) is the joint density of the predictors. The classes are discrete, so P ( X 1 , ... , X P ) = k = 1 K P ( X 1 , ... , X P | y = k ) π ( Y = k ) .

Prior Probability

Theprior probabilityof a class is the assumed relative frequency with which observations from that class occur in a population.

Classification Score

The naive Bayesscoreis the class posterior probability given the observation.

Introduced in R2014b