predict

Predict labels for Gaussian kernel classification model

collapse all in page

Syntax

Label = predict(Mdl,X)

[Label,Score] = predict(Mdl,X)

Description

Label= predict(Mdl,X)返回一个向量的预测类标签predictor data in the matrix or tableX, based on the binary Gaussian kernel classification modelMdl.

example

[Label,Score] = predict(Mdl,X)also returnsclassification scoresfor both classes.

Examples

collapse all

Predict Training Set Labels

Open Live Script

Predict the training set labels using a binary kernel classification model, and display the confusion matrix for the resulting classification.

Load theionospheredata set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

loadionosphere

Train a binary kernel classification model that identifies whether the radar return is bad ('b') or good ('g').

rng('default')% For reproducibilityMdl = fitckernel(X,Y);

Mdlis aClassificationKernelmodel.

Predict the training set, or resubstitution, labels.

label = predict(Mdl,X);

Construct a confusion matrix.

ConfusionTrain = confusionchart(Y,label);

Figure contains an object of type ConfusionMatrixChart.

The model misclassifies one radar return for each class.

Predict Test Set Labels

Open Live Script

Predict the test set labels using a binary kernel classification model, and display the confusion matrix for the resulting classification.

Load theionospheredata set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

loadionosphere

Partition the data set into training and test sets. Specify a 15% holdout sample for the test set.

rng('default')% For reproducibilityPartition = cvpartition(Y,'Holdout',0.15); trainingInds = training(Partition);% Indices for the training settestInds = test(Partition);% Indices for the test set

Train a binary kernel classification model using the training set. A good practice is to define the class order.

Mdl = fitckernel(X(trainingInds,:),Y(trainingInds),'ClassNames',{'b','g'});

Predict the training-set labels and the test set labels.

labelTrain = predict(Mdl,X(trainingInds,:)); labelTest = predict(Mdl,X(testInds,:));

Construct a confusion matrix for the training set.

ConfusionTrain = confusionchart(Y(trainingInds),labelTrain);

Figure contains an object of type ConfusionMatrixChart.

The model misclassifies only one radar return for each class.

Construct a confusion matrix for the test set.

ConfusionTest = confusionchart(Y(testInds),labelTest);

Figure contains an object of type ConfusionMatrixChart.

The model misclassifies one bad radar return as being a good return, and five good radar returns as being bad returns.

Estimate Posterior Class Probabilities

Open Live Script

Estimate posterior class probabilities for a test set, and determine the quality of the model by plotting a receiver operating characteristic (ROC) curve. Kernel classification models return posterior probabilities for logistic regression learners only.

Load theionospheredata set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

loadionosphere

Partition the data set into training and test sets. Specify a 30% holdout sample for the test set.

rng('default')% For reproducibilityPartition = cvpartition(Y,'Holdout',0.30); trainingInds = training(Partition);% Indices for the training settestInds = test(Partition);% Indices for the test set

Train a binary kernel classification model. Fit logistic regression learners.

Mdl = fitckernel(X(trainingInds,:),Y(trainingInds),...'ClassNames',{'b','g'},'Learner','logistic');

Predict the posterior class probabilities for the test set.

[~,posterior] = predict(Mdl,X(testInds,:));

BecauseMdlhas one regularization strength, the outputposterioris a matrix with two columns and rows equal to the number of test-set observations. Columnicontains posterior probabilities ofMdl.ClassNames(i)given a particular observation.

计算性能指标(真正的positive rates and false positive rates) for a ROC curve and find the area under the ROC curve (AUC) value by creating arocmetricsobject.

rocObj = rocmetrics(Y(testInds),posterior,Mdl.ClassNames);

Plot the ROC curve for the second class by using theplotfunction ofrocmetrics.

plot(rocObj,ClassNames=Mdl.ClassNames(2))

Figure contains an axes object. The axes object with title ROC Curve contains 3 objects of type roccurve, scatter, line. These objects represent g (AUC = 0.9042), g Model Operating Point.

The AUC is close to1, which indicates that the model predicts labels well.

Input Arguments

collapse all

`Mdl`—Binary kernel classification model
`ClassificationKernel`model object

Binary kernel classification model, specified as aClassificationKernelmodel object. You can create aClassificationKernelmodel object usingfitckernel.

`X`—Predictor data to be classified
numeric matrix|table

Predictor data to be classified, specified as a numeric matrix or table.

Each row ofXcorresponds to one observation, and each column corresponds to one variable.

For a numeric matrix:
- The variables in the columns ofXmust have the same order as the predictor variables that trainedMdl.
- If you trainedMdlusing a table (for example,Tbl) andTblcontains all numeric predictor variables, thenXcan be a numeric matrix. To treat numeric predictors inTblas categorical during training, identify categorical predictors by using theCategoricalPredictorsname-value pair argument offitckernel. IfTblcontains heterogeneous predictor variables (for example, numeric and categorical data types) andXis a numeric matrix, thenpredictthrows an error.
For a table:
- predictdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.
- If you trainedMdlusing a table (for example,Tbl), then all predictor variables inXmust have the same variable names and data types as those that trainedMdl(stored inMdl.PredictorNames). However, the column order ofXdoes not need to correspond to the column order ofTbl. Also,TblandXcan contain additional variables (response variables, observation weights, and so on), butpredictignores them.
- If you trainedMdlusing a numeric matrix, then the predictor names inMdl.PredictorNamesand corresponding predictor variable names inXmust be the same. To specify predictor names during training, see thePredictorNamesname-value pair argument offitckernel. All predictor variables inXmust be numeric vectors.Xcan contain additional variables (response variables, observation weights, and so on), butpredictignores them.

Data Types:table|double|single

Output Arguments

collapse all

`Label`— Predicted class labels
categorical array | character array | logical matrix | numeric matrix | cell array of character vectors

预测类标签,returned as a categorical or character array, logical or numeric matrix, or cell array of character vectors.

Labelhasnrows, wherenis the number of observations inX, and has the same data type as the observed class labels (Y) used to trainMdl.(The software treats string arrays as cell arrays of character vectors.)

predictclassifies observations into the class yielding the highest score.

`Score`— Classification scores
numeric array

Classification scores, returned as ann-by-2 numeric array, wherenis the number of observations inX.Score(i,j)is the score for classifying observationiinto classj.Mdl.ClassNamesstores the order of the classes.

IfMdl.Learneris'logistic', then classification scores are posterior probabilities.

More About

collapse all

Classification Score

For kernel classification models, the rawclassification scorefor classifying the observationx, a row vector, into the positive class is defined by

$f (x) = T (x) β + b .$

$T (\cdot)$ is a transformation of an observation for feature expansion.
βis the estimated column vector of coefficients.
bis the estimated scalar bias.

The raw classification score for classifyingxinto the negative class is−f(x). The software classifies observations into the class that yields a positive score.

If the kernel classification model consists of logistic regression learners, then the software applies the'logit'score transformation to the raw classification scores (seeScoreTransform).

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

predictdoes not support talltabledata.

For more information, seeTall Arrays.

Version History

Introduced in R2017b

predict

Syntax

Description

Examples

Predict Training Set Labels

Predict Test Set Labels

Estimate Posterior Class Probabilities

Input Arguments

Mdl—Binary kernel classification modelClassificationKernelmodel object

X—Predictor data to be classifiednumeric matrix|table

Output Arguments

Label— Predicted class labelscategorical array | character array | logical matrix | numeric matrix | cell array of character vectors

Score— Classification scoresnumeric array

More About

Classification Score

Extended Capabilities

Tall ArraysCalculate with arrays that have more rows than fit in memory.

Version History

See Also

`Mdl`—Binary kernel classification model
`ClassificationKernel`model object

`X`—Predictor data to be classified
numeric matrix|table

`Label`— Predicted class labels
categorical array | character array | logical matrix | numeric matrix | cell array of character vectors

`Score`— Classification scores
numeric array

Tall Arrays
Calculate with arrays that have more rows than fit in memory.