crossval
Cross-validate machine learning model
Description
sets an additional cross-validation option. You can specify only one name-value argument. For example, you can specify the number of folds or a holdout sample proportion.CVMdl
= crossval(Mdl
,Name,Value
)
Examples
Cross-Validate SVM Classifier
Load theionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
loadionosphererng(1);% For reproducibility
Train a support vector machine (SVM) classifier. Standardize the predictor data and specify the order of the classes.
SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});
SVMModel
is a trainedClassificationSVM
classifier.'b'
is the negative class and'g'
is the positive class.
Cross-validate the classifier using 10-fold cross-validation.
CVSVMModel = crossval(SVMModel)
CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVSVMModel
is aClassificationPartitionedModel
cross-validated classifier. During cross-validation, the software completes these steps:
Randomly partition the data into 10 sets of equal size.
Train an SVM classifier on nine of the sets.
Repeat steps 1 and 2k= 10 times. The software leaves out one partition each time and trains on the other nine partitions.
Combine generalization statistics for each fold.
Display the first model inCVSVMModel.Trained
.
FirstModel = CVSVMModel.Trained{1}
FirstModel = CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 ... ] Sigma: [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 ... ] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods
FirstModel
is the first of the 10 trained classifiers. It is aCompactClassificationSVM
classifier.
You can estimate the generalization error by passingCVSVMModel
tokfoldLoss
.
Specify Holdout Sample Proportion for Naive Bayes Cross-Validation
Specify a holdout sample proportion for cross-validation. By default,crossval
uses 10-fold cross-validation to cross-validate a naive Bayes classifier. However, you have several other options for cross-validation. For example, you can specify a different number of folds or a holdout sample proportion.
Load theionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
loadionosphere
Remove the first two predictors for stability.
X = X(:,3:end); rng('default');% For reproducibility
Train a naive Bayes classifier using the predictorsX
and class labelsY
. A recommended practice is to specify the class names.'b'
is the negative class and'g'
is the positive class.fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});
Mdl
is a trainedClassificationNaiveBayes
classifier.
Cross-validate the classifier by specifying a 30% holdout sample.
CVMdl = crossval(Mdl,'Holdout', 0.3)
CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVMdl
is aClassificationPartitionedModel
cross-validated, naive Bayes classifier.
Display the properties of the classifier trained using 70% of the data.
TrainedModel = CVMdl.Trained{1}
TrainedModel = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods
TrainedModel
is aCompactClassificationNaiveBayes
classifier.
Estimate the generalization error by passingCVMdl
tokfoldloss
.
kfoldLoss(CVMdl)
ans = 0.2095
The out-of-sample misclassification error is approximately 21%.
Reduce the generalization error by choosing the five most important predictors.
idx = fscmrmr(X,Y); Xnew = X(:,idx(1:5));
Train a naive Bayes classifier for the new predictor.
Mdlnew = fitcnb(Xnew,Y,'ClassNames',{'b','g'});
Cross-validate the new classifier by specifying a 30% holdout sample, and estimate the generalization error.
CVMdlnew = crossval(Mdlnew,'Holdout', 0.3); kfoldLoss(CVMdlnew)
ans = 0.1429
The out-of-sample misclassification error is reduced from approximately 21% to approximately 14%.
Create Cross-Validated Regression GAM Usingcrossval
Train a regression generalized additive model (GAM) by usingfitrgam
, and create a cross-validated GAM by usingcrossval
and the holdout option. Then, usekfoldPredict
预测反应为validation-fold observations using a model trained on training-fold observations.
Load thepatients
data set.
loadpatients
Create a table that contains the predictor variables (Age
,Diastolic
,Smoker
,Weight
,Gender
,SelfAssessedHealthStatus
) and the response variable (Systolic
).
tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);
Train a GAM that contains linear terms for predictors.
Mdl = fitrgam(tbl,'Systolic');
Mdl
is aRegressionGAM
model object.
Cross-validate the model by specifying a 30% holdout sample.
rng('default')% For reproducibilityCVMdl = crossval(Mdl,'Holdout', 0.3)
CVMdl = RegressionPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {1x6 cell} CategoricalPredictors: [3 5 6] ResponseName: 'Systolic' NumObservations: 100 KFold: 1 Partition: [1x1 cvpartition] NumTrainedPerFold: [1x1 struct] ResponseTransform: 'none' IsStandardDeviationFit: 0 Properties, Methods
Thecrossval
function creates aRegressionPartitionedGAM
model objectCVMdl
with the holdout option. During cross-validation, the software completes these steps:
Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.
Store the compact, trained model in the
Trained
property of the cross-validated model objectRegressionPartitionedGAM
.
You can choose a different cross-validation setting by using the'CrossVal'
,'CVPartition'
,'KFold'
, or'Leaveout'
name-value argument.
Predict responses for the validation-fold observations by usingkfoldPredict
. The function predicts responses for the validation-fold observations by using the model trained on the training-fold observations. The function assignsNaN
to the training-fold observations.
yFit = kfoldPredict(CVMdl);
Find the validation-fold observation indexes, and create a table containing the observation index, observed response values, and predicted response values. Display the first eight rows of the table.
idx = find(~isnan(yFit)); t = table(idx,tbl.Systolic(idx),yFit(idx),...'VariableNames',{'Obseraction Index','Observed Value','Predicted Value'}); head(t)
ans=8×3 tableObseraction Index Observed Value Predicted Value _________________ ______________ _______________ 1 124 130.22 6 121 124.38 7 130 125.26 12 115 117.05 20 125 121.82 22 123 116.99 23 114 107 24 128 122.52
Compute the regression error (mean squared error) for the validation-fold observations.
L = kfoldLoss(CVMdl)
L = 43.8715
Input Arguments
Mdl
—Machine learning model
full regression model object|full classification model object
Machine learning model, specified as a full regression or classification model object, as given in the following tables of supported models.
Regression Model Object
Model | Full Regression Model Object |
---|---|
Gaussian process regression (GPR) model | RegressionGP (If you supply a custom'ActiveSet' in the call tofitrgp , then you cannot cross-validate the GPR model.) |
Generalized additive model (GAM) | RegressionGAM |
Neural network model | RegressionNeuralNetwork |
Classification Model Object
Model | Full Classification Model Object |
---|---|
Generalized additive model | ClassificationGAM |
k-nearest neighbor model | ClassificationKNN |
Naive Bayes model | ClassificationNaiveBayes |
Neural network model | ClassificationNeuralNetwork |
Support vector machine for one-class and binary classification | ClassificationSVM |
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, whereName
is the argument name andValue
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and encloseName
in quotes.
Example:crossval(Mdl,'KFold',3)
specifies using three folds in a cross-validated model.
CVPartition
—Cross-validation partition
[]
(default) |cvpartition
partition object
Cross-validation partition, specified as acvpartition
partition object created bycvpartition
. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:Suppose you create a random partition for 5-fold cross-validation on 500 observations by usingcvp = cvpartition(500,'KFold',5)
. Then, you can specify the cross-validated model by using'CVPartition',cvp
.
Holdout
—Fraction of data for holdout validation
scalar value in the range (0,1)
Fraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify'Holdout',p
, then the software completes these steps:
Randomly select and reserve
p*100
% of the data as validation data, and train the model using the rest of the data.Store the compact, trained model in the
Trained
property of the cross-validated model. IfMdl
does not have a corresponding compact object, thenTrained
contains a full object.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:'Holdout',0.1
Data Types:double
|single
KFold
—Number of folds
10
(default) |positive integer value greater than 1
Number of folds to use in a cross-validated model, specified as a positive integer value greater than 1. If you specify'KFold',k
, then the software completes these steps:
Randomly partition the data into
k
sets.For each set, reserve the set as validation data, and train the model using the other
k
– 1sets.Store the
k
compact, trained models in ak
-by-1 cell vector in theTrained
property of the cross-validated model. IfMdl
does not have a corresponding compact object, thenTrained
contains a full object.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:'KFold',5
Data Types:single
|double
Leaveout
—Leave-one-out cross-validation flag
'off'
(default) |'on'
Leave-one-out cross-validation flag, specified as'on'
or'off'
. If you specify'Leaveout','on'
, then for each of thenobservations (wherenis the number of observations, excluding missing observations, specified in theNumObservations
property of the model), the software completes these steps:
Reserve the one observation as validation data, and train the model using the othern– 1 observations.
Store thencompact, trained models in ann-by-1 cell vector in the
Trained
property of the cross-validated model. IfMdl
does not have a corresponding compact object, thenTrained
contains a full object.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:'Leaveout','on'
Output Arguments
CVMdl
— Cross-validated machine learning model
cross-validated (partitioned) model object
Cross-validated machine learning model, returned as one of the cross-validated (partitioned) model objects in the following tables, depending on the input modelMdl
.
Regression Model Object
Model | Regression Model (Mdl ) |
Cross-Validated Model (CVMdl ) |
---|---|---|
Gaussian process regression model | RegressionGP |
RegressionPartitionedModel |
Generalized additive model | RegressionGAM |
RegressionPartitionedGAM |
Neural network model | RegressionNeuralNetwork |
RegressionPartitionedModel |
Classification Model Object
Model | Classification Model (Mdl ) |
Cross-Validated Model (CVMdl ) |
---|---|---|
Generalized additive model | ClassificationGAM |
ClassificationPartitionedGAM |
k-nearest neighbor model | ClassificationKNN |
ClassificationPartitionedModel |
Naive Bayes model | ClassificationNaiveBayes |
ClassificationPartitionedModel |
Neural network model | ClassificationNeuralNetwork |
ClassificationPartitionedModel |
Support vector machine for one-class and binary classification | ClassificationSVM |
ClassificationPartitionedModel |
Tips
Assess the predictive performance of
Mdl
on cross-validated data by using thekfoldfunctions and properties ofCVMdl
, such askfoldPredict
,kfoldLoss
,kfoldMargin
, andkfoldEdge
for classification andkfoldPredict
andkfoldLoss
for regression.Return a partitioned classifier with stratified partitioning by using the name-value argument
'KFold'
or'Holdout'
.Create a
cvpartition
objectcvp
usingcvp =
cvpartition
(n,'KFold',k)
. Return a partitioned classifier with nonstratified partitioning by using the name-value argument'CVPartition',cvp
.
Alternative Functionality
Instead of training a model and then cross-validating it, you can create a cross-validated model directly by using a fitting function and specifying one of these name-value argument:'CrossVal'
,'CVPartition'
,'Holdout'
,'Leaveout'
, or'KFold'
.
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
This function fully supports GPU arrays for a trained classification model specified as a
ClassificationKNN
orClassificationSVM
object.
For more information, seeRun MATLAB Functions on a GPU(Parallel Computing Toolbox).
Version History
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
运行该命令通过输入MATLAB逗号nd Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
你也可以选择一个web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)