Main Content

predict

Class:TreeBagger

Predict responses using ensemble of bagged decision trees

Syntax

Yfit = predict(B,X)
Yfit = predict(B,X,Name,Value)
[Yfit,stdevs] = predict(___)
[Yfit,scores] = predict(___)
[Yfit,scores,stdevs] = predict(___)

Description

Yfit = predict(B,X)returns a vector of predicted responses for the predictor data in the table or matrixX, based on the ensemble of bagged decision treesB.Yfitis a cell array of character vectors for classification and a numeric array for regression. By default,predicttakes a democratic (nonweighted) average vote from all trees in the ensemble.

Bis a trainedTreeBaggermodel object, that is, a model returned byTreeBagger.

Xis a table or matrix of predictor data used to generate responses. Rows represent observations and columns represent variables.

  • IfXis a numeric matrix:

    • The variables making up the columns ofX必须有相同的订单预测变量s that trainedB.

    • If you trainedBusing a table (for example,Tbl), thenXcan be a numeric matrix ifTblcontains all numeric predictor variables. To treat numeric predictors inTblas categorical during training, identify categorical predictors using theCategoricalPredictorsname-value pair argument ofTreeBagger. IfTblcontains heterogeneous predictor variables (for example, numeric and categorical data types) andXis a numeric matrix, thenpredictthrows an error.

  • IfXis a table:

    • predictdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.

    • If you trainedBusing a table (for example,Tbl), then all predictor variables inXmust have the same variable names and be of the same data types as those that trainedB(stored inB.PredictorNames). However, the column order ofXdoes not need to correspond to the column order ofTbl.TblandXcan contain additional variables (response variables, observation weights, etc.), butpredictignores them.

    • If you trainedBusing a numeric matrix, then the predictor names inB.PredictorNamesand corresponding predictor variable names inXmust be the same. To specify predictor names during training, see thePredictorNamesname-value pair argument ofTreeBagger. All predictor variables inXmust be numeric vectors.Xcan contain additional variables (response variables, observation weights, etc.), butpredictignores them.

Yfit = predict(B,X,Name,Value)specifies additional options using one or more name-value pair arguments:

  • 'Trees'— Array of tree indices to use for computation of responses. The default is'all'.

  • 'TreeWeights'— Array ofNTreesweights for weighting votes from the specified trees, whereNTreesis the number of trees in the ensemble.

  • 'UseInstanceForTree'— Logical matrix of sizeNobs-by-NTreesindicating which trees to use to make predictions for each observation, whereNobsis the number of observations. By default all trees are used for all observations.

For regression,[Yfit,stdevs] = predict(___)also returns standard deviations of the computed responses over the ensemble of the grown trees using any of the input argument combinations in previous syntaxes.

For classification,[Yfit,scores] = predict(___)also returns scores for all classes.scoresis a matrix with one row per observation and one column per class. For each observation and each class, the score generated by each tree is the probability of the observation originating from the class, computed as the fraction of observations of the class in a tree leaf.predictaverages these scores over all trees in the ensemble.

[Yfit,scores,stdevs] = predict(___)also returns standard deviations of the computed scores for classification.stdevsis a matrix with one row per observation and one column per class, with standard deviations taken over the ensemble of the grown trees.

Algorithms

  • For regression problems, the predicted response for an observation is the weighted average of the predictions using selected trees only. That is,

    y ^ = 1 t = 1 T α t I ( t S ) t = 1 T α t y ^ t I ( t S ) .

    • y ^ t is the prediction from treetin the ensemble.

    • Sis the set of indices of selected trees that comprise the prediction (see'Trees'and'UseInstanceForTree'). I ( t S ) is 1 iftis in the setS, and 0 otherwise.

    • αtis the weight of treet(see'TreeWeights').

  • For classification problems, the predicted class for an observation is the class that yields the largest weighted average of the class posterior probabilities (i.e., classification scores) computed using selected trees only. That is,

    1. For each classcCand each treet= 1,...,T,predictcomputes P ^ t ( c | x ) , which is the estimated posterior probability of classcgiven observationxusing treet.Cis the set of all distinct classes in the training data. For more details on classification tree posterior probabilities, seefitctreeandpredict.

    2. predictcomputes the weighted average of the class posterior probabilities over the selected trees.

      P ^ ( c | x ) = 1 t = 1 T α t I ( t S ) t = 1 T α t P ^ t ( c | x ) I ( t S ) .

    3. The predicted class is the class that yields the largest weighted average.

    y ^ = arg max c C { P ^ ( c | x ) } .