预测使用说criminant Analysis Models

predictuses three quantities to classify observations:posterior probability,prior probability, andcost。

predictclassifies so as to minimize the expected classification cost:

$\hat{y} = \underset{y = 1, 。.., K}{\arg \min} \sum_{k = 1}^{K} \hat{P} (k | x) C (y | k),$

where

$\hat{y}$ is the predicted classification.
Kis the number of classes.
$\hat{P} (k | x)$ is the posterior probability of classkfor observationx。
$C (y | k)$ is the cost of classifying an observation asywhen its true class isk。

The space ofXvalues divides into regions where a classificationYis a particular value. The regions are separated by straight lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or parabolas) for quadratic discriminant analysis. For a visualization of these regions, seeCreate and Visualize Discriminant Analysis Classifier。

Posterior Probability

The posterior probability that a pointxbelongs to classkis the product of theprior probabilityand the multivariate normal density. The density function of the multivariate normal with 1-by-dmeanμ_kandd-by-dcovariance Σ_kat a 1-by-dpointxis

$P (x | k) = \frac{1}{{({(2 π)}^{d} | Σ_{k} |)}^{1 / 2}} \exp (- \frac{1}{2} (x - μ_{k}) Σ_{k}^{- 1} {(x - μ_{k})}^{T}),$

where $| Σ_{k} |$ is the determinant of Σ_k, and $Σ_{k}^{- 1}$ is the inverse matrix.

LetP(k) represent the prior probability of classk。Then the posterior probability that an observationxis of classkis

$\hat{P} (k | x) = \frac{P (x | k) P (k)}{P (x)},$

whereP(x) is a normalization constant, namely, the sum overkofP(x|k)P(k).

Prior Probability

The prior probability is one of three choices:

'uniform'— The prior probability of classkis 1 over the total number of classes.
'empirical'— The prior probability of classkis the number of training samples of classkdivided by the total number of training samples.
A numeric vector — The prior probability of classkis thejth element of thePriorvector. Seefitcdiscr。

After creating a classifierobj, you can set the prior using dot notation:

obj.Prior = v;

wherevis a vector of positive elements representing the frequency with which each element occurs. You do not need to retrain the classifier when you set a new prior.

Cost

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

True Misclassification Cost per Class

Cost(i,j)is the cost of classifying an observation into classjif its true class isi。By default,Cost(i,j)=1ifi~=j, andCost(i,j)=0ifi=j。换句话说,是成本0for correct classification, and1for incorrect classification.

You can set any cost matrix you like when creating a classifier. Pass the cost matrix in theCostname-value pair infitcdiscr。

After you create a classifierobj, you can set a custom cost using dot notation:

obj.Cost = B;

Bis a square matrix of sizeK-by-Kwhen there areKclasses. You do not need to retrain the classifier when you set a new cost.

Expected Misclassification Cost per Observation

Suppose you haveNobsobservations that you want to classify with a trained discriminant analysis classifierobj。Suppose you haveKclasses. You place the observations into a matrixXnewwith one observation per row. The command

[label,score,cost] = predict(obj,Xnew)

回报,等欧tputs, a cost matrix of sizeNobs-by-K。Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of theKclasses.cost(n,k)is

$\sum_{i = 1}^{K} \hat{P} (i | X n e w (n)) C (k | i),$

where

Kis the number of classes.
$\hat{P} (i | X n e w (n))$ is theposterior probabilityof classifor observationXnew(n).
$C (k | i)$ is thecostof classifying an observation askwhen its true class isi。