主要内容

Crab Classification

This example illustrates using a neural network as a classifier to identify the sex of crabs from physical dimensions of the crab.

问题:螃蟹的分类

In this example we attempt to build a classifier that can identify the sex of a crab from its physical measurements. Six physical characteristics of a crab are considered: species, frontallip, rearwidth, length, width and depth. The problem on hand is to identify the sex of a crab given the observed values for each of these 6 physical characteristics.

为什么要神经网络?

神经网络已证明自己是熟练的分类器,并且特别适合解决非线性问题。鉴于现实世界现象的非线性性质,例如螃蟹分类,神经网络无疑是解决该问题的好候选者。

六个物理特征将充当神经网络的输入,而螃蟹的性别将成为目标。鉴于输入构成了螃蟹的物理特征的六个观察值,因此,神经网络有望确定螃蟹是雄性还是雌性。

这是通过将先前记录的输入介绍给神经网络,然后对其进行调整以产生所需的目标输出来实现。这个过程称为神经网络培训。

准备数据

Data for classification problems are set up for a neural network by organizing the data into two matrices, the input matrix X and the target matrix T.

Each ith column of the input matrix will have six elements representing a crab's species, frontallip, rearwidth, length, width, and depth.

Each corresponding column of the target matrix will have two elements. Female crabs are represented with a one in the first element, male crabs with a one in the second element. (All other elements are zero).

Here the dataset is loaded.

[x,t] = crab_dataset;尺寸(x)
ans =1×26 200
尺寸(t)
ans =1×22 200年

Building the Neural Network Classifier

下一步是创建一个神经网络,该网络将学会识别螃蟹的性别。

由于神经网络从随机的初始权重开始,因此每次运行时,此示例的结果都会略有不同。将随机种子设置为避免这种随机性。但是,这对于您自己的应用程序不是必需的。

setdemorandstream(491218382)

两层(即单层)馈电神经网络可以学习隐藏层中足够的神经元的任何输入输出关系。非输出层的图层称为隐藏层。

We will try a single hidden layer of 10 neurons for this example. In general, more difficult problems require more neurons, and perhaps more layers. Simpler problems require fewer neurons.

的input and output have sizes of 0 because the network has not yet been configured to match our input and target data. This will happen when the network is trained.

net = patternnet(10); view(net)

Now the network is ready to be trained. The samples are automatically divided into training, validation and test sets. The training set is used to teach the network. Training continues as long as the network continues improving on the validation set. The test set provides a completely independent measure of network accuracy.

[net,tr] =火车(net,x,t);

图神经网络训练(26-FEB-2022 11:05:54)包含一个类型Uigridlayout的对象。

要查看网络在培训期间的性能如何改善,请单击培训工具中的“性能”按钮,或致电PlotPerform。

性能是根据平方误差来衡量的,并以日志刻度显示。随着网络的训练,它迅速下降。

为每个培训,验证和测试集显示了性能。

PlotPerform(TR)

图形性能(PlotPerform)包含一个轴对象。具有标题最佳验证性能的轴对象为0.023041时,在Epoch 21中包含6个类型线对象。这些对象代表火车,验证,测试,最好。

图神经网络训练(26-FEB-2022 11:05:54)包含一个类型Uigridlayout的对象。

Testing the Classifier

现在可以通过测试样品对训练有素的神经网络进行测试。这将使我们了解网络应用于现实世界的数据时的状况。

的network outputs will be in the range 0 to 1, so we can usevec2indfunction to get the class indices as the position of the highest element in each output vector.

testx = x(:,tr.testind);testt = t(:,tr.testind);testy = net(testx);testindices = vec2ind(testy)
testIndices =1×302 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 1 2 1 1 1 1 1 2 2 1

神经网络对数据的满足程度的一种量度是混乱图。在这里,混淆矩阵在所有样品中绘制。

的confusion matrix shows the percentages of correct and incorrect classifications. Correct classifications are the green squares on the matrices diagonal. Incorrect classifications form the red squares.

If the network has learned to classify properly, the percentages in the red squares should be very small, indicating few misclassifications.

If this is not the case then further training, or training a network with more hidden neurons, would be advisable.

plotconfusion(testT,testY)

Figure Confusion (plotconfusion) contains an axes object. The axes object with title Confusion Matrix contains 29 objects of type patch, text, line.

这是正确和错误分类的总体百分比。

[c,cm] = confusion(testT,testY)
C = 0.0333
cm =2×212 1 0 17
fprintf('百分比正确分类:%f %% \ n',100*(1-C));
Percentage Correct Classification : 96.666667%
fprintf('百分比不正确分类:%f %% \ n',100*c);
Percentage Incorrect Classification : 3.333333%

Another measure of how well the neural network has fit data is the receiver operating characteristic plot. This shows how the false positive and true positive rates relate as the thresholding of outputs is varied from 0 to 1.

的farther left and up the line is, the fewer false positives need to be accepted in order to get a high true positive rate. The best classifiers will have a line going from the bottom left corner, to the top left corner, to the top right corner, or close to that.

PlotRoc(testt,testy)

图接收器操作特性(PlotRoc)包含一个轴对象。带有标题ROC的轴对象包含4个类型行的对象。这些对象代表1类,第2类。

This example illustrated using a neural network to classify crabs.

Explore other examples and the documentation for more insight into neural networks and their applications.