使用深度学习的语义细分

此示例使用：

打开实时脚本

此示例显示了如何使用深度学习来训练语义细分网络。

A semantic segmentation network classifies every pixel in an image, resulting in an image that is segmented by class. Applications for semantic segmentation include road segmentation for autonomous driving and cancer cell segmentation for medical diagnosis. To learn more, see使用深度学习开始使用语义细分(Computer Vision Toolbox)。

为了说明训练程序，此示例训练DeepLab V3+ [1]，这是一种用于语义图像分割的卷积神经网络（CNN）。其他类型用于语义分割的网络包括完全卷积网络（FCN），SEGNET和U-NET。此处显示的培训程序也可以应用于这些网络。

This example uses theCAMVID数据集[2]来自剑桥大学进行培训。该数据集是包含驾驶时获得的街道景观的图像集合。该数据集为32个语义类提供了像素级标签，包括汽车，行人和道路。

设置

This example creates the Deeplab v3+ network with weights initialized from a pre-trained Resnet-18 network. ResNet-18 is an efficient network that is well suited for applications with limited processing resources. Other pretrained networks such as MobileNet v2 or ResNet-50 can also be used depending on application requirements. For more details, see预处理的深神经网络。

To get a pretrained Resnet-18, installresnet18。After installation is complete, run the following code to verify that the installation is correct.

resnet18（）;

此外，下载验证版本的DeepLab V3+。验证的模型使您可以在不等待培训完成的情况下运行整个示例。

预处理='https://ssd.mathworks.com/万博1manbetxsupportfiles/vision/data/deeplabv3plusresnet18camvid.zip';预处理福尔德= fullfile（tempdir，“预告片”）；预处理networkzip = fullfile（预处理固定器，'deeplabv3plusResnet18CamVid.zip'）；如果〜存在（预处理networkzip，'文件') mkdir(pretrainedFolder); disp(“下载预审慎的网络（58 MB）...”）；websave(pretrainedNetworkZip,pretrainedURL);结尾UNEPRAIP（预处理的固定材料，预处理的厚底人）

强烈建议使用具有CUDA能力的NVIDIA™GPU来运行此示例。使用GPU需要并行计算工具箱™。有关支持的计算功能的信息，请参见万博1manbetx释放的G万博1manbetxPU支持（并行计算工具箱）。

Download CamVid Dataset

从以下URL下载CAMVID数据集。

imageURL ='http://web4.cs.ucl.ac.uk/staff/g.brostow/motionsegrecdata/files/701_stillsraw_full.zip';labelURL ='http://web4.cs.ucl.ac.uk/staff/g.brostow/MotionSegRecData/data/LabeledApproved_full.zip';outputFolder = fullfile（tempdir，'camvid'）；labelszip = fullfile（outputFolder，'labels.zip'）；imageszip = fullfile（outputFolder，'images.zip'）；如果〜存在（labelszip，'文件'）||〜存在（imageszip，'文件'）mkdir（outputfolter）disp（“下载16 MB CAMVID数据集标签...”）；Websave（Labelszip，labelurl）;unzip（labelszip，fullfile（outputFolder，，'标签'）;disp（'Downloading 557 MB CamVid dataset images...'）；websave（imageszip，imageurl）;unzip（imageszip，fullfile（outputFolder，，'images'）;结尾

笔记: Download time of the data depends on your Internet connection. The commands used above block MATLAB until the download is complete. Alternatively, you can use your web browser to first download the dataset to your local disk. To use the file you downloaded from the web, change the导出目录以上可变到下载文件的位置。

负载凸轮的图像

利用imageDatastore加载凸轮图像。这imageDatastore使您能够在磁盘上有效地加载大量图像。

imgdir = fullfile（outputFolder，'images','701_stillsraw_full'）；imds = imageDatastore(imgDir);

显示其中一个图像。

I = readimage(imds,559); I = histeq(I); imshow(I)

负载凸轮卡维德像素标记的图像

利用pixelLabelDatastore(Computer Vision Toolbox)to load CamVid pixel label image data. ApixelLabelDatastoreencapsulates the pixel label data and the label ID to a class name mapping.

我们使培训更容易，我们将Camvid的32个原始课程分组为11堂课。指定这些类。

classes = ["Sky""Building"“极”“路”"Pavement""Tree"“标志符号”“栅栏”"Car"“行人”“骑自行车的人”];

为了将32个类减少到11个类中，将原始数据集中的多个类分组在一起。例如，“汽车”是“汽车”，“ suvpickuptruck”，“ Truck_bus”，“ Train”和“其他涌向”的组合。使用支持功能返回分组的标签ID万博1manbetxcamvidpixellabelids，在此示例的末尾列出。

labelids = camvidpixellabelids（）;

使用类和标签ID创建pixelLabelDatastore.

labeldir = fullfile（outputFolder，'标签'）；pxds = pixellabeldatastore（labeldir，class，labelids）;

通过将其叠加在图像顶部来读取并显示一个像素标记的图像。

C = readimage(pxds,559); cmap = camvidColorMap; B = labeloverlay(I,C,“ColorMap”，cmap）;Imshow（b）Pixellabelcolorbar（cmap，class）;

没有颜色覆盖的区域没有像素标签，在训练过程中不使用。

分析数据集统计信息

要查看Camvid数据集中的类标签的分布，请使用CounteachLabel(Computer Vision Toolbox)。此函数计算按类标签的像素数。

tbl = counteachLabel（PXDS）

tbl=11×3桌名称PixelCount ImagePixelCount _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________。1.4054E+08 4.8453E+08 {'PAVEMENT'} 3.3614E+07 4.7209E+08 {'TROED'} 5.4259E+07 4.479E+08 {'SIGNSSYMBOL'} 5.2242E'} 6.9211e+06 2.516e+08 {'car'} 2.4437e+07 4.8315e+08 {'patestestrian'} 3.4029e+06 4.4444e+08 {'bicyclist'}

Visualize the pixel counts by class.

frequency = tbl.PixelCount/sum(tbl.PixelCount); bar(1:numel(classes),frequency) xticks(1:numel(classes)) xticklabels(tbl.Name) xtickangle(45) ylabel('频率')

理想情况下，所有类都将具有相等数量的观察结果。但是，Camvid中的课程不平衡，这是街头场景的汽车数据集中的一个常见问题。与行人和骑自行车的像素相比，这样的场景具有更多的天空，建筑物和公路像素，因为天空，建筑物和道路覆盖了图像中的更多区域。如果无法正确处理，这种不平衡可能会损害学习过程，因为学习是有利于主导阶层的偏见。在此示例的稍后，您将使用类加权来处理此问题。

这images in the CamVid data set are 720 by 960 in size. Image size is chosen such that a large enough batch of images can fit in memory during training on an NVIDIA™ Titan X with 12 GB of memory. You may need to resize the images to smaller sizes if your GPU does not have sufficient memory or reduce the training batch size.

准备培训，验证和测试集

Deeplab v3+ is trained using 60% of the images from the dataset. The rest of the images are split evenly in 20% and 20% for validation and testing respectively. The following code randomly splits the image and pixel label data into a training, validation and test set.

[imdstrain，imdsval，imdstest，pxdStrain，pxdsval，pxdstest] = PartitionCamvidData（IMDS，PXDS）;

这60/20/20 split results in the following number of training, validation and test images:

numTrainingImages = numel（imdstrain.files）

数字图= 421

numValimages = numel（imdsval.files）

numValimages = 140

numTestingImages = numel（imdstest.files）

numTestingImages = 140

创建网络

使用deeplabv3pluslayers功能以创建基于RESNET-18的DeepLab V3+网络。选择最佳应用程序的网络需要经验分析，并且是另一个级别的超参数调整。例如，您可以尝试使用Resnet-50或Mobilenet V2等不同基本网络，也可以尝试其他语义分割网络体系结构，例如侦查，完全卷积网络（FCN）或U-NET。

% Specify the network image size. This is typically the same as the traing image sizes.图像= [720 960 3];% Specify the number of classes.numClasses = numel(classes);％创建DeepLab V3+。lgraph = deeplabv3pluslayers（图像大小，数字，“ resnet18”）；

使用课堂加权平衡课程

如前所述，Camvid中的类不平衡。为了改善培训，您可以使用课堂加权来平衡课程。使用先前计算的像素标签计数CounteachLabel(Computer Vision Toolbox)并计算中位频率等级。

imagefreq = tbl.pixelcount ./ tbl.imagePixelCount;classWeights =中值（ImageFreq）./ ImageFreq

classWeights =11×10.3182 0.2082 5.0924 0.1744 0.7103 0.4175 4.5371 1.8386 1.0000 6.6059⋮

使用PixelClassificationLayer(Computer Vision Toolbox)。

pxlayer = PixelClassificationLayer（'Name','标签',“课”，tbl.name，'classWights'，班级）;lgraph =替代者（lgraph，“分类”，pxlayer）;

选择培训选项

这optimization algorithm used for training is stochastic gradient descent with momentum (SGDM). Use训练指定用于SGDM的超参数。

％定义验证数据。dsval = combine（iMDSVAL，pxDSVAL）;% Define training options.选项=训练（'sgdm',...'LearnRateSchedule',“分段”,...'LearnRatedRopperiod'，10，...'LearnRatedRopFactor'，0.3，...'势头'，0.9，...“初始删除”，1E-3，...“ l2reginalization”，0.005，...'验证data'，dsval，...“ maxepochs”，30，...“ MINIBATCHSIZE”,8,...“洗牌”,“每个段”,...“检查点路”，tempdir，...'VerboseFrequency',2,...'Plots',“训练过程”,...“验证水平”, 4);

这learning rate uses a piecewise schedule. The learning rate is reduced by a factor of 0.3 every 10 epochs. This allows the network to learn quickly with a higher initial learning rate, while being able to find a solution close to the local optimum once the learning rate drops.

通过设置每个时期的验证数据，对网络进行测试'验证data'范围。这“验证水平”is set to 4 to stop training early when the validation accuracy converges. This prevents the network from overfitting on the training dataset.

小批量的大小为8，用于减少训练时的记忆使用情况。您可以根据系统上的GPU内存量增加或降低此值。

此外，“检查点路”设置为临时位置。此名称值对可以在每个培训时期的结尾节省网络检查点。如果由于系统故障或停电而中断培训，则可以从保存的检查站恢复培训。确保由“检查点路”has enough space to store the network checkpoints. For example, saving 100 Deeplab v3+ checkpoints requires ~6 GB of disk space because each checkpoint is 61 MB.

Data Augmentation

Data augmentation is used to improve network accuracy by randomly transforming the original data during training. By using data augmentation, you can add more variety to the training data without increasing the number of labeled training samples. To apply the same random transformation to both image and pixel label data use datastore结合和转换。首先，结合Imdsrain和PXDSTRAIN。

dstrain = combine（imdstrain，pxDStrain）;

Next, use datastore转换to apply the desired data augmentation defined in the supporting function增强型。在这里，+/- 10像素的随机左/右反射和随机X/y翻译用于数据增强。

xTrans = [-10 10]; yTrans = [-10 10]; dsTrain = transform(dsTrain, @(data)augmentImageAndLabel(data,xTrans,yTrans));

请注意，数据增强不应用于测试和验证数据。理想情况下，测试和验证数据应代表原始数据，并且未修改以进行无偏评估。

开始训练

开始使用培训火车网如果是dotrainingflag is true. Otherwise, load a pretrained network.

注意：在NVIDIA™TITAN X上验证了训练，并具有12 GB的GPU内存。如果您的GPU的内存较少，则在训练过程中可能会用尽内存。如果发生这种情况，请尝试设置“ MINIBATCHSIZE”到1英寸训练, or reducing the network input and resizing the training data. Training this network takes about 70 minutes. Depending on your GPU hardware, it may take longer.

dotraining = false;如果dotraining [net，info] = trainnetwork（dstrain，lgraph，options）;别的预处理network = fullfile（预先填充者，'deeplabv3plusnet18camvid.mat'）；data = load(pretrainedNetwork); net = data.net;结尾

Test Network on One Image

As a quick sanity check, run the trained network on one test image.

i =读取（IMDstest，35）;c = semanticseg（i，net）;

Display the results.

b = labeloverlay（i，c，'colormap',cmap,'透明度'，0.4）;Imshow（b）Pixellabelcolorbar（cmap，class）;

比较结果Cwith the expected ground truth stored inpxdstest。这green and magenta regions highlight areas where the segmentation results differ from the expected ground truth.

ExpectResult =读取（pxdstest，35）;实际= uint8（c）;预期= uint8（预期恢复）;imshowpair（实际，预期）

从视觉上讲，语义细分结果在公路，天空和建筑物等课程中良好重叠。但是，诸如行人和汽车之类的较小物体并不那么准确。每类重叠的量可以使用交叉点（IOU）度量（也称为Jaccard索引）来测量。使用雅卡德(图片Processing Toolbox)衡量iou的功能。

iou = jaccard(C,expectedResult); table(classes,iou)

ans =11×2 table课程______________________________________________________________________________________________________________________________________。

这IoU metric confirms the visual results. Road, sky, and building classes have high IoU scores, while classes such as pedestrian and car have low scores. Other common segmentation metrics include thedice(图片Processing Toolbox)和bfscore(图片Processing Toolbox)轮廓匹配分数。

Evaluate Trained Network

要测量多个测试图像的准确性，请运行semanticseg(Computer Vision Toolbox)on the entire test set. A mini-batch size of 4 is used to reduce memory usage while segmenting images. You can increase or decrease this value based on the amount of GPU memory you have on your system.

pxdsResults = semanticseg(imdsTest,net,...“ MINIBATCHSIZE”,4,...'WriteLocation',tempdir,...“冗长”，错误的）;

semanticseg将测试集的结果返回为pixelLabelDatastore目的。每个测试图像的实际像素标签数据imdstest在指定的位置写入磁盘the'WriteLocation'范围。利用evaluateSemanticSegmentation(Computer Vision Toolbox)测量测试集结果上的语义分割指标。

指标=评估emanticalscentation（pxdsresults，pxdstest，“冗长”，错误的）;

evaluateSemanticSegmentationreturns various metrics for the entire dataset, for individual classes, and for each test image. To see the dataset level metrics, inspect指标。

指标

ans =1×5桌GlobalAccuracy MeanAccuracy MeanIoU WeightedIoU MeanBFScore ______________ ____________ _______ ___________ ___________ 0.87695 0.85392 0.6302 0.80851 0.65051

数据集指标提供了网络性能的高级概述。要查看每个班级对整体表现的影响，请使用指标。

指标

ans =11×3桌Accuracy IoU MeanBFScore ________ _______ ___________ Sky 0.93112 0.90209 0.8952 Building 0.78453 0.76098 0.58511 Pole 0.71586 0.21477 0.5144 Road 0.93024 0.91465 0.76696 Pavement 0.88466 0.70571 0.70919 Tree 0.87377 0.76323 0.70875 SignSymbol 0.79358 0.39309 0.48302 Fence 0.81506 0.46484 0.48565 Car 0.90956 0.76799 0.69233 Pedestrian 0.87629 0.4366 0.60792 Bicyclist 0.87844 0.60829 0.55089

Although the overall dataset performance is quite high, the class metrics show that underrepresented classes such as行人,Bicyclist，和Car没有细分Road,天空，和Building。包括更多代表性类别样本的其他数据可能有助于改善结果。

万博1manbetx支持功能

功能labelids = camvidpixellabelids（）％返回对应于每个类的标签ID。%％camvid数据集有32个类。将它们分为11个课程％最初的侦察训练方法[1]。%％的11个类是：％“天空”"Building", "Pole", "Road", "Pavement", "Tree", "SignSymbol",％ “栅栏”, "Car", "Pedestrian", and "Bicyclist".%％Camvid像素标签ID作为RGB颜色值提供。将它们分组为% 11 classes and return them as a cell array of M-by-3 matrices. The与每个RGB值一起列出了原始Camvid类名称。笔记% that the Other/Void class are excluded below.labelids = {...％“天空”[ 128 128 128;...％“天空”]％ “建造”[ 000 128 064;...％ “桥”128 000 000;...％ “建造”064 192 000;...% "Wall"064 000 064;...%的“隧道”192 000 128;...% "Archway"]％“杆”[192 192 128;...％“ column_pole”000 000 064;...％ “交通拥挤”]%的道路[ 128 064 128;...％ “路”128 000 192;...％“ lanemkgsdriv”192 000 064;...% "LaneMkgsNonDriv"]% "Pavement"[ 000 000 192;...％“人行道”064 192 128;...％“停车场”128 128 192;...% "RoadShoulder"]％ “树”[128 128 000;...％ “树”192 192 000;...％“植被米斯”]％“符号”[ 192 128 128;...％“符号”128 128 064;...% "Misc_Text"000 064 064;...％ “红绿灯”]％ “栅栏”[ 064 064 128;...％ “栅栏”]% "Car"[064 000 128;...% "Car"064 128 192;...% "SUVPickupTruck"192 128 192;...％“ truck_bus”192 064 128;...％ “火车”128 064 064;...％“换档”]％ “行人”[064 064 000;...％ “行人”192 128 064;...% "Child"064 000 192;...％“ CartluggagePram”064 128 064;...% "Animal"]% "Bicyclist"[ 000 128 192;...% "Bicyclist"192 000 192;...％“摩托车驾驶室”]};结尾

功能pixelLabelColorbar(cmap, classNames)% Add a colorbar to the current axis. The colorbar is formatted％以颜色显示类名称。Colormap（GCA，CMAP）％将配色栏添加到当前数字中。c = colorbar('peer'，GCA）;％使用班级名称作为刻度标记。c.TickLabels = classNames; numClasses = size(cmap,1);％中心滴答标签。c.ticks = 1/（numClasses*2）：1/numClasses：1;％删除tick标记。c.ticklength = 0;结尾

功能cmap = camvidcolormap（）% Define the colormap used by CamVid dataset.cmap = [128 128 128% Sky128 0 0％ 建造192 192 192% Pole128 64 128%的道路60 40 222% Pavement128 128 0% Tree192 128 128% SignSymbol64 64 128% Fence64 0 128％ 车64 64 0％ 行人0 128 192% Bicyclist];％在[0 1]之间标准化。cmap = cmap ./ 255;结尾

功能[imdstrain，imdsval，imdstest，pxdStrain，pxdsval，pxdstest] = PartitionCamvidData（IMDS，PXDS）％通过随机选择60％的培训数据来分区数据。这％休息用于测试。％设置初始随机状态，例如可重复性。RNG（0）;numfiles = numel（imds.files）;ShuffledIndices = randperm（numfiles）;% Use 60% of the images for training.numTrain = round（0.60 * numfiles）;trainingIdx =洗牌（1：numTrain）;% Use 20% of the images for validationnumVal = round（0.20 * numfiles）;vallx = shuffledIndices（numTrain+1：numTrain+numVal）;% Use the rest for testing.testIdx = shuffledIndices（numTrain+NumVal+1：end）;% Create image datastores for training and test.trainingImages = imds.Files(trainingIdx); valImages = imds.Files(valIdx); testImages = imds.Files(testIdx); imdsTrain = imageDatastore(trainingImages); imdsVal = imageDatastore(valImages); imdsTest = imageDatastore(testImages);% Extract class and label IDs info.classes = pxds.ClassNames; labelIDs = camvidPixelLabelIDs();% Create pixel label datastores for training and test.训练标签= pxds.files（trieberIdx）;vallabels = pxds.files（vallx）;testLabels = pxds.files（testIdx）;PXDSTRAIN = PIXELLABELDATASTASTORE（训练标签，类，标签）；PXDSVAL = PIXELLABELDATASTASTORE（Vallabels，类，Labelids）;pxdstest = pixellabeldatastore（testlabels，class，labelids）;结尾

功能数据= augmentimageandlabel（数据，Xtrans，Ytrans）% Augment images and pixel label images using random reflection and% translation.为了i = 1：size（数据，1）tform = RandomAffine2d（...'X反射'，真的，...'xtranslation'，Xtrans，...'translation'，ytrans）;% Center the view at the center of image in the output space while％允许翻译将输出图像移出视图。dout = affineOutputView（size（data {i，1}），tform，“界限”,“中心输出”）；％使用相同的转换扭曲图像和像素标签。data{i,1} = imwarp(data{i,1}, tform,'outputview'，Rout）;data{i,2} = imwarp(data{i,2}, tform,'outputview'，Rout）;结尾结尾

References

[1] Chen，Liang-Chieh等。“用可分离的卷积进行编码器，用于语义图像分割。”ECCV（2018）。

[2] Brostow, G. J., J. Fauqueur, and R. Cipolla. "Semantic object classes in video: A high-definition ground truth database."图案识别字母。卷。30，第2期，2009年，第88-97页。