车道检测与GPU编码器优化

此示例使用：

打开脚本

此示例显示如何从深度学习网络生成CUDA®代码，该网络由A表示SeriesNetwork目的。在该示例中，串联网络是一种卷积神经网络，可以从图像中检测和输出车道标记边界。

先决条件

CUDA启用NVIDIA®GPU。
NVIDIA CUDA工具包和驱动程序。
英伟达cuDNN图书馆。
OpenCV库用于视频读取和图像显示操作。
编译器和库的环境变量。有关有关编译器和库的支持版本的信息，请参阅万博1manbetx第三方硬件(GPU编码器)。设置环境变量，请参阅设置先决条件产品s manbetx 845(GPU编码器)。

验证GPU环境

使用coder.checkGpuInstall(GPU编码器)验证运行此示例所需的编译器和库是否已正确设置。

envcfg = coder.gpuenvconfig（“主机”);envcfg.deeplibtarget =.“cudnn”;envCfg。DeepCodegen = 1;envCfg。安静= 1;coder.checkGpuInstall (envCfg);

获取售价串行网络

[laneNet, coeffMeans, coeffStds] = getLaneDetectionNetworkGPU();

该网络以一幅图像作为输入输出两个车道边界，分别对应于自我车辆的左车道和右车道。每个车道边界用抛物线方程表示: $ y = ax ^ 2 + bx + c $ ，其中y为横向偏移量，x为到车辆的纵向距离。网络在每个通道上输出三个参数a、b和c。网络架构类似于AlexNet.只不过最后几层被更小的全连接层和回归输出层所取代。

laneNet。层

ans = 23×1带有图层的图层数组:1的数据图像输入227×227×3图片2的zerocenter正常化conv1卷积96年11×11×3旋转步[4 4]和填充[0 0 0 0]3‘relu1 ReLU ReLU 4 norm1的横通道正常化横通道正常化与5频道/元素5“pool1”马克斯池3×3马克斯池步(2 - 2)和填充[0 0 0 0]6conv2卷积256 5×5×48步[1]和填充的卷积[2 2 2 2]7‘relu2 ReLU ReLU 8“norm2”横通道正常化横通道规范化5频道每个元素9“pool2”马克斯池3×3马克斯池步(2 - 2)和填充[0 0 0 0]10 conv3卷积384 3×3×256[1]和隆起与进步填充[1 1 1 1]11的relu3 ReLU ReLU 12 conv4卷积384 3×3×192旋转步[1]和填充[1 1 1 1]13的relu4 ReLU ReLU 14 conv5卷积256 3×3×192旋转步[1]和填充[1 1 1 1]15 ' relu5 ReLU ReLU 16“pool5”马克斯池3×3马克斯池步(2 - 2)和填充[0 0 0 0]17 fc6完全Connected 4096 fully connected layer 18 'relu6' ReLU ReLU 19 'drop6' Dropout 50% dropout 20 'fcLane1' Fully Connected 16 fully connected layer 21 'fcLane1Relu' ReLU ReLU 22 'fcLane2' Fully Connected 6 fully connected layer 23 'output' Regression Output mean-squared-error with 'leftLane_a', 'leftLane_b', and 4 other responses

检查主要入口点功能

类型检测_LANE.M.

功能[LANEFOUND，LTPTS，RTPTS] =检测_LANE（帧，LANECOEFFMEANS，LANECOEFFSTDS）％从网络输出，计算％图像坐标中的左右车道点。CALTECH％单声道相机模型描述了相机坐标。％＃codegen％持久对象mynet用于加载系列网络对象。％在第一个调用此函数时，构建持久对象并％设置。当函数被称为后续时间时，重复使用相同的对象以呼叫在输入上的预测，从而避免重建和重新加载％网络对象。持久的lanenet;如果是isempty（lanenet）lanenet = coder.loaddeeplearningnetwork（'lanenet.mat'，'lanenet'）;end lanecoeffsnetworkoutput = lanenet.predict（erfute（帧，[2 1 3]））;通过反转归一化步骤Params = LanecoeffsNetworkOutput来恢复原始Coeffs。* LanecoeffStds + Lanecoeffmeans;IsrightLanefound = ABS（参数（6））> 0.5; %c should be more than 0.5 for it to be a right lane isLeftLaneFound = abs(params(3)) > 0.5; vehicleXPoints = 3:30; %meters, ahead of the sensor ltPts = coder.nullcopy(zeros(28,2,'single')); rtPts = coder.nullcopy(zeros(28,2,'single')); if isRightLaneFound && isLeftLaneFound rtBoundary = params(4:6); rt_y = computeBoundaryModel(rtBoundary, vehicleXPoints); ltBoundary = params(1:3); lt_y = computeBoundaryModel(ltBoundary, vehicleXPoints); % Visualize lane boundaries of the ego vehicle tform = get_tformToImage; % map vehicle to image coordinates ltPts = tform.transformPointsInverse([vehicleXPoints', lt_y']); rtPts = tform.transformPointsInverse([vehicleXPoints', rt_y']); laneFound = true; else laneFound = false; end end function yWorld = computeBoundaryModel(model, xWorld) yWorld = polyval(model, xWorld); end function tform = get_tformToImage % Compute extrinsics based on camera setup yaw = 0; pitch = 14; % pitch of the camera in degrees roll = 0; translation = translationVector(yaw, pitch, roll); rotation = rotationMatrix(yaw, pitch, roll); % Construct a camera matrix focalLength = [309.4362, 344.2161]; principalPoint = [318.9034, 257.5352]; Skew = 0; camMatrix = [rotation; translation] * intrinsicMatrix(focalLength, ... Skew, principalPoint); % Turn camMatrix into 2-D homography tform2D = [camMatrix(1,:); camMatrix(2,:); camMatrix(4,:)]; % drop Z tform = projective2d(tform2D); tform = tform.invert(); end function translation = translationVector(yaw, pitch, roll) SensorLocation = [0 0]; Height = 2.1798; % mounting height in meters from the ground rotationMatrix = (... rotZ(yaw)*... % last rotation rotX(90-pitch)*... rotZ(roll)... % first rotation ); % Adjust for the SensorLocation by adding a translation sl = SensorLocation; translationInWorldUnits = [sl(2), sl(1), Height]; translation = translationInWorldUnits*rotationMatrix; end %------------------------------------------------------------------ % Rotation around X-axis function R = rotX(a) a = deg2rad(a); R = [... 1 0 0; 0 cos(a) -sin(a); 0 sin(a) cos(a)]; end %------------------------------------------------------------------ % Rotation around Y-axis function R = rotY(a) a = deg2rad(a); R = [... cos(a) 0 sin(a); 0 1 0; -sin(a) 0 cos(a)]; end %------------------------------------------------------------------ % Rotation around Z-axis function R = rotZ(a) a = deg2rad(a); R = [... cos(a) -sin(a) 0; sin(a) cos(a) 0; 0 0 1]; end %------------------------------------------------------------------ % Given the Yaw, Pitch, and Roll, determine the appropriate Euler % angles and the sequence in which they are applied to % align the camera's coordinate system with the vehicle coordinate % system. The resulting matrix is a Rotation matrix that together % with the Translation vector defines the extrinsic parameters of the camera. function rotation = rotationMatrix(yaw, pitch, roll) rotation = (... rotY(180)*... % last rotation: point Z up rotZ(-90)*... % X-Y swap rotZ(yaw)*... % point the camera forward rotX(90-pitch)*... % "un-pitch" rotZ(roll)... % 1st rotation: "un-roll" ); end function intrinsicMat = intrinsicMatrix(FocalLength, Skew, PrincipalPoint) intrinsicMat = ... [FocalLength(1) , 0 , 0; ... Skew , FocalLength(2) , 0; ... PrincipalPoint(1), PrincipalPoint(2), 1]; end

为网络和后处理代码生成代码

该网络计算参数a、b和c，这些参数描述了左右车道边界的抛物线方程。

从这些参数，计算与通道位置对应的x和y坐标。必须将坐标映射到图像坐标。功能检测_LANE.M.执行所有这些计算。通过创建一个GPU代码配置对象来为这个函数生成CUDA代码'lib'目标并将目标语言设置为C ++。使用编码器。DeepLearningConfig(GPU编码器)功能创建一个CUDNN.深度学习配置对象，并将其赋给DeepLearningConfig属性。运行codegen命令。

cfg = coder.gpuConfig ('lib');cfg.deeplearningconfig = coder.deeplearningconfig（“cudnn”);cfg。GenerateReport = true;cfg。TargetLang =“c++”;codegenarg游戏{ONE（227,227,3，'单一'），一个（1,6，'双'），那些（1,6，'双'）}配置cfgdetect_lane

代码生成成功：要查看报告，请打开（'codegen / lib / detect_lane / html / export.mldatx'）。

生成的代码描述

该系列网络生成为一个包含23个层类的数组的c++类。

班级c_lanenet{公共：INT32_T.batchSize;int32_Tnumlayers.;real32_T* inputData;real32_T * outputData;MWCNNLayer*层[23];公众:c_lanenet(无效);无效设置（void）;无效预测（空白）;无效的清理(无效);~ c_lanenet(无效);};

的设置()类的方法设置句柄并为每个层对象分配内存。的预测()方法调用网络中的23层中的每一个的预测。

CNN_LANENET_CONV * _W和CNN_LANENET_CONV * _B文件是网络中卷积层的二进制权重和偏置文件。CNN_LANENET_FC * _W和CNN_LANENET_FC * _B文件是网络中完全连接图层的二进制权重和偏置文件。

codegendir = fullfile（“codegen”,'lib','detect_lane');dir (codegendir)

。cnn_lanenet0_0_conv4_w.bin .. cnn_lanenet0_0_conv5_b.bin的.gitignore cnn_lanenet0_0_conv5_w.bin DeepLearningNetwork.cu cnn_lanenet0_0_data_offset.bin DeepLearningNetwork.h cnn_lanenet0_0_data_scale.bin DeepLearningNetwork.o cnn_lanenet0_0_fc6_b.bin MWCNNLayerImpl.cu cnn_lanenet0_0_fc6_w.bin MWCNNLayerImpl.hpp cnn_lanenet0_0_fcLane1_b.bin MWCNNLayerImpl.o cnn_lanenet0_0_fcLane1_w.bin MWCudaDimUtility.CU cnn_lanenet0_0_fcLane2_b.bin MWCudaDimUtility.hpp cnn_lanenet0_0_fcLane2_w.bin MWCustomLayerForCuDNN.cpp cnn_lanenet0_0_responseNames.txt MWCustomLayerForCuDNN.hpp codeInfo.mat MWCustomLayerForCuDNN.o codedescriptor.dmr MWElementwiseAffineLayer.cpp compileInfo.mat MWElementwiseAffineLayer.hpp defines.txt MWElementwiseAffineLayer.o detect_lane.a MWElementwiseAffineLayerImpl.cudetect_lane.cu mwelementwiseaffinelayerimpl.hpp detect_lane.h mwelementwiseaffinelayerimpl.o detect_lane.o mwelementwiseaffinelayerimplkernel.cu detect_lane_data.cu mwelementwiseaffinelayerimplkernel.o detect_lane_data.h MWFusedConvReLULayer.cpp detect_lane_data.o MWFusedConvReLULayer.hpp detect_lane_initialize.cu MWFusedConvReLULayer.o detect_lane_initialize.h MWFusedConvReLULayerImpl.cu detect_lane_initialize.o MWFusedConvReLULayerImpl.hpp detect_lane_ref.rsp MWFusedConvReLULayerImpl.o detect_lane_rtw.mk MWKernelHeaders.hpp detect_lane_terminate.cu MWTargetNetworkImpl.cu detect_lane_terminate。ħMWTargetNetworkImpl.hpp detect_lane_terminate.o MWTargetNetworkImpl.o detect_lane_types.h buildInfo.mat例子cnn_api.cpp gpu_codegen_info.mat cnn_api.hpp HTML cnn_api.o接口cnn_lanenet0_0_conv1_b.bin mean.bin cnn_lanenet0_0_conv1_w.bin predict.cu cnn_lanenet0_0_conv2_b.bin predict.h cnn_lanenet0_0_conv2_w.bin predict.o cnn_lanenet0_0_conv3_b.binttw_proj.tmw cnn_lanenet0_0_conv3_w.bin rtwtypes.h cnn_lanenet0_0_0_conv4_b.bin

生成用于后处理输出的其他文件

从经过培训的网络的导出均值和STD值以在执行期间使用。

codegendir = fullfile（pwd，“codegen”,'lib','detect_lane');fid = fopen (fullfile (codegendir“mean.bin”），'W');A = [coeffMeans coeffStds];写入文件(fid,,'双倍的');文件关闭(fid);

主文件

使用主文件编译网络代码。主文件使用OpenCVVideoCapture从输入视频读取帧的方法。每个帧都被处理并分类，直到读取更多帧。在为每帧显示输出之前，输出通过使用后处理detect_lane函数生成detect_lane.cu。

类型main_lanenet.cu

/ *版权所有2016 MathWorks，Inc. * / #include  #include  #include  #include  #include #include  #include  #include  #include  #include  #include“detect_lane.h”使用命名空间简历;void ReadData（Float *输入，Mat＆Orig，Mat＆IM）{尺寸尺寸（227,227）;调整大小（orig，Im，size，0,0，Inter_linear）;for（int j = 0; j <227 * 227; j ++）{// bgr到rgb输入[2 * 227 * 227 + j] =（浮点）（im.data [j * 3 + 0]）;输入[1 * 227 * 227 + j] =（浮动）（im.data [j * 3 + 1]）;输入[0 * 227 * 227 + j] =（浮动）（im.data [j * 3 + 2]）;void addlane（float pts [28] [2]，mat＆im，nampts）{std :: vector  iArirray;for（int k = 0; k > orig; if (orig.empty()) break; readData(inputBuffer, orig, im); writeData(inputBuffer, orig, 6, means, stds); cudaEventRecord(stop); cudaEventSynchronize(stop); char strbuf[50]; float milliseconds = -1.0; cudaEventElapsedTime(&milliseconds, start, stop); fps = fps*.9+1000.0/milliseconds*.1; sprintf (strbuf, "%.2f FPS", fps); putText(orig, strbuf, Point(200,30), FONT_HERSHEY_DUPLEX, 1, CV_RGB(0,0,0), 2); imshow("Lane detection demo", orig); if( waitKey(50)%256 == 27 ) break; // stop capturing by pressing ESC */ } destroyWindow("Lane detection demo"); free(inputBuffer); free(outputBuffer); return 0; }

下载示例视频

如果~ ('./caltech_cordova1.avi',“文件”）URL =.“//www.tianjin-qmedu.com/万博1manbetxsupportfiles/gpucoder/media/caltech_cordova1.avi”;websave ('caltech_cordova1.avi'，URL）;结尾

构建可执行

如果ispc setenv (“MATLAB_ROOT”，matlabroot）;vcvarsall = mex.getcompilerconfigurations（“c++”）.details.commandlineshell;setenv (“VCVARSALL”, vcvarsall);系统(“make_win_lane_detection.bat”);CD（Codegendir）;系统(“lanenet.exe  ..\..\..\ caltech_cordova1.avi”);别的setenv (“MATLAB_ROOT”，matlabroot）;系统(“让- f Makefile_lane_detection.mk”);CD（Codegendir）;系统(”。/ lanenet  ../../../ caltech_cordova1.avi”);结尾