立体声差距

此示例使用：

打开脚本

此示例显示如何从MATLAB®函数生成CUDA®MEX功能，该函数计算两个图像的立体声差距。

第三方先决条件

必需的

此示例生成CUDA MEX并具有以下第三方要求。

CUDA使NVIDIA®GPU和兼容的驱动程序启用。对于半精密代码生成，GPU设备必须具有6.0的最小计算能力。

可选的

对于诸如静态，动态库或可执行文件的非MEX构建，此示例具有以下附加要求。

nvidia工具包。
编译器和库的环境变量。有关更多信息，请参阅第三方硬件和设置先决条件产品s manbetx 845。

验证GPU环境

要验证运行此示例所需的编译器和库是否正确设置，请使用Coder.CheckGPuInstall.功能。

envcfg = coder.gpuenvconfig（'主持人'）;Envcfg.basicCodegen = 1;Envcfg.quiet = 1;Coder.CheckGpuInstall（Envcfg）;

立体声差距计算

这立体索迪斯帕尔特法入口点函数需要两个图像并返回从这两个图像计算的立体声差距图。

类型立体探索探索

在此实现中的立体声差距块的%%修改算法匹配％，而不是查找移位图像，索引是％映射以节省内存和一些处理。RGBA专栏主要填充数据用作与CUDA内在兼容性的输入。使用可分离滤波器（水平和％垂直）来执行％卷积。函数[out_disp] =立体探测器（IMG0，IMG1）％＃codegen％copyright 2017-2019 MathWorks，Inc.％GPU代码生成Pragma Coder.gpu.kernelfun;%%立体声差距参数％| Win_rad |是待操作窗的半径。| min_disparity |％是搜索继续的最低差异级别。| max_disparity |PAURION继续的最大差异级别是％的。 WIN_RAD = 8; min_disparity = -16; max_disparity = 0; %% Image Dimensions for Loop Control % The number of channels packed are 4 (RGBA) so as nChannels are 4. [imgHeight,imgWidth]=size(img0); nChannels = 4; imgHeight = imgHeight/nChannels; %% Store the Raw Differences diff_img = zeros([imgHeight+2*WIN_RAD,imgWidth+2*WIN_RAD],'int32'); % Store the minimum cost min_cost = zeros([imgHeight,imgWidth],'int32'); min_cost(:,:) = 99999999; % Store the final disparity out_disp = zeros([imgHeight,imgWidth],'int16'); %% Filters for Aggregating the Differences % |filter_h| is the horizontal filter used in separable convolution. % |filter_v| is the vertical filter used in separable convolution which % operates on the output of the row convolution. filt_h = ones([1 17],'int32'); filt_v = ones([17 1],'int32'); % Main Loop that runs for all the disparity levels. This loop is % expected to run on CPU. for d=min_disparity:max_disparity % Find the difference matrix for the current disparity level. Expect % this to generate a Kernel function. coder.gpu.kernel; for colIdx=1:imgWidth+2*WIN_RAD coder.gpu.kernel; for rowIdx=1:imgHeight+2*WIN_RAD % Row index calculation. ind_h = rowIdx - WIN_RAD; % Column indices calculation for left image. ind_w1 = colIdx - WIN_RAD; % Row indices calculation for right image. ind_w2 = colIdx + d - WIN_RAD; % Border clamping for row Indices. if ind_h <= 0 ind_h = 1; end if ind_h > imgHeight ind_h = imgHeight; end % Border clamping for column indices for left image. if ind_w1 <= 0 ind_w1 = 1; end if ind_w1 > imgWidth ind_w1 = imgWidth; end % Border clamping for column indices for right image. if ind_w2 <= 0 ind_w2 = 1; end if ind_w2 > imgWidth ind_w2 = imgWidth; end % In this step, Sum of absolute Differences is performed % across tour channels. tDiff = int32(0); for chIdx = 1:nChannels tDiff = tDiff + abs(int32(img0((ind_h-1)*(nChannels)+chIdx,ind_w1))-int32(img1((ind_h-1)*(nChannels)+chIdx,ind_w2))); end % Store the SAD cost into a matrix. diff_img(rowIdx,colIdx) = tDiff; end end % Aggregating the differences using separable convolution. Expect this % to generate two kernels using shared memory.The first kernel is the % convolution with the horizontal kernel and second kernel operates on % its output the column wise convolution. cost_v = conv2(diff_img,filt_h,'valid'); cost = conv2(cost_v,filt_v,'valid'); % This part updates the min_cost matrix with by comparing the values % with current disparity level. for ll=1:imgWidth for kk=1:imgHeight % load the cost temp_cost = int32(cost(kk,ll)); % Compare against the minimum cost available and store the % disparity value. if min_cost(kk,ll) > temp_cost min_cost(kk,ll) = temp_cost; out_disp(kk,ll) = abs(d) + 8; end end end end end

阅读图像并将数据包到RGBA包装列 - 主要订单中

img0 = imread（'scene_left.png'）;img1 = imread（'scene_right.png'）;[imgrgb0] = pack_rgbdata（IMG0）;[imgrgb1] = pack_rgbdata（IMG1）;

左图像

正确的形象

生成GPU代码

cfg = coder.gpuconfig（'mex'）;Codegen.-Config.CFG.-  args.{imgrgb0，imgrgb1}立体探索探索;

代码生成成功：要查看报告，请打开（'codegen / mex / stereodisodisparity / html / export.mldatx'）。

运行生成的mex并显示输出差异

Out_disp = stereodisparity_mex（IMGRGB0，IMGRGB1）;ImageC（OUT_DISP）;

半精确度

此示例中的计算也可以在半精度浮点数中完成，使用Stereodisparityhalfprecision.M.入口点函数。要使用半精密数据类型生成和执行代码，需要6.0或更高的CUDA计算能力。设定computEapability.代码配置对象的属性'6.0'。对于半精度，必须将用于生成CUDA代码的内存分配（MALLOC）模式设置为“离散”。

cfg.gpuconfig.ComputEcapability =.'6.0';cfg.g.guconfig.mallocmode =.'离散的';

标准Imread.命令表示具有整数的图像的RGB通道，每个像素一个。整数范围为0到255.只需将输入铸造到半型可能导致卷曲期间的溢出。在这种情况下，我们可以将图像缩放到0到1之间的值。“imread”表示具有整数的图像的RGB通道，每个像素一个。整数范围为0到255.只需将输入铸造到半型可能导致卷曲期间的溢出。在这种情况下，我们可以将图像扩展到0到1之间的值。

img0 = imread（'scene_left.png'）;img1 = imread（'scene_right.png'）;[IMGRGB0] =一半（pack_rgbdata（IMG0））/ 255;[IMGRGB1] =一半（Pack_rgbdata（IMG1））/ 255;

为该功能生成CUDA MEX

代码生成stereo_disparity_half_precision.m.功能。

Codegen.-Config.CFG.-  args.{imgrgb0，imgrgb1}StereodisParityHalfPrecision.;

代码生成成功：要查看报告，请打开（'codegen / mex / stereodisparityhalfprecision / html / eport.mldatx'）。

也可以看看

职能

Codegen.|Coder.CheckGPuInstall.|coder.gpu.constantmemory.|coder.gpu.kernel.|coder.gpu.kernelfun.|gpucoder.matrixmatrixkernel.|gpucoder.stildkernel.

对象

coder.codeConfig|Coder.embeddedCodeConfig|Coder.gpuconfig.|coder.gpuenvconfig