cuda内核不工作当一个数据传输到GPU。任何问题与我的网关墨西哥人代码?

38视图(30天)

显示旧的评论

Moein Mozaffarzadeh 2021年3月19日

0
链接

这个问题直接联系

//www.tianjin-qmedu.com/matlabcentral/answers/777732-cuda-kernel-does-not-work-when-a-data-is-transfer-to-the-gpu-any-problem-with-my-gateway-mex-code

评论道: Moein Mozaffarzadeh2021年5月24日

答:接受理查德。

你好,

我试图转移KUDA内核数据,做一些职能并输出回Matlab。我已经评估了内核在visual studio和它的工作原理。然而,当我做一个墨西哥人文件的代码,它不提供我我期望的输出。这是我的墨西哥人网关代码:

                          #包括“cuda_runtime.h”
                         
                          #包括“device_launch_parameters.h”
                         
                          #包括“cuda.h”
                         
                          使用名称空间性病;
                         
                          #包括< mex.h >
                         
                          __global__无效kernel_Reconstruction2 (int, int * ReconstructedImage_GPU int * Dev_RfData传输,int NStart_Transmit) {
                         
                          intTID = threadIdx。y * blockDim。x + threadIdx.x;
                         
                          intBlockOFFset = blockDim。x * blockDim。y * blockIdx.x;
                         
                          intRowOFFset = blockDim。x * blockDim。y * gridDim。x * blockIdx.y;
                         
                          intGID = RowOFFset + BlockOFFset + TID;
                         
                          }
                         
                          无效mexFunction (int nlhs mxArray * plhs [],
                         
                          intnrhs,常量mxArray * prhs []) {
                         
                          int * RfData;/ /射频数据;一个固定内存是专用的
                         
                          int * ReconstructedImage_GPU;
                         
                          RfData = (int *) mxGetPr (prhs [0]);
                         
                          plhs [0] = mxCreateNumericMatrix (1 64 * 64 mxINT32_CLASS mxREAL);
                         
                          ReconstructedImage_GPU = (int *) mxGetData (plhs [0]);
                         
                          intArrayByteSize_RfData = sizeof (int) * (96 * 96 * 4096);
                         
                          intBYTES_PER_STREAM = ArrayByteSize_RfData / 96;
                         
                          / /内存分配:RfData;我们发送射频数据与流媒体设备
                         
                          int * Device_RfData;/ /设备指针射频数据。
                         
                          (cudaMalloc ((int * *) &Device_RfData ArrayByteSize_RfData));
                         
                          int * Device_ReconstructedImage_GPU;/ /设备指针重建的图像
                         
                          intArrayByteSize_ReconstructedImage_GPU = sizeof (int) * (96 * 96);
                         
                          (cudaMalloc ((int * *) &Device_ReconstructedImage_GPU ArrayByteSize_ReconstructedImage_GPU));
                         
                          printf (“CUDA重建开始……\ n”);
                         
                          dim3块(1024 1);
                         
                          dim3网格(64 * 64,96),/ / SystemSetup.NumberOfTransmitter
                         
                          cudaStream_t *流= new cudaStream_t [96];/ / SystemSetup.NumberOfTransmitter
                         
                          intNStart_Transmit {};
                         
                          为(int传输= 0;传输< 96;传输+ +){
                         
                          cudaStreamCreate(流(传输));
                         
                          NStart_Transmit =传输* (96 * 4096);
                         
                          cudaMemcpyAsync (&Device_RfData NStart_Transmit, &RfData NStart_Transmit, BYTES_PER_STREAM, cudaMemcpyHostToDevice,流(传输));
                         
                          kernel_Reconstruction2< < <网格块0流(传输)> > > (&Device_RfData NStart_Transmit, Device_ReconstructedImage_GPU,传输,NStart_Transmit);
                         
                          (cudaPeekAtLastError ());
                         
                          }
                         
                          为(int传输= 0;传输< 96;传输+ +){cudaStreamDestroy(流(传输));}/ /破坏流
                         
                          删除[]流;
                         
                          cudaDeviceSynchronize ();
                         
                          (cudaMemcpy (ReconstructedImage_GPU Device_ReconstructedImage_GPU、ArrayByteSize_ReconstructedImage_GPU cudaMemcpyDeviceToHost));
                         
                          (cudaFree (Device_RfData));
                         
                          (cudaFree (Device_ReconstructedImage_GPU));
                         
                          }

我还检查如果“RfData”包含在主机或没有实际值;都是好的。所以,我认为有问题的“Device_RfData”值。有什么我错过什么?

问候,

Moein。

2的评论
显示1年长的评论藏1年长的评论

Moein Mozaffarzadeh 2021年3月22日

你好理查德,

会儿解释,ReconstructedImage_GPU是一个图像。对于每一个像素的图像,我需要总结一些Dev_RfData样本。这是发生在我的内核,我提供的是我的代码的简化版本。她到底发生了什么:

                               
                               __global__无效kernel_Reconstruction2(设置* SetupLoaded_p、浮* MediumZ_p * MediumX_p浮动,浮动* TRansducerCorrZ_p, * TRansducerCorrX_p浮动
                              
                               int, int * RfData int * DirDir_Size,intReconstruct_SoundSpeedint, int * ReconstructedImage_GPU传输,intNStart_Transmit,int大小、浮点数* Device_ConvArrivalTime) {
                              
                               intTID = threadIdx。y * blockDim。x + threadIdx.x;
                              
                               intBlockOFFset = blockDim。x * blockDim。y * blockIdx.x;
                              
                               intRowOFFset = blockDim。x * blockDim。y * gridDim。x * blockIdx.y;
                              
                               intGID = RowOFFset + BlockOFFset + TID;
                              
                               intGID_RowBased = BlockOFFset + TID;
                              
                               intD1D2,山姆,Pz_man Px_man,接收、RoundTripSample, IndexingReceive, IndexingTransmit;
                              
                               浮动ReceiveTime、RoundTripTime TransmitTime;
                              
                               如果(GID_RowBased <大小){
                              
                               Px_man = (GID_RowBased)% (SetupLoaded_p - > Nx);
                              
                               Pz_man = (GID_RowBased) / (SetupLoaded_p - > Nx);
                              
                               收到= blockIdx.y;
                              
                               IndexingReceive =接收* Dir_Size + (GID_RowBased);
                              
                               IndexingTransmit =传输* Dir_Size + (GID_RowBased);
                              
                               TransmitTime = (sqrtf (((TRansducerCorrX_p(传输)——MediumX_p [Px_man]) * (TRansducerCorrX_p(传输)——MediumX_p [Px_man])) + ((TRansducerCorrZ_p(传输)——MediumZ_p [Pz_man]) * (TRansducerCorrZ_p(传输)——MediumZ_p [Pz_man])))) / Reconstruct_SoundSpeed;
                              
                               ReceiveTime = (sqrtf (((TRansducerCorrX_p[接受]——MediumX_p [Px_man]) * (TRansducerCorrX_p[接受]——MediumX_p [Px_man])) + ((TRansducerCorrZ_p[接受]——MediumZ_p [Pz_man]) * (TRansducerCorrZ_p[接受]——MediumZ_p [Pz_man])))) / Reconstruct_SoundSpeed;
                              
                               RoundTripTime = (TransmitTime + ReceiveTime);
                              
                               RoundTripTime+ = (SetupLoaded_p - > TransmissionOffset);
                              
                               RoundTripSample = lroundf (RoundTripTime * SetupLoaded_p - > Fs) 1;
                              
                               ReconstructedImage_GPU [GID_RowBased] + = ((RfData [RoundTripSample +((接收)* SetupLoaded_p - > NumberOfSamples)))
                              
                               * (Dir (IndexingReceive) * Dir [IndexingTransmit]));
                              
                               }
                              
                               }

当然,我需要复制所有的数组预先在内核中使用 cudaMemcpy。对不起如果我不匹配与网关墨西哥人代码(我试图提供一个compileable代码在我的第一篇文章)。

我确信内核工作正常,因为它给了我正确的形象在我的Visual Studio项目(抱歉如果我不把这里所有的代码,因为它需要很多的变量和一个大型数据集)。

我认为,Matlab不正确启动内核。你知道什么是错的呢?

Moein。

登录置评。

在回答这个问题。

接受的答案

理查德。 2021年3月24日

0
链接

直接链接到这个答案

//www.tianjin-qmedu.com/matlabcentral/answers/777732-cuda-kernel-does-not-work-when-a-data-is-transfer-to-the-gpu-any-problem-with-my-gateway-mex-code answer_656963

编辑:理查德。 2021年3月24日

嗨Moein,

没有办法直接固定CUDA内存分配MATLAB数组。输入一个墨西哥人由MATLAB函数总是会分配的内存管理器。

你应该能够分配自己的固定主机内存和mxArray的数据复制到它之前使用流的异步复制设备。复制问题吗?看看交换机我不知道这些额外的日志将比储蓄更昂贵的使用流。

在我的测试中,我使用一个基本的调用mexcuda:

mexcudaTUI_CUDA。铜MexFunctions.cu

您可能还想叫墨西哥人设置,以确保你已经选择了相同的编译器在您使用MATLAB在VS。

我不认为这是你看到的差异可能是由于编译器标志。我认为这是更好的专注于验证输入,然后处理步骤输出匹配你的VS版本和MATLAB版本。

看到与输出可视化问题,两者之间的内存数组的顺序是不同的。显然有一个2 d(甚至3 d)输入数据结构,和你处理它的内核设计工作一片在其中的一个维度。但是输入数组1 d。是输入数据通过从MATLAB绝对正确导入吗?如果涉及到二维矩阵在某个阶段还有可能是转置失踪。你试过调整(N-by-ntransmitters)图像的输入检查两个一维流看起来有预期的结构吗?

当你有核实输入数据流是相同的,那么你将需要看具体位置在内核处理出现偏差。它可能会很有帮助如果你能制定一个示例输入要小得多,这样你有更少的数据项之间的追踪和比较两个系统当你这样做。逐步减少代码(暂时)的内核,直到输出匹配尝试另一种方法——你甚至可以在内核仅仅输出线程指数作为一个基本的测试来证明每个方法执行相同的内核。

理查德。

直接的问题是,变量C,将来你想引用调用一个函数需要被声明为静态的,和他们的声明需要在一个适当的范围,即如果()的分支。只有创建的内容变量的代码需要谨慎,只运行一次。他们也不应该释放mexfunction结束时,不应该使用cudaResetDevice,因为这些会使cuda在下一次调用无效的指针。你需要免费注册一个函数指针使用mexAtExit墨西哥人文件清理时,见//www.tianjin-qmedu.com/help/matlab/apiref/mexatexit.html。

然而,我不认为这种方法是最好的方法问题。它并不意味着墨西哥人函数将没有任何用户干预:您需要明确,他们向用户文档必须设置“TransferToHost”标记时某些其他输入变量有不同的数据。而且,他们不应该设置标记如果他们想看到最好的性能在其他时间。这个接口是混乱的,用户可能会犯错误。这也意味着用户没有真正控制持久性GPU数据的生命周期,没有明显的方式对他们说“好吧,我完成了代码,现在我想将整个GPU用于别的东西”。

如果你接受gpuArray作为墨西哥人的输入功能,你将只需要文档支持gpuArrays作为输入,和用户类型“otherData = gpuArray (otherData)“之前调用你的墨西哥人的功能。万博1manbetx你可以点为如何创建一个gpuArray MathWorks文档,和持久数据的生命周期将由标准Matlab控制规则数据,它将持续的生活只要用户有一个变量,其中包含它,它会被释放时,变量是清除。

看着StackOverflow问题,看起来你正在取得进展后一个选项。最简单的方法编译这些GPU墨西哥人Matlab函数是在Matlab使用mexcuda命令,但我明白这是一个更大的项目的一部分,你想要与生产,我建议使用mexcuda作为起点,这样您就可以测试并确认您的代码按预期工作,然后工作配置VS(我怀疑你可能只是需要点和在外面的/ lib / win64 /微软目录发现GPU。lib文件)

理查德。

Moein Mozaffarzadeh 2021年5月24日

你好理查德,

你好理查德,非常感谢你的解释。是的,我认为你是对的一份文件,支持gpuArrays。万博1manbetx所以,我将切换到使用这feasure Matlab。

我刚刚发现有一个MEXGateway代码的内存管理问题。使用gpuArrays的第一步是把“cudaResetDevice”结束时我的代码(但仍拥有所有的记忆删除),以确保我完全控制整个代码定义的记忆我(我不丢失的东西)。然而,MexFunction不正确的输出(输出几乎翻了一倍我每次在Matlab运行MexFunction);我第一次运行MexFunction,得到输出我希望一切都好。这个问题,以下链接中提供的代码是:

//www.tianjin-qmedu.com/matlabcentral/answers/838088-why-the-output-of-my-mex-function-become-incorrect-when-used-in-a-for-loop-in-matlab-memory-managem?s_tid=prof_contriblnk

既然你已经熟悉我的编码和项目:),我将感激你的评论和支持。万博1manbetx提前谢谢。

Moein。

登录置评。

答案(1)

理查德。 2021年3月22日

0
链接

直接链接到这个答案

//www.tianjin-qmedu.com/matlabcentral/answers/777732-cuda-kernel-does-not-work-when-a-data-is-transfer-to-the-gpu-any-problem-with-my-gateway-mex-code answer_654867

编辑:理查德。 2021年3月22日

Moein,有一些问题,我在代码中可以看到。

第一个问题是,它并不以任何方式初始化GPU设备。最简单的方法是添加一个调用mxInitGPU中记录 //www.tianjin-qmedu.com/help/parallel-computing/mxinitgpu.html 。这将使用MATLAB的正常规则来选择默认的GPU。我添加了一个简单的任务线在内核中产生模拟输出值,我也观察到没有返回,然后确认添加一个调用mxInitGPU mexFunction初()引起的预期值返回(重启后MATLAB解决GPU的状态)。

第二个问题是,代码示例似乎以数组界限当复制从Device_ReconstructedImage_GPU ReconstructedImage_GPU。创建输出mxArray 1——(64 x64)矩阵,但在设备上使用数组大小是1 - (96 x96)。这可能会导致MATLAB崩溃——输出mxArray需要匹配的GPU设备阵列大小。

7评论
显示6年长的评论隐藏6年长的评论

Moein Mozaffarzadeh 2021年3月22日

你好理查德,

谢谢你的帮助。首先,让我确认定义1 - (96 x96)是我的坏在这里发布这个简单的代码。这是我actial墨西哥人网关代码,这是完全adpative:

                                  #包括“Header.cuh”
                                 
                                  #定义NRHS10/ /输入的数量
                                 
                                  #定义NLHS2/ /输出的数量
                                 
                                  __global__无效测试(int * Device_MediumZ) {
                                 
                                  Device_MediumZ [threadIdx。x] + = 1;
                                 
                                  / / printf (“Device_MediumZ: % d”,Device_MediumZ threadIdx.x);
                                 
                                  }
                                 
                                  无效mexFunction (int nlhs mxArray * plhs [],
                                 
                                  intnrhs,常量mxArray * prhs []) {
                                 
                                  / *检查为适当的数量的参数* /
                                 
                                  如果(nrhs ! = nrhs) {
                                 
                                  mexErrMsgIdAndTxt (“MyToolbox: arrayProduct: nrhs”,“数量的输入必须% u”,NRHS);
                                 
                                  }
                                 
                                  如果(nlhs ! = nlhs) {
                                 
                                  mexErrMsgIdAndTxt (“MyToolbox: arrayProduct: nlhs”,“数量的输出必须% u”,NLHS);
                                 
                                  }
                                 
                                  int计数器= 0;
                                 
                                  双TransferDataToDevice;
                                 
                                  TransferDataToDevice = mxGetScalar (prhs(柜台));计数器+ +;;/ /我们可以设置此变量“0”一旦信息数据转移。
                                 
                                  设置SystemSetup;/ /变量包含的属性中。
                                 
                                  fillSetup (SystemSetup prhs(柜台),1);计数器+ +;/ /设置最后一个参数为零,如果你不想显示的属性设置。
                                 
                                  * MediumX浮动;/ / X
                                 
                                  * MediumZ浮动;/ / Z坐标的像素
                                 
                                  int * LensZIndex;/ / Z指数的镜头;它在MATLAB已经减去1吗
                                 
                                  int * RfData;/ /射频数据;一个固定内存是专用的
                                 
                                  int * Dir;/ /指向性图案的元素;一个固定内存是专用的
                                 
                                  * LensArrivalTime浮动;/ /到达时间透镜的索引的数组元素;一个固定内存是专用的
                                 
                                  * TRansducerCorrX浮动;/ / X
                                 
                                  * TRansducerCorrZ浮动;/ / Z坐标数组的元素
                                 
                                  int * ReconstructedImage_GPU;
                                 
                                  / / const mxArray * pMxArray = prhs(柜台);
                                 
                                  MediumX =(*)浮动mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  MediumZ =(*)浮动mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  LensZIndex = (int *) mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  RfData = (int *) mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  Dir = (int *) mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  LensArrivalTime =(*)浮动mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  TRansducerCorrX =(*)浮动mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  TRansducerCorrZ =(*)浮动mxGetPr (prhs[计数器]);计数器+ +;
                                 
                                  plhs [0] = mxCreateNumericMatrix (1, SystemSetup。新西兰* SystemSetup.Nx,mxINT32_CLASS, mxREAL);
                                 
                                  / / plhs [0] = mxCreateDoubleMatrix (1, SystemSetup。新西兰* SystemSetup.Nx,mxREAL);
                                 
                                  ReconstructedImage_GPU = (int *) mxGetData (plhs [0]);
                                 
                                  int * RFOUT;
                                 
                                  plhs [1] = mxCreateNumericMatrix (1, (SystemSetup。NumberOfTransmitter * SystemSetup。NumberOfReceiver * SystemSetup。NumberOfSamples), mxINT32_CLASS, mxREAL);
                                 
                                  RFOUT = (int *) mxGetData (plhs [1]);
                                 
                                  printf (“RfData: % d % d % d。\ n”,RfData [22], RfData [3538942], RfData [3538943]);
                                 
                                  int * ZIndexProximal = (int *) calloc (SystemSetup。Nx sizeof (int));/ /指针为近端结束
                                 
                                  int * ZIndexDistal = (int *) calloc (SystemSetup。Nx sizeof (int));/ /指针为远端结束
                                 
                                  / / int * ReconstructedImage_GPU = (int *) calloc (SystemSetup。新西兰* SystemSetup.Nx,大小of(int)); // the pointer的重建图像
                                 
                                  / / int * ReconstructedImage_GPU;/ /指针的重建图像
                                 
                                  / / int * ReconstructedImage_GPU = new int [SystemSetup。新西兰* SystemSetup.Nx] {};/ /指针的重建图像
                                 
                                  / /为(int Pz = 0;Pz < SystemSetup.Nz;Pz + +) {
                                 
                                  / /为(int Px = 0;Px < SystemSetup.Nx;Px + +) {
                                 
                                  / / ReconstructedImage_GPU [Pz * SystemSetup。Nx + Px) = 0;
                                 
                                  / /}
                                 
                                  / /}
                                 
                                  / / int * ReconstructedImage_GPU;/ /指针的重建图像
                                 
                                  intNStart_Transmit {};
                                 
                                  boolShowTime = 1;
                                 
                                  clock_tCUDA_start,MAX_Find_Start CUDA_end MAX_Find_End;/ /指针计算在不同阶段的处理时间
                                 
                                  / /浮动* MediumX =(浮动*)calloc (SystemSetup。Nx sizeof(浮动);
                                 
                                  / /浮动* MediumZ =(浮动*)calloc (SystemSetup。新西兰,sizeof(浮动));
                                 
                                  / / int * LensZIndex = new int [SystemSetup.Nx];/ / Z指数的镜头;它在MATLAB已经减去1吗
                                 
                                  / / int * RfData;/ /射频数据;一个固定内存是专用的
                                 
                                  int* (SystemSetup ArrayByteSize_RfData = sizeof (int)。NumberOfTransmitter * SystemSetup。NumberOfReceiver * SystemSetup。NumberOfSamples);
                                 
                                  / / int * Dir;/ /指向性图案的元素;一个固定内存是专用的
                                 
                                  intDir_Size = SystemSetup。新西兰* SystemSetup.Nx;/ /把这个常数。看到的每个像素refere每个元素,我们需要由Dir_Size转变
                                 
                                  int* (SystemSetup ArrayByteSize_Dir = sizeof (int)。NumberOfReceiver * Dir_Size);
                                 
                                  / / * LensArrivalTime浮动;/ /到达时间透镜的索引的数组元素;一个固定内存是专用的
                                 
                                  浮动* (SystemSetup ArrayByteSize_LensArrivalTime = sizeof(浮动)。NumberOfReceiver * SystemSetup.Nx);
                                 
                                  / /浮动* TRansducerCorrX =新的浮动(SystemSetup.NumberOfReceiver);/ / X
                                 
                                  / /浮动* TRansducerCorrZ =新的浮动(SystemSetup.NumberOfReceiver);/ / Z坐标数组的元素
                                 
                                  / / int * ZIndexProximal = (int *) calloc (SystemSetup。Nx sizeof (int));/ /指针为近端结束
                                 
                                  / / int * ZIndexDistal = (int *) calloc (SystemSetup。Nx sizeof (int));/ /指针为远端结束
                                 
                                  / / int * ReconstructedImage_GPU = new int [SystemSetup。新西兰* SystemSetup.Nx] {};/ /指针的重建图像
                                 
                                  / / * ReconstructedImage_GPU_Filtered =新浮动(SystemSetup浮动。新西兰* SystemSetup.Nx] {};/ /指针对过滤后的重建图像
                                 
                                  / / int NStart_Transmit {};
                                 
                                  / / bool ShowTime = 1;/ /设置“1”如果想要显示康索尔的处理时间
                                 
                                  / / clock_t CUDA_start、CUDA_end MAX_Find_Start, MAX_Find_End;/ /指针计算在不同阶段的处理时间
                                 
                                  / / / / > > > > > > > > > > > > > > > > >指针为该设备> > > > > > > > > > > > > > > > > > > > > >
                                 
                                  设置* Device_SystemSetup;/ /除号指针媒介属性
                                 
                                  浮动* SystemSetup.Nz ArrayByteSize_MediumZ = sizeof(浮动);
                                 
                                  * Device_MediumZ浮动;/ /设备指针Z坐标的媒介
                                 
                                  浮动* SystemSetup.Nx ArrayByteSize_MediumX = sizeof(浮动);
                                 
                                  * Device_MediumX浮动;/ /设备指针X坐标的媒介
                                 
                                  浮动* SystemSetup.NumberOfReceiver ArrayByteSize_TRansducerCorrZ = sizeof(浮动);
                                 
                                  * Device_TRansducerCorrZ浮动;/ /设备指针Z坐标的数组元素
                                 
                                  浮动* SystemSetup.NumberOfReceiver ArrayByteSize_TRansducerCorrX = sizeof(浮动);
                                 
                                  * Device_TRansducerCorrX浮动;/ /设备指针数组元素的X坐标
                                 
                                  intArrayByteSize_LensZIndex = sizeof (int) * SystemSetup.Nx;
                                 
                                  int * Device_LensZIndex;/ /设备指针Z坐标的镜头。
                                 
                                  * Device_LensArrivalTime浮动;/ /设备指针镜头到达时间。
                                 
                                  int * Device_RfData;/ /设备指针射频数据。
                                 
                                  intBYTES_PER_STREAM = ArrayByteSize_RfData / SystemSetup.NumberOfTransmitter;
                                 
                                  int * Device_Dir; / /设备指针数组元素的方向性。
                                 
                                  int* (SystemSetup ArrayByteSize_ReconstructedImage_GPU = sizeof (int)。新西兰* SystemSetup.Nx);
                                 
                                  int * Device_ReconstructedImage_GPU;/ /设备指针重建的图像
                                 
                                  intSize_Proximal = (SystemSetup。Perio_ZEnd - SystemSetup.LenseThicknessIndex) * SystemSetup.Nx;
                                 
                                  浮动* (SystemSetup ArrayByteSize_ProximalArrivalTime = sizeof(浮动)。NumberOfReceiver * Size_Proximal);
                                 
                                  * Device_ProximalArrivalTime浮动;/ /设备指针像素的到达时间我们需要重建的近端
                                 
                                  intArrayByteSize_ReconstructedImage_GPU_ProximalSegmentation = sizeof (int) * ((SystemSetup。Perio_ZEnd - SystemSetup.LenseThicknessIndex) * SystemSetup.Nx);
                                 
                                  浮* Device_DistalArrivalTime; / /设备指针像素的到达时间我们需要重建发现远端
                                 
                                  * Device_RestArrivalTime浮动;/ /设备指针像素的到达时间我们需要找到远端后重建
                                 
                                  int * Device_ZIndexProximal;/ /设备指针分段的近端
                                 
                                  int * Device_ZIndexDistal;/ /设备指针分段末端
                                 
                                  intGridX、大小、ReconstructionSoundSpeed Size_Distal;
                                 
                                  intNumberOfPixels {};
                                 
                                  / / /指针需要抛物线拟合:
                                 
                                  int * xParabola = new int [SystemSetup.Nx];
                                 
                                  int * FittingOutput = new int [SystemSetup.Nx];
                                 
                                  为(int i = 0;我< SystemSetup.Nx;我+ +){
                                 
                                  xParabola[我]=我;
                                 
                                  }
                                 
                                  intSegmentation_ZStartPixel;
                                 
                                  intSegmentation_ZEndPixel;
                                 
                                  intNz_Seg;
                                 
                                  浮动ArrayByteSize_DistalArrivalTime;
                                 
                                  浮动ArrayByteSize_RestArrivalTime;
                                 
                                  / / > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >准备数据运行内核的设备< < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < <
                                 
                                  / /内存分配和传输设备:SystemSetup这应该只能转让一次
                                 
                                  gpuErrchk (cudaMalloc ((void * *) &Device_SystemSetup, sizeof(设置)));
                                 
                                  gpuErrchk (cudaMemcpy (Device_SystemSetup &SystemSetup, sizeof(设置),cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:MediumZ这应该是只能转让一次
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_MediumZ ArrayByteSize_MediumZ));
                                 
                                  gpuErrchk (cudaMemcpy (Device_MediumZ MediumZ、ArrayByteSize_MediumZ cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:MediumX这应该是只能转让一次
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_MediumX ArrayByteSize_MediumX));
                                 
                                  gpuErrchk (cudaMemcpy (Device_MediumX MediumX、ArrayByteSize_MediumX cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:TRansducerCorrZ这应该是只能转让一次
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_TRansducerCorrZ ArrayByteSize_TRansducerCorrZ));
                                 
                                  gpuErrchk (cudaMemcpy (Device_TRansducerCorrZ TRansducerCorrZ、ArrayByteSize_TRansducerCorrZ cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:TRansducerCorrX这应该是只能转让一次
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_TRansducerCorrX ArrayByteSize_TRansducerCorrX));
                                 
                                  gpuErrchk (cudaMemcpy (Device_TRansducerCorrX TRansducerCorrX、ArrayByteSize_TRansducerCorrX cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:LensZIndex这应该是只能转让一次
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_LensZIndex ArrayByteSize_LensZIndex));
                                 
                                  gpuErrchk (cudaMemcpy (Device_LensZIndex LensZIndex、ArrayByteSize_LensZIndex cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:LensArrivalTime这应该是只能转让一次
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_LensArrivalTime ArrayByteSize_LensArrivalTime));
                                 
                                  gpuErrchk (cudaMemcpy (Device_LensArrivalTime LensArrivalTime、ArrayByteSize_LensArrivalTime cudaMemcpyHostToDevice));
                                 
                                  / /内存分配和传输设备:Dir;这应该只迁移一次
                                 
                                  / * gpuErrchk (cudaMalloc ((int * *) &Device_Dir ArrayByteSize_Dir));
                                 
                                  gpuErrchk (cudaMemcpy (Device_Dir Dir, ArrayByteSize_Dir cudaMemcpyHostToDevice)); * /
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_Dir ArrayByteSize_Dir));
                                 
                                  gpuErrchk (cudaMemcpy (Device_Dir Dir, ArrayByteSize_Dir cudaMemcpyHostToDevice));
                                 
                                  / / gpuErrchk (cudaBindTexture (NULL,特克斯,Device_Dir ArrayByteSize_Dir));
                                 
                                  / /创建纹理对象
                                 
                                  / / cudaResourceDesc resDesc;
                                 
                                  / / memset (&resDesc 0 sizeof (resDesc));
                                 
                                  / / resDesc。resType = cudaResourceTypeLinear;
                                 
                                  / / resDesc.res.linear.devPtr = Device_Dir;
                                 
                                  / / resDesc.res.linear.desc.f = cudaChannelFormatKindSigned;
                                 
                                  / / resDesc.res.linear.desc.x = 32;/ /位/通道
                                 
                                  / / resDesc.res.linear。大小InBytes = ArrayByteSize_Dir;
                                 
                                  / / cudaTextureDesc texDesc;
                                 
                                  / / memset (&texDesc 0 sizeof (texDesc));
                                 
                                  / / texDesc。readMode = cudaReadModeElementType;
                                 
                                  / / / /创建纹理对象:我们只需要做一次!
                                 
                                  / / cudaTextureObject_t特克斯= 0;
                                 
                                  / / cudaCreateTextureObject(特克斯,&resDesc &texDesc, NULL);
                                 
                                  / /内存分配:ReconstructedImage_GPU这个应该只分配一次
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_ReconstructedImage_GPU ArrayByteSize_ReconstructedImage_GPU));
                                 
                                  / /为(int Pz = 0;Pz < SystemSetup.Nz;Pz + +) {
                                 
                                  / /为(int Px = 0;Px < SystemSetup.Nx;Px + +) {
                                 
                                  / / ReconstructedImage_GPU [Pz * SystemSetup。Nx + Px) = 0;
                                 
                                  / /}
                                 
                                  / /}
                                 
                                  / / gpuErrchk (cudaMemcpy (Device_ReconstructedImage_GPU、ReconstructedImage_GPU ArrayByteSize_ReconstructedImage_GPU, cudaMemcpyHostToDevice));/ /我们做不是transafer这是我们不需要一个主形象
                                 
                                  / /内存分配:内存分配Z指数的近端与Dikstra分段结束;这应该只分配一次
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_ZIndexProximal ArrayByteSize_LensZIndex));
                                 
                                  / /内存分配:内存分配的Z指数与Dikstra末端分段;这应该只分配一次
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_ZIndexDistal ArrayByteSize_LensZIndex));
                                 
                                  / /内存分配:这包含所有像素的到达时间,我们使用传统方法重建
                                 
                                  * Device_ConvArrivalTime浮动;/ /设备指针
                                 
                                  浮动* (SystemSetup ArrayByteSize_Device_ConvArrivalTime = sizeof(浮动)。NumberOfReceiver * SystemSetup。新西兰* SystemSetup.Nx);
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_ConvArrivalTime ArrayByteSize_Device_ConvArrivalTime));
                                 
                                  / /内存分配:这包含所有像素的到达时间我们需要重建段近端端
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_ProximalArrivalTime ArrayByteSize_ProximalArrivalTime));
                                 
                                  / *
                                 
                                  / /转到设备:PixelToSyntheticElements这应该是只能转让一次
                                 
                                  浮动* SystemSetup.Nx ArrayByteSize_PixelToSyntheticElements = sizeof(浮动);
                                 
                                  * Device_PixelToSyntheticElements浮动;
                                 
                                  gpuErrchk (cudaMalloc((浮动* *)&Device_PixelToSyntheticElements ArrayByteSize_PixelToSyntheticElements));
                                 
                                  * /
                                 
                                  cout< <“指针分配…“< < endl;
                                 
                                  / / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
                                 
                                  / /从这里,代码需要一个新的数据更新一次来自Verasonics。
                                 
                                  / / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
                                 
                                  如果(ShowTime = = 1) {CUDA_start =时钟();}
                                 
                                  / /内存分配:RfData;我们发送射频数据与流媒体设备
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_RfData ArrayByteSize_RfData));
                                 
                                  / / gpuErrchk (cudaMemcpy (Device_RfData、RfData ArrayByteSize_RfData, cudaMemcpyHostToDevice));
                                 
                                  printf (“CUDA重建开始……\ n”);
                                 
                                  dim3块(1024 1);
                                 
                                  / / > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >找到类型的图像recnstruction:传统/高级> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
                                 
                                  如果(SystemSetup。ProcessingType = = 1) {
                                 
                                  NumberOfPixels = (SystemSetup.Nz) * SystemSetup.Nx;
                                 
                                  GridX = (NumberOfPixels / block.x);
                                 
                                  ReconstructionSoundSpeed = SystemSetup.WaterSoundSpeed;
                                 
                                  }
                                 
                                  其他的如果(SystemSetup。ProcessingType = = 2) {
                                 
                                  NumberOfPixels = SystemSetup。LenseThicknessIndex * SystemSetup.Nx;
                                 
                                  GridX = (NumberOfPixels /块。x + (((NumberOfPixels% block.x) ! = 0) * 1));
                                 
                                  ReconstructionSoundSpeed = SystemSetup.LenseSoundSpeed;
                                 
                                  }
                                 
                                  其他的{
                                 
                                  cout < <“错误的粉磨:处理类型不存在……”< < endl;
                                 
                                  }
                                 
                                  / / dim3网格(GridX SystemSetup.NumberOfTransmitter); / / SystemSetup.NumberOfTransmitter
                                 
                                  dim3网格(GridX SystemSetup.NumberOfTransmitter),/ / SystemSetup.NumberOfTransmitter
                                 
                                  cudaStream_t *流= new cudaStream_t [SystemSetup.NumberOfTransmitter];/ / SystemSetup.NumberOfTransmitter
                                 
                                  为(int传输= 0;传输< SystemSetup.NumberOfTransmitter;传输+ +){
                                 
                                  cudaStreamCreate(流(传输));
                                 
                                  NStart_Transmit =传输* (SystemSetup。NumberOfReceiver * SystemSetup。NumberOfSamples);
                                 
                                  cudaMemcpyAsync (&Device_RfData NStart_Transmit, &RfData NStart_Transmit, BYTES_PER_STREAM, cudaMemcpyHostToDevice,流(传输));
                                 
                                  kernel_Reconstruction2< < <网格块0流(传输)> > > (Device_SystemSetup、Device_MediumZ Device_MediumX, Device_TRansducerCorrZ, Device_TRansducerCorrX,
                                 
                                  &Device_RfData NStart_Transmit, Device_Dir、Dir_Size ReconstructionSoundSpeed, Device_ReconstructedImage_GPU,传输,NStart_Transmit, NumberOfPixels, Device_ConvArrivalTime);
                                 
                                  gpuErrchk (cudaPeekAtLastError ());
                                 
                                  }
                                 
                                  为(int传输= 0;传输< SystemSetup.NumberOfTransmitter;传输+ +){cudaStreamDestroy(流(传输));}/ /破坏流
                                 
                                  删除[]流;
                                 
                                  cudaDeviceSynchronize ();
                                 
                                  如果(ShowTime = = 1) {
                                 
                                  CUDA_end =时钟();
                                 
                                  / / cout < <“在GPU处理时间:“< <(双)((双)(CUDA_end - CUDA_start) / CLOCKS_PER_SEC) < <“[s]。”< < endl;
                                 
                                  printf (“在GPU处理时间:% f [s]。\ n”(双)((双)(CUDA_end - CUDA_start) / CLOCKS_PER_SEC));
                                 
                                  }
                                 
                                  gpuErrchk (cudaMemcpy (ReconstructedImage_GPU Device_ReconstructedImage_GPU、ArrayByteSize_ReconstructedImage_GPU cudaMemcpyDeviceToHost));
                                 
                                  ofstreamfout30 (“ReconstructedImage_GPU_Check.txt”);
                                 
                                  为(int Pz = 0;Pz < SystemSetup.Nz;Pz + +) {
                                 
                                  为(int Px = 0;Px < SystemSetup.Nx;Px + +) {
                                 
                                  fout30< < ReconstructedImage_GPU [Pz * SystemSetup。Nx + Px) < <”,”;
                                 
                                  }
                                 
                                  fout30< < endl;
                                 
                                  }
                                 
                                  fout30.close ();
                                 
                                  / *创建输出矩阵* /
                                 
                                  / / plhs [0] = mxCreateDoubleMatrix (1, SystemSetup。新西兰* SystemSetup.Nx,mxREAL);
                                 
                                  / / ReconstructedImage_GPU = (int *) mxGetData (plhs [0]);
                                 
                                  mexPrintf (“代码没问题\ n”);
                                 
                                  / /删除记忆中定义的项目
                                 
                                  gpuErrchk (cudaFree (Device_SystemSetup));
                                 
                                  gpuErrchk (cudaFree (Device_MediumZ));
                                 
                                  gpuErrchk (cudaFree (Device_MediumX));
                                 
                                  gpuErrchk (cudaFree (Device_TRansducerCorrZ));
                                 
                                  gpuErrchk (cudaFree (Device_TRansducerCorrX));
                                 
                                  gpuErrchk (cudaFree (Device_LensZIndex));
                                 
                                  gpuErrchk (cudaFree (Device_LensArrivalTime));
                                 
                                  gpuErrchk (cudaFree (Device_Dir));
                                 
                                  gpuErrchk (cudaFree (Device_ConvArrivalTime));
                                 
                                  gpuErrchk (cudaFree (Device_RfData));
                                 
                                  / /
                                 
                                  gpuErrchk (cudaFree (Device_ReconstructedImage_GPU));
                                 
                                  gpuErrchk (cudaFree (Device_ProximalArrivalTime));
                                 
                                  / /如果(SystemSetup。ProcessingType = = 2) {
                                 
                                  gpuErrchk (cudaFree (Device_ZIndexProximal));
                                 
                                  / / gpuErrchk (cudaFree (Device_DistalArrivalTime));
                                 
                                  gpuErrchk (cudaFree (Device_ZIndexDistal));
                                 
                                  / / gpuErrchk (cudaFree (Device_RestArrivalTime));
                                 
                                  / /
                                 
                                  / /}
                                 
                                  cudaDeviceReset ();
                                 
                                  删除[]xParabola;
                                 
                                  删除[]FittingOutput;
                                 
                                  / /删除[]MediumX;
                                 
                                  / /删除[]MediumZ;
                                 
                                  / /免费(MediumZ);
                                 
                                  / /删除[]TRansducerCorrX;
                                 
                                  / /删除[]TRansducerCorrZ;
                                 
                                  / /删除[]ReconstructedImage_GPU;
                                 
                                  / /删除[]ReconstructedImage_GPU_Filtered;
                                 
                                  / /删除[]LensZIndex;
                                 
                                  免费(ZIndexProximal);
                                 
                                  免费(ZIndexDistal);
                                 
                                  / / cudaFreeHost (RfData);/ /释放固定内存
                                 
                                  / / cudaFreeHost (Dir);/ /释放固定内存
                                 
                                  / / cudaFreeHost (LensArrivalTime);/ /释放固定内存
                                 
                                  }

这也是我的头。cuh代码:

                                  #如果未定义头
                                 
                                  #定义标题
                                 
                                  / /#包括< windows.h >
                                 
                                  #包括“cuda_runtime.h”
                                 
                                  #包括“device_launch_parameters.h”
                                 
                                  #包括“cuda.h”
                                 
                                  / /#包括“cuda_fp16.h”
                                 
                                  #包括< iostream >
                                 
                                  #包括< iomanip >
                                 
                                  结构体设置{
                                 
                                  intProcessingType,NumberOfReceiver NumberOfTransmitter NumberOfSamples、WaterSoundSpeed CBone, LenseSoundSpeed,新西兰Nx, Perio_ZStart, DetectionAngle,
                                 
                                  Perio_ZEnd、LenseThicknessIndex Distal_ZStart_Index、Distal_ZEnd_Index FittingDegree;
                                 
                                  浮动F0、Fs、TransmissionOffset ArrayElementWidthλ;
                                 
                                  };
                                 
                                  #定义π3.14159265
                                 
                                  #包括< iostream >
                                 
                                  #包括< fstream >
                                 
                                  #包括< sstream >
                                 
                                  #包括<空间>
                                 
                                  #包括<迭代器>
                                 
                                  #包括< stdio . h >
                                 
                                  #包括< cstdio >
                                 
                                  使用名称空间性病;
                                 
                                  #包括向量> <
                                 
                                  #包括< cstdlib >
                                 
                                  #包括<字符串>
                                 
                                  #包括< iomanip >
                                 
                                  #包括< math.h >
                                 
                                  #包括< cmath >
                                 
                                  #包括< ctime >
                                 
                                  #包括< mex.h >
                                 
                                  #包括“MexFunctions.cuh”
                                 
                                  / / < int, 1 cudaReadModeElementType >特克斯质地;
                                 
                                  / / < int >特克斯质地;
                                 
                                  / /__constant__浮动constData [512];
                                 
                                  #定义gpuErrchk (ans) {gpuAssert ((ans),__FILE__,__LINE__);}
                                 
                                  内联空白gpuAssert (cudaError_t代码,const char *文件,int, bool中止= true)
                                 
                                  {
                                 
                                  如果(代码! = cudaSuccess)
                                 
                                  {
                                 
                                  流(stderr,“GPUassert: % s % s % d \ n”cudaGetErrorString(代码)、文件、线);
                                 
                                  如果(中止)退出(代码);
                                 
                                  }
                                 
                                  }
                                 
                                  __global__无效kernel_Reconstruction2(设置* SetupLoaded_p、浮* MediumZ_p * MediumX_p浮动,浮动* TRansducerCorrZ_p, * TRansducerCorrX_p浮动
                                 
                                  int, int * RfData int * DirDir_Size,intReconstruct_SoundSpeedint, int * ReconstructedImage_GPU传输,intNStart_Transmit,int大小、浮点数* Device_ConvArrivalTime);
                                 
                                  #endif / /头!

所以,当我想称之为墨西哥人在matlab文件,我使用

                                  清晰的TUI_CUDA
                                 
                                  gpuDevice ();
                                 
                                  (形象、RF) = TUI_CUDA (TransferDataToDevice,设置,MediumX、MediumZ Lens_Index_Array,…
                                 
                                  data_Rearranged_saved、Dir_Rearranged_saved LensToElementsArrivalTime、MediumX MediumZ);

你提供的链接是过期的,但我发现这一个:

//www.tianjin-qmedu.com/help/parallel-computing/mxinitgpu.html

有什么区别gpuDevice ();和

”

# include“gpu / mxGPUArray.h”

int mxInitGPU ()

”

在CUDA项目吗?顺便说一句,我在哪里可以找到这个 gpu / mxGPUArray。h ?

我想既然我已经提供了完整的代码,事情更清晰。我希望没有使事情更复杂的比我的第一篇文章。请建议。

Moein

Moein Mozaffarzadeh 2021年3月22日

哦,这是我的MexFunctions。铜:

                                  #包括“MexFunctions.cuh”
                                 
                                  无效fillSetup(设置SystemSetup, const mxArray * pMxArray, const int ShowInput) {
                                 
                                  双* pMxData;
                                 
                                  pMxData = mxGetPr (pMxArray);
                                 
                                  / /读取设置的经验。
                                 
                                  intCounterSystem = 0;
                                 
                                  SystemSetup。ProcessingType = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。NumberOfTransmitter = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。NumberOfReceiver = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。NumberOfSamples = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。Fs =((浮动)pMxData [CounterSystem]) * 1000;CounterSystem + +;
                                 
                                  SystemSetup。F0= (float)pMxData[CounterSystem]; CounterSystem++;
                                 
                                  SystemSetup。WaterSoundSpeed = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。CBone = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。LenseThicknessIndex = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。LenseSoundSpeed = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。新西兰= (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。Nx = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。TransmissionOffset =((浮动)pMxData [CounterSystem]) / 1000;CounterSystem + +;
                                 
                                  SystemSetup。DetectionAngle = (int) pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。ArrayElementWidth =(浮动)pMxData [CounterSystem];CounterSystem + +;
                                 
                                  SystemSetup。Perio_ZStart = (int) pMxData [CounterSystem];CounterSystem + +;/ /像素数量,我们需要开始寻找近端
                                 
                                  SystemSetup。Perio_ZEnd = (int) pMxData [CounterSystem];CounterSystem + +;/ /像素数量,我们需要完成寻找近端
                                 
                                  SystemSetup。Distal_ZStart_Index = (int) pMxData [CounterSystem];CounterSystem + +;/ /像素数量,我们需要开始寻找远端
                                 
                                  SystemSetup。Distal_ZEnd_Index = (int) pMxData [CounterSystem];CounterSystem + +;/ /像素数量,我们需要完成远端
                                 
                                  SystemSetup。FittingDegree = (int) pMxData [CounterSystem];CounterSystem + +;/ /像素在w数量
                                 
                                  SystemSetup。λ= SystemSetup。WaterSoundSpeed / SystemSetup。F0 / 1000;/ /λ在(毫米)
                                 
                                  / /显示的属性设置
                                 
                                  如果(ShowInput = = 1) {
                                 
                                  如果(SystemSetup。ProcessingType = = 1) {
                                 
                                  cout< <“传统的图像重建。”< < endl;
                                 
                                  }
                                 
                                  其他的{
                                 
                                  cout < <“先进的图像重建。”< < endl;
                                 
                                  }
                                 
                                  cout< < < < SystemSetup“发射机的数量:”。NumberOfTransmitter < < endl;
                                 
                                  cout< < < < SystemSetup“接收机数量:”。NumberOfReceiver < < endl;
                                 
                                  cout< < "每个每个通道的样本数量:“< < SystemSetup。NumberOfSamples < < endl;
                                 
                                  cout< <“中央频率(MHz):“< < SystemSetup。F0 < < endl;
                                 
                                  cout< < "采样频率(MHz): " < < SystemSetup。Fs / 1000 < < endl;
                                 
                                  cout< < "水的速度(米/秒)的声音:“< < SystemSetup。WaterSoundSpeed < < endl;
                                 
                                  cout< < "骨(m / s)的速度的声音:“< < SystemSetup。CBone < < endl;
                                 
                                  cout< < "厚度的透镜(像素):“< < SystemSetup。LenseThicknessIndex < < endl;
                                 
                                  cout< <”声透镜的速度(米/秒):“< < SystemSetup。LenseSoundSpeed < < endl;
                                 
                                  cout< < " Z方向的像素数量:“< < SystemSetup。新西兰< < endl;
                                 
                                  cout< < " X方向上的像素数量:“< < SystemSetup。Nx < < endl;
                                 
                                  cout< < "传播抵消(女士):“< < SystemSetup。TransmissionOffset < < endl;
                                 
                                  cout< < "检测角(度):“< < SystemSetup。DetectionAngle < < endl;
                                 
                                  cout< < "元素的宽度(毫米):“< < SystemSetup。ArrayElementWidth < < endl;
                                 
                                  cout< < "近端表面分割(像素):“< < SystemSetup。Perio_ZStart < < < < SystemSetup“-”。Perio_ZEnd < < endl;
                                 
                                  cout< < "远端表面分割(像素):“< < SystemSetup。Distal_ZStart_Index < < < < SystemSetup“-”。Distal_ZEnd_Index < < endl;
                                 
                                  cout< < < < SystemSetup“拟合程度:”。FittingDegree < < endl;
                                 
                                  cout< < " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -”< < endl;
                                 
                                  }
                                 
                                  }

我使用visual studio的墨西哥人文件。也许我应该直接使用Matlab ? ! ! ! !

我试了在matlab代码:

mexcuda COMPFLAGS = ' COMPFLAGS美元——use-local-env -maxrregcount = 0——64年机器编译-cudart静态-Xptxas -dlcm = cg -use_fast_math -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler”MyFunction.cu

但是,内核(我的输出。e,在matlab图像)是不正确的。您还可以看到,我也提取“ReconstructedImage_GPU_Check”,看看问题是与输出网关,但同样,提取ReconstructedImage_GPU_Check和“图像”的结果是相同的。我真的很困惑!

理查德。 2021年3月22日

嗨Moein,

谢谢你指出失效链接。我认为它拿起拖点,希望我有固定它。

mxInitGPU实际上是调用gpuDevice一样。的主要优势是,你不必记得gpuDevice打个电话,但它也是最佳实践使用,以防将来的版本MATLAB mxGPU前需要添加额外的初始化函数。

gpu / mxGPUarray。h是在< matlabroot > /工具箱/ / gpu /走读生/包括平行。mexcuda命令添加这包括路径给你但是如果你使用和构建系统然后你需要添加这条道路。

建筑与VS应该不是一个问题,因为你是编译成功,而不是导致崩溃不太可能是一个问题。

我编译和运行你的原始例子只有一个小的在内核分配一个固定的输出值。这对我来说跑成功并返回一个数组,其中包含固定值。

我已经设法编译你的完整的示例,但我不知道如何运行它,就像你说的它是更复杂的。对我来说有一种简单的方法来生成数据,运行这个,与预期输出和比较?另外,有一个简单的内核之外的原始例子,产生一个输出,你认为是不正确的?

理查德。 2021年3月23日

嗨Moein,谢谢你的示例数据。

我有几件事,您可能希望检查。首先,我做管理运行代码,我有一些图片。这是我得到的图像使用什么您发布的代码:

这是你看到的相同的错误输出?

我认为您正在使用的重塑代码将输出转变成一个矩阵是不正确的。记住,数组在MATLAB列为主vs行存储在c .当我使用这段代码:

                                  DAS_Ccode =重塑(图像,(64、256));
                                 
                                  abs1 = abs (DAS_Ccode);
                                 
                                  图,显示亮度图像(abs1);

我得到这张图片:

看起来更合理的对我,但是仍然不一样你发布的图像。是上面的一个正确的图像,或不呢?

有一些可能的代码问题我注意:

(1)你有一个不匹配类型的输入数据。MATLAB在uint32数组—无符号数据。在C代码,通过符号整数指针你阅读它。给定值的示例数据你似乎走了,但应该修好它。

(2)运行代码时我经历了持续的崩溃在某些版本的MATLAB,最终我追踪到一个内存访问错误被返回的cudaMemcpyAsync调用设备传输数据。内存异步API需要“钉”,源在这种情况下是mxArray数据不是通过cuda分配。

理查德。

Moein Mozaffarzadeh 2021年3月23日

你好理查德,

谢谢你的帮助。我认为你是对的” DAS_Ccode =重塑(图像,(64、256)); ”。

是的,你发布的第一张照片,是我在这里也会与我的墨西哥人的功能,这是不明智的,是不正确的。第二个图片你发表对我来说似乎更为明智,但仍然不正确的图像。正确的图片是我贴在我之前的评论。

关于你发现的问题:

1——是的,你是对的。这确实是一个问题与我的代码。我现在修理它,但是我仍然不能得到正确的图像。我猜这是因为下一个问题

2 -我在cuda项目在visual studio使用:

                                  int * RfData;/ /射频数据;一个固定内存是专用的
                                 
                                  int * Device_RfData;/ /设备指针射频数据。
                                 
                                  int* (SystemSetup ArrayByteSize_RfData = sizeof (int)。NumberOfTransmitter * SystemSetup。NumberOfReceiver * SystemSetup。NumberOfSamples);
                                 
                                  cudaMallocHost ((int * *) &RfData ArrayByteSize_RfData);/ /固定内存
                                 
                                  gpuErrchk (cudaMalloc ((int * *) &Device_RfData ArrayByteSize_RfData));

和使用“cudaMemcpyAsync (&Device_RfData NStart_Transmit, &RfData NStart_Transmit, BYTES_PER_STREAM, cudaMemcpyHostToDevice,流[将]);”“循环”内包含我的内核转移“RfData”内核以异步方式寻找的时间。

在我的墨西哥人代码中,可以看到,首先,我定义了一个指针,然后使用“int * RfData;”

RfData = (int *) mxGetPr (prhs[计数器]);计数器+ +;

能够从Matlab复制“data_Rearranged_saved”“RfData”。我推测自“RfData”不是一个固定内存墨西哥人内核代码是为什么我不工作在墨西哥人(但在visual studio项目工作以来我使用一个固定内存也正如你提到了异步API需要内存)“钉住”。

现在我的问题是:如何定义一个固定内存,并把“data_Rearranged_saved”从Matlab ?请让我知道。

Moein。

Moein Mozaffarzadeh 2021年3月23日

理查德,

添加更多的我以前的评论,我也检查我的代码的版本不需要固定内存。它的作用很好,在那里我CUDA在visual studio项目,但又不与墨西哥人文件。

我猜:编译我的墨西哥人在visual studio代码。它有什么不同在Matlab编译呢?如果是这样的话,你能提供我的代码需要使用Matlab ?

我已经用这个:

                                  mexcudaCOMPFLAGS = ' COMPFLAGS美元——use-local-env -maxrregcount = 0——机64 -compatibleArrayDims——编译-cudart静态-Xptxas -dlcm = cg -use_fast_math -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler”TUI_Main。铜MexFunctions.cuCudaKernels.cu
                                 

但同样的错误的图像生成。

Moein。

登录置评。

在回答这个问题。

类别

图像处理和计算机视觉图像处理工具箱几何变换和图像配准通用的几何转换

找到更多的在通用的几何转换在帮助中心和文件交换

社区寻宝

找到宝藏在MATLAB中央,发现社区如何帮助你!

开始狩猎!

cuda内核不工作当一个数据传输到GPU。任何问题与我的网关墨西哥人代码?

2的评论
显示1年长的评论藏1年长的评论

接受的答案

9日评论
显示8年长的评论隐藏8年长的评论

答案(1)

7评论
显示6年长的评论隐藏6年长的评论

另请参阅

类别

标签

社区寻宝

cuda内核不工作当一个数据传输到GPU。任何问题与我的网关墨西哥人代码?

2的评论 显示1年长的评论藏1年长的评论

接受的答案

9日评论 显示8年长的评论隐藏8年长的评论

答案(1)

7评论 显示6年长的评论隐藏6年长的评论

另请参阅

类别

标签

社区寻宝

2的评论
显示1年长的评论藏1年长的评论

9日评论
显示8年长的评论隐藏8年长的评论

7评论
显示6年长的评论隐藏6年长的评论