Performance

Troubleshoot code generation issues, improve code execution time, and reduce memory usage of generated code

Some of the most common reasons why GPU Coder™ generated code is not performing as expected are:

CUDA^®内核不是创建。
Host to device and device to host memory transfers (cudaMemcpy) are throttling performance.
没有足够的并行性或设备问题。

These topics elaborate on the common causes for these symptoms and describe how to utilize the built-in screener to detect these issues. You can find information on how to work around for these issues and generate more efficient CUDA code.

Apps

expand all

GPU Coder

GPU Coder	Generate GPU code fromMATLABcode
GPU Environment Check	Verify and set up GPU code generation environment

Functions

expand all

Code Generation

`codegen`	Generate C/C++ code fromMATLABcode
`gpucoder`	OpenGPU Coderapp
`gpucoder.profile`	Create an execution profile report for generated CUDA code

Programming for Code Generation

`coder.gpu.kernel`	Pragma that maps`for`-loops to GPU kernels
`coder.gpu.kernelfun`	Pragma that maps function to GPU kernels
`coder.gpu.nokernel`	Pragma to disable kernel creation for loops

Objects

expand all

Code configuration

`coder.gpuConfig`	Configuration parameters forCUDAcode generation fromMATLABcode by usingGPU Coder
`coder.CodeConfig`	Configuration parameters for C/C++ code generation fromMATLABcode
`coder.EmbeddedCodeConfig`	Configuration parameters for C/C++ code generation fromMATLABcode withEmbedded Coder
`coder.gpuEnvConfig`	Create configuration object containing the parameters passed to`Coder.CheckgPuinstall`for performing GPU code generation environment checks

Topics

Workflow
GPU Coder troubleshooting workflow.
Code Generation Reports
Create and view reports generated during code generation.
Trace Between Generated CUDA Code and MATLAB Source Code
Highlight sections of MATLAB code that runs on the GPU.
生成从MATLAB代码生成的代码的GPU代码指标报告
Create and explore GPU static code metrics report.
Debug CUDA MEX Functions
Suggestions for debugging CUDA MEX function.
Kernel Analysis
Recommendations for generating efficient CUDA kernels.
Memory Bottleneck Analysis
Reduce memory bottleneck issues when using GPU Coder.
Analyze Execution Profiles of the Generated Code
Fine-grain profiling for the MATLAB algorithm and its generated CUDA code through SIL.
Analysis with NVIDIA Profiler
Improve performance by using the information obtained from NVIDIA Profiler (nvvp).
GPU Coder Limitations
See current limitations of GPU Coder.
Register Count nvlink Error
Troubleshoot compilation failures due to a register countnvlink错误。

Featured Examples

GPU Execution Profiling of the Generated Code

使用GPUCODER.PROFILE函数生成生成的CUDA®代码的执行分析报告。雾化被用作证明这一概念的示例。

Open Live Script