Performance
Some of the most common reasons why GPU Coder™ generated code is not performing as expected are:
CUDA®内核不是创建。
Host to device and device to host memory transfers (
cudaMemcpy
) are throttling performance.没有足够的并行性或设备问题。
These topics elaborate on the common causes for these symptoms and describe how to utilize the built-in screener to detect these issues. You can find information on how to work around for these issues and generate more efficient CUDA code.
Apps
Functions
Objects
Topics
- Workflow
GPU Coder troubleshooting workflow.
- Code Generation Reports
Create and view reports generated during code generation.
- Trace Between Generated CUDA Code and MATLAB Source Code
Highlight sections of MATLAB code that runs on the GPU.
- 生成从MATLAB代码生成的代码的GPU代码指标报告
Create and explore GPU static code metrics report.
- Debug CUDA MEX Functions
Suggestions for debugging CUDA MEX function.
- Kernel Analysis
Recommendations for generating efficient CUDA kernels.
- Memory Bottleneck Analysis
Reduce memory bottleneck issues when using GPU Coder.
- Analyze Execution Profiles of the Generated Code
Fine-grain profiling for the MATLAB algorithm and its generated CUDA code through SIL.
- Analysis with NVIDIA Profiler
Improve performance by using the information obtained from NVIDIA Profiler (nvvp).
- GPU Coder Limitations
See current limitations of GPU Coder.
- Register Count nvlink Error
Troubleshoot compilation failures due to a register count
nvlink
错误。