Kernel Creation from MATLAB Code

MATLAB code structures and patterns that create CUDA^®GPU kernels

GPU Coder™ generates and executes optimized CUDA kernels for specific algorithm structures and patterns in your MATLAB^®code. The generated code calls optimized NVIDIA^®CUDA libraries, including cuFFT, cuSolver, cuBLAS, cuDNN, and TensorRT. The generated code can be integrated into your project as source code, static libraries, or dynamic libraries, and can be compiled for desktops, servers, and GPUs embedded on NVIDIA Jetson, DRIVE, and other platforms. GPU Coder lets you incorporate handwritten CUDA code into your algorithms and into the generated code.

Apps

expand all

GPU Coder

GPU Coder	Generate GPU code fromMATLABcode
GPU Environment Check	Verify and set up GPU code generation environment

Functions

expand all

Code Generation

`codegen`	Generate C/C++ code fromMATLABcode
`gpucoder`	OpenGPU Coderapp
`coder.checkGpuInstall`	Verify GPU code generation environment
`coder.gpuConfig`	Configuration parameters forCUDAcode generation fromMATLABcode by usingGPU Coder

GPU Kernel Pragmas

`coder.gpu.kernel`	Pragma that maps`for`-loops to GPU kernels
`coder.gpu.kernelfun`	Pragma that maps function to GPU kernels
`coder.gpu.nokernel`	Pragma to disable kernel creation for loops
`coder.ceval`	Call external C/C++ function
`coder.gpu.iterations`	Pragma that provides information to the code generator for making parallelization decisions on variable bound loops

GPU Memory Pragmas

`coder.gpu.constantMemory`	Pragma that maps a variable to the constant memory on GPU
`coder.gpu.persistentMemory`	Pragma to allocate a variable as persistent memory on the GPU

GPU Atomic Operations

`gpucoder.atomicAdd`	Atomically add a specified value to a variable in global or shared memory
`gpucoder.atomicAnd`	Atomically perform bit-wise AND between a specified value and a variable in global or shared memory
`gpucoder.atomicCAS`	Atomically compare and swap the value of a variable in global or shared memory
`gpucoder.atomicDec`	Atomically decrement a variable in global or shared memory within a specified upper bound
`gpucoder.atomicExch`	Atomically exchange a variable in global or shared memory with the specified value
`gpucoder.atomicInc`	Atomically increment a variable in global or shared memory within a specified upper bound
`gpucoder.atomicMax`	Atomically find the maximum between a specified value and a variable in global or shared memory
`gpucoder.atomicMin`	Atomically find the minimum between a specified value and a variable in global or shared memory
`gpucoder.atomicOr`	Atomically perform bit-wise OR between a specified value and a variable in global or shared memory
`gpucoder.atomicSub`	Atomically subtract a specified value from a variable in global or shared memory
`gpucoder.atomicXor`	Atomically perform bit-wise XOR between a specified value and a variable in global or shared memory

Programming for Code Generation

`gpucoder.stencilKernel`	CreateCUDAcode for stencil functions
`gpucoder.matrixMatrixKernel`	Optimized GPU implementation of functions containing matrix-matrix operations
`gpucoder.batchedMatrixMultiply`	Optimized GPU implementation of batched matrix multiply operation
`gpucoder.stridedMatrixMultiply`	Optimized GPU implementation of strided and batched matrix multiply operation
`gpucoder.batchedMatrixMultiplyAdd`	Optimized GPU implementation of batched matrix multiply with add operation
`gpucoder.stridedMatrixMultiplyAdd`	向的优化GPU实现,成批的matrix multiply with add operation
`gpucoder.sort`	Optimized GPU implementation of theMATLABsort function
`gpucoder.transpose`	Optimized GPU implementation of theMATLABtranspose function
`gpucoder.reduce`	Optimized GPU implementation for reduction operations

Objects

expand all

Code configuration

`coder.gpuConfig`	Configuration parameters forCUDAcode generation fromMATLABcode by usingGPU Coder
`coder.CodeConfig`	Configuration parameters for C/C++ code generation fromMATLABcode
`coder.EmbeddedCodeConfig`	Configuration parameters for C/C++ code generation fromMATLABcode withEmbedded Coder
`coder.gpuEnvConfig`	Create configuration object containing the parameters passed to`coder.checkGpuInstall`for performing GPU code generation environment checks

Topics

Kernels from Element-Wise Loops
Create kernels from MATLAB functions containing scalarized, element-wise math operations.
Kernels from Scatter-Gather Type Operations
Create kernels from MATLAB functions containing reduction operations.
Kernels from Library Calls
Target GPU optimized math libraries such as cuBLAS, cuSOLVER, cuFFT, and Thrust.
- cuBLAS Example
- cuSOLVER Example
- FFT Example
- Thrust Example
Support for GPU Arrays
Generate CUDA code that uses GPU arrays.
Legacy Code Integration
Integrate custom GPU code with MATLAB code intended for code generation.
Design Patterns
Create kernels for MATLAB functions containing computational design patterns.
GPU Memory Allocation and Minimization
Memory allocation options and optimizations for GPU Coder.
What is Half Precision?
Introduction to the half-precision data type in MATLAB and Simulink^®.
Half Precision Code Generation Support
C/C++ and GPU code generation support for functions that support half-precision inputs.

Featured Examples

$Simulate Diffraction Patterns Using CUDA FFT Libraries$

Simulate Diffraction Patterns Using CUDA FFT Libraries

Use GPU Coder™ to leverage the CUDA® Fast Fourier Transform library (cuFFT) to compute two-dimensional FFT on a NVIDIA® GPU. The two-dimensional Fourier transform is used in optics to calculate far-field diffraction patterns. When a monochromatic light source passes through a small aperture, such as in Young's double-slit experiment, you can observe these diffraction patterns. This example also shows you how to use GPU pointers as inputs to an entry-point function when generating CUDA MEX, source code, static libraries, dynamic libraries, and executables. By using this functionality, the performance of the generated code is improved by minimizing the number of cudaMemcpy calls in the generated code.

Open Script

QR分解在NVIDIA GPU使用cuSOLVER Libraries

Create a standalone CUDA® executable that leverages the CUDA Solver library (cuSOLVER). The example uses a curve fitting application that mimics automatic lane tracking on a road to illustrate:

Open Live Script

Stencil Processing on GPU

Generate CUDA® kernels for stencil type operations by implementing "Game of Life" by John H. Conway.

Open Script

Benchmark Solving a Linear System by Using GPU Coder

Benchmark solving a linear system by generating CUDA® code. Use matrix left division, also known as mldivide or the backslash operator (\), to solve the system of linear equations A*x = b for x (that is, compute x = A\b).

Open Live Script

Fog Rectification

The use of image processing functions for GPU code generation. The example takes a foggy image as input and produces a defogged image. This example is a typical implementation of fog rectification algorithm. The example uses conv2, im2gray, and imhist functions.

Open Live Script

Stereo Disparity

Generate a CUDA® MEX function from a MATLAB® function that computes the stereo disparity of two images.

Open Live Script