coder.gpu.kernel
Pragma that mapsfor
-loops to GPU kernels
Description
coder.gpu.kernel()
is a loop-level pragma that you must place immediately before a for loop. It generates a kernel with the dimensions computed from the loop parameters.
Note
Thecoder.gpu.kernel
pragma overrides all parallel loop analysis checks that the software performs. Usecoder.gpu.kernelfun
first before using the more advanced functionality of thecoder.gpu.kernel
pragma.
coder.gpu.kernel(B,T)
is a loop-level pragma that you must place immediately before a for loop. It generates a kernel with the dimensions specified byB
andT
.B[Bx,By,1]
is an array that defines the number of blocks in the grid along dimensionsx
andy
(z
not used).T[Tx,Ty,Tz]
is an array that defines the number of threads in the block along dimensionsx
,y
, andz
.
A value of -1 forB
andT
indicates that GPU Coder™ must infer the grid and block dimensions automatically. Thecoder.gpu.kernel
pragma generates errors for invalid grid and block dimensions.
coder.gpu.kernel(B,T,M,name)
expects the sameB
andT
arguments. You can specify optional argumentsM
andname
.M
is a positive integer specifying the minimum number of blocks per streaming multiprocessor. Sometimes, increasingM
can reduce the register usage within a kernel and improve kernel occupancy. A value of -1 forM
indicates that GPU Coder must use the default value of 1.name
is a character array that allows you to customize the name of the generated kernel.
Specifying the kernel pragma overrides all parallel loop analysis checks. This override allows loops to be parallelized in situations where parallel loop analysis cannot prove that all iterations are independent of each other. First, ensure that the loop is safe to parallelize.
This function is a code generation function. It has no effect in MATLAB®.
例子
Version History
See Also
Apps
Functions
codegen
|coder.gpu.kernelfun
|gpucoder.stencilKernel
|coder.gpu.constantMemory
|gpucoder.reduce
|gpucoder.sort
|coder.gpu.nokernel