what is the role of streaming multiprocessor(multiprocessorcount in gpuDevice()) on GPU coder?

6 views (last 30 days)
I use the GPU coder app with my graphic card, GeForce GTX 1070Ti and I found an issue from the result.
I simulated my code with 1000, 1200, 1400, 1600, 1800 and 1900 nodes.
The elapsed time goes shorter when the number of nodes are small until the number of noes is 1800.
However, when I simulated with 1900 nodes, the elapsed time is much faster than the simulation with 1000 nodes.
I suppose that this becomes due to the MultiprocessorCount of my graphic card and the MultiprocessorCount of my graphic card is 19.
我想知道多处理的角色是什么,我想知道为什么使用1900个节点经过的时间比1000个节点的经过的时间快的确切原因。
Also, here is my code.
功能[d,n,b,r] = fcn_prm_dh_complete(node_prm,posmap,map_obs,obs_mat,cr)%#codegen
n = length(node_PRM);
D = coder.nullcopy(zeros(n));
N = coder.nullcopy(zeros(n));
B = coder.nullcopy(ones(n));
coder.gpu.kernel;
fori0=1:n-1
coder.gpu.kernel;
forj0=i0+1:n
b_val=1;
B(i0,j0)=b_val;
end
end
r = coder.nullcopy(zeros(n));
len =(1:n)';
pos_mat=PosMap(node_PRM(len),:);
coder.gpu.kernel;
fori1=1:n-1
coder.gpu.kernel;
forj1=i1+1:n
dist=sqrt((pos_mat(i1,1)-pos_mat(j1,1))^2+(pos_mat(i1,2)-pos_mat(j1,2))^2+5*(pos_mat(i1,3)-pos_mat(j1,3))^2);
d(i1,j1)= dist;
C = CollisionCheck_sp8(pos_mat(i1,:),pos_mat(j1,:));
N(i1,j1)=C;
end
end
coder.gpu.kernel;
fori2=1:n-1
coder.gpu.kernel;
forj2=i2+1:n
coder.gpu.kernel;
fork=1:N(i2,j2)
ifD(i2,j2)<=CR
node1=pos_mat(i2,:);
node2=pos_mat(j2,:);
dx =(node2(1)-node1(1))/n(i2,j2);
x=node1(1)+dx*k;
dy=(node2(2)-node1(2))/N(i2,j2);
y = node1(2)+dy*k;
dz=(node2(3)-node1(3))/N(i2,j2);
z=node1(3)+dz*k;
node_x=round(x/0.3);
node_y=round(y/0.3);
node_z=round(z/0.3+1);
Idxpt=node_x+(node_y-1)*size(Map_Obs,2)+(node_z-1)*size(Map_Obs,1)*size(Map_Obs,2);
ifObs_mat(Idxpt)==0
b=0;
b(i2,j2)= b;
end
else
b=0;
b(i2,j2)= b;
end
end
end
end
coder.gpu.kernel;
fori3=1:n-1
coder.gpu.kernel;
forJ3 = i3+1:n
d_val = d(i3,j3);
b_val = b(i3,j3);
R(i3,j3)=d_val*b_val;
R(j3,i3)=d_val*b_val;
end
end

Answers (1)

Aditya帕蒂尔
Aditya帕蒂尔 2021年7月12日
Streaming Multiprocessors ( SMs) is a concept from Nvidia GPUs ,每个地方 SM 处理段中的线程 El。SMS的数量更多,更多的计算能力 GPU .
As to the reason for performance increase, there can be many 不同的原因 , which can be difficult to predict based on the MATLAB code itself and requires looking at the generate code and benchmarking it. However, one reason can be that this specific number of nodes avoids some memory contentions (where there are reads/writes to same memory bank).

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

开始狩猎!