what is the role of streaming multiprocessor(multiprocessorcount in gpuDevice()) on GPU coder?

2 views (last 30 days)

Show older comments

lim daehee on 13 Jan 2020

Answered: Aditya帕蒂尔 on 12 Jul 2021

I use the GPU coder app with my graphic card, GeForce GTX 1070Ti and I found an issue from the result.

I simulated my code with 1000, 1200, 1400, 1600, 1800 and 1900 nodes.

The elapsed time goes shorter when the number of nodes are small until the number of noes is 1800.

However, when I simulated with 1900 nodes, the elapsed time is much faster than the simulation with 1000 nodes.

I suppose that this becomes due to the MultiprocessorCount of my graphic card and the MultiprocessorCount of my graphic card is 19.

I wonder what is the role of MultiprocessorCount and I want to know the exact reason why the elapsed time with 1900 nodes is faster than the elapsed time of 1000 nodes.

Also, here is my code.

                         function(D N B R) = fcn_PRM_DH_complete (node_PRM PosMap,Map_Obs,Obs_mat,CR)%#codegen
                        
                         n = length(node_PRM);
                        
                         D = coder.nullcopy(zeros(n));
                        
                         N = coder.nullcopy(zeros(n));
                        
                         B = coder.nullcopy(ones(n));
                        
                         coder.gpu.kernel;
                        
                         fori0=1:n-1
                        
                         coder.gpu.kernel;
                        
                         forj0=i0+1:n
                        
                         b_val=1;
                        
                         B(i0,j0)=b_val;
                        
                         end
                        
                         end
                        
                         R = coder.nullcopy(zeros(n));
                        
                         len=(1:n)';
                        
                         pos_mat=PosMap(node_PRM(len),:);
                        
                         coder.gpu.kernel;
                        
                         fori1=1:n-1
                        
                         coder.gpu.kernel;
                        
                         forj1=i1+1:n
                        
                         dist=sqrt((pos_mat(i1,1)-pos_mat(j1,1))^2+(pos_mat(i1,2)-pos_mat(j1,2))^2+5*(pos_mat(i1,3)-pos_mat(j1,3))^2);
                        
                         D(i1,j1) = dist;
                        
                         C = collisionCheck_SP8(pos_mat(i1,:), pos_mat(j1,:));
                        
                         N(i1,j1)=C;
                        
                         end
                        
                         end
                        
                         coder.gpu.kernel;
                        
                         fori2=1:n-1
                        
                         coder.gpu.kernel;
                        
                         forj2=i2+1:n
                        
                         coder.gpu.kernel;
                        
                         fork=1:N(i2,j2)
                        
                         ifD(i2,j2)<=CR
                        
                         node1=pos_mat(i2,:);
                        
                         node2=pos_mat(j2,:);
                        
                         dx=(node2(1)-node1(1))/N(i2,j2);
                        
                         x=node1(1)+dx*k;
                        
                         dy=(node2(2)-node1(2))/N(i2,j2);
                        
                         y=node1(2)+dy*k;
                        
                         dz=(node2(3)-node1(3))/N(i2,j2);
                        
                         z=node1(3)+dz*k;
                        
                         node_x=round(x/0.3);
                        
                         node_y=round(y/0.3);
                        
                         node_z=round(z/0.3+1);
                        
                         Idxpt=node_x+(node_y-1)*size(Map_Obs,2)+(node_z-1)*size(Map_Obs,1)*size(Map_Obs,2);
                        
                         ifObs_mat(Idxpt)==0
                        
                         b=0;
                        
                         B(i2,j2)=b;
                        
                         end
                        
                         else
                        
                         b=0;
                        
                         B(i2,j2)=b;
                        
                         end
                        
                         end
                        
                         end
                        
                         end
                        
                         coder.gpu.kernel;
                        
                         fori3=1:n-1
                        
                         coder.gpu.kernel;
                        
                         forj3=i3+1:n
                        
                         d_val=D(i3,j3);
                        
                         b_val=B(i3,j3);
                        
                         R(i3,j3)=d_val*b_val;
                        
                         R(j3,i3)=d_val*b_val;
                        
                         end
                        
                         end

0 Comments
ShowHide-1 older comments

Answers (1)

Aditya帕蒂尔 on 12 Jul 2021

0
Link

Direct link to this answer

//www.tianjin-qmedu.com/matlabcentral/answers/500069-what-is-the-role-of-streaming-multiprocessor-multiprocessorcount-in-gpudevice-on-gpu-coder#answer_744798

Streaming Multiprocessors ( SMs) is a concept from Nvidia GPUs , where each SM processes threads in para ll el. More the number of SMs, more the computational power of the GPU .

As to the reason for performance increase, there can be many distinct reasons , which can be difficult to predict based on the MATLAB code itself and requires looking at the generate code and benchmarking it. However, one reason can be that this specific number of nodes avoids some memory contentions (where there are reads/writes to same memory bank).

0 Comments
ShowHide-1 older comments

s manbetx 845

Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

what is the role of streaming multiprocessor(multiprocessorcount in gpuDevice()) on GPU coder?

0 Comments
ShowHide-1 older comments

Answers (1)

0 Comments
ShowHide-1 older comments

See Also

Categories

Tags

s manbetx 845

Release

Community Treasure Hunt

what is the role of streaming multiprocessor(multiprocessorcount in gpuDevice()) on GPU coder?

0 Comments ShowHide-1 older comments

Answers (1)

0 Comments ShowHide-1 older comments

See Also

Categories

Tags

s manbetx 845

Release

Community Treasure Hunt

0 Comments
ShowHide-1 older comments

0 Comments
ShowHide-1 older comments