Loren在Matlab的艺术上

Turn ideas into MATLAB

A Prime Case Study for Making MATLAB Code Go Faster

今天我想欢迎一位客人博客, Mike Croucher. ,最近在支持学术计算研究的长期职业生涯后,最近加入了MathWorks作为客户成功工程师。万博1manbetx

Introduction

我的大多数职业都致力于与科学家和工程师合作,以改善他们的计算工作流程。有许多事情可以做到,包括鼓励使用版本控制或使用 unit testing 但是,到目前为止,最常见的要求 “你能做到这件事我做得更快吗?
More than twenty years of being asked this simple question has led me to explore many areas of technical computing such as parallel computing, numerical analysis, HPC和云计算 GPU计算 那mathematical optimization, automatic differentiation 和多种语言只有几个语言。
在其核心,制作程序更快的艺术将迭代两个简单的步骤。
  1. Find out exactly what is slow.
  2. Try something different to see if it's any faster while still giving the right answer
As you might imagine, there's a world of complexity hidden in the above. 'Trying something different' could mean using language features to optimise the algorithm I've written, trying out parallelisation or even switching to a completely different algorithm althogether. I usually add an additional constraint to point 2 which is 'expend as little effort as possible' because I am fundamentally lazy.
Matlab的探查器,哪个 最近收到了界面大修 to provide flame graph views of code execution time, helps take care of the first point. Exploring the profiler would provide a rich seam of bloggable content as would exploring all of the algorithmic performance enhancements we make from release to release.
Thanks to MATLAB执行引擎 (以前讨论过这个博客 这里 and 这里 ) 和 各种其他优化 ,现代MATLAB比以往任何时候都更快,如下图所示,我们不断改进该领域。
当然,我们有时我们需要优化我们的速度代码,并且我想分享最近在内部讨论的一个例子。

Don't Fear the Looper

灵感来自这篇文章 ,同事在Matlab中写下以下内容来计算素数。
typemycountprimes.m.
%计数有多少次近摄的次次次次初步函数res = mycountprimes(max_n)res = 0;对于i = 2:max_n,如果myisprime(i)res = res + 1;结束端末端%函数检查n是prime函数结果= myisprime(n)如果n <= 1结果= false;返回端Q =楼层(SQRT(N));对于i = 2:q如果mod(n,i)== 0结果= false;返回结束最终结果= true;结尾
numprimes = mycountprimes(1000000)
numprimes = 78498.
This code is very heavy on loops - exactly the sort of thing I was trained to avoid in my early MATLAB days but thanks to the Execution Engine, it's not too bad. (See 最近的帖子更多的是循环速度 在Matlab)
f = @()mycountprimes(1000000);
OriginalSpeed = timeit(f)
OriginalSpeed = 3.7878.
即便如此,我们也可以尝试加快这段特定的代码。我们可以利用循环的每一次迭代的事实 mycountpimes. 是独立的,并使用并行 parloop 。This is easy to do but might be heavy on computational resources. Once you realise you can leverage par ,试图使您的问题抛出尽可能多的CPU核心让您的问题变得令人诱人。这几天我最喜欢做这件事的方式是发射一个 Cloud Center cluster 并升起核心计数,就像有道理一样高。

Changingmodtofloor

探查器表明我在线中花了很多时间
如果mod(n,i)== 0
所以一种方法是问 'Can we rewrite that line to be any faster?' 。这并不总是很明显哪种替代方案可能工作,所以我常常简单地弄清楚不同的方式来做一些事情并试一试。写表达的另一种方法 mod(n,i)== 0 floor(n/i) == n/i 事实证明要快得多!我这样做了 floorMyCountPrimes 功能,检查它给出了正确的结果,并证明它确实更快。
numPrimes = floorMyCountPrimes(1000000)
numprimes = 78498.
f = @()floormycountprimes(1000000);
侧板= Timeit(F)
侧坡= 0.5296.

Trying Integer Types for an Integer Algorithm

默认情况下,MATLAB中的所有数字都是浮点类型 双倍的 这对于大量数值计算非常重要,但在这里我们正在处理纯粹的整数算法。它有时可以支付股息来使这个明确在代码中,这是其中一个时代!事实证明 mod 函数必须在处理浮点数时进行更多的工作,而不是严格为此特定示例。作为程序员,我知道我的算法基于完全整数,所以我可以切换到使用 uint32 对于一些变量。也就是说,我改变了主循环 myisprime. to
q =楼层(SQRT(N));
n = UINT32(n);
为了i = uint32(2):q
如果mod(n,i)== 0
result = false;
return
结尾
结尾
This allows MATLAB to use a specialised and faster version of the mod 功能函数现在知道我们正在处理整数而不是双打。
我这样做了 uint32MyIsPrime function defined at the end of this post and called it via uint32 mycountpimes。 首先让我们检查结果与原始功能一致,然后时间
numprimes = uint32mycountprimes(1000000)
numprimes = 78498.
f = @()uint32mycountprimes(1000000);
uint32Speed = timeit(f)
uint32speed = 0.5660.
使用这种方法或基于落地的方法,我们实现了相对较少的工作和技术的显着性能提升 par parallelisation are still on the table if we need even more speed.

Matlab作为算法的存储库

Finally, we could recognise that the fundamental algorithm we are using to count primes is not optimal and use a different one instead. Taking this to the extreme, we could remember that MATLAB is a repository of numerical algorithms as well as a programming language and so find the one that directly solves the problem we want to solve rather than rolling our own. A quick search of the documentation reveals the 素瓜 返回次数阵列的函数。我们生成此数组并按如下方式计算元素。
numPrimes = numel(primes(1000000))
numprimes = 78498.
f = @()numel(primes(1000000));
builtinSpeed = timeit(f)
BULLESEDPEED = 0.0029
More than fast enough for my purposes!

系统信息

如果您在自己的机器上运行此脚本,则您几乎肯定会对我提供不同的结果。使用 版本 功能和优秀 CPU Info by Ben Tordoff, here are the details of my machine and MATLAB version.
版本info = ['matlab发布r'版本('-release'
版本info ='Matlab发布R2021A'
cpuinfo()
ANS =.结构与字段:
Name: 'Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz' Clock: '2112 MHz' Cache: '1024 KB' NumProcessors: 4 OSType: 'Windows' OSVersion: 'Microsoft Windows 10 Pro'

你需要速度?

Matlab比以往任何时候都快,将来会继续变得更快。这将与改进的硬件相结合,将帮助您回答比以往更大更复杂的问题。但是,我们总会有更多的地方,我们需要更多。您认为Matlab加速技术有用,您希望在Matlab更快的是什么?让我们知道 这里
Helper functions
function结果= floormyisprime(n)
%函数检查n是素数
如果n <= 1
result = false;
return
结尾
q =楼层(SQRT(N));
为了i = 2:q
如果floor(n/i) == n/i
result = false;
return
结尾
结尾
结果=真;
结尾
functionres = floormycountprimes(max_n)
%计算有多少个质数和萤火虫ing max_n
res = 0;
为了i = 2:max_n
如果floormyisprime(i)
Res = Res + 1;
结尾
结尾
结尾
function结果= uint32myisprime(n)
%函数检查n是素数
如果n <= 1
result = false;
return
结尾
q =楼层(SQRT(N));
n = UINT32(n);
为了i = uint32(2):q
如果mod(n,i)== 0
result = false;
return
结尾
结尾
结果=真;
结尾
functionRES = UINT32MYCOUNTPRIMES(MAX_N)
%计算有多少个质数和萤火虫ing max_n
res = 0;
为了i = 2:max_n
如果Uint32myisprime(i)
Res = Res + 1;
结尾
结尾
结尾
Copyright 2021 The MathWorks, Inc.
|

Comments

To leave a comment, please click这里to sign in to your MathWorks Account or create a new one.