Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud
Training deep networks is computationally intensive and can take many hours of computing time; however, neural networks are inherently parallel algorithms. You can take advantage of this parallelism by running in parallel using high-performance GPUs and computer clusters.
It is recommended to train using a GPU or multiple GPUs. Only use single CPU or multiple CPUs if you do not have a GPU. CPUs are normally much slower that GPUs for both training and inference. Running on a single GPU typically offers much better performance than running on multiple CPU cores.
If you do not have a suitable GPU, you can rent high-performance GPUs and clusters in the cloud. For more information on hot to access MATLAB®在云中进行深度学习,请参阅云中的深度学习。
使用GPU并行或平行的选项需要Computing Toolbox™. Using a GPU also requires a supported GPU device. For information on supported devices, seeGPU Support by Release(Parallel Computing Toolbox)。Using a remote cluster also requiresMATLAB Parallel Server™。
Tip
FortrainNetwork
workflows, GPU support is automatic. By default, thetrainNetwork
function uses a GPU if one is available. If you have access to a machine with multiple GPUs, specify the执行环境
training option as"multi-gpu"
。
To run custom training workflows, includingdlnetwork
工作流程,在GPU上,使用Minibatchqueue
自动将数据转换为gpuArray
objects.
您可以使用并行资源来扩展单个网络的深度学习。您也可以同时训练多个网络。以下各节显示了MATLAB中并行深入学习的可用选项:
Note
例如,如果在单个远程计算机上运行MATLAB,则通过SSH或远程桌面协议连接到的云计算机,请按照本地资源的步骤操作。有关连接到云资源的更多信息,请参见云中的深度学习。
Train Single Network in Parallel
使用本地资源并行训练单个网络
The following table shows you the available options for training and inference with single network on your local workstation.
资源 | trainNetwork 工作流 |
Custom Training Workflows | Required Products |
---|---|---|---|
Single CPU | Automatic if no GPU is available. Training using a single CPU is not recommended. |
Training using a single CPU is not recommended. |
|
Multiple CPU cores | Training using multiple CPU cores is not recommended if you have access to a GPU. |
Training using multiple CPU cores is not recommended if you have access to a GPU. |
|
单gpu | Automatic. By default, training and inference run on the GPU if one is available. Alternatively, specify the |
利用 |
|
Multiple GPUs | 指定 例如,请参阅使用自动多GPU支持的火车网络万博1manbetx。 |
Start a local parallel pool with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs 利用 例如,请参阅与自定义培训循环并行的火车网络。Set the |
利用Remote Cluster Resources to Train Single Network in Parallel
下表显示了远程群集上单个网络培训和推断的可用选项。
资源 | trainNetwork 工作流 |
Custom Training Workflows | Required Products |
---|---|---|---|
Multiple CPUs | Training using multiple CPU cores is not recommended if you have access to a GPU. |
Training using multiple CPU cores is not recommended if you have access to a GPU. |
|
Multiple GPUs | 指定desired cluster as your default cluster profile. For more information, seeManage Cluster Profiles and Automatic Pool Creation 指定 例如,请参阅Train Network in the Cloud Using Automatic Parallel Support。 |
Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs 利用 例如,请参阅与自定义培训循环并行的火车网络。Set the |
使用深网设计师和实验经理并行训练单个网络
您可以使用Deep Network Designer并行训练单个网络。您可以使用本地资源或远程集群训练。
要使用多个GPU在本地训练,请设置
ExectionEnvironment
选项多GPU
in the Training Options dialog.To train using a remote cluster, set the
ExectionEnvironment
选项平行
in the Training Options dialog. If there is no current parallel pool, the software starts one using the default cluster profile. If the pool has access to GPUs, then only workers with a unique GPU perform training computation. If the pool does not have GPUs, then training takes place on all available CPU workers instead.
You can use Experiment Manager to run a single trial using multiple parallel workers. For more information, see使用实验经理并行培训网络。
Train Multiple Networks in Parallel
使用本地或远程集群资源并行训练多个网络
To train multiple networks in parallel, train each network on a different parallel worker. You can modify the network or training parameters on each worker to perform parameter sweeps in parallel.
利用parfor
(Parallel Computing Toolbox)或者parfeval
(Parallel Computing Toolbox)to train a single network on each worker. To run in the background without blocking your local MATLAB, useparfeval
。You can plot results using theoutputfcn
training option.
您可以在本地运行或使用远程群集。使用远程群集需要MATLAB Parallel Server。
资源 | trainNetwork 工作流 |
Custom Training Workflows | Required Products |
---|---|---|---|
Multiple CPUs | 指定desired cluster as your default cluster profile. For more information, seeManage Cluster Profiles and Automatic Pool Creation 利用 有关示例,请参见 |
指定desired cluster as your default cluster profile. For more information, seeManage Cluster Profiles and Automatic Pool Creation 利用 |
|
Multiple GPUs | Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs 利用 有关示例,请参见 |
Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs 利用 Convert each mini-batch of data to |
利用实验经理to Train Multiple Networks in Parallel
You can use Experiment Manager to run trials on multiple parallel workers simultaneously. Set up your parallel environment and enable the利用Paralleloption before running your experiment. Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. For more information, see使用实验经理并行培训网络。
批处理深度学习
You can offload deep learning computations to run in the background using thebatch
(Parallel Computing Toolbox)function. This means that you can continue using MATLAB while your computation runs in the background, or you can close your client MATLAB and fetch results later.
您可以在本地或远程集群中运行批处理作业。要卸载深度学习计算,请使用batch
提交在集群中运行的脚本或功能。您可以将任何类型的深度学习计算作为批处理作业,包括并行计算。例如,请参阅将深度学习批处理工作发送到集群
要并行运行,请使用包含与本地或集群中并行运行相同代码的脚本或函数。例如,您的脚本或功能可以运行trainNetwork
使用"ExecutionEnvironment","parallel"
选项,或并行运行自定义培训循环。利用batch
将脚本或功能提交到集群中并使用Pool
选项specify the number of workers you want to use. For more information on running parallel computations withbatch
, seeRun Batch Parallel Jobs(Parallel Computing Toolbox)。
为了在多个网络上运行深度学习计算,建议为每个网络提交一个批处理作业。这样做可以避免在集群中启动并行池的开销,并允许您使用工作监视器来单独观察每个网络计算的进度。
You can submit multiple batch jobs. If the submitted jobs require more workers than are currently available in the cluster, then later jobs are queued until earlier jobs have finished. Queued jobs start when enough workers are available to run the job.
The default search paths of the workers might not be the same as that of your client MATLAB. To ensure that workers in the cluster have access to the needed files, such as code files, data files, or model files, specify paths to add to workers using theAdditionalPaths
选项。
To retrieve results after the job is finished, use thefetchOutputs
(Parallel Computing Toolbox)function.fetchOutputs
retrieves all variables in the batch worker workspace. When you submit batch jobs as a script, by default, workspace variables are copied from the client to workers. To avoid recursion of workspace variables, submit batch jobs as functions instead of as scripts.
You can use thediary
(Parallel Computing Toolbox)to capture command line output while running batch jobs. This can be useful when executing thetrainNetwork
功能Verbose
option set to真的
。
Manage Cluster Profiles and Automatic Pool Creation
并行计算工具箱与群集配置文件预先配置local
for running parallel code on your local desktop machine. By default, MATLAB starts all parallel pools using thelocal
cluster profile. If you want to run code on a remote cluster, you must start a parallel pool using the remote cluster profile. You can manage cluster profiles using the Cluster Profile Manager. For more information about managing cluster profiles, see发现集群并使用群集配置文件(Parallel Computing Toolbox)。
Some functions, includingtrainNetwork
,预测
,分类
,parfor
, 和parfeval
can automatically start a parallel pool. To take advantage of automatic parallel pool creation, set your desired cluster as the default cluster profile in the Cluster Profile Manager. Alternatively, you can create the pool manually and specify the desired cluster resource when you create the pool.
如果您想在远程群集中使用多个GPU来并行训练多个网络或定制培训循环,则最佳实践是在所需的群集中手动启动一个并行池,并使用与可用GPU一样多的工人开始。有关更多信息,请参阅Deep Learning with MATLAB on Multiple GPUs。
Deep Learning Precision
For best performance, it is recommended to use a GPU for all deep learning workflows. Because single-precision and double-precision performance of GPUs can differ substantially, it is important to know in which precision computations are performed. Typically, GPUs offer much better performance for calculations in single precision.
If you only use a GPU for deep learning, then single-precision performance is one of the most important characteristics of a GPU. If you also use a GPU for other computations using Parallel Computing Toolbox, then high double-precision performance is important. This is because many functions in MATLAB use double-precision arithmetic by default. For more information, seeImprove Performance Using Single Precision Calculations(Parallel Computing Toolbox)
当您使用网络训练网络时trainNetwork
function, or when you use prediction or validation functions withDAGNetwork
和SeriesNetwork
objects, the software performs these computations using single-precision, floating-point arithmetic. Functions for training, prediction, and validation includetrainNetwork
,预测
,分类
, 和激活
。当您同时使用CPU和GPU训练网络时,该软件使用单精度算术。
For custom training workflows, it is recommended to convert data to single precision for training and inference. If you useMinibatchqueue
to manage mini-batches, your data is converted to single precision by default.
另请参阅
训练
|Minibatchqueue
|trainNetwork
|Deep Network Designer|实验经理
相关话题
- Deep Learning with MATLAB on Multiple GPUs
- Deep Learning with Big Data
- 云中的深度学习
- Train Deep Learning Networks in Parallel
- 将深度学习批处理工作发送到集群
- 使用Parfeval训练多个深度学习网络
- 利用parfor to Train Multiple Deep Learning Networks
- Upload Deep Learning Data to the Cloud
- Run Custom Training Loops on a GPU and in Parallel
- 使用实验经理并行培训网络