主要内容

Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud

Training deep networks is computationally intensive and can take many hours of computing time; however, neural networks are inherently parallel algorithms. You can take advantage of this parallelism by running in parallel using high-performance GPUs and computer clusters.

It is recommended to train using a GPU or multiple GPUs. Only use single CPU or multiple CPUs if you do not have a GPU. CPUs are normally much slower that GPUs for both training and inference. Running on a single GPU typically offers much better performance than running on multiple CPU cores.

If you do not have a suitable GPU, you can rent high-performance GPUs and clusters in the cloud. For more information on hot to access MATLAB®在云中进行深度学习,请参阅云中的深度学习

使用GPU并行或平行的选项需要Computing Toolbox™. Using a GPU also requires a supported GPU device. For information on supported devices, seeGPU Support by Release(Parallel Computing Toolbox)。Using a remote cluster also requiresMATLAB Parallel Server™

Tip

FortrainNetworkworkflows, GPU support is automatic. By default, thetrainNetworkfunction uses a GPU if one is available. If you have access to a machine with multiple GPUs, specify the执行环境training option as"multi-gpu"

To run custom training workflows, includingdlnetwork工作流程,在GPU上,使用Minibatchqueue自动将数据转换为gpuArrayobjects.

您可以使用并行资源来扩展单个网络的深度学习。您也可以同时训练多个网络。以下各节显示了MATLAB中并行深入学习的可用选项:

Note

例如,如果在单个远程计算机上运行MATLAB,则通过SSH或远程桌面协议连接到的云计算机,请按照本地资源的步骤操作。有关连接到云资源的更多信息,请参见云中的深度学习

Train Single Network in Parallel

使用本地资源并行训练单个网络

The following table shows you the available options for training and inference with single network on your local workstation.

资源 trainNetwork工作流 Custom Training Workflows Required Products
Single CPU

Automatic if no GPU is available.

Training using a single CPU is not recommended.

Training using a single CPU is not recommended.

  • MATLAB

  • 深度学习工具箱™

Multiple CPU cores

Training using multiple CPU cores is not recommended if you have access to a GPU.

Training using multiple CPU cores is not recommended if you have access to a GPU.

  • MATLAB

  • 深度学习工具箱

  • Parallel Computing Toolbox

单gpu

Automatic. By default, training and inference run on the GPU if one is available.

Alternatively, specify the执行环境training option as"gpu"

利用Minibatchqueue自动将数据转换为gpuArrayobjects. For more information, seeRun Custom Training Loops on a GPU and in Parallel

例如,请参阅Train Network Using Custom Training Loop

Multiple GPUs

指定执行环境training option as"multi-gpu"

例如,请参阅使用自动多GPU支持的火车网络万博1manbetx

Start a local parallel pool with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs

利用parpool在每个工人上执行培训或推断一部分小批量。将每个部分迷你批量数据转换为gpuArrayobjects. For training, aggregate gradients, loss and state parameters after each iteration. For more information, seeRun Custom Training Loops on a GPU and in Parallel

例如,请参阅与自定义培训循环并行的火车网络。Set the执行环境variable to"auto"或者"gpu"

利用Remote Cluster Resources to Train Single Network in Parallel

下表显示了远程群集上单个网络培训和推断的可用选项。

资源 trainNetwork工作流 Custom Training Workflows Required Products
Multiple CPUs

Training using multiple CPU cores is not recommended if you have access to a GPU.

Training using multiple CPU cores is not recommended if you have access to a GPU.

  • MATLAB

  • 深度学习工具箱

  • Parallel Computing Toolbox

  • MATLAB Parallel Server

Multiple GPUs

指定desired cluster as your default cluster profile. For more information, seeManage Cluster Profiles and Automatic Pool Creation

指定执行环境training option as"parallel"

例如,请参阅Train Network in the Cloud Using Automatic Parallel Support

Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs

利用parpoolto execute training or inference with a portion of a mini-batch on each worker. Convert each partial mini-batch of data togpuArrayobjects. For training, aggregate gradients, loss and state parameters after each iteration. For more information, seeRun Custom Training Loops on a GPU and in Parallel

例如,请参阅与自定义培训循环并行的火车网络。Set the执行环境variable to"auto"或者"gpu"

使用深网设计师和实验经理并行训练单个网络

您可以使用Deep Network Designer并行训练单个网络。您可以使用本地资源或远程集群训练。

  • 要使用多个GPU在本地训练,请设置ExectionEnvironment选项多GPUin the Training Options dialog.

  • To train using a remote cluster, set theExectionEnvironment选项平行in the Training Options dialog. If there is no current parallel pool, the software starts one using the default cluster profile. If the pool has access to GPUs, then only workers with a unique GPU perform training computation. If the pool does not have GPUs, then training takes place on all available CPU workers instead.

You can use Experiment Manager to run a single trial using multiple parallel workers. For more information, see使用实验经理并行培训网络

Train Multiple Networks in Parallel

使用本地或远程集群资源并行训练多个网络

To train multiple networks in parallel, train each network on a different parallel worker. You can modify the network or training parameters on each worker to perform parameter sweeps in parallel.

利用parfor(Parallel Computing Toolbox)或者parfeval(Parallel Computing Toolbox)to train a single network on each worker. To run in the background without blocking your local MATLAB, useparfeval。You can plot results using theoutputfcntraining option.

您可以在本地运行或使用远程群集。使用远程群集需要MATLAB Parallel Server

资源 trainNetwork工作流 Custom Training Workflows Required Products
Multiple CPUs

指定desired cluster as your default cluster profile. For more information, seeManage Cluster Profiles and Automatic Pool Creation

利用parfor或者parfeval同时execute training or inference on each worker. Specify the执行环境training option as"cpu"对于每个网络。

有关示例,请参见

指定desired cluster as your default cluster profile. For more information, seeManage Cluster Profiles and Automatic Pool Creation

利用parfor或者parfeval同时execute training or inference on each worker. For more information, seeRun Custom Training Loops on a GPU and in Parallel

  • MATLAB

  • 深度学习工具箱

  • Parallel Computing Toolbox

  • (optional)MATLAB Parallel Server

Multiple GPUs

Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs

利用parfor或者parfeval同时execute a network on each worker. Specify the执行环境training option as"gpu"对于每个网络。

有关示例,请参见

Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, seeDeep Learning with MATLAB on Multiple GPUs

利用parfor或者parfeval同时execute training or inference on each worker. For more information, seeRun Custom Training Loops on a GPU and in Parallel

Convert each mini-batch of data togpuArray。利用Minibatchqueue和setOutputEnvironmentproperty to'gpu'自动将数据转换为gpuArrayobjects.

利用实验经理to Train Multiple Networks in Parallel

You can use Experiment Manager to run trials on multiple parallel workers simultaneously. Set up your parallel environment and enable the利用Paralleloption before running your experiment. Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. For more information, see使用实验经理并行培训网络

批处理深度学习

You can offload deep learning computations to run in the background using thebatch(Parallel Computing Toolbox)function. This means that you can continue using MATLAB while your computation runs in the background, or you can close your client MATLAB and fetch results later.

您可以在本地或远程集群中运行批处理作业。要卸载深度学习计算,请使用batch提交在集群中运行的脚本或功能。您可以将任何类型的深度学习计算作为批处理作业,包括并行计算。例如,请参阅将深度学习批处理工作发送到集群

要并行运行,请使用包含与本地或集群中并行运行相同代码的脚本或函数。例如,您的脚本或功能可以运行trainNetwork使用"ExecutionEnvironment","parallel"选项,或并行运行自定义培训循环。利用batch将脚本或功能提交到集群中并使用Pool选项specify the number of workers you want to use. For more information on running parallel computations withbatch, seeRun Batch Parallel Jobs(Parallel Computing Toolbox)

为了在多个网络上运行深度学习计算,建议为每个网络提交一个批处理作业。这样做可以避免在集群中启动并行池的开销,并允许您使用工作监视器来单独观察每个网络计算的进度。

You can submit multiple batch jobs. If the submitted jobs require more workers than are currently available in the cluster, then later jobs are queued until earlier jobs have finished. Queued jobs start when enough workers are available to run the job.

The default search paths of the workers might not be the same as that of your client MATLAB. To ensure that workers in the cluster have access to the needed files, such as code files, data files, or model files, specify paths to add to workers using theAdditionalPaths选项。

To retrieve results after the job is finished, use thefetchOutputs(Parallel Computing Toolbox)function.fetchOutputsretrieves all variables in the batch worker workspace. When you submit batch jobs as a script, by default, workspace variables are copied from the client to workers. To avoid recursion of workspace variables, submit batch jobs as functions instead of as scripts.

You can use thediary(Parallel Computing Toolbox)to capture command line output while running batch jobs. This can be useful when executing thetrainNetwork功能Verboseoption set to真的

Manage Cluster Profiles and Automatic Pool Creation

并行计算工具箱与群集配置文件预先配置localfor running parallel code on your local desktop machine. By default, MATLAB starts all parallel pools using thelocalcluster profile. If you want to run code on a remote cluster, you must start a parallel pool using the remote cluster profile. You can manage cluster profiles using the Cluster Profile Manager. For more information about managing cluster profiles, see发现集群并使用群集配置文件(Parallel Computing Toolbox)

Some functions, includingtrainNetwork,预测,分类,parfor, 和parfevalcan automatically start a parallel pool. To take advantage of automatic parallel pool creation, set your desired cluster as the default cluster profile in the Cluster Profile Manager. Alternatively, you can create the pool manually and specify the desired cluster resource when you create the pool.

如果您想在远程群集中使用多个GPU来并行训练多个网络或定制培训循环,则最佳实践是在所需的群集中手动启动一个并行池,并使用与可用GPU一样多的工人开始。有关更多信息,请参阅Deep Learning with MATLAB on Multiple GPUs

Deep Learning Precision

For best performance, it is recommended to use a GPU for all deep learning workflows. Because single-precision and double-precision performance of GPUs can differ substantially, it is important to know in which precision computations are performed. Typically, GPUs offer much better performance for calculations in single precision.

If you only use a GPU for deep learning, then single-precision performance is one of the most important characteristics of a GPU. If you also use a GPU for other computations using Parallel Computing Toolbox, then high double-precision performance is important. This is because many functions in MATLAB use double-precision arithmetic by default. For more information, seeImprove Performance Using Single Precision Calculations(Parallel Computing Toolbox)

当您使用网络训练网络时trainNetworkfunction, or when you use prediction or validation functions withDAGNetworkSeriesNetworkobjects, the software performs these computations using single-precision, floating-point arithmetic. Functions for training, prediction, and validation includetrainNetwork,预测,分类, 和激活。当您同时使用CPU和GPU训练网络时,该软件使用单精度算术。

For custom training workflows, it is recommended to convert data to single precision for training and inference. If you useMinibatchqueueto manage mini-batches, your data is converted to single precision by default.

另请参阅

||||

相关话题