Use Automatic Differentiation InDeep Learning Toolbox
Custom Training and Calculations Using Automatic Differentiation
Automatic differentiation makes it easier to create custom training loops, custom layers, and other deep learning customizations.
Generally, the simplest way to customize deep learning training is to create adlnetwork
. Include the layers you want in the network. Then perform training in a custom loop by using some sort of gradient descent, where the gradient is the gradient of the objective function. The objective function can be classification error, cross-entropy, or any other relevant scalar function of the network weights. SeeList of Functions with dlarray Support.
This example is a high-level version of a custom training loop. Here,f
is the objective function, such as loss, andg
is the gradient of the objective function with respect to the weights in the networknet
. Theupdate
function represents some type of gradient descent.
% High-level training loopn = 1;while(n < nmax) [f,g] = dlfeval(@model,net,X,T); net = update(net,g); n = n + 1;end
You calldlfeval
to compute the numeric value of the objective and gradient. To enable the automatic computation of the gradient, the dataX
must be adlarray
.
X = dlarray(X);
The objective function has adlgradient
call to calculate the gradient. Thedlgradient
call must be inside of the function thatdlfeval
evaluates.
function[f,g] = model(net,X,T)% Calculate objective using supported functions for dlarrayY = forward(net,X); f = fcnvalue(Y,T);% crossentropy or similarg = dlgradient(f,net.Learnables);% Automatic gradientend
For an example using adlnetwork
with adlfeval
-dlgradient
-dlarray
syntax and a custom training loop, seeTrain Network Using Custom Training Loop. For further details on custom training using automatic differentiation, seeDefine Custom Training Loops, Loss Functions, and Networks.
Usedlgradient
anddlfeval
Together for Automatic Differentiation
To use automatic differentiation, you must calldlgradient
inside a function and evaluate the function usingdlfeval
. Represent the point where you take a derivative as adlarray
object, which manages the data structures and enables tracing of evaluation. For example, the Rosenbrock function is a common test function for optimization.
function[f,grad] = rosenbrock(x) f = 100*(x(2) - x(1).^2).^2 + (1 - x(1)).^2; grad = dlgradient(f,x);end
Calculate the value and gradient of the Rosenbrock function at the pointx0
= [–1,2]. To enable automatic differentiation in the Rosenbrock function, passx0
as adlarray
.
x0 = dlarray([-1,2]); [fval,gradval] = dlfeval(@rosenbrock,x0)
fval = 1x1 dlarray 104 gradval = 1x2 dlarray 396 200
For an example using automatic differentiation, seeTrain Network Using Custom Training Loop.
Derivative Trace
To evaluate a gradient numerically, adlarray
constructs a data structure for reverse mode differentiation, as described inAutomatic Differentiation Background. This data structure is thetraceof the derivative computation. Keep in mind these guidelines when using automatic differentiation and the derivative trace:
Do not introduce a new
dlarray
inside of an objective function calculation and attempt to differentiate with respect to that object. For example:function[dy,dy1] = fun(x1) x2 = dlarray(0); y = x1 + x2; dy = dlgradient(y,x2);% Error: x2 is untraceddy1 = dlgradient(y,x1);% No error even though y has an untraced portionend
Do not use
extractdata
跟踪参数。这样做违反了tracing. For example:fun = @(x)dlgradient(x + atan(extractdata(x)),x);% Gradient for any point is 1 due to the leading 'x' term in fun.dlfeval(fun,dlarray(2.5))
ans = 1x1 dlarray 1
However, you can use
extractdata
to introduce a new independent variable from a dependent one.When working in parallel, moving traced dlarray objects between the client and workers breaks the tracing. The traced dlarray object is saved on the worker and loaded in the client as an untraced dlarray object. To avoid breaking tracing when working in parallel, compute all required gradients on the worker and then combine the gradients on the client. For an example, seeTrain Network in Parallel with Custom Training Loop.
Use only supported functions. For a list of supported functions, seeList of Functions with dlarray Support. To use an unsupported functionf, try to implementfusing supported functions.
Characteristics of Automatic Derivatives
You can evaluate gradients using automatic differentiation only for scalar-valued functions. Intermediate calculations can have any number of variables, but the final function value must be scalar. If you need to take derivatives of a vector-valued function, take derivatives of one component at a time. In this case, consider setting the
dlgradient
'RetainData'
名称-值对参数true
.A call to
dlgradient
evaluates derivatives at a particular point. The software generally makes an arbitrary choice for the value of a derivative when there is no theoretical value. For example, therelu
function,relu(x) = max(x,0)
, is not differentiable atx = 0
. However,dlgradient
returns a value for the derivative.x = dlarray(0); y = dlfeval(@(t)dlgradient(relu(t),t),x)
y = 1x1 dlarray 0
The value at the nearby point
eps
is different.x = dlarray(eps); y = dlfeval(@(t)dlgradient(relu(t),t),x)
y = 1x1 dlarray 1
See Also
dlarray
|dlgradient
|dlfeval
|dlnetwork
Related Topics
- Train Generative Adversarial Network (GAN)
- Define Custom Training Loops, Loss Functions, and Networks
- Train Network Using Custom Training Loop
- Specify Training Options in Custom Training Loop
- Define Model Loss Function for Custom Training Loop
- Train Network Using Model Function
- Initialize Learnable Parameters for Model Function
- List of Functions with dlarray Support