Choose Training Configurations for LSTM Using Bayesian Optimization

Open Example

This example shows how to create a deep learning experiment to find optimal network hyperparameters and training options for long short-term memory (LSTM) networks using Bayesian optimization. In this example, you useExperiment Managerto train LSTM networks that predict the remaining useful life (RUL) of engines. The experiment uses the Turbofan Engine Degradation Simulation Data Set described in [1] (seeReferences）。为了more information on processing this data set for sequence-to-sequence regression, see使用深度学习的顺序到序列回归。

Bayesian optimization provides an alternative strategy to sweeping hyperparameters in an experiment. You specify a range of values for each hyperparameter and select a metric to optimize, and Experiment Manager searches for a combination of hyperparameters that optimizes your selected metric. Bayesian optimization requires Statistics and Machine Learning Toolbox™. For more information, seeTune Experiment Hyperparameters by Using Bayesian Optimization。

RUL captures how many operational cycles an engine can make before failure. To focus on the sequence data from when the engines are close to failing, preprocess the data by clipping the responses at a specified threshold. This preprocessing operation allows the network to focus on predictor data behaviors close to failing by treating instances with higher RUL values as equal. For example, this figure shows the first response observation and the corresponding clipped response with a threshold of 150.

When you train a deep learning network, how you preprocess data, the number of layers and hidden units, and the initial learning rate in the network can affect the training behavior and performance of the network. Choosing the depth of an LSTM network involves balancing speed and accuracy. For example, deeper networks can be more accurate but take longer to train and converge [2].

By default, when you run a built-in training experiment for regression, Experiment Manager computes the loss and root mean squared error (RMSE) for each trial in your experiment. This example compares the performance of the network in each trial by using a custom metric that is specific to the problem data set. For more information on using custom metric functions, seeEvaluate Deep Learning Experiments by Using Metric Functions。

开放实验

First, open the example. Experiment Manager loads a project with a preconfigured experiment. To open the experiment, in theExperiment Browser, double-click the name of the experiment (SequenceRegressionExperiment）。

Built-in training experiments consist of a description, a table of hyperparameters, a setup function, and a collection of metric functions to evaluate the results of the experiment. Experiments that use Bayesian optimization include additional options to limit the duration of the experiment. For more information, seeConfigure Built-In Training Experiment。

TheDescriptionfield contains a textual description of the experiment. For this example, the description is:

序列到序列回归，以预测发动机的剩余使用寿命（RUL）。该实验在更改数据阈值级别，LSTM层深度，隐藏单元的数量和初始学习率时，使用贝叶斯优化比较网络性能。

The高参数表specifies the strategy (Bayesian Optimization) and hyperparameter values to use for the experiment. For each hyperparameter, specify these options:

Range— Enter a two-element vector that gives the lower bound and upper bound of a real- or integer-valued hyperparameter, or a string array or cell array that lists the possible values of a categorical hyperparameter.
Type— Selectreal(real-valued hyperparameter),整数(integer-valued hyperparameter), orcategorical(categorical hyperparameter).
转换— Selectnone(no transform) orlog（对数变换）。为了log, the hyperparameter must berealor整数and positive. With this option, the hyperparameter is searched and modeled on a logarithmic scale.

When you run the experiment, Experiment Manager searches for the best combination of hyperparameters. Each trial uses a new combination of the hyperparameter values based on the results of the previous trials. This example uses these hyperparameters:

Threshold将所有响应数据设置在阈值之上值之上，以等于阈值。为防止均匀响应数据，请使用更大或等于150的阈值。为了将允许值集限制为150、200和250，实验模型Thresholdas a categorical hyperparameter.
lstmdepth指示网络中使用的LSTM层数。将此超参数指定为1和3之间的整数。
NumHiddenUnitsdetermines the number of hidden units, or the amount of information stored at each time step, used in the network. Increasing the number of hidden units can result in overfitting the data and in a longer training time. Decreasing the number of hidden units can result in underfitting the data. Specify this hyperparameter as an integer between 50 and 300.
初始learternratespecifies the initial learning rate used for training. If the learning rate is too low, then training takes a long time. If the learning rate is too high, then training can reach a suboptimal result or diverge. The best learning rate depends on your data as well as the network you are training. The experiment models this hyperparameter on a logarithmic scale because the range of values (0.001 to 0.1) spans several orders of magnitude.

UnderBayesian Optimization Options，您可以通过输入最长时间（以秒为单位）和最大试验数来指定实验的持续时间。为了最好地使用贝叶斯优化的功能，请至少执行30次目标函数评估。

TheSetup Functionconfigures the training data, network architecture, and training options for the experiment. The input to the setup function is a structure with fields from the hyperparameter table. The setup function returns four outputs that you use to train a network for image regression problems. In this example, the setup function has three sections.

Load and Preprocess Datadownloads and extracts the Turbofan Engine Degradation Simulation Data Set fromhttps://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/[3]。设置函数的这一部分还滤除了恒定的有价值功能，将预测数据数据归一化为零均值和单位方差，并使用超参数的数值来剪辑响应数据Threshold, and randomly selects training examples to use for validation.

dataFolder = fullfile(tempdir,"turbofan");

如果~exist(dataFolder,"dir") mkdir(dataFolder); oldDir = cd(dataFolder); filename ="CMAPSSData.zip"; websave(filename,"https://ti.arc.nasa.gov/c/6/",。。。weboptions("Timeout"正);解压缩(文件名,dataFolder);cd (oldDir);结尾

filenameTrainPredictors = fullfile(dataFolder,"train_FD001.txt");[XTrain, YTrain] = processTurboFanDataTrain(文件nameTrainPredictors);

XTrain = helperFilter(XTrain); XTrain = helperNormalize(XTrain);

thr = str2double(params.Threshold);为了i = 1：numel（ytrain）ytrain {i}（ytrain {i}> thr）= thr;结尾

为了i=1:numel(XTrain) sequence = XTrain{i}; sequenceLengths(i) = size(sequence,2);结尾

[~,idx] = sort(sequenceLengths,"descend");XTrain = XTrain(idx); YTrain = YTrain(idx);

idx = randperm(numel(XTrain),10); XValidation = XTrain(idx); XTrain(idx) = []; YValidation = YTrain(idx); YTrain(idx) = [];

Define Network Architecture为序列到序列回归定义LSTM网络的体系结构。该网络由LSTM层组成，然后由100尺寸的完全连接层和一个辍学层的掉落层组成，其辍学概率为0.5。超参数lstmdepthandNumHiddenUnitsspecify the number of LSTM layers and the number of hidden units for each layer.

numResponses = size（ytrain {1}，1）;特征imension = size（xtrain {1}，1）;lstmdepth = params.lstmdepth;numhidendunits = params.numhidendunits;

layers = sequenceInputLayer(featureDimension);

为了i = 1:LSTMDepth layers = [layers;lstmLayer(numHiddenUnits,OutputMode="sequence"）;结尾

layers = [layers fullyConnectedLayer(100) reluLayer() dropoutLayer(0.5) fullyConnectedLayer(numResponses) regressionLayer];

Specify Training Optionsdefines the training options for the experiment. Because deeper networks take longer to converge, the number of epochs is set to 300 to ensure all network depths converge. This example validates the network every 30 iterations. The initial learning rate equals the初始learternrate从高参数表中值，每15个时期下降0.2倍。培训选项ExecutionEnvironment调成"auto"，如果有可用的话，该实验将在GPU上运行。否则，实验经理使用CPU。因为此示例可以使用GPU加快训练时间来比较许多时期的网络深度和火车。使用GPU需要并行计算工具箱™和支持的GPU设备。万博1manbetx有关更多信息，请参阅GPU Support by Release(Parallel Computing Toolbox)。

maxepochs = 300;minibatchsize = 20;

options = trainingOptions("adam",。。。执行环境="auto",。。。MaxEpochs=maxEpochs,。。。minibatchsize = minibatchsize，。。。ValidationData={XValidation,YValidation},。。。验证频率= 30，。。。初始learternrate=params.InitialLearnRate,。。。LearnRateDropFactor=0.2,。。。LearnRateDropPeriod=15,。。。GradientThreshold=1,。。。Shuffle="never",。。。Verbose=false);

To inspect the setup function, underSetup Function, clickEdit。The setup function opens in MATLAB Editor. In addition, the code for the setup function appears in附录1at the end of this example.

TheMetricssection specifies optional functions that evaluate the results of the experiment. Experiment Manager evaluates these functions each time it finishes training the network. To inspect a metric function, select the name of the metric function and clickEdit。The metric function opens in MATLAB Editor.

The prediction of the RUL of an engine requires careful consideration. If the prediction underestimates the RUL, engine maintenance might be scheduled before it is necessary. If the prediction overestimates the RUL, the engine might fail while in operation, resulting in high costs or safety concerns. To help mitigate these scenarios, this example includes a metric functionMeanMaxAbsoluteErrorthat identifies networks that underpredict or overpredict the RUL.

TheMeanMaxAbsoluteErrormetric calculates the maximum absolute error, averaged across the entire training set. This metric calls thepredictfunction to make a sequence of RUL predictions from the training set. Then, after calculating the maximum absolute error between each training response and predicted response sequence, the function computes the mean of all maximum absolute errors. This metric identifies the maximum deviations between the actual and predicted responses. The code for the metric function appears in附录3at the end of this example.

Run Experiment

当您运行实验时，实验管理器会搜索相对于所选指标的超参数的最佳组合。实验中的每个试验都基于先前试验的结果，使用了超参数值的新组合。

培训可能需要一些时间。为了限制实验的持续时间，您可以修改Bayesian Optimization Optionsby reducing the maximum running time or the maximum number of trials. However, note that running fewer than 30 trials can prevent the Bayesian optimization algorithm from converging to an optimal set of hyperparameters.

默认情况下，实验经理一次进行一次试验。如果您具有并行计算工具箱™，则可以同时运行多个试验，也可以将实验作为群集中的批处理作业。

一次进行实验的一个试验，在实验管理器工具条上Mode, selectSequentialand clickRun。
同时进行多次试验Mode, selectSimultaneousand clickRun。If there is no current parallel pool, Experiment Manager starts one using the default cluster profile. Experiment Manager then executes multiple simultaneous trials, depending on the number of parallel workers available. For best results, before you run your experiment, start a parallel pool with as many workers as GPUs. For more information, seeUse Experiment Manager to Train Networks in ParallelandGPU Support by Release(Parallel Computing Toolbox)。
To offload the experiment as a batch job, underMode, selectBatch SequentialorBatch Simultaneous，指定您的ClusterandPool Size, and clickRun。有关更多信息，请参阅卸载实验作为批处理作业到集群。

结果表显示每个试验的度量函数值。实验管理器以选定度量的最佳值强调了试验。例如，在本实验中，第23次试验产生的最大绝对误差最小。

While the experiment is running, clickTraining Plot显示此培训策划和跟踪s of each trial. The elapsed time for a trial to complete training increases with network depth.

Evaluate Results

In the table of results, theMeanMaxAbsoluteErrorvalue quantifies how much the network underpredicts or overpredicts the RUL. TheValidation RMSEvalue quantifies how well the network generalizes to unseen data. To find the best result for your experiment, sort the table of results and select the trial that has the lowestMeanMaxAbsoluteErrorandValidation RMSEvalues.

指向MeanMaxAbsoluteErrorcolumn.
Click the triangle icon.
SelectSort in Ascending Order。

Similarly, find the trial with the smallest validation RMSE by opening the drop-down menu for theValidation RMSEcolumn and selectingSort in Ascending Order。

If no single trial minimizes both values, opt for a trial that ranks well for both metrics. For instance, in these results, trial 23 has the smallest mean maximum absolute error and the seventh smallest validation RMSE. Among the trials with a lower validation RMSE, only trial 29 has a comparable mean maximum absolute error. Which of these trials is preferable depends on whether you favor a lower mean maximum absolute error or a lower validation RMSE.

记录观察的结果xperiment, add an annotation.

在结果表中，右键单击MeanMaxAbsoluteErrorcell of the best trial.
Select添加注释。
In the注释pane, enter your observations in the text box.
Repeat the previous steps for theValidation RMSEcell.

To test the best trial in your experiment, export the trained networks and display the predicted response sequence for several randomly chosen test sequences.

Select the best trial in your experiment.
On theExperiment Managertoolstrip, clickExport>Trained Network。
In the dialog window, enter the name of a workspace variable for the exported network. The default name is训练的网络。
Use the exported network and theThresholdvalue of the network as inputs to the helper function绘制序列, which is listed in附录4at the end of this example. For instance, in the MATLAB Command Window, enter:

绘制序列(trainedNetwork,200)

The function plots the true and predicted response sequences of unseen test data.

Close Experiment

In theExperiment Browser, right-click the name of the project and selectClose Project。实验经理关闭了项目中包含的所有实验和结果。

附录1：设置功能

This function configures the training data, network architecture, and training options for the experiment.

Input

参数is a structure with fields from the Experiment Manager hyperparameter table.

Output

XTrain是包含训练数据的单元阵列。
YTrainis a cell array containing the regression values for training,
layersis a layer graph that defines the neural network architecture.
optionsis atrainingOptionsobject.

function[XTrain,YTrain,layers,options] = SequenceRegressionExperiment_setup1(params) dataFolder = fullfile(tempdir,"turbofan");如果~exist(dataFolder,"dir") mkdir(dataFolder); oldDir = cd(dataFolder); filename ="CMAPSSData.zip"; websave(filename,"https://ti.arc.nasa.gov/c/6/",。。。weboptions("Timeout"正);解压缩(文件名,dataFolder);cd (oldDir);结尾filenameTrainPredictors = fullfile(dataFolder,"train_FD001.txt");[XTrain, YTrain] = processTurboFanDataTrain(文件nameTrainPredictors); XTrain = helperFilter(XTrain); XTrain = helperNormalize(XTrain); thr = str2double(params.Threshold);为了i = 1：numel（ytrain）ytrain {i}（ytrain {i}> thr）= thr;结尾为了i=1:numel(XTrain) sequence = XTrain{i}; sequenceLengths(i) = size(sequence,2);结尾[~,idx] = sort(sequenceLengths,"descend");XTrain = XTrain(idx); YTrain = YTrain(idx); idx = randperm(numel(XTrain),10); XValidation = XTrain(idx); XTrain(idx) = []; YValidation = YTrain(idx); YTrain(idx) = []; numResponses = size(YTrain{1},1); featureDimension = size(XTrain{1},1); LSTMDepth = params.LSTMDepth; numHiddenUnits = params.NumHiddenUnits; layers = sequenceInputLayer(featureDimension);为了i = 1:LSTMDepth layers = [layers;lstmLayer(numHiddenUnits,OutputMode="sequence"）;结尾layers = [layers fullyConnectedLayer(100) reluLayer() dropoutLayer(0.5) fullyConnectedLayer(numResponses) regressionLayer]; maxEpochs = 300; miniBatchSize = 20; options = trainingOptions("adam",。。。执行环境="auto",。。。MaxEpochs=maxEpochs,。。。minibatchsize = minibatchsize，。。。ValidationData={XValidation,YValidation},。。。验证频率= 30，。。。初始learternrate=params.InitialLearnRate,。。。LearnRateDropFactor=0.2,。。。LearnRateDropPeriod=15,。。。GradientThreshold=1,。。。Shuffle="never",。。。Verbose=false);结尾

附录2: Filter and Normalize Predictive Maintenance Data

The helper functionhelperFilter通过删除具有恒定值的功能来过滤数据。在所有时间步骤中保持恒定的功能可能会对培训产生负面影响。

function[XTrain,XTest] = helperFilter(XTrain,XTest) m = min([XTrain{:}],[],2); M = max([XTrain{:}],[],2); idxConstant = M == m;

为了i = 1:numel(XTrain) XTrain{i}(idxConstant,:) = [];如果nargin>1 XTest{i}(idxConstant,:) = [];结尾结尾结尾

The helper functionhelperNormalizenormalizes the training and test predictors to have zero mean and unit variance.

function[XTrain,XTest] = helperNormalize(XTrain,XTest) mu = mean([XTrain{:}],2); sig = std([XTrain{:}],0,2);

为了i = 1：numel（xtrain）xtrain {i} =（xtrain {i} -mu）./ sig;如果nargin>1 XTest{i} = (XTest{i} - mu) ./ sig;结尾结尾结尾

附录3：计算最大绝对错误的平均值

This metric function calculates the maximum absolute error of the trained network, averaged over the training set.

functionmetricOutput = MeanMaxAbsoluteError(trialInfo) net = trialInfo.trainedNetwork; thr = str2double(trialInfo.parameters.Threshold); filenamePredictors = fullfile(tempdir,"turbofan","train_FD001.txt");[XTrain, YTrain] = processTurboFanDataTrain(文件namePredictors); XTrain = helperFilter(XTrain); XTrain = helperNormalize(XTrain);为了i = 1：numel（ytrain）ytrain {i}（ytrain {i}> thr）= thr;结尾ypred =预测（net，xtrain，minibatchSize = 1）;MaxAbsErrors =零（1，numel（ytrain））;为了i=1:numel(YTrain) absError = abs(YTrain{i}-YPred{i}); maxAbsErrors(i) = max(absError);结尾metricOutput = mean(maxAbsErrors);结尾

附录4: Plot Predictive Maintenance Sequences

此功能绘制了真实和预测的响应序列，以允许您评估训练有素的网络的性能。此功能使用助手功能helperFilterandhelperNormalize，在附录2。

function绘制序列(net,threshold) filenameTrainPredictors = fullfile(tempdir,"turbofan","train_FD001.txt");filenameTestPredictors = fullfile(tempdir,"turbofan",“ test_fd001.txt”);filenameTestResponses = fullfile(tempdir,"turbofan",“ rul_fd001.txt”);[XTrain, YTrain] = processTurboFanDataTrain(文件nameTrainPredictors); [XTest,YTest] = processTurboFanDataTest(filenameTestPredictors,filenameTestResponses); [XTrain,XTest] = helperFilter(XTrain,XTest); [~,XTest] = helperNormalize(XTrain,XTest);为了i = 1:numel(YTrain) YTrain{i}(YTrain{i} > threshold) = threshold; YTest{i}(YTest{i} > threshold) = threshold;结尾YPred = predict(net,XTest,MiniBatchSize=1); idx = randperm(100,4); figure为了i = 1：numel（idx）子图（2,2，i）绘图（ytest {idx（i）}，"--") holdon情节（ypred {idx（i）}，".-") hold离开Ylim（[0阈值+25]）标题（"Test Observation "+ idx(i)) xlabel("Time Step") ylabel("RUL")结尾传奇（["Test Data""Predicted"],Location="southwest")结尾

References

[1] Saxena, Abhinav, Kai Goebel, Don Simon, and Neil Eklund. "Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation."2008 International Conference on Prognostics and Health Management(2008): 1–9.

[2] Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. "An Empirical Exploration of Recurrent Network Architectures."Proceedings of the 32nd International Conference on Machine Learning(2015): 2342–2350.

[3] Saxena, Abhinav, Kai Goebel. "Turbofan Engine Degradation Simulation Data Set."NASA Ames Prognostics Data Repository,https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/, NASA Ames Research Center, Moffett Field, CA.