Choose Training Configurations for LSTM Using Bayesian Optimization

Since R2020b

This example uses:

Open Example

This example shows how to create a deep learning experiment to find optimal network hyperparameters and training options for long short-term memory (LSTM) networks using Bayesian optimization. In this example, you useExperiment Managerto train LSTM networks that predict the remaining useful life (RUL) of engines. The experiment uses the Turbofan Engine Degradation Simulation data set. For more information on processing this data set for sequence-to-sequence regression, seeSequence-to-Sequence Regression Using Deep Learning．

Bayesian optimization provides an alternative strategy to sweeping hyperparameters in an experiment. You specify a range of values for each hyperparameter and select a metric to optimize, and Experiment Manager searches for a combination of hyperparameters that optimizes your selected metric. Bayesian optimization requires Statistics and Machine Learning Toolbox™. For more information, seeTune Experiment Hyperparameters by Using Bayesian Optimization．

RUL captures how many operational cycles an engine can make before failure. To focus on the sequence data from when the engines are close to failing, preprocess the data by clipping the responses at a specified threshold. This preprocessing operation allows the network to focus on predictor data behaviors close to failing by treating instances with higher RUL values as equal. For example, this figure shows the first response observation and the corresponding clipped response with a threshold of 150.

When you train a deep learning network, how you preprocess data, the number of layers and hidden units, and the initial learning rate in the network can affect the training behavior and performance of the network. Choosing the depth of an LSTM network involves balancing speed and accuracy. For example, deeper networks can be more accurate but take longer to train and converge [2].

By default, when you run a built-in training experiment for regression, Experiment Manager computes the loss and root mean squared error (RMSE) for each trial in your experiment. This example compares the performance of the network in each trial by using a custom metric that is specific to the problem data set. For more information on using custom metric functions, seeEvaluate Deep Learning Experiments by Using Metric Functions．

Open Experiment

First, open the example. Experiment Manager loads a project with a preconfigured experiment. To open the experiment, in theExperiment Browser, double-clickSequenceRegressionExperiment．

Built-in training experiments consist of a description, a table of hyperparameters, a setup function, and a collection of metric functions to evaluate the results of the experiment. Experiments that use Bayesian optimization include additional options to limit the duration of the experiment. For more information, seeConfigure Built-In Training Experiment．

TheDescriptionfield contains a textual description of the experiment. For this example, the description is:

Sequence-to-sequence regression to predict the remaining useful life (RUL) of engines. This experiment compares network performance using Bayesian optimization when changing data thresholding level, LSTM layer depth, the number of hidden units, and the initial learn rate.

TheHyperparameterssection specifies the strategy and hyperparameter options to use for the experiment. For each hyperparameter, you can specify these options:

Range— Enter a two-element vector that gives the lower bound and upper bound of a real- or integer-valued hyperparameter, or a string array or cell array that lists the possible values of a categorical hyperparameter.
Type— Selectrealfor a real-valued hyperparameter,integerfor an integer-valued hyperparameter, orcategoricalfor a categorical hyperparameter.
Transform— Selectnoneto use no transform orlogto use a logarithmic transform. When you selectlog, the hyperparameter values must be positive. With this setting, the Bayesian optimization algorithm models the hyperparameter on a logarithmic scale.

When you run the experiment, Experiment Manager searches for the best combination of hyperparameters. Each trial uses a new combination of the hyperparameter values based on the results of the previous trials. This example uses these hyperparameters:

Thresholdsets all response data above the threshold value to be equal to the threshold value. To prevent uniform response data, use threshold values greater or equal to 150. To limit the set of allowable values to 150, 200 and 250, the experiment modelsThresholdas a categorical hyperparameter.
LSTMDepthindicates the number of LSTM layers used in the network. Specify this hyperparameter as an integer between 1 and 3.
NumHiddenUnitsdetermines the number of hidden units, or the amount of information stored at each time step, used in the network. Increasing the number of hidden units can result in overfitting the data and in a longer training time. Decreasing the number of hidden units can result in underfitting the data. Specify this hyperparameter as an integer between 50 and 300.
InitialLearnRatespecifies the initial learning rate used for training. If the learning rate is too low, then training takes a long time. If the learning rate is too high, then training can reach a suboptimal result or diverge. The best learning rate depends on your data as well as the network you are training. The experiment models this hyperparameter on a logarithmic scale because the range of values (0.001 to 0.1) spans several orders of magnitude.

UnderBayesian Optimization Options, you can specify the duration of the experiment by entering the maximum time (in seconds) and the maximum number of trials to run. To best use the power of Bayesian optimization, perform at least 30 objective function evaluations.

TheSetup Functionsection specifies a function that configures the training data, network architecture, and training options for the experiment. To open this function in MATLAB® Editor, clickEdit．The code for the function also appears inSetup Function．The input to the setup function is a structure with fields from the hyperparameter table. The function returns four outputs that you use to train a network for image regression problems. In this example, the setup function has these sections:

Load and Preprocess Datadownloads and extracts the Turbofan Engine Degradation Simulation Data Set fromhttps://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/[3]. This section of the setup function also filters out constant valued features, normalizes the predictor data to have zero mean and unit variance, clips the response data by using the numerical value of the hyperparameterThreshold, and randomly selects training examples to use for validation.

dataFolder = fullfile(tempdir,"turbofan");if~exist(dataFolder,"dir") mkdir(dataFolder); filename = matlab.internal.examples.downloadSupportFile("nnet",．..“数据/你rbofanEngineDegradationSimulationData.zip"); unzip(filename,dataFolder);endfilenameTrainPredictors = fullfile(dataFolder,"train_FD001.txt"); [XTrain,YTrain] = processTurboFanDataTrain(filenameTrainPredictors); XTrain = helperFilter(XTrain); XTrain = helperNormalize(XTrain); thr = str2double(params.Threshold);fori = 1:numel(YTrain) YTrain{i}(YTrain{i} > thr) = thr;endfori=1:numel(XTrain) sequence = XTrain{i}; sequenceLengths(i) = size(sequence,2);end[~,idx] = sort(sequenceLengths,"descend"); XTrain = XTrain(idx); YTrain = YTrain(idx); idx = randperm(numel(XTrain),10); XValidation = XTrain(idx); XTrain(idx) = []; YValidation = YTrain(idx); YTrain(idx) = [];

Define Network Architecturedefines the architecture for an LSTM network for sequence-to-sequence regression. The network consists of LSTM layers followed by a fully connected layer of size 100 and a dropout layer with a dropout probability of 0.5. The hyperparametersLSTMDepthandNumHiddenUnitsspecify the number of LSTM layers and the number of hidden units for each layer.

numResponses = size(YTrain{1},1); featureDimension = size(XTrain{1},1); LSTMDepth = params.LSTMDepth; numHiddenUnits = params.NumHiddenUnits; layers = sequenceInputLayer(featureDimension);fori = 1:LSTMDepth layers = [layers;lstmLayer(numHiddenUnits,OutputMode="sequence")];endlayers = [layers fullyConnectedLayer(100) reluLayer() dropoutLayer(0.5) fullyConnectedLayer(numResponses) regressionLayer];

Specify Training Optionsdefines the training options for the experiment. Because deeper networks take longer to converge, the number of epochs is set to 300 to ensure all network depths converge. This example validates the network every 30 iterations. The initial learning rate equals theInitialLearnRatevalue from the hyperparameter table and drops by a factor of 0.2 every 15 epochs. With the training optionExecutionEnvironmentset to"auto", the experiment runs on a GPU if one is available. Otherwise, Experiment Manager uses the CPU. Because this example compares network depths and trains for many epochs, using a GPU speeds up training time considerably. Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For more information, seeGPU Computing Requirements(Parallel Computing Toolbox)．

maxEpochs = 300; miniBatchSize = 20; options = trainingOptions("adam",．..ExecutionEnvironment="auto",．..MaxEpochs=maxEpochs,．..MiniBatchSize=miniBatchSize,．..ValidationData={XValidation,YValidation},．..ValidationFrequency=30,．..InitialLearnRate=params.InitialLearnRate,．..LearnRateDropFactor=0.2,．..LearnRateDropPeriod=15,．..GradientThreshold=1,．..Shuffle="never",．..Verbose=false);

TheMetricssection specifies optional functions that evaluate the results of the experiment. Experiment Manager evaluates these functions each time it finishes training the network. This example includes a metric functionMeanMaxAbsoluteErrorthat identifies networks that underpredict or overpredict the RUL. If the prediction underestimates the RUL, engine maintenance might be scheduled before it is necessary. If the prediction overestimates the RUL, the engine might fail while in operation, resulting in high costs or safety concerns. To help mitigate these scenarios, theMeanMaxAbsoluteErrormetric calculates the maximum absolute error, averaged across the entire training set. This metric calls thepredictfunction to make a sequence of RUL predictions from the training set. Then, after calculating the maximum absolute error between each training response and predicted response sequence, the metric function computes the mean of all maximum absolute errors and identifies the maximum deviations between the actual and predicted responses. To open this function in MATLAB Editor, select the name of the metric function and clickEdit．The code for the function also appears inCompute Mean of Maximum Absolute Errors．

Run Experiment

When you run the experiment, Experiment Manager searches for the best combination of hyperparameters with respect to the chosen metric. Each trial in the experiment uses a new combination of hyperparameter values based on the results of the previous trials.

Training can take some time. To limit the duration of the experiment, you can modify theBayesian Optimization Optionsby reducing the maximum running time or the maximum number of trials. However, note that running fewer than 30 trials can prevent the Bayesian optimization algorithm from converging to an optimal set of hyperparameters.

By default, Experiment Manager runs one trial at a time. If you have Parallel Computing Toolbox™, you can run multiple trials at the same time or offload your experiment as a batch job in a cluster:

To run one trial of the experiment at a time, on theExperiment Managertoolstrip, underMode, selectSequentialand clickRun．
To run multiple trials at the same time, underMode, selectSimultaneousand clickRun．If there is no current parallel pool, Experiment Manager starts one using the default cluster profile. Experiment Manager then runs as many simultaneous trials as there are workers in your parallel pool. For best results, before you run your experiment, start a parallel pool with as many workers as GPUs. For more information, seeUse Experiment Manager to Train Networks in ParallelandGPU Computing Requirements(Parallel Computing Toolbox)．
To offload the experiment as a batch job, underMode, selectBatch SequentialorBatch Simultaneous, specify yourClusterandPool Size, and clickRun．For more information, seeOffload Experiments as Batch Jobs to Cluster．

A table of results displays the metric function values for each trial. Experiment Manager highlights the trial with the optimal value for the selected metric. For example, in this experiment, the 23rd trial produces the smallest maximum absolute error.

To display the training plot and track the progress of each trial while the experiment is running, under审查结果, clickTraining Plot．The elapsed time for a trial to complete training increases with network depth.

Evaluate Results

In the table of results, theMeanMaxAbsoluteErrorvalue quantifies how much the network underpredicts or overpredicts the RUL. TheValidation RMSEvalue quantifies how well the network generalizes to unseen data. To find the best result for your experiment, sort the table of results and select the trial that has the lowestMeanMaxAbsoluteErrorandValidation RMSEvalues.

Point to theMeanMaxAbsoluteErrorcolumn.
Click the triangle icon.
SelectSort in Ascending Order．

Similarly, find the trial with the smallest validation RMSE by opening the drop-down menu for theValidation RMSEcolumn and selectingSort in Ascending Order．

If no single trial minimizes both values, opt for a trial that ranks well for both metrics. For instance, in these results, trial 23 has the smallest mean maximum absolute error and the seventh smallest validation RMSE. Among the trials with a lower validation RMSE, only trial 29 has a comparable mean maximum absolute error. Which of these trials is preferable depends on whether you favor a lower mean maximum absolute error or a lower validation RMSE.

To record observations about the results of your experiment, add an annotation.

In the results table, right-click theMeanMaxAbsoluteErrorcell of the best trial.
Select添加注释．
In theAnnotationspane, enter your observations in the text box.
Repeat the previous steps for theValidation RMSEcell.

To test the best trial in your experiment, export the trained networks and display the predicted response sequence for several randomly chosen test sequences.

Select the best trial in your experiment.
On theExperiment Managertoolstrip, clickExport>Trained Network．
In the dialog window, enter the name of a workspace variable for the exported network. The default name istrainedNetwork．
使用网络和出口Thresholdvalue of the network as inputs to the helper functionplotSequences．To view the code for this function, seePlot Predictive Maintenance Sequences．For instance, in the MATLAB Command Window, enter:

plotSequences(trainedNetwork,200)

The function plots the true and predicted response sequences of unseen test data.

Close Experiment

In theExperiment Browser, right-click the name of the project and selectClose Project．Experiment Manager closes all of the experiments and results contained in the project.

Setup Function

This function configures the training data, network architecture, and training options for the experiment. The input to this function is a structure with fields from the hyperparameter table. The function returns four outputs that you use to train a network for image regression problems.

function[XTrain,YTrain,layers,options] = SequenceRegressionExperiment_setup(params)

Load and Preprocess Data

dataFolder = fullfile(tempdir,"turbofan");if~exist(dataFolder,"dir") mkdir(dataFolder); filename = matlab.internal.examples.downloadSupportFile("nnet",．..“数据/你rbofanEngineDegradationSimulationData.zip"); unzip(filename,dataFolder);endfilenameTrainPredictors = fullfile(dataFolder,"train_FD001.txt"); [XTrain,YTrain] = processTurboFanDataTrain(filenameTrainPredictors); XTrain = helperFilter(XTrain); XTrain = helperNormalize(XTrain); thr = str2double(params.Threshold);fori = 1:numel(YTrain) YTrain{i}(YTrain{i} > thr) = thr;endfori=1:numel(XTrain) sequence = XTrain{i}; sequenceLengths(i) = size(sequence,2);end[~,idx] = sort(sequenceLengths,"descend"); XTrain = XTrain(idx); YTrain = YTrain(idx); idx = randperm(numel(XTrain),10); XValidation = XTrain(idx); XTrain(idx) = []; YValidation = YTrain(idx); YTrain(idx) = [];

Define Network Architecture

numResponses = size(YTrain{1},1); featureDimension = size(XTrain{1},1); LSTMDepth = params.LSTMDepth; numHiddenUnits = params.NumHiddenUnits; layers = sequenceInputLayer(featureDimension);fori = 1:LSTMDepth layers = [layers;lstmLayer(numHiddenUnits,OutputMode="sequence")];endlayers = [layers fullyConnectedLayer(100) reluLayer() dropoutLayer(0.5) fullyConnectedLayer(numResponses) regressionLayer];

Specify Training Options

maxEpochs = 300; miniBatchSize = 20; options = trainingOptions("adam",．..ExecutionEnvironment="auto",．..MaxEpochs=maxEpochs,．..MiniBatchSize=miniBatchSize,．..ValidationData={XValidation,YValidation},．..ValidationFrequency=30,．..InitialLearnRate=params.InitialLearnRate,．..LearnRateDropFactor=0.2,．..LearnRateDropPeriod=15,．..GradientThreshold=1,．..Shuffle="never",．..Verbose=false);

end

Filter and Normalize Predictive Maintenance Data

The helper functionhelperFilterfilters the data by removing features with constant values. Features that remain constant for all time steps can negatively impact the training.

function[XTrain,XTest] = helperFilter(XTrain,XTest) m = min([XTrain{:}],[],2); M = max([XTrain{:}],[],2); idxConstant = M == m;fori = 1:numel(XTrain) XTrain{i}(idxConstant,:) = [];ifnargin>1 XTest{i}(idxConstant,:) = [];endendend

The helper functionhelperNormalizenormalizes the training and test predictors to have zero mean and unit variance.

function[XTrain,XTest] = helperNormalize(XTrain,XTest) mu = mean([XTrain{:}],2); sig = std([XTrain{:}],0,2);fori = 1:numel(XTrain) XTrain{i} = (XTrain{i} - mu) ./ sig;ifnargin>1 XTest{i} = (XTest{i} - mu) ./ sig;endendend

Compute Mean of Maximum Absolute Errors

This metric function calculates the maximum absolute error of the trained network, averaged over the training set.

functionmetricOutput = MeanMaxAbsoluteError(trialInfo) net = trialInfo.trainedNetwork; thr = str2double(trialInfo.parameters.Threshold); filenamePredictors = fullfile(tempdir,"turbofan","train_FD001.txt"); [XTrain,YTrain] = processTurboFanDataTrain(filenamePredictors); XTrain = helperFilter(XTrain); XTrain = helperNormalize(XTrain);fori = 1:numel(YTrain) YTrain{i}(YTrain{i} > thr) = thr;endYPred = predict(net,XTrain,MiniBatchSize=1); maxAbsErrors = zeros(1,numel(YTrain));fori=1:numel(YTrain) absError = abs(YTrain{i}-YPred{i}); maxAbsErrors(i) = max(absError);endmetricOutput = mean(maxAbsErrors);end

Plot Predictive Maintenance Sequences

This function plots the true and predicted response sequences to allow you to evaluate the performance of your trained network. This function uses the helper functionshelperFilterandhelperNormalize．查看这些函数的代码s, seeFilter and Normalize Predictive Maintenance Data．

functionplotSequences(net,threshold) filenameTrainPredictors = fullfile(tempdir,"turbofan","train_FD001.txt"); filenameTestPredictors = fullfile(tempdir,"turbofan","test_FD001.txt"); filenameTestResponses = fullfile(tempdir,"turbofan","RUL_FD001.txt"); [XTrain,YTrain] = processTurboFanDataTrain(filenameTrainPredictors); [XTest,YTest] = processTurboFanDataTest(filenameTestPredictors,filenameTestResponses); [XTrain,XTest] = helperFilter(XTrain,XTest); [~,XTest] = helperNormalize(XTrain,XTest);fori = 1:元素个数(YTrain) YTrain {} (YTrain{我}> threshold) = threshold; YTest{i}(YTest{i} > threshold) = threshold;endYPred = predict(net,XTest,MiniBatchSize=1); idx = randperm(100,4); figurefori = 1:numel(idx) subplot(2,2,i) plot(YTest{idx(i)},"--") holdonplot(YPred{idx(i)},".-") holdoffylim([0 threshold+25]) title("Test Observation "+ idx(i)) xlabel("Time Step") ylabel("RUL")endlegend(["Test Data""Predicted"],Location="southwest")end

References

[1] Saxena, Abhinav, Kai Goebel, Don Simon, and Neil Eklund. "Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation."2008 International Conference on Prognostics and Health Management(2008): 1–9.

[2] Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. "An Empirical Exploration of Recurrent Network Architectures."Proceedings of the 32nd International Conference on Machine Learning(2015): 2342–2350.