Main Content

matlab.io.datastore.minibatchableclass

Package:matlab.io.datastore

Add mini-batch support to datastore

描述

matlab.io.datastore.minibatchableis an abstract mixin class that adds support for mini-batches to your custom datastore for use with Deep Learning Toolbox™. A mini-batch datastore contains training and test data sets for use in Deep Learning Toolbox training, prediction, and classification.

To use this mixin class, you must inherit from thematlab.io.datastore.minibatchable除了继承matlab.io.Datastore基类。键入以下语法为类定义文件的第一行:

ClassDefmydatastore< matlab.io.Datastore & ... matlab.io.datastore.MiniBatchable ... end

To add support for mini-batches to your datastore:

  • 从其他类继承matlab.io.datastore.minibatchable

  • Define two additional properties:MinibatchSizeandNumObservations

有关创建自定义迷你批次数据存储的更多详细信息和步骤,以优化培训,预测和分类期间的性能,请参见开发自定义迷你批次数据存储

特性

展开全部

Number of observations that are returned in each batch, or call of theread功能。对于培训,预测和分类,MinibatchSize属性设置为定义的迷你批量大小trainingOptions

属性:

抽象的 true
使用权 Public

数据存储中包含的观察总数。这一数量的观察是一个训练时期的长度。

属性:

抽象的 true
放使用权 Protected
ReadAccess Public

属性

抽象的 true
Sealed 错误的

For information on class attributes, see类属性

Copy Semantics

处理。要了解处理课程如何影响复制操作,请参见复制对象

例子

全部收缩

This example shows how to train a deep learning network on out-of-memory sequence data by transforming and combining datastores.

A transformed datastore transforms or processes data read from an underlying datastore. You can use a transformed datastore as a source of training, validation, test, and prediction data sets for deep learning applications. Use transformed datastores to read out-of-memory data or to perform specific preprocessing operations when reading batches of data. When you have separate datastores containing predictors and labels, you can combine them so you can input the data into a deep learning network.

When training the network, the software creates mini-batches of sequences of the same length by padding, truncating, or splitting the input data. For in-memory data, thetrainingOptions函数提供了填充和截断输入序列的选项,但是,对于存储数据外的数据,您必须手动填充和截断序列。

负载培训数据

Load the Japanese Vowels data set as described in [1] and [2]. The zip filejapaneseVowels.zipcontains sequences of varying length. The sequences are divided into two folders,火车andTest,分别包含训练序列和测试序列。在这些文件夹中的每个文件夹中,序列都分为子文件夹,这些子文件夹从1to9。这names of these subfolders are the label names. A MAT file represents each sequence. Each sequence is a matrix with 12 rows, with one row for each feature, and a varying number of columns, with one column for each time step. The number of rows is the sequence dimension and the number of columns is the sequence length.

解压缩序列数据。

filename =“ japanyvowels.zip”; outputFolder = fullfile(tempdir,"japaneseVowels");unzip(filename,outputFolder);

对于培训预测指标,创建文件数据存储并指定读取功能为加载功能。这加载function, loads the data from the MAT-file into a structure array. To read files from the subfolders in the training folder, set the'IncludeSubfolders'option totrue

foldertrain = fullfile(输出折叠器,“火车”);fdsPredictorTrain = fileDatastore(folderTrain,。。。'ReadFcn',@load,。。。'IncludeSubfolders',真的);

Preview the datastore. The returned struct contains a single sequence from the first file.

preview(fdsPredictorTrain)
ans =带有字段的结构:X:[12×20双]

对于标签,创建一个文件数据存储,并指定读取函数为readLabel函数,定义在示例的末尾。这readLabelfunction extracts the label from the subfolder name.

classNames =字符串(1:9);fdslabeltrain = filedatastore(foldertrain,。。。'ReadFcn',@(filename)readlabel(文件名,classNames),。。。'IncludeSubfolders',真的);

Preview the datastore. The output corresponds to the label of the first file.

预览(fdslabeltrain)
ans =分类1

转换并组合数据存储

要将序列数据从预测变量数据存储到深度学习网络中输入序列数据,序列的迷你批次必须具有相同的长度。使用padSequencefunction, defined at the end of the datastore, that pads or truncates the sequences to have length 20.

sequenceLength = 20; tdsTrain = transform(fdsPredictorTrain,@(data) padSequence(data,sequenceLength));

Preview the transformed datastore. The output corresponds to the padded sequence from the first file.

X = preview(tdsTrain)
X =1×1 cell array{12×20 double}

要将两个数据库的预测变量和标签输入到深度学习网络中,请使用combine功能。

cdstrain = combine(tdstrain,fdslabeltrain);

Preview the combined datastore. The datastore returns a 1-by-2 cell array. The first element corresponds to the predictors. The second element corresponds to the label.

preview(cdsTrain)
ans =1×2单元格数组{12×20 double} {[1]}

Define LSTM Network Architecture

定义LSTM网络体系结构。将输入数据的功能数量指定为输入大小。指定具有100个隐藏单元的LSTM层,并输出序列的最后一个元素。最后,指定一个完全连接的层,其输出大小等于类的数量,然后是软磁层和分类层。

numFeatures = 12; numClasses = numel(classNames); numHiddenUnits = 100; layers = [。。。sequenceInputLayer(numFeatures) lstmLayer(numHiddenUnits,'outputmode','last')完整连接的layerer(numClasses)softmaxlayer classificationlayer];

Specify the training options. Set the solver to'adam'and“梯度阈值”to 2. Set the mini-batch size to 27 and set the maximum number of epochs to 75. The datastores do not support shuffling, so set“洗牌”to'never'

由于迷你批次很小,序列短,因此CPU更适合训练。放'ExecutionEnvironment'to'cpu'。在GPU上训练,如果有的话,请设置'ExecutionEnvironment'to'auto'(the default value).

minibatchsize = 27;选项=训练('adam',。。。'ExecutionEnvironment','cpu',。。。'MaxEpochs',75,。。。“ MINIBATCHSIZE”,minibatchsize,。。。“梯度阈值”,2,。。。“洗牌”,'never',。。。“冗长”,0,。。。'Plots','training-progress');

火车the LSTM network with the specified training options.

net = trainnetwork(CDSTRAIN,层,选项);

Test the Network

创建一个转换包含了数据存储-out test data using the same steps as for the training data.

foldertest = fullfile(outputFolder,"Test");fdspredictEttest = filedatastore(foldertest,。。。'ReadFcn',@load,。。。'IncludeSubfolders',真的);tdsTest = transform(fdsPredictorTest,@(data) padSequence(data,sequenceLength));

Make predictions on the test data using the trained network.

YPred = classify(net,tdsTest,“ MINIBATCHSIZE”,minibatchSize);

Calculate the classification accuracy on the test data. To get the labels of the test set, create a file datastore with the read functionreadLabel并指定包括子文件夹。指定输出是通过设置的垂直限制的“统一”option totrue

fdsLabelTest = fileDatastore(folderTest,。。。'ReadFcn',@(filename)readlabel(文件名,classNames),。。。'IncludeSubfolders',true,。。。“统一”,真的);ytest = readall(fdslabeltest);
accuracy = mean(YPred == YTest)
精度= 0.9351

Functions

readLabel功能从指定的文件名中提取标签classNames

functionlabel = readLabel(filename,classNames) filepath = fileparts(filename); [~,label] = fileparts(filepath); label = categorical(string(label),classNames);结尾

padSequencefunction pads or truncates the sequence indata.xto have the specified sequence length and returns the result in a 1-by-1 cell.

function序列= padSequence(数据、sequenceLength)顺ce = data.X; [C,S] = size(sequence);如果S < sequenceLength padding = zeros(C,sequenceLength-S); sequence = [sequence padding];别的sequence = sequence(:,1:sequenceLength);结尾sequence = {sequence};结尾

References

[1] Kudo, M., J. Toyama, and M. Shimbo. "Multidimensional Curve Classification Using Passing-Through Regions."图案识别字母。卷。20,第11-13号,第1103–111111111页。

[2] Kudo,M.,J。Toyama和M. Shimbo。日本元音数据集。https://archive.ics.uci.edu/ml/datasets/Japanese+Vowels

Version History

Introduced in R2018a

展开全部

不建议从R2019A开始