Preprocess Data for Domain-Specific Deep Learning Applications

Data preprocessing is used for training, validation, and inference. Preprocessing consists of a series of deterministic operations that normalize or enhance desired data features. For example, you can normalize data to a fixed range or rescale data to the size required by the network input layer.

Preprocessing can occur at two stages in the deep learning workflow.

Commonly, preprocessing occurs as a separate step that you complete before preparing the data to be fed to the network. You load your original data, apply the preprocessing operations, then save the result to disk. The advantage of this approach is that the preprocessing overhead is only required once, then the preprocessed images are readily available as a starting place for all future trials of training a network.
If you load your data into a datastore, then you can also apply preprocessing during training by using thetransformandcombinefunctions. For more information, seeDatastores for Deep Learning. The transformed images are not stored in memory. This approach is convenient to avoid writing a second copy of training data to disk if your preprocessing operations are not computationally expensive and do not noticeably impact the speed of training the network.

Data augmentation consists of randomized operations that are applied to the training data while the network is training. Augmentation increases the effective amount of training data and helps to make the network invariant to common distortion in the data. For example, you can add artificial noise to training data so that the network is invariant to noise.

To augment training data, start by loading your data into a datastore. For more information, seeDatastores for Deep Learning. Some built-in datastores apply a specific and limited set of augmentation to data for specific applications. You can also apply your own set of augmentation operations on data in the datastore by using thetransformandcombinefunctions. During training, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set.

Image Processing Applications

Augment image data to simulate variations in the image acquisition. For example, the most common type of image augmentation operations are geometric transformations such as rotation and translation, which simulate variations in the camera orientation with respect to the scene. Color jitter simulates variations of lighting conditions and color in the scene. Artificial noise simulates distortions caused by the electrical fluctuations in the sensor and analog-to-digital conversion errors. Blur simulates an out-of-focus lens or movement of the camera with respect to the scene.

Common image preprocessing operations include noise removal, edge-preserving smoothing, color space conversion, contrast enhancement, and morphology.

If you have Image Processing Toolbox™, then you can process data using these operations as well as any other functionality in the toolbox. For an example that shows how to create and apply these transformations, seeAugment Images for Deep Learning Workflows Using Image Processing Toolbox.

Processing Type	Description	Sample Functions
Resize images	Resize images by a fixed scaling factor or to a target size	`imresize`,`imresize3`(Image Processing Toolbox)
Warp images	Apply random reflection, rotation, scale, shear, and translation to images	`randomAffine2d`(Image Processing Toolbox),`randomAffine3d`(Image Processing Toolbox)
Crop images	Crop an image to a target size from the center or a random position	`centerCropWindow2d`(Image Processing Toolbox),`centerCropWindow3d`(Image Processing Toolbox) `randomWindow2d`(Image Processing Toolbox),`randomCropWindow3d`(Image Processing Toolbox)
Jitter color	Randomly adjust image hue, saturation, brightness, or contrast	`jitterColorHSV`(Image Processing Toolbox)
Simulate noise	Add random Gaussian, Poisson, salt and pepper, or multiplicative noise	`imnoise`(Image Processing Toolbox)
Simulate blur	Add Gaussian or directional motion blur	`imgaussfilt`(Image Processing Toolbox),`imgaussfilt3`(Image Processing Toolbox) `imfilter`(Image Processing Toolbox)

Object Detection

Object detection data consists of an image and bounding boxes that describe the location and characteristics of objects in the image.

If you have Computer Vision Toolbox™, then you can use theImage Labeler(计算机视觉工具箱)and theVideo Labeler(计算机视觉工具箱)apps to interactively label ROIs and export the label data for training a neural network. If you have Automated Driving Toolbox™, then you also use theGround Truth Labeler(Automated Driving Toolbox)app to create labeled ground truth training data.

When you transform an image, you must perform an identical transformation to the corresponding bounding boxes. If you have Computer Vision Toolbox, then you can process bounding box data using the operations in the table. For an example that shows how to create and apply these transformations, seeAugment Bounding Boxes for Object Detection. For more information, seeGetting Started with Object Detection Using Deep Learning(计算机视觉工具箱).

Processing Type	Description	Sample Functions
Resize bounding boxes	Resize bounding boxes by a fixed scaling factor or to a target size	`bboxresize`(计算机视觉工具箱)
Crop bounding boxes	Crop a bounding box to a target size from the center or a random position	`bboxcrop`(计算机视觉工具箱)
Warp bounding boxes	Apply reflection, rotation, scale, shear, and translation to bounding boxes	`bboxwarp`(计算机视觉工具箱)

Semantic Segmentation

Semantic segmentation data consists of images and corresponding pixel labels represented as categorical arrays.

If you have Computer Vision Toolbox, then you can use theImage Labeler(计算机视觉工具箱)and theVideo Labeler(计算机视觉工具箱)应用程序交互式地标签像素和经验ort the label data for training a neural network. If you have Automated Driving Toolbox, then you also use theGround Truth Labeler(Automated Driving Toolbox)app to create labeled ground truth training data.

When you transform an image, you must perform an identical transformation to the corresponding pixel labeled image. If you have Image Processing Toolbox, then you can preprocess pixel label images using the functions in the table and any other toolbox function that supports categorical input. For an example that shows how to create and apply these transformations, seeAugment Pixel Labels for Semantic Segmentation. For more information, seeGetting Started with Semantic Segmentation Using Deep Learning(计算机视觉工具箱).

Processing Type	Description	Sample Functions
Resize pixel labels	Resize pixel label images by a fixed scaling factor or to a target size	`imresize`
Crop pixel labels	Crop a pixel label image to a target size from the center or a random position	`imcrop`(Image Processing Toolbox) `centerCropWindow2d`(Image Processing Toolbox),`centerCropWindow3d`(Image Processing Toolbox) `randomWindow2d`(Image Processing Toolbox),`randomCropWindow3d`(Image Processing Toolbox)
Warp pixel labels	Apply random reflection, rotation, scale, shear, and translation to pixel label images	`randomAffine2d`(Image Processing Toolbox),`randomAffine3d`(Image Processing Toolbox)

Signal Processing Applications

Signal Processing Toolbox™ enables you to denoise, smooth, detrend, and resample signals. You can augment training data with noise, multipath fading, and synthetic signals such as pulses and chirps. You can also create labeled sets of signals by using theSignal Labeler(Signal Processing Toolbox)app and thelabeledSignalSet(Signal Processing Toolbox)object. For an example that shows how to create and apply these transformations, seeWaveform Segmentation Using Deep Learning.

Wavelet Toolbox™ and Signal Processing Toolbox enable you to generate 2-D time-frequency representations of time series data that you can use as image inputs for signal classification applications. For an example, seeClassify Time Series Using Wavelet Analysis and Deep Learning. Similarly, you can extract sequences from signal data to use as input for LSTM networks. For an example, seeClassify ECG Signals Using Long Short-Term Memory Networks(Signal Processing Toolbox).

Communications Toolbox™ expands on signal processing functionality to enable you to perform error correction, interleaving, modulation, filtering, synchronization, and equalization of communication systems. For an example that shows how to create and apply these transformations, seeModulation Classification with Deep Learning.

You can process signal data using the functions in the table as well as any other functionality in each toolbox.

Processing Type	Description	Sample Functions
Clean signals	Apply median filtering or moving average to signal Remove polynomial trend Resample signal to new fixed rate	`medfilt1`(Signal Processing Toolbox),`smoothdata` `去趋势` `downsample`(Signal Processing Toolbox),`interp`(Signal Processing Toolbox),`upsample`(Signal Processing Toolbox)
Filter signals	Perform lowpass, highpass, and bandstop filtering of IIR and FIR signals Design IIR and FIR filters Apply IIR and FIR filters	`bandpass`(Signal Processing Toolbox),`bandstop`(Signal Processing Toolbox),`highpass`(Signal Processing Toolbox),`lowpass`(Signal Processing Toolbox) `butter`(Signal Processing Toolbox),`designfilt`(Signal Processing Toolbox),`fir1`(Signal Processing Toolbox),`gaussdesign`(Signal Processing Toolbox),`rcosdesign`(Signal Processing Toolbox) `filter`
Augment signals	Add white Gaussian noise to signal using Communications Toolbox Adjust time information of the signal, and perform multipath fading using Communications Toolbox Add synthetic chirps and waveforms	`awgn`(Communications Toolbox) `chirp`(Signal Processing Toolbox),`square`(Signal Processing Toolbox),`rectpuls`(Signal Processing Toolbox),`sawtooth`(Signal Processing Toolbox)
Create time-frequency representations	Create spectrograms, scalograms, and other 2-D representations of 1-D signals	`pspectrum`(Signal Processing Toolbox),`xspectrogram`(Signal Processing Toolbox) `fsst`(Signal Processing Toolbox),`ifsst`(Signal Processing Toolbox) `stft`(Signal Processing Toolbox),`istft`(Signal Processing Toolbox) `cwt`(Wavelet Toolbox)
Extract features from signals	Estimate instantaneous frequency and spectral entropy	`instfreq`(Signal Processing Toolbox),`pentropy`(Signal Processing Toolbox)

音频的过程ing Applications

Audio Toolbox™ provides tools for audio processing, speech analysis, and acoustic measurement. Use these tools to extract auditory features and transform audio signals. Augment audio data with randomized or deterministic time scaling, time stretching, and pitch shifting. You can also create labeled ground truth training data by using theAudio Labeler(Audio Toolbox)app. You can process audio data using the functions in this table as well as any other functionality in the toolbox. For an example that shows how to create and apply these transformations, seeAugment Audio Dataset(Audio Toolbox).

Processing Type Description Sample Functions Sample Output

Augment audio data

Perform random or deterministic pitch shifting, time-scale modification, time shifting, noise addition, and volume control

Processing Type	Description	Sample Functions	Sample Output
Augment audio data	Perform random or deterministic pitch shifting, time-scale modification, time shifting, noise addition, and volume control	`audioDataAugmenter`(Audio Toolbox),`audioTimeScaler`(Audio Toolbox),`shiftPitch`(Audio Toolbox),`stretchAudio`(Audio Toolbox)
Extract audio features	Extract spectral parameters from audio segments	`audioFeatureExtractor`(Audio Toolbox),`mfcc`(Audio Toolbox)	Processed output: ans = struct with fields: mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13] mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26] mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39] spectralCentroid: 40 pitch: 41
Create time-frequency representations	Create mel spectrograms and other 2-D representations of audio signals	`melSpectrogram`(Audio Toolbox),`mdct`(Audio Toolbox)

audioDataAugmenter(Audio Toolbox),audioTimeScaler(Audio Toolbox),shiftPitch(Audio Toolbox),stretchAudio(Audio Toolbox)

Extract audio features

Extract spectral parameters from audio segments

audioFeatureExtractor(Audio Toolbox),mfcc(Audio Toolbox)

Processed output:

ans = struct with fields: mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13] mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26] mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39] spectralCentroid: 40 pitch: 41

Create time-frequency representations

Create mel spectrograms and other 2-D representations of audio signals

melSpectrogram(Audio Toolbox),mdct(Audio Toolbox)

Text Analytics

Text Analytics Toolbox™ includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. Use these tools to extract text from popular file formats, preprocess raw text, extract individual words or multiword phrases (n-grams), convert text into numerical representations, and build statistical models. You can process text data using the functions in this table as well as any other functionality in the toolbox. For an example showing how to get started, seePrepare Text Data for Analysis(Text Analytics Toolbox).

Processing Type Description Sample Functions Sample Output

Tokenize text

Parse text into words and punctuation

Processing Type	Description	Sample Functions	Sample Output
Tokenize text	Parse text into words and punctuation	`tokenizedDocument`(Text Analytics Toolbox)	Original: `"A few tree limbs greater than 6 inches down on HWY 18 in Roseland."` Processed output: `15 tokens: A few tree limbs greater than 6 inches down on HWY 18 in Roseland.`
Clean text	Remove variations in word forms and case Remove punctuation Remove stop words, short words, and long words	`normalizeWords`(Text Analytics Toolbox) `erasePunctuation`(Text Analytics Toolbox) `removeStopWords`(Text Analytics Toolbox),`removeShortWords`(Text Analytics Toolbox),`removeLongWords`(Text Analytics Toolbox)	Processed output: `15 tokens: a few tree limb great than 6 inch down on hwy 18 in roseland.` `14 tokens: a few tree limb great than 6 inch down on hwy 18 in roseland` `8 tokens: few tree limb great inch down hwy roseland`

tokenizedDocument(Text Analytics Toolbox)

Original:

"A few tree limbs greater than 6 inches down on HWY 18 in Roseland."

Processed output:

15 tokens: A few tree limbs greater than 6 inches down on HWY 18 in Roseland.

Clean text

Remove variations in word forms and case
Remove punctuation
Remove stop words, short words, and long words

normalizeWords(Text Analytics Toolbox)
erasePunctuation(Text Analytics Toolbox)
removeStopWords(Text Analytics Toolbox),removeShortWords(Text Analytics Toolbox),removeLongWords(Text Analytics Toolbox)

Processed output:

15 tokens: a few tree limb great than 6 inch down on hwy 18 in roseland.

14 tokens: a few tree limb great than 6 inch down on hwy 18 in roseland

8 tokens: few tree limb great inch down hwy roseland