Main Content

audioFeatureExtractor

Streamline audio feature extraction

Since R2019b

Description

audioFeatureExtractorencapsulates multiple audio feature extractors into a streamlined and modular implementation.

Creation

Description

aFE= audioFeatureExtractor()creates an audio feature extractor with default property values.

example

aFE= audioFeatureExtractor(Name=Value)specifies nondefault properties foraFEusing one or more name-value arguments.

Properties

expand all

Main Properties

Analysis window, specified as a real vector.

Data Types:single|double

重叠相邻分析窗口的长度,specified as an integer in the range [0,numel(Window)).

Data Types:single|double

FFT length, specified as an integer. The default value of[]means that the FFT length is equal to the window lengthnumel(Window).

Data Types:single|double

Input sample rate in Hz, specified as a positive scalar.

Data Types:single|double

Input to spectral descriptors, specified as"linearSpectrum","melSpectrum","barkSpectrum", or"erbSpectrum".

Spectral descriptors affected by this property are:

The spectrum input to the spectral descriptors is the same as output from the corresponding feature:

For example, if you setSpectralDescriptorInputto"barkSpectrum", andspectralCentroidtotrue, thenaFEreturns the centroid of the default Bark spectrum.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); aFE = audioFeatureExtractor(SampleRate=fs,...SpectralDescriptorInput="barkSpectrum",...spectralCentroid=true); barkSpectralCentroid = extract(aFE,audioIn);
If you specify a nondefaultbarkSpectrumusingsetExtractorParameters, then the nondefault Bark spectrum is the input to the spectral descriptors. For example, if you callsetExtractorParameters(aFE,"barkSpectrum",NumBands=40), thenaFEreturns the centroid of a 40-band Bark spectrum.

setExtractorParameters(aFE,"barkSpectrum",NumBands=40) bark40SpectralCentroid = extract(aFE,audioIn);

Data Types:char|string

This property is read-only.

Total number of features output fromextractfor the current object configuration, specified as a positive integer.FeatureVectorLengthis equal to the second dimension of the output from theextractfunction.

Data Types:single|double

Features to Extract

Extract the one-sided linear spectrum, specified astrueorfalse.

To set parameters of the linear spectrum extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"linearSpectrum",Name=Value)
Settable parameters for the linear spectrum extraction are:

  • FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0,SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,SampleRate/2].

  • SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".

  • WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.

Data Types:logical

Extract the one-sided mel spectrum, specified astrueorfalse.

To set parameters of the mel spectrum extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"melSpectrum",Name=Value)
Settable parameters for the mel spectrum extraction are:

  • FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0,SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,SampleRate/2].

  • SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".

  • NumBands–– Number of mel bands, specified as an integer. If unspecified,NumBandsdefaults to32.

  • FilterBankNormalization–– Normalization applied to bandpass filters, specified as"bandwidth","area", or"none". If unspecified,FilterBankNormalizationdefaults to"bandwidth".

  • WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.

  • FilterBankDesignDomain–– Domain in which the filter bank is designed, specified as either"linear"or"warped". If unspecified,FilterBankDesignDomaindefaults to"linear".

Data Types:logical

Extract the one-sided Bark spectrum, specified astrueorfalse.

To set parameters of the Bark spectrum extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"barkSpectrum",Name=Value)
Settable parameters for the Bark spectrum extraction are:

  • FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0,SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,SampleRate/2].

  • SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".

  • NumBands–– Number of Bark bands, specified as an integer. If unspecified,NumBandsdefaults to32.

  • FilterBankNormalization–– Normalization applied to bandpass filters, specified as"bandwidth","area", or"none". If unspecified,FilterBankNormalizationdefaults to"bandwidth".

  • WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.

  • FilterBankDesignDomain–– Domain in which the filter bank is designed, specified as either"linear"or"warped". If unspecified,FilterBankDesignDomaindefaults to"linear".

Data Types:logical

Extract the one-sided ERB spectrum, specified astrueorfalse.

To set parameters of the ERB spectrum extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"erbSpectrum",Name=Value)
Settable parameters for the ERB spectrum extraction are:

  • FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0,SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,SampleRate/2].

  • SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".

  • NumBands–– Number of ERB bands, specified as an integer. If unspecified,NumBandsdefaults toceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1))).

  • FilterBankNormalization–– Normalization applied to bandpass filters, specified as"bandwidth","area", or"none". If unspecified,FilterBankNormalizationdefaults to"bandwidth".

  • WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.

Data Types:logical

Extract mel-frequency cepstral coefficients (MFCC), specified astrueorfalse.

To set parameters of the MFCC extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"mfcc",Name=Value)
Settable parameters for the MFCC extraction are:

  • NumCoeffs–– Number of coefficients returned for each window, specified as a positive integer. If unspecified,NumCoeffsdefaults to13.

  • DeltaWindowLength–– Delta window length, specified as an odd integer greater than 2. If unspecified,DeltaWindowLengthdefaults to9. This parameter affects themfccDeltaandmfccDeltaDeltafeatures.

  • Rectification–– Type of nonlinear rectification, specified as"log"or"cubic-root".

The mel-frequency cepstral coefficients are calculated using themelSpectrum.

Data Types:logical

Extract delta of MFCC, specified astrueorfalse.

The delta MFCC is calculated based on the extracted MFCC. Parameters set onmfccaffectmfccDelta.

Data Types:logical

Extract delta-delta of MFCC, specified astrueorfalse.

The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set onmfccaffectmfccDeltaDelta.

Data Types:logical

Extract gammatone cepstral coefficients (GTCC), specified astrueorfalse.

To set parameters of the GTCC extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"gtcc",Name=Value)
Settable parameters for the GTCC extraction are:

  • NumCoeffs–– Number of coefficients returned for each window, specified as a positive integer. If unspecified,NumCoeffsdefaults to13.

  • DeltaWindowLength–– Delta window length, specified as an odd integer greater than 2. If unspecified,DeltaWindowLengthdefaults to9. This parameter affects thegtccDeltaandgtccDeltaDeltafeatures.

  • Rectification–– Type of nonlinear rectification, specified as"log"or"cubic-root".

The gammatone cepstral coefficients are calculated using theerbSpectrum.

Data Types:logical

Extract delta of GTCC, specified astrueorfalse.

The delta GTCC is calculated based on the extracted GTCC. Parameters set ongtccaffectgtccDelta.

Data Types:logical

Extract delta-delta of GTCC, specified astrueorfalse.

The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set ongtccaffectgtccDeltaDelta.

Data Types:logical

Extract spectral centroid, specified astrueorfalse.

The spectral centroid is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral crest, specified astrueorfalse.

The spectral crest is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral decrease, specified astrueorfalse.

The spectral decrease is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral entropy, specified astrueorfalse.

The spectral entropy is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral flatness, specified astrueorfalse.

The spectral flatness is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral flux, specified astrueorfalse.

The spectral flux is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

To set parameters of the spectral flux extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"spectralFlux",Name=Value)
Settable parameters for the spectral flux extraction are:

  • NormType–– Norm type used to calculate the spectral flux, specified as1or2. If unspecified,NormTypedefaults to2.

Data Types:logical

Extract spectral kurtosis, specified astrueorfalse.

The spectral kurtosis is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral rolloff point, specified astrueorfalse.

The spectral rolloff point is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

To set parameters of the spectral rolloff point extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"spectralRolloffPoint",Name=Value)
Settable parameters for the spectral flux extraction are:

  • Threshold–– Threshold of the rolloff point, specified as a scalar in the range (0, 1). If unspecified,Thresholddefaults to0.95.

Data Types:logical

Extract spectral skewness, specified astrueorfalse.

The spectral skewness is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral slope, specified astrueorfalse.

The spectral slope is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract spectral spread, specified astrueorfalse.

The spectral spread is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract pitch, specified astrueorfalse.

To set parameters of the pitch extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"pitch",Name=Value)
Settable parameters for the pitch extraction are:

  • Method–– Method used to calculate the pitch, specified as"PEF","NCF","CEP","LHS", or"SRH". If unspecified,Methoddefaults to"NCF". For a description of available pitch extraction methods, seepitch.

  • Range–– Range within to search for the pitch in Hz, specified as a two-element row vector of increasing values. If unspecified,Rangedefaults to[50,400].

  • MedianFilterLength–– Median filter length used to smooth pitch estimates over time, specified as a positive integer. If unspecified,MedianFilterLengthdefaults to1(no median filtering).

Data Types:logical

Extract harmonic ratio, specified astrueorfalse.

Data Types:logical

Extract zero-crossing rate, specified astrueorfalse.

To set parameters of the zero-crossing rate extraction, usesetExtractorParameters:

setExtractorParameters(aFE,"zerocrossrate",Name=Value)
Settable parameters for the zero-crossing rate extraction are:

  • Method–– Method for computing the zero-crossing rate, specified as"difference"or"comparison". If unspecified,Method, defaults to"difference". For more information, seezerocrossrate.

  • Level–– Signal level for which the crossing rate is computed, specified as a real scalar.audioFeatureExtractorsubtracts theLevelvalue from the signal and then finds the zero crossings. If unspecified,Leveldefaults to0.

  • Threshold–– Threshold above and below theLevelvalue over which the crossing rate is computed, specified as a real scalar.audioFeatureExtractorsets all the values of the input in the range[–Threshold,Threshold]to0and then finds the zero crossings. If unspecified,Thresholddefaults to0.

  • TransitionEdge— Transitions to include when counting zero crossings, specified as"falling","rising", or"both". If you specify"falling", only negative-going transitions are counted. If you specify"rising", only positive-going transitions are counted. If unspecified,TransitionEdgedefaults to"both".

  • ZeroPositive— Sign convention, specified as a logical scalar. If you specifyZeroPositiveastrue, then0is considered positive. If you specifyZeroPositiveasfalse, thenaudioFeatureExtractorconsiders0,–1, and+1to have distinct signs following the convention of thesignfunction. If unspecified,ZeroPositivedefaults tofalse.

Data Types:logical

Extract short-time energy, specified astrueorfalse. The short-time energy is computed using

sTE = sum(xbw.^2,1),

wherexbwis the buffered and windowed signal.

Example: Chirp Function

Generate a chirp sampled at 1 kHz for 3 seconds. The instantaneous frequency is 100 Hz at t = 0 and crosses 200 Hz at t = 1 second. Divide the signal into 103-sample segments with 43 samples of overlap between adjoining segments. Window each segment with a periodic Hamming window.

fs = 1e3; x = chirp(0:1/fs:3,100,1,200)'; win = hamming(103,"periodic"); nover = 43; [xb,~] = buffer(x,length(win),nover,"nodelay"); xbw = xb.*win;

Compute the short-time energy using the definition.

Edef = sum(xbw.^2,1)';

UseaudioFeatureExtractorto compute the short-time energy.

EaFE = extract(audioFeatureExtractor(shortTimeEnergy=true,...SampleRate=fs,Window=win,OverlapLength=nover),x);

Verify that both procedures give the same short-time energy.

dff = max(abs(EaFE-Edef))
dff = 0

Data Types:logical

Object Functions

extract Extract audio features
setExtractorParameters Set nondefault parameter values for individual feature extractors
info Output mapping and individual feature extractor parameters
generateMATLABFunction CreateMATLABfunction compatible with C/C++ code generation
plotFeatures Plot extracted audio features

Examples

collapse all

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create anaudioFeatureExtractorobject that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. Use a 30 ms analysis window with 20 ms overlap.

aFE = audioFeatureExtractor (...SampleRate=fs,...Window=hamming(round(0.03*fs),"periodic"),...OverlapLength=round(0.02*fs),...mfcc=true,...mfccDelta=true,...mfccDeltaDelta=true,...pitch=true,...spectralCentroid=true,...zerocrossrate=true,...shortTimeEnergy=true);

Callextractto extract the audio features from the audio signal.

features = extract(aFE,audioIn);

Useinfoto determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

idx = info(aFE)
idx =struct with fields:mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13] mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26] mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39] spectralCentroid: 40 pitch: 41 zerocrossrate: 42 shortTimeEnergy: 43

Plot the detected pitch over time.

t = linspace(0,size(audioIn,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title("Pitch") xlabel("Time (s)") ylabel("Frequency (Hz)")

Figure contains an axes object. The axes object with title Pitch, xlabel Time (s), ylabel Frequency (Hz) contains an object of type line.

Plot the zero-crossing rate over time.

plot(t,features(:,idx.zerocrossrate)) title("Zero-Crossing Rate") xlabel("Time (s)")

Figure contains an axes object. The axes object with title Zero-Crossing Rate, xlabel Time (s) contains an object of type line.

Plot the short-time energy over time.

plot(t,features(:,idx.shortTimeEnergy)) title("Short-Time Energy") xlabel("Time (s)")

Figure contains an axes object. The axes object with title Short-Time Energy, xlabel Time (s) contains an object of type line.

Create an audio datastore that points to audio samples included with Audio Toolbox®.

folder = fullfile(matlabroot,"toolbox","audio","samples"); ads = audioDatastore(folder);

Find all files that correspond to a sample rate of 44.1 kHz and thensubsetthe datastore.

keepFile = cellfun(@(x)contains(x,"44p1"),ads.Files); ads = subset(ads,keepFile);

Convert the data to atallarray.tallarrays are evaluated only when you request them explicitly usinggather. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple workers. The audio data is represented as anM-by-1 tall cell array, whereMis the number of files in the audio datastore.

adsTall = tall(ads)
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). adsTall = M×1 tall cell array { 539648×1 double} { 227497×1 double} { 8000×1 double} { 685056×1 double} { 882688×2 double} {1115760×2 double} { 505200×2 double} {3195904×2 double} : : : :

Create anaudioFeatureExtractorobject to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.

aFE = audioFeatureExtractor (SampleRate=44.1e3,...melSpectrum=true,...barkSpectrum=true,...erbSpectrum=true,...linearSpectrum=true);

Define acellfunfunction so that audio features are extracted from each cell of the tall array. Callgatherto evaluate the tall array.

specsTall = cellfun(@(x)extract(aFE,x),adsTall,UniformOutput=false); specs = gather(specsTall);
Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 14 sec Evaluation completed in 14 sec

Thespecsvariable returned from gather is anumFiles-by-1 cell array, wherenumFilesis the number of files in the datastore. Each element of the cell array is anumHops-by-numFeatures-by-numChannelsarray, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

numFiles = numel(specs)
numFiles = 12
[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})
numHops1 = 1053
numFeaturesFile1 = 620
numChanelsFile1 = 1
[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})
numHops2 = 443
numFeaturesFile2 = 620
numChanelsFile2 = 1

UseplotFeaturesto visualize audio features extracted with anaudioFeatureExtractorobject.

Read in an audio signal from a file.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

Create anaudioFeatureExtractorobject that extracts the gammatone cepstral coefficients (GTCCs) and the delta of the GTCCs. Set theSampleRateproperty to the sample rate of the audio signal, and use the default values for the other properties.

afe = audioFeatureExtractor (SampleRate = fs, gtcc = true,gtccDelta=true);

Plot the features extracted from the audio signal.

plotFeatures(afe,audioIn)

Figure audioFeatureExtractor contains 2 axes objects and another object of type uipanel. Axes object 1 with title GTCC, xlabel Time (s), ylabel Coefficient contains an object of type image. Axes object 2 with title GTCC Delta, xlabel Time (s), ylabel Coefficient contains an object of type image.

Algorithms

TheaudioFeatureExtractorcreates a feature extraction pipeline based on your selected features. To reduce computations,audioFeatureExtractorreuses intermediary representations and outputs some intermediate representations as features.

例如,要创建一个对象that extracts the centroid of the Bark spectrum, the flux of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify theaudioFeatureExtractoras follows.

aFE = audioFeatureExtractor (...SpectralDescriptorInput="barkSpectrum",...spectralCentroid=true,...spectralFlux=true,...pitch=true,...harmonicRatio=true,...mfccDeltaDelta=true)
aFE = audioFeatureExtractor with properties: Properties Window: [1024×1 double] OverlapLength: 512 SampleRate: 44100 FFTLength: [] SpectralDescriptorInput: 'barkSpectrum' Enabled Features mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio Disabled Features linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread To extract a feature, set the corresponding property to true. For example, obj.mfcc = true, adds mfcc to the list of enabled features.
This configuration corresponds to the highlighted feature extraction pipeline.

Note

BecauseaudioFeatureExtractorreuses intermediary representations, the features output fromaudioFeatureExtractormight not correspond with the default configuration of features output by corresponding individual feature extractors.

Extended Capabilities

Version History

Introduced in R2019b

expand all