Pretrained Models

Transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks

Audio Toolbox™ provides MATLAB^®and Simulink^®support for pretrained audio deep learning networks. Locate and classify sounds with YAMNet and estimate pitch with CREPE. Extract VGGish or OpenL3 feature embeddings to input to machine learning and deep learning systems. Use i-vector systems to produce compact representations of audio signals for applications such as speaker recognition, verification, identification, and diarization, speech emotion recognition, and acoustic machine fault detection.

This functionality requires Deep Learning Toolbox™. The Audio Toolbox pretrained networks are available inDeep Network Designer(Deep Learning Toolbox).

Functions

expand all

VGGish

`vggish`	VGGish neural network
`vggishPreprocess`	Preprocess audio for VGGish feature extraction
`vggishEmbeddings`	Extract VGGish feature embeddings

YAMNet

`classifySound`	Classify sounds in audio signal
`yamnet`	YAMNet neural network
`yamnetGraph`	Graph of YAMNet AudioSet ontology
`yamnetPreprocess`	Preprocess audio for YAMNet classification

OpenL3

`openl3`	OpenL3神经网络
`openl3Preprocess`	Preprocess audio for OpenL3 feature extraction
`openl3Embeddings`	Extract OpenL3 feature embeddings

CREPE

`crepe`	CREPE neural network
`crepePreprocess`	Preprocess audio for CREPE deep learning network
`crepePostprocess`	Postprocess output of CREPE deep learning network
`pitchnn`	Estimate pitch with deep learning neural network

i-Vectors

`ivectorSystem`	Create i-vector system
`speakerRecognition`	Pretrained speaker recognition system

Blocks

expand all

VGGish

VGGish Embeddings	Extract VGGish embeddings
VGGish Preprocess	Preprocess audio for VGGish feature extraction
VGGish	VGGish embeddings extraction network

YAMNet

Sound Classifier	Classify sounds in audio signal
YAMNet	YAMNet sound classification network
YAMNet Preprocess	Preprocess audio for YAMNet classification

Apps

Deep Network Designer

Design, visualize, and train deep learning networks

Related Information

Speech-to-Text Transcription Using wav2vec 2.0

Featured Examples

Transfer Learning with Pretrained Audio Networks in Deep Network Designer

Interactively fine-tune a pretrained network to classify new audio signals using Deep Network Designer.

Open Live Script

3-D Sound Event Localization and Detection Using Trained Recurrent Convolutional Neural Network

Perform 3-D sound event localization and detection using a pretrained deep learning model.

Open Live Script

Speech Command Recognition in Simulink

Detect the presence of speech commands in audio using a Simulink model.

Open Model

Investigate Audio Classifications Using Deep Learning Interpretability Techniques

Use interpretability techniques to investigate the predictions of a deep neural network trained to classify audio data.

Open Live Script