Audio Processing Using Deep Learning

通过音频和语音处理应用扩展深度学习工作流程

Apply deep learning to audio and speech processing applications by using Deep Learning Toolbox™ together with Audio Toolbox™. For signal processing applications, seeSignal Processing Using Deep Learning。For applications in wireless communications, see使用深度学习的无线通信。

应用

信号标签	Label signal attributes, regions, and points of interest, and extract features

功能

expand all

数据管理和增强

`audiodatastore`	用于收集音频文件的数据存储
`Audiodataaugmenter`	Augment audio data

Feature Extraction

`audioFeatureExtractor`	Streamline audio feature extraction
`Openl3embeddings`	Extract OpenL3 feature embeddings
`pitchnn`	通过深度学习神经网络估计音调
`vggishEmbeddings`	Extract VGGish feature embeddings

Pretrained Networks

`分类`	Classify sounds in audio signal
`crepe`	可丽饼神经网络
`Crepepreprecess`	可丽饼深度学习网络的预处理音频
`crepePostprocess`	Postprocess output of CREPE deep learning network
`OpenL3`	OpenL3 neural network
`Openl3embeddings`	Extract OpenL3 feature embeddings
`OpenL3Preprocess`	Preprocess audio for OpenL3 feature extraction
`pitchnn`	通过深度学习神经网络估计音调
`vggish`	VGGish neural network
`vggishEmbeddings`	Extract VGGish feature embeddings
`VGGISHPRECESS`	VGGISH功能提取的预处理音频
`yamnet`	Yamnet神经网络
`yamnetGraph`	Yamnet Audioset本体论图
`yamnetPreprocess`	Preprocess audio for YAMNet classification

Blocks

VGGish	VGGish embeddings extraction network
vggish嵌入	提取vggish嵌入
Yamnet	Yamnetsound classification network
Sound Classifier	Classify sounds in audio signal

话题

音频应用深度学习简介（音频工具箱）
学习常见的工具和工作流程，将深度学习应用于音频应用程序。
使用深度学习对声音进行分类（音频工具箱）
训练，验证和测试简单的长期记忆（LSTM）以对声音进行分类。
Transfer Learning with Pretrained Audio Networks in Deep Network Designer
此示例显示了如何使用深层网络设计器进行验证的网络进行交互微调以对新的音频信号进行分类。
Speaker Identification Using Custom SincNet Layer and Deep Learning
Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.
Dereverberate Speech Using Deep Learning Networks
训练一个深度学习模型，从语音中消除混响。
Speech Command Recognition in Simulink
Detect the presence of speech commands in audio using a Simulink^®模型。
音频功能的顺序功能选择
This example shows a typical workflow for feature selection applied to the task of spoken digit recognition.
Train Spoken Digit Recognition Network Using Out-of-Memory Audio Data
此示例使用转换后的数据存储台上训练在不可存储的音频数据上的口语识别网络。
火车口语数字识别网络使用不可存储的功能
该示例使用转换后的数据存储器在失调的听觉谱图上训练一个口头数字识别网络。
使用深度学习能力技术调查音频分类
此示例显示了如何使用可解释性技术来研究经过对音频数据进行分类的深神经网络的预测。

特色示例

Train 3-D Sound Event Localization and Detection (SELD) Using Deep Learning

训练深度学习模型，从Ambisonic数据执行声音本地化和事件检测。

Open Live Script

3-D Sound Event Localization and Detection Using Trained Recurrent Convolutional Neural Network

Perform 3-D sound event localization and detection using a pretrained deep learning model.

Open Live Script

使用X向量的说话者识别

开发一个X-Vector系统来执行说话者的识别。

Open Live Script

使用X-向量的扬声器诊断

说话者诊断是根据说话者身份将音频信号分配为细分市场的过程。它回答了“谁在何时说话”的问题，没有说话者的先验知识，并取决于应用程序，而没有先验的说话者数量。

Open Live Script

使用深度学习的语音命令识别

训练一个深度学习模型，该模型检测到音频中语音命令的存在。该示例使用语音命令数据集[1]来训练卷积神经网络以识别给定的一组命令。

Open Live Script

使用MFCC和LSTM网络中的噪声中关键字发现

Identify a keyword in noisy speech using a deep learning network. In particular, the example uses a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficients (MFCC).

Open Live Script

Denoise Speech Using Deep Learning Networks

Denoise speech signals using deep learning networks. The example compares two types of networks applied to the same task: fully connected, and convolutional.

Open Live Script

火车生成对抗网络（GAN）进行声音综合

Train and use a generative adversarial network (GAN) to generate sounds.

Open Live Script

Voice Activity Detection in Noise Using Deep Learning

Detect regions of speech in a low signal-to-noise environment using deep learning. The example uses the Speech Commands Dataset to train a Bidirectional Long Short-Term Memory (BiLSTM) network to detect voice activity.

Open Live Script

言语情感识别

使用Bilstm网络说明了简单的语音情感识别（SER）系统。您首先下载数据集，然后在单个文件上测试训练有素的网络。该网络接受了小型德语数据库的培训[1]。

Open Live Script

使用晚融合的声学场景识别

Create a multi-model late fusion system for acoustic scene recognition. The example trains a convolutional neural network (CNN) using mel spectrograms and an ensemble classifier using wavelet scattering. The example uses the TUT dataset for training and evaluation [1].

Open Live Script

End-to-End Deep Speech Separation

使用端到端的深度学习网络进行独立的语音分离。

Open Live Script

基于声学的机器故障识别

开发一个深度学习模型，以检测空气压缩机中的故障并打包系统以在流数据上操作。

Open Live Script

Accelerate Audio Deep Learning Using GPU-Based Feature Extraction

利用GPU进行功能提取，以减少训练音频深度学习模型所需的时间。

Open Live Script

关键字在Raspberry pi上生成噪声代码中发现

Demonstrates code generation for keyword spotting using a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficient (MFCC) feature extraction on Raspberry Pi™. MATLAB® Coder™ with Deep Learning Support enables the generation of a standalone executable (.elf) file on Raspberry Pi. Communication between MATLAB® (.mlx) file and the generated executable file occurs over asynchronous User Datagram Protocol (UDP). The incoming speech signal is displayed using a timescope. A mask is shown as a blue rectangle surrounding spotted instances of the keyword, YES. For more details on MFCC feature extraction and deep learning network training, visit Keyword Spotting in Noise Using MFCC and LSTM Networks.

Open Live Script

语音命令识别代码生成Intel MKL-DNN

Deploy feature extraction and a convolutional neural network (CNN) for speech command recognition on Intel® processors. To generate the feature extraction and network code, you use MATLAB® Coder and the Intel® Math Kernel Library for Deep Neural Networks (MKL-DNN). In this example, the generated code is a MATLAB executable (MEX) function, which is called by a MATLAB script that displays the predicted speech command along with the time domain signal and auditory spectrogram. For details about audio preprocessing and network training, see Speech Command Recognition Using Deep Learning.

Open Live Script

基于声学的机器故障识别Code Generation with Intel MKL-DNN

生成一个MATLAB^®standalone executable for acoustics-based machine fault recognition.

Open Live Script

使用Simulink在Raspberry Pi上识别语音命令万博1manbetx

Deploy feature extraction and a convolutional neural network for speech command recognition on Raspberry Pi™.

Open Model

语音命令识别代码生成Intel MKL-DNNUsing Simulink

Use Embedded Coder^®in Simulink and Intel^®MKL-DNN部署功能提取和卷积神经网络，以供语音命令识别Intel处理器。

Open Model