Main Content

机器学习和音频的深度学习

Dataset management, labeling, and augmentation; segmentation and feature extraction for audio, speech, and acoustic applications

Audio Toolbox™提供了为音频,语音和声学应用程序开发机器和深度学习解决方案的功能,包括扬声器识别,语音命令识别,声学场景识别等。万博 尤文图斯

  • Useaudiodatastore以并联摄入大型音频数据集和处理文件。

  • Use音频标签通过手动和自动注释音频记录来构建音频数据集。

  • UseAudiodataaugmenter创建内置或自定义信号处理方法的随机管道,以增强和综合音频数据集。

  • UseaudioFeatureExtractor在共享中间计算的同时提取不同特征的组合。

音频工具箱还提供了对文本到语音和语音文本的第三方API的访问权限,并且包括鉴定的VGGISH和YAMNET模型,以便您可以执行传输学习,分类和提取功能嵌入。使用预审进的网络需要深度学习Toolbox™。

特色示例

Speaker Verification Using i-Vectors

Speaker Verification Using i-Vectors

Speaker verification, or authentication, is the task of confirming that the identity of a speaker is who they purport to be. Speaker verification has been an active research area for many years. An early performance breakthrough was to use a Gaussian mixture model and universal background model (GMM-UBM) [1] on acoustic features (usually mfcc). For an example, see Speaker Verification Using Gaussian Mixture Models. One of the main difficulties of GMM-UBM systems involves intersession variability. Joint factor analysis (JFA) was proposed to compensate for this variability by separately modeling inter-speaker variability and channel or session variability [2] [3]. However, [4] discovered that channel factors in the JFA also contained information about the speakers, and proposed combining the channel and speaker spaces into a total variability space. Intersession variability was then compensated for by using backend procedures, such as linear discriminant analysis (LDA) and within-class covariance normalization (WCCN), followed by a scoring, such as the cosine similarity score. [5] proposed replacing the cosine similarity scoring with a probabilistic LDA (PLDA) model. [11] and [12] proposed a method to Gaussianize the i-vectors and therefore make Gaussian assumptions in the PLDA, referred to as G-PLDA or simplified PLDA. While i-vectors were originally proposed for speaker verification, they have been applied to many problems, like language recognition, speaker diarization, emotion recognition, age estimation, and anti-spoofing [10]. Recently, deep learning techniques have been proposed to replace i-vectors with d-vectors or x-vectors [8] [6].