检测Speech

检测音频信号中语音的边界

在页面中崩溃

句法

idx = detectspeech（Audioin，fs）

idx = detectSpeech（Audioin，fs，name，value）

[[IDX，，，，阈值] = detectSpeech(___）

检测Speech(___）

Description

例子

IDX=检测史普尔（Audioin，，，，FS）返回索引Audioin这与语音信号的边界相对应。

例子

IDX=检测史普尔（Audioin，，，，FS，，，，名称，价值）使用一个或多个指定选项名称，价值pair arguments.

Example:检测（Audioin，fs，'window'，Hann（512，'odigit'），“ offlaplength”，256）使用512分的周期性Hann窗口检测语音，并重叠256点。

例子

[[IDX，，，，阈值] = detectSpeech(___）also returns the thresholds used to compute the boundaries of speech.

例子

检测Speech(___）没有输出参数将显示输入信号中检测到的语音区域的图。

Examples

全部收缩

情节检测到的语音区域

打开实时脚本

阅读音频信号。将音频信号夹为20秒。

[audioin，fs] = audioread（'Rainbow-16-8-Mono-114secs.wav'）；Audioin = Audioin（1：20*fs）;

称呼检测Speech。Specify no output arguments to display a plot of the detected speech regions.

检测（Audioin，fs）;

图包含一个轴对象。这axes object with title Detected Speech contains 37 objects of type line, constantline, patch.

这检测Speech函数使用基于每个分析框架的能量和光谱传播的阈值算法。您可以修改窗户，，，，OverlapLength，和合并为您的特定需求微调算法。

窗口绘制=0.074;％秒numWindowSamples = round(windowDuration*fs); win = hamming(numWindowSamples,“周期性”）；percentOverlap =35;overlap = round(numWindowSamples*percentOverlap/100); mergeDuration =0.44;合并器=圆形（合并*fs）;检测Speece（Audioin，fs，“窗户”，，，，win,“超级长度”，，，，overlap,“合并”，，，，mergeDist)

图包含一个轴对象。带有标题检测的轴对象包含19个类型线，常数和补丁的对象。

Reuse Decision Thresholds

打开实时脚本

在包含语音的音频文件中阅读。将音频信号分为上半场和下半部分。

[audioin，fs] = audioread（'Counting-16-44p1-Mono-15secs.wav'）；firsthalf = audioin（1：floor（numel（audioin）/2））;secondhalf = audioin（numel（firsthalf）：end）;

称呼检测Speech在音频信号的前半部分。指定两个输出参数，以返回与检测到的语音区域和决策阈值相对应的索引。

[speetindices，阈值] =检测斯普（Firsthalf，fs）;

称呼检测Speechon the second half with no output arguments to plot the regions of detected speech. Specify the thresholds determined from the previous call to检测Speech。

检测Speece（第二次，FS，'Thresholds'，阈值）

使用大型数据集

重复使用语音检测阈值时，当您使用大型数据集或部署深度学习或机器学习管道以实时推理时，可以提供明显的计算效率。下载并提取数据集[[1]。

url ='https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.01.tar.gz';downloadfolder = tempdir;datasetFolder = fullfile（downloadfolder，'google_speech'）；如果〜存在（datasetFolder，'dir'）disp（“下载数据集（1.9 GB）...”）untar（url，datasetFolder）结尾

创建一个音频数据存储以指向录音。使用文件夹名称作为标签。

ads = audiodatastore（datasetFolder，“包括橡皮folders”，真的，“ Labelsource”，，，，“折叠式”）；

Reduce the data set by 95% for the purposes of this example.

ads = splitEachLabel(ads,0.05,'Exclude'，，，，'_background_noise'）；

创建两个数据存储：一个用于培训，一个用于测试。

[adstrain，adstest] = spliteachlabel（ADS，0.8）;

计算训练数据集的平均阈值。

阈值=零（numel（adstrain.files），2）;为了ii = 1：numel（adstrain.files）[audioin，adsinfo] = read（adstrain）;[〜，阈值（ii，:)] =检测Speece（Audioin，adsinfo.samplerate）;结尾阈值=平均值（阈值，1）;

Use the precomputed thresholds to detect speech regions on files from the test data set. Plot the detected region for three files.

[[Audioin，，，，adsInfo] = read(adsTest); detectSpeech(audioIn,adsInfo.SampleRate,'Thresholds'，阈值）；

[[Audioin，，，，adsInfo] = read(adsTest); detectSpeech(audioIn,adsInfo.SampleRate,'Thresholds'，阈值）；

[[Audioin，，，，adsInfo] = read(adsTest); detectSpeech(audioIn,adsInfo.SampleRate,'Thresholds'，阈值）；

References

[1]守望者，皮特。“语音命令：单词语音识别的公共数据集。”由TensorFlow分发。创意共享归因4.0许可证。

从语音信号中删除无声区域

打开实时脚本

在音频文件中阅读并收听。绘制频谱图。

[audioin，fs] = audioread（'Counting-16-44p1-Mono-15secs.wav'）；声音（Audioin，FS）频谱图（Audioin，Hann（1024，，“周期性”），，，，512，，，，1024,fs,'yaxis'）

图包含一个轴对象。轴对象包含类型图像的对象。

对于机器学习应用程序，您通常希望从音频信号中提取功能。致电光谱function on the audio signal, then plot thehistogram显示光谱熵的分布。

熵=光谱伦理（Audioin，fs）;Numbins = 40;直方图（熵，numbins，'Normalization'，，，，'可能性'）title(“音频信号的光谱熵”）

图包含一个轴对象。带有标题光谱熵的轴对象包含类型直方图的对象。

根据您的应用程序，您可能只想从语音区域提取光谱熵。由此产生的统计数据是说话者的特征，并且频道的特征较少。称呼检测Speech在音频信号上，然后创建一个仅包含检测到的语音区域的新信号。

SpeechIndices = detectSpeech（Audioin，fs）;spechsignal = [];为了II = 1：大小（Speechindices，1）specksignal = [specksignal; audioin（speendindices（ii，1）：speendindices（ii，2））];结尾

聆听语音信号并绘制频谱图。

声音（语音信号，FS）频谱图（语音标志性，Hann（1024，，“周期性”），，，，512，，，，1024,fs,'yaxis'）

图包含一个轴对象。轴对象包含类型图像的对象。

致电光谱在语音信号上起作用，然后绘制histogram显示光谱熵的分布。

熵= SpectralTropy（语音标志性，FS）;直方图（熵，numbins，'Normalization'，，，，'可能性'）title(“语音信号的光谱熵”）

图包含一个轴对象。带有标题光谱熵的轴对象包含类型直方图的对象。

输入参数

全部收缩

`Audioin`-音频输入
列向量

音频输入，指定为列向量。

数据类型：单身的|double

`FS`-采样率（Hz）
标量

Hz的采样率，指定为标量。

数据类型：单身的|double

名称值参数

将可选的参数对name1 = value1，...，namen = valuen，，，，where姓名是参数名称和价值is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

在R2021a之前，请使用逗号分隔每个名称和值，并附上姓名用引号。

Example:检测Speech（Audioin，fs，“合并”，100）

`窗户`-窗口应用于时间域
`hann(round(0.03*fs),'periodic')`（默认）|向量

在时间域中应用的窗口，指定为逗号分隔对，由'窗户'和一个真正的向量。向量中的元素数必须在[2，尺寸（Audioin，，，，1）]. The number of elements in the vector must also be greater thanOverlapLength。

数据类型：单身的|double

`OverlapLength`-相邻窗口之间重叠的样本数量
`0`（默认）|标量在范围内[0，`numel（窗口）`−1]

相邻窗口之间重叠的样本数量，指定为逗号分隔对，由'OverlapLength'和范围内的整数[0，尺寸（窗户，，，，1））。

数据类型：单身的|double

`合并`-number of samples over which to merge positive speech detection decisions
`numel（窗口）`*5（默认）|非负标量

number of samples over which to merge positive speech detection decisions, specified as the comma-separated pair consisting of '合并' and a nonnegative scalar.

笔记

语音检测的分辨率由跳长度给出，其中跳长等于numel（窗户）−OverlapLength。

数据类型：单身的|double

`阈值`-阈值为了decision
2元素向量

阈值为了decision, specified as the comma-separated pair consisting of '阈值'和两元素向量。

如果您不指定阈值，这检测Speech函数通过使用在当前输入框架上计算出的功能的直方图衍生阈值。
如果指定阈值，这检测Speech功能跳过新决策阈值的推导。重复使用语音决策阈值时，当您使用大型数据集或部署深度学习或机器学习管道以实时推理时，可以提供明显的计算效率。

数据类型：单身的|double

输出参数

全部收缩

`IDX`- 语音区域的开始和终点
n-by-2矩阵

Start and end indices of speech regions, returned as ann-by-2矩阵。n对应于检测到的单个语音区域的数量。第一列对应于语音区域的开始索引，第二列对应于语音区域的最终索引。

数据类型：单身的|double

`阈值`- 用于决策的阈值
两元素向量

用于决策的阈值，作为两元素向量返回。阈值按顺序[能量阈值，，，，光谱传播阈值].

数据类型：单身的|double

算法

这检测Speech算法基于[[1]，，，，although modified so that the statistics to threshold are short-term energy and spectral spread, instead of short-term energy and spectral centroid. The diagram and steps provide a high-level overview of the algorithm. For details, see[[1]。

Sequence of stages in algorithm.

这audio signal is converted to a time-frequency representation using the specified窗户和OverlapLength。
这short-term energy and spectral spread is calculated for each frame. The spectral spread is calculated according toSpectralspread。
为短期能量和光谱扩散分布创建直方图。
对于每个直方图，根据 $t = \frac{w \times m_{1} + m_{2}}{w + 1}$ ，，，，wherem₁和m₂分别是第一本和第二个本地最大值。wis set to5。
Both the spectral spread and the short-term energy are smoothed across time by passing through successive five-element moving median filters.
通过比较短期能量和光谱扩散与各自的阈值来创建面具。要将框架声明为包含语音，必须高于其阈值。
面具合并。为了将框架宣布为语音，短期能量和光谱传播都必须超过其各自的阈值。
如果它们之间的距离小于合并。

References

[1] Theodoros的Giannakopoulos。“一种在MATLAB中实施的语音信号的沉默和分割的方法”（雅典，雅典，2009年）。

扩展功能

C/C++ Code Generation
使用MATLAB®CODER™生成C和C ++代码。

GPU数组
使用并行计算工具箱™在图形处理单元（GPU）上运行加速代码。

this function fully supports GPU arrays. For more information, see在GPU上运行MATLAB功能（并行计算工具箱）。

版本历史记录

在R2020a中引入

也可以看看

Spectralspread|语音性挖出器

话题

使用MFCC和LSTM网络中的噪声中关键字发现

检测Speech

句法

Description

Examples

情节检测到的语音区域

Reuse Decision Thresholds

从语音信号中删除无声区域

输入参数

Audioin-音频输入列向量

FS-采样率（Hz）标量

名称值参数

窗户-窗口应用于时间域hann(round(0.03*fs),'periodic')（默认）|向量

OverlapLength-相邻窗口之间重叠的样本数量0（默认）|标量在范围内[0，numel（窗口）−1]

合并-number of samples over which to merge positive speech detection decisionsnumel（窗口）*5（默认）|非负标量

阈值-阈值为了decision2元素向量

输出参数

IDX- 语音区域的开始和终点n-by-2矩阵

阈值- 用于决策的阈值两元素向量

算法

References

扩展功能

C/C++ Code Generation使用MATLAB®CODER™生成C和C ++代码。

GPU数组使用并行计算工具箱™在图形处理单元（GPU）上运行加速代码。

版本历史记录

也可以看看

话题

`Audioin`-音频输入
列向量

`FS`-采样率（Hz）
标量

`窗户`-窗口应用于时间域
`hann(round(0.03*fs),'periodic')`（默认）|向量

`OverlapLength`-相邻窗口之间重叠的样本数量
`0`（默认）|标量在范围内[0，`numel（窗口）`−1]

`合并`-number of samples over which to merge positive speech detection decisions
`numel（窗口）`*5（默认）|非负标量

`阈值`-阈值为了decision
2元素向量

`IDX`- 语音区域的开始和终点
n-by-2矩阵

`阈值`- 用于决策的阈值
两元素向量

C/C++ Code Generation
使用MATLAB®CODER™生成C和C ++代码。

GPU数组
使用并行计算工具箱™在图形处理单元（GPU）上运行加速代码。