主要内容

检测到异常信号使用deepSignalAnomalyDetector

这个例子展示了如何t使用o检测异常信号deepSignalAnomalyDetectordeepSignalAnomalyDetector对象实现autoencoder架构可以被训练使用semi-supervised或无监督学习。探测器可以找到异常点或区域,或确定整个信号异常。对象也提供了一些方便的功能,您可以使用可视化和分析结果。

异常数据点偏离整个数据集的总体模式。检测异常时间序列数据具有广泛的应用领域,如制造业、预测性维护,和人类健康监测。在很多情况下,手动标记整个数据集训练模型来检测异常是不现实的,尤其是当相关数据有许多比异常的正常样本。在这些场景中,异常检测基于semi-supervised或无监督学习是一个更可行的解决方案。

deepSignalAnomalyDetector提供了两种类型的autoencoder架构。autoencoder是深层神经网络训练复制输入数据在其输出,这样,重建误差尽可能小。autoencoder用来训练的数据可以由完全正常的样品或者可以包括一小部分样品异常。数据不需要标签。火车autoencoder之后,它可以重建测试数据,计算每个样本的重构误差,并宣布为异常的样本重建误差超过指定阈值。

案例1:检测异常心跳序列

本节使用deepSignalAnomalyDetector对象检测异常心跳序列从BIDMC充血性心力衰竭数据库在数据[1]。心跳收藏有5405心电图(ECG)可变长度的序列,每个采样在250赫兹,包含心跳的三个类别:

  • N -正常

  • r - R-onT过早心室收缩

  • V -过早心室收缩

数据标记,但在本例中使用标签只用于测试和性能评估。autoencoder训练过程完全是无监督的。

加载数据

下载hearbeat数据https://ssd.mathworks.com/万博1manbetxsupportfiles/SPT/data/PhysionetBIDMC.zip使用download万博1manbetxSupportFile函数。整个数据集大约是2 MB。的ecgSignals包含信号和ecgLabels包含标签。

datasetZipFile = matlab.internal.examples.download万博1manbetxSupportFile (“SPT”,“数据/ PhysionetBIDMC.zip”);datasetFolder = fullfile (fileparts (datasetZipFile),“PhysionetBDMC”);如果~存在(datasetFolder“dir”)解压缩(datasetZipFile datasetFolder);结束ds1 =加载(fullfile (datasetFolder“chf07.mat”));ecgSignals1 = ds1.ecgSignals
ecgSignals1 =5405×1单元阵列{146×1双}{140×1双}{139×1双}{143×1双}{143×1双}{145×1双}{147×1双}{139×1双}{143×1双}{139×1双}{146×1双}{143×1双}{144×1双}{142×1双}{142×1双}{140×1双}⋮
ecgLabels1 = ds1.ecgLabels;碳纳米管= countlabels (ecgLabels1)
碳纳米管=3×3表标签数百分比_____ _____ _________ N r 111 0.11101 5288 97.835 V 2.0537

可视化典型波形对应三种心跳类别。

helperPlotECG (ecgSignals1 ecgLabels1)

每个类别相对应的指标,将数据集分为训练集和测试集。在训练集,包括60%的样品保持自然异常分布。排除类V样本训练集,但包含在测试集,包括这些样品在测试设置决定autoencoder可以检测之前未被注意的异常类型。

idxN =找到(strcmp (ecgLabels1,“N”));idxR =找到(strcmp (ecgLabels1,“r”));idxV =找到(strcmp (ecgLabels1,“V”));idx = splitlabels (ecgLabels1, 0.6,排除=“V”);idxTrain = [idx {1});idxTest = [idx {2}; idxV];
countlabels (ecgLabels1 (idxTrain))
ans =2×3表标签数百分比_____ _____ N 3173 97.932 r 67 2.0679
countlabels (ecgLabels1 (idxTest))
ans =3×3表标签数百分比_____ _____ _________ 2.0323 N 2115 97.691 V 6 0.27714 r 44

创建和训练检测器

创建一个deepSignalAnomalyDetector对象有着悠久短期记忆(LSTM)模型。集WindowLength“fullSignal”确定每一段完整的信号都是正常或不正常。

DLSTM1 = deepSignalAnomalyDetector (1,“lstm”WindowLength =“fullSignal”)
DLSTM1 = deepSignalAnomalyDetectorLSTM属性:IsTrained: 0 NumChannels: 1模型信息ModelType:“lstm”EncoderHiddenUnits: [32 16] DecoderHiddenUnits:[16个32]阈值信息阈值:[]ThresholdMethod:“contaminationFraction”ThresholdParameter: 0.0100 WindowLength窗口信息:“fullSignal”WindowLossAggregation:“的意思是”

火车探测器估计(亚当)优化器使用自适应的时刻,这是一个最受欢迎的解决者深学习培训。时代的最大数量往往需要根据数据集大小调整和培训过程。因为样品的数量比较大,集MaxEpochsOne hundred.

选择= trainingOptions (“亚当”,MaxEpochs = 100,MiniBatchSize = 500);trainDetector (DLSTM1 ecgSignals1 (idxTrain),选择);
培训单一的GPU。| = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | | |时代迭代时间| Mini-batch | Mini-batch |基地学习| | | | | (hh: mm: ss) RMSE | |损失速率| | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | 1 | 1 |就是| 0.46 | 0.1 | 0.0010 | | 9 50 | | 00:00:14 e-02 | 0.21 | 2.2 | 0.0010 | | 100 | | 00:00:29 e-02 | 0.20 | 2.0 | 0.0010 | | 25 | 150 | 00:00:45 e-02 | 0.18 | 1.6 | 0.0010 | | 200 | | 00:01:00 e-02 | 0.17 | 1.4 | 0.0010 | | | 250 | 00:01:15 e-02 | 0.17 | 1.4 | 0.0010 | | 300 | | 00:01:29 e-02 | 0.17 | 1.4 | 0.0010 | | 59 | 350 | 00:01:44 e-02 | 0.16 | 1.3 | 0.0010 | 67 | 400 | | 00:01:59 e-02 | 0.16 | 1.2 | 0.0010 | 75 | 450 | | 00:02:13 e-02 | 0.14 | 1.0 | 0.0010 | 84 | 500 | | 00:02:27 | 0.10 | 4.6 e 03 | 0.0010 | 92 | 550 | | 00:02:42 | 0.09 | 3.9 e 03 | 0.0010 | 100 | 600 | | 00:02:56 | 0.10 | 4.6 e 03 | 0.0010 | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = |培训完成:马克思时代完成。计算阈值……阈值计算完成。

调整阈值

默认情况下,deepSignalAnomalyDetector对象计算阈值假设1%的数据训练集是不正常的。这个假设并不总是正确的,所以你经常需要调整阈值通过改变自动阈值方法或手动通过设置阈值。

使用plotLoss的损失函数可视化训练集和当前的阈值。每个杆对应于重建误差信号的训练数据集。

图plotLoss (DLSTM1 ecgSignals1 (idxTrain) ylim ([0, 0.1])

基于plotLoss输出,手动设置阈值,超过阈值的一些零星的损失很可能异常。

updateDetector (DLSTM1ThresholdMethod =“手动”,阈值= 0.02)

验证阈值的选择、情节的分布重建正常和异常数据的错误使用plotLossDistribution。左边的直方图阈值对应于正常的分布数据。右边的直方图的阈值对应的分布异常数据。所选阈值成功地分离了正常和异常组。

% ecgSignals1 (idxN)包含正常信号% ecgSignals1 ([idxR; idxV])包含异常信号plotLossDistribution (DLSTM1 ecgSignals1 (idxN) ecgSignals1 ([idxR;idxV]))

发现异常和评估性能

选择一个样本中的每个类别测试集和绘制重建信号使用plotAnomalies。红线代表信号探测器分类异常。一个好的迹象,探测器成功训练是它可以充分重建正常的信号,不能充分重建异常信号。

图(位置= [0 0 500 300])idxNTest =联盟(idxN idxTest);% N类plotAnomalies (DLSTM1 ecgSignals1 (idxNTest (1)), PlotReconstruction = true)

图(位置= [0 0 500 300])idxVTest =联盟(idxV idxTest);% V类plotAnomalies (DLSTM1 ecgSignals1 (idxVTest (1)), PlotReconstruction = true)

图(位置= [0 0 500 300])idxRTest =联盟(idxR idxTest);%类rplotAnomalies (DLSTM1 ecgSignals1 (idxRTest (1)), PlotReconstruction = true)

使用检测目标函数的检测器来检测异常训练集和测试集和计算重建损失。

[labelsTrainPred1, lossTrainPred1] =检测(DLSTM1 ecgSignals1 (idxTrain));[labelsTestPred1, lossTestPred1] =检测(DLSTM1 ecgSignals1 (idxTest));

有两种不同的异常检测任务。

  • 检测异常包含在训练集,也被称为异常值检测

  • 检测异常的观测在训练集之外,也被称为新奇的检测

分析的性能训练autoencoder两个任务。

您可以使用一个接受者操作特征(ROC)曲线评价的准确性探测器在阈值范围的决定。ROC曲线下的面积(AUC)措施的整体性能。AUC是越接近1,探测器的探测能力越强。计算AUC使用rocmetrics(深度学习工具箱)函数。的AUC是接近一个异常值检测、和略小,但仍然很好的新奇检测。

图(“位置”[0 0 600 300])tiledlayout (1、2、TileSpacing =“紧凑”)nexttile rocc = rocmetrics (ecgLabels1 (idxTrain) ~ =“N”cell2mat (lossTrainPred1),真正的);情节(rocc ShowModelOperatingPoint = false)标题([“训练集ROC曲线”,“(异常值检测)”])nexttile rocc = rocmetrics (ecgLabels1 (idxTest) ~ =“N”cell2mat (lossTestPred1),真正的);情节(rocc ShowModelOperatingPoint = false)标题([“测试集ROC曲线”,“(新奇检测)”])

计算之前指定阈值的检测精度。

图(“位置”[0 0 1000 300])tiledlayout (1、2、TileSpacing =“紧凑”)= confusionchart (ecgLabels1 nexttile厘米(idxTrain) ~ =“N”cell2mat (labelsTrainPred1));厘米。RowSummary =“row-normalized”;标题(“训练集的准确性(异常值检测)”)= confusionchart (ecgLabels1 nexttile厘米(idxTest) ~ =“N”cell2mat (labelsTestPred1));厘米。RowSummary =“row-normalized”;标题(“测试集的准确性(新奇检测)”)

案例2:检测异常点在连续长时间系列

前一节展示了如何检测异常数据集包含多个信号段和确定每一部分是否异常。在这一节的数据集是一个信号。目的是检测异常的信号和《纽约时报》。

使用一个deepSignalAnomalyDetector在一个长心电图记录检测异常引起的室性心动过速。数据从心脏性猝死霍尔特数据库[2]。ECG信号的采样率为250 Hz。

下载和准备数据

下载的数据https://ssd.mathworks.com/万博1manbetxsupportfiles/SPT/data/PhysionetSDDB.zip使用download万博1manbetxSupportFile函数。数据集包含两个时间表。的时间表X包含了ECG信号。时间表Y包含标签,表明是否每个样本的ECG信号是正常的。在前一节中,您仅使用标签来验证检测的准确性。

datasetZipFile = matlab.internal.examples.download万博1manbetxSupportFile (“SPT”,“数据/ PhysionetSDDB.zip”);datasetFolder = fullfile (fileparts (datasetZipFile),“PhysionetSDDB”);如果~存在(datasetFolder“dir”)解压缩(datasetZipFile datasetFolder);结束ds2 =负载(fullfile (datasetFolder“sddb49.mat”));ecgSignals2 = ds2.X;ecgLabels2 = ds2.y;

正常信号和可视化。覆盖位置异常。在这种情况下,异常检测是具有挑战性的,因为经常发生的心电图记录,信号基线漂移。基线水平的变化可以很容易地并被错误地归类为异常。

选择训练数据的一种常见方法是使用一个段的信号很明显,没有异常。在很多情况下,记录的开始通常是正常的,如在这个心电图信号。选择第一个200秒的记录与纯粹的正常数据训练模型。使用其余的录音测试异常检测器的性能。训练数据包含段基线漂移,理想情况下,探测器学习并适应这种模式,认为正常。

dataProcessed =正常化(ecgSignals2);图绘制(dataProcessed.Time dataProcessed.Variables)情节(dataProcessed (ecgLabels2.anomaly:) .Time, dataProcessed .Variables (ecgLabels2.anomaly:),“。”)举行包含(“时间(s)”)ylabel (“归一化心电图振幅”)标题(“sddb49 ECG信号”)传说([“信号”“异常”])

将数据集分为训练集和测试集。

fs = 250;idxTrain2 = 1:200 * fs;idxTest2 = idxTrain2(结束)+ 1:高度(dataProcessed);dataProcessedTrain = dataProcessed (idxTrain2:);labelsTrainTrue = ecgLabels2 (idxTrain2:);dataProcessedTest = dataProcessed (idxTest2:);labelsTestTrue = ecgLabels2 (idxTest2:);

创建和训练检测器

创建一个deepSignalAnomalyDetector卷积autoencoder模型。

训练集仅包含正常数据。因此,合理使用最大的重建误差作为一个阈值,当宣布一个信号段异常。设置ThresholdMethod财产“马克斯”。将由于基线漂移信号的复杂性,比默认使用一个更大的网络。在每个样本检测异常的信号,保持窗口长度为其默认值的样本。

DCONV2 = deepSignalAnomalyDetector (1,“conv”,FilterSize = 32,NumFilters = 16,NumDownsampleLayers = 4,ThresholdMethod =“马克斯”)
DCONV2 = deepSignalAnomalyDetectorCNN属性:IsTrained: 0 NumChannels: 1模型信息ModelType:“conv”FilterSize: 32 NumFilters: 16 NumDownsampleLayers: 4 DownsampleFactor: 2 DropoutProbability: 0.2000阈值信息阈值:[]ThresholdMethod:“max”ThresholdParameter: 1窗口信息WindowLength: 1 OverlapLength:“汽车”WindowLossAggregation:“的意思是”

为了确保大型网络的完整训练,时代的最大数量设置为500。情节培训期间进步而不是展示表,设置情节培训选项“训练进步”详细的

选择= trainingOptions (“亚当”、MaxEpochs = 500块=“训练进步”Verbose = false);trainDetector (DCONV2 dataProcessedTrain选择)

发现异常和评估性能

情节测试信号的重建误差分布和地面实况标签进行比较。有一个明显的高损失峰值对应的位置异常。分布也包含多个较小的波动。

图tiledlayout (2, 1) nexttile plotLoss (DCONV2, dataProcessed (idxTest2:));nexttile茎(ecgLabels2 {idxTest2:},“。”网格)yticks ([0 1]) yticklabels ({“正常”,“不正常”})标题(“地面实况标签”)包含(“指数”窗)

查看信号重建的一个区域与不正常的心跳和测试集的测试集和基线漂移。接下来的重构信号基线很好,偏离了原始信号只在异常点。

plotAnomalies (DCONV2 dataProcessed (250 * fs: 300 * fs,:), PlotReconstruction = true)标题(“测试区域异常心跳”网格)

plotAnomalies (DCONV2 dataProcessed (210 * fs: 250 * fs,:), PlotReconstruction = true)标题(“与基线漂移测试区域”)

案例3:检测异常地区多通道信号

有场景包含多个来自不同的信号测量的数据。这些信号可以包括加速度、温度和电机的转速。你可以训练deepSignalAnomalyDetector对象与多元信号在这些multi-measurement观察和检测异常。

加载和准备数据

加载波形数据集WaveformData。观察是数组的大小numChannels——- - - - - -numTimeSteps,在那里numChannels的渠道和数量吗numTimeSteps是时间序列中的步骤的数量。转置的数组的列对应的时间步骤。显示数据的头几个细胞。

负载WaveformDatadata = cellfun (@ x (x),数据,UniformOutput = false);头(数据)
{103×3双}{136×3双}{140×3双}{124×3双}{127×3双}{200×3双}{141×3双}{151×3双}

可视化最初几个序列在一个阴谋。

{1}numChannels =大小(数据,2);tiledlayout (2, 2)2 = 1:4 nexttile stackedplot(数据{2},DisplayLabels =“通道”+ (1:numChannels));标题(“观察”+ 2)包含(“时间步”)结束

分区数据为训练和测试的分区。10%使用90%的数据进行训练和测试。

numObservations =元素个数(数据);rng默认的[idxTrain3, ~, idxTest3] = dividerand (numObservations, 0.9, 0, 0.1);signalTrain3 =数据(idxTrain3);signalTest3 =数据(idxTest3);

创建和训练检测器

创建一个默认异常探测器和指定通道的数量为3。

DCONV3 = deepSignalAnomalyDetector (3);trainDetector (DCONV3 signalTrain3)
培训单一的GPU。| = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | | |时代迭代时间| Mini-batch | Mini-batch |基地学习| | | | | (hh: mm: ss) RMSE | |损失速率| | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | 1 | 1 |就是| 1.92 | 1.9 | 0.0010 | | 8 50 | | 00:00:03 | 1.09 | 0.6 | 0.0010 | | 100 | | 00:00:06 | 1.03 | 0.5 | 0.0010 | | 22 | 150 | 00:00:10 | 0.97 | 0.5 | 0.0010 | | 200 | | 29日00:00:14 | 0.91 | 0.4 | 0.0010 | | 210 | | 00:00:15 | 0.90 | 0.4 | 0.0010 | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = |培训完成:马克思时代完成。计算阈值……阈值计算完成。

发现异常和评估性能

为了测试探测器,随机选择50个数据序列并添加人工异常。随机选择50的序列来修改。

signalTest3New = signalTest3;numAnomalousSequences = 50;rng默认的idx = randperm(元素个数(signalTest3) numAnomalousSequences);

随机选择一个20-sample地区渠道每个选择序列,代之以振幅绝对值的5倍。

2 = 1:numAnomalousSequences X = signalTest3New {idx (ii)};idxPatch = 40:6 0;nch =兰迪(3);OldRegion = X (idxPatch nch);newRegion = 5 * abs (OldRegion);X (idxPatch nch) = newRegion;signalTest3New {idx (ii)} = X;结束

使用异常探测器发现异常区域。可视化结果的两个信号。探测器确定异常存在于一个信号,当其渠道显示异常行为。

图plotAnomalies (DCONV3 signalTest3New {idx (2)})

图plotAnomalies (DCONV3 signalTest3New {idx (20)})

结论

这个例子展示了如何使用一个deepSignalAnomalyDetector对象的训练没有标签检测点、地区或观察异常信号段,长信号,多元信号。

引用

威尔逊[1]唐纳德·s·拜姆科鲁奇,e . Scott Monrad Harton美国史密斯,理查德·f·赖特Alyce Lanoue,黛安·f·瑟,伯纳德j . Ransil威廉·格罗斯曼W和尤金Braunwald。“严重充血性心力衰竭患者的生存与口服药物治疗。”美国心脏病学会杂志》上,7卷,不。3日(1986年3月):661 - 70。https://doi.org/10.1016/s0735 - 1097 (86) 80478 - 8。

大卫[2]格林沃尔德,斯科特。“开发和分析心室纤维性颤动探测器”。(M.S. thesis, MIT Dept. of Electrical Engineering and Computer Science, 1986).

[3]不氩L。,Luis A. N. Amaral, Leon Glass, Jeffrey M. Hausdorff, Plamen Ch. Ivanov, Roger G. Mark, Joseph E. Mietus, George B. Moody, Chung-Kang Peng, and H. Eugene Stanley. “PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals.”循环101年,没有。23日(2000年6月13日):https://doi.org/10.1161/01.CIR.101.23.e215

万博1manbetx支持函数

函数helperPlotECG (ecgData ecgLabels)图(位置= [0 0 900 250])tiledlayout (1、3、TileSpacing =“紧凑”);类= {“N”,“r”,“V”};i = 1:长度(类)x = ecgData (ecgLabels = =类{我});nexttile情节(x {4}) xticks([0 70长度(x{4})))轴标题(“信号与阶级”+类{我})结束结束

另请参阅

功能

对象

相关的话题