主要内容

Data Smoothing and Outlier Detection

Data smoothing refers to techniques for eliminating unwanted noise or behaviors in data, while outlier detection identifies data points that are significantly different from the rest of the data.

Moving Window Methods

移动窗口方法处理数据的方法smaller batches at a time, typically in order to statistically represent a neighborhood of points in the data. The moving average is a common data smoothing technique that slides a window along the data, computing the mean of the points inside of each window. This can help to eliminate insignificant variations from one data point to the next.

For example, consider wind speed measurements taken every minute for about 3 hours. Use themovemean窗口尺寸为5分钟,可以使高速风阵风平滑。

loadwinddata.matmins = 1:长度(速度);窗口= 5;MeansPeed = movemean(速度,窗口);情节(分钟,速度,分钟,意思)轴紧的legend('Measured Wind Speed',“ 5分钟内的平均风速”,。。。'地点','best')xlabel('Time')ylabel('Speed')

图包含一个轴对象。轴对象包含2个类型行的对象。这些物体代表测得的风速,在5分钟内的平均风速。

同样,您可以使用滑动窗口计算中间风速movmedianfunction.

MentiansPeed = Movmedian(速度,窗口);情节(分钟,速度,分钟,中位丙)轴紧的legend('Measured Wind Speed',“ 5分钟内的风速中间风速”,。。。'地点','best')xlabel('Time')ylabel('Speed')

图包含一个轴对象。轴对象包含2个类型行的对象。这些物体代表测得的风速,中间的风速在5分钟内。

并非所有数据都适合使用移动窗口方法平滑。例如,创建带有随机噪声的正弦信号。

t = 1:0.2:15;a = sin(2*pi*t) + cos(2*pi*0.5*t);anoise = a + 0.5*rand(1,长度(t));图(t,a,t,anoise)轴紧的legend(“原始数据”,'Noisy Data','地点','best')

图包含一个轴对象。轴对象包含2个类型行的对象。这se objects represent Original Data, Noisy Data.

Use a moving mean with a window size of 3 to smooth the noisy data.

窗户= 3; Amean = movmean(Anoise,window); plot(t,A,t,Amean) axis紧的legend(“原始数据”,“移动平均 - 窗口尺寸3”)

图包含一个轴对象。轴对象包含2个类型行的对象。这se objects represent Original Data, Moving Mean - Window Size 3.

移动平均值可实现数据的一般形状,但不会非常准确地捕获山谷(本地最小值)。由于山谷点在每个窗口中都被两个较大的邻居所包围,因此平均值与这些点不是很好的近似值。如果使窗口尺寸更大,则平均值将完全消除较短的峰值。对于此类数据,您可以考虑替代平滑技术。

Amean = movmean(Anoise,5); plot(t,A,t,Amean) axis紧的legend(“原始数据”,“移动平均 - 窗口尺寸5”,。。。'地点','best')

图包含一个轴对象。轴对象包含2个类型行的对象。这se objects represent Original Data, Moving Mean - Window Size 5.

通用平滑方法

smoothdata功能提供了几种平滑选项,例如Savitzky-Golay方法,该方法是信号处理中使用的流行平滑技术。默认,smoothdata根据数据选择该方法的最佳猜测窗口大小。

使用Savitzky-Golay方法使嘈杂的信号平滑Anoise并输出使用的窗口大小。与此方法相比,该方法提供了更好的山谷近似movemean

[asgolay,window] = smoothdata(anoise,'sgolay');图(t,a,t,asgolay)轴紧的legend(“原始数据”,“ Savitzky-Golay”,'地点','best')

图包含一个轴对象。轴对象包含2个类型行的对象。这se objects represent Original Data, Savitzky-Golay.

窗户
窗户= 3

这robust Lowess method is another smoothing method that is particularly helpful when outliers are present in the data in addition to noise. Inject an outlier into the noisy data, and use robust Lowess to smooth the data, which eliminates the outlier.

Anoise(36) = 20; Arlowess = smoothdata(Anoise,'rlowess',5);图(T,Anoise,t,arlowess)轴紧的legend('Noisy Data','Robust Lowess')

图包含一个轴对象。轴对象包含2个类型行的对象。这些对象代表嘈杂的数据,稳健的lowess。

Detecting Outliers

Outliers in data can significantly skew data processing results and other computed quantities. For example, if you try to smooth data containing outliers with a moving median, you can get misleading peaks or valleys.

Amedian = Smoothdata(Anoise,“ movmedian”);plot(t,Anoise,t,Amedian) axis紧的legend('Noisy Data',“移动中位数”)

图包含一个轴对象。轴对象包含2个类型行的对象。这些对象代表嘈杂的数据,移动中位数。

isoutlier当检测到异常值时,功能将返回逻辑1。验证异常值的索引和价值Anoise

tf = isOutlier(anoise);ind = find(tf)
ind = 36
aoutlier = anoise(ind)
aoutlier = 20

You can use thefilloutliersfunction to replace outliers in your data by specifying a fill method. For example, fill the outlier inAnoise邻居的价值立即向右。

afill = filleoutliers(anoise,'下一个');图(T,Anoise,T,Afill)轴紧的legend('Noisy Data with Outlier','Noisy Data with Filled Outlier')

图包含一个轴对象。轴对象包含2个类型行的对象。这些对象用填充异常值代表带有异常,嘈杂的数据的嘈杂数据。

Nonuniform Data

Not all data consists of equally spaced points, which can affect methods for data processing. Create a约会时间载体包含数据中数据的不规则采样时间AirReg。这时间矢量代表每分钟在最初的30分钟内每分钟采集的样品,然后在两天内每小时收集样本。

t0 = datetime(2014,1,1,1,1,1); timeminutes = sort(t0 + minutes(1:30)); timehours = t0 + hours(1:48); time = [timeminutes timehours]; Airreg = rand(1,length(time)); plot(time,Airreg) axis紧的

图包含一个轴对象。轴对象包含一个类型行的对象。

默认,smoothdatasmooths with respect to equally spaced integers, in this case,1,2,...,78。Since integer time stamps do not coordinate with the sampling of the points inAirReg,平滑后的前半小时数据仍然嘈杂。

Adefault = SmoothData(AirReg,',3);情节(时间,airreg,时间,默认)轴紧的legend(“原始数据”,“带有默认样品点的平滑数据”)

图包含一个轴对象。轴对象包含2个类型行的对象。这se objects represent Original Data, Smoothed Data with Default Sample Points.

Many data processing functions in MATLAB®, includingsmoothdata,movemean, 和filloutliers, allow you to provide sample points, ensuring that data is processed relative to its sampling units and frequencies. To remove the high-frequency variation in the first half hour of data inAirReg, use the'SamplePoints'随时间邮票的选项时间

AsamplePoints = SmoothData(airReg,',。。。hours(3),'SamplePoints',时间);绘图(时间,空气,时间,asamplepoints)轴紧的legend(“原始数据”,'Smoothed Data with Sample Points')

图包含一个轴对象。轴对象包含2个类型行的对象。这se objects represent Original Data, Smoothed Data with Sample Points.

See Also

||||

相关话题