Data Smoothing and Outlier Detection
Data smoothing refers to techniques for eliminating unwanted noise or behaviors in data, while outlier detection identifies data points that are significantly different from the rest of the data.
Moving Window Methods
移动窗口方法处理数据的方法smaller batches at a time, typically in order to statistically represent a neighborhood of points in the data. The moving average is a common data smoothing technique that slides a window along the data, computing the mean of the points inside of each window. This can help to eliminate insignificant variations from one data point to the next.
For example, consider wind speed measurements taken every minute for about 3 hours. Use themovemean
窗口尺寸为5分钟,可以使高速风阵风平滑。
loadwinddata.matmins = 1:长度(速度);窗口= 5;MeansPeed = movemean(速度,窗口);情节(分钟,速度,分钟,意思)轴紧的legend('Measured Wind Speed',“ 5分钟内的平均风速”,。。。'地点','best')xlabel('Time')ylabel('Speed')
同样,您可以使用滑动窗口计算中间风速movmedian
function.
MentiansPeed = Movmedian(速度,窗口);情节(分钟,速度,分钟,中位丙)轴紧的legend('Measured Wind Speed',“ 5分钟内的风速中间风速”,。。。'地点','best')xlabel('Time')ylabel('Speed')
并非所有数据都适合使用移动窗口方法平滑。例如,创建带有随机噪声的正弦信号。
t = 1:0.2:15;a = sin(2*pi*t) + cos(2*pi*0.5*t);anoise = a + 0.5*rand(1,长度(t));图(t,a,t,anoise)轴紧的legend(“原始数据”,'Noisy Data','地点','best')
Use a moving mean with a window size of 3 to smooth the noisy data.
窗户= 3; Amean = movmean(Anoise,window); plot(t,A,t,Amean) axis紧的legend(“原始数据”,“移动平均 - 窗口尺寸3”)
移动平均值可实现数据的一般形状,但不会非常准确地捕获山谷(本地最小值)。由于山谷点在每个窗口中都被两个较大的邻居所包围,因此平均值与这些点不是很好的近似值。如果使窗口尺寸更大,则平均值将完全消除较短的峰值。对于此类数据,您可以考虑替代平滑技术。
Amean = movmean(Anoise,5); plot(t,A,t,Amean) axis紧的legend(“原始数据”,“移动平均 - 窗口尺寸5”,。。。'地点','best')
通用平滑方法
这smoothdata
功能提供了几种平滑选项,例如Savitzky-Golay方法,该方法是信号处理中使用的流行平滑技术。默认,smoothdata
根据数据选择该方法的最佳猜测窗口大小。
使用Savitzky-Golay方法使嘈杂的信号平滑Anoise
并输出使用的窗口大小。与此方法相比,该方法提供了更好的山谷近似movemean
。
[asgolay,window] = smoothdata(anoise,'sgolay');图(t,a,t,asgolay)轴紧的legend(“原始数据”,“ Savitzky-Golay”,'地点','best')
窗户
窗户= 3
这robust Lowess method is another smoothing method that is particularly helpful when outliers are present in the data in addition to noise. Inject an outlier into the noisy data, and use robust Lowess to smooth the data, which eliminates the outlier.
Anoise(36) = 20; Arlowess = smoothdata(Anoise,'rlowess',5);图(T,Anoise,t,arlowess)轴紧的legend('Noisy Data','Robust Lowess')
Detecting Outliers
Outliers in data can significantly skew data processing results and other computed quantities. For example, if you try to smooth data containing outliers with a moving median, you can get misleading peaks or valleys.
Amedian = Smoothdata(Anoise,“ movmedian”);plot(t,Anoise,t,Amedian) axis紧的legend('Noisy Data',“移动中位数”)
这isoutlier
当检测到异常值时,功能将返回逻辑1。验证异常值的索引和价值Anoise
。
tf = isOutlier(anoise);ind = find(tf)
ind = 36
aoutlier = anoise(ind)
aoutlier = 20
You can use thefilloutliers
function to replace outliers in your data by specifying a fill method. For example, fill the outlier inAnoise
邻居的价值立即向右。
afill = filleoutliers(anoise,'下一个');图(T,Anoise,T,Afill)轴紧的legend('Noisy Data with Outlier','Noisy Data with Filled Outlier')
Nonuniform Data
Not all data consists of equally spaced points, which can affect methods for data processing. Create a约会时间
载体包含数据中数据的不规则采样时间AirReg
。这时间
矢量代表每分钟在最初的30分钟内每分钟采集的样品,然后在两天内每小时收集样本。
t0 = datetime(2014,1,1,1,1,1); timeminutes = sort(t0 + minutes(1:30)); timehours = t0 + hours(1:48); time = [timeminutes timehours]; Airreg = rand(1,length(time)); plot(time,Airreg) axis紧的
默认,smoothdata
smooths with respect to equally spaced integers, in this case,1,2,...,78
。Since integer time stamps do not coordinate with the sampling of the points inAirReg
,平滑后的前半小时数据仍然嘈杂。
Adefault = SmoothData(AirReg,',3);情节(时间,airreg,时间,默认)轴紧的legend(“原始数据”,“带有默认样品点的平滑数据”)
Many data processing functions in MATLAB®, includingsmoothdata
,movemean
, 和filloutliers
, allow you to provide sample points, ensuring that data is processed relative to its sampling units and frequencies. To remove the high-frequency variation in the first half hour of data inAirReg
, use the'SamplePoints'
随时间邮票的选项时间
。
AsamplePoints = SmoothData(airReg,',。。。hours(3),'SamplePoints',时间);绘图(时间,空气,时间,asamplepoints)轴紧的legend(“原始数据”,'Smoothed Data with Sample Points')
See Also
smoothdata
|isoutlier
|filloutliers
|movemean
|movmedian