高阵列的可视化
可视化大数据集需要以某种方式汇总,归纳或采样数据,以减少屏幕上绘制的点数。在某些情况下,诸如直方图
和馅饼
bin the data to reduce the size, while other functions such asplot
和分散
use a more complex approach that avoids plotting duplicate pixels on the screen. For problems where the pixel overlap is relevant to the analysis, thebinscatter
功能还提供了一种可视化密度模式的有效方法。
Visualizing tall arrays doesnot需要使用gather
。MATLAB®immediately evaluates and displays visualizations of tall arrays. Currently, you can visualize tall arrays using the functions and methods in this table.
功能 | Required Toolboxes | Notes |
---|---|---|
plot |
— | These functions plot in iterations, progressively adding to the plot as more data is read. During the updates, a progress indicator shows the proportion of data that has been plotted. Zooming and panning is supported during the updating process, before the plot is complete. To stop the update process, press the pause button in the progress indicator. |
分散 |
— | |
binscatter |
— | |
直方图 |
— | |
直方图2 |
— | |
馅饼 |
— | 仅用于可视化分类数据。 |
BINSCATTERPLOT (Statistics and Machine Learning Toolbox) |
统计和机器学习工具箱™ | Figure contains a slider to control the brightness and color detail in the image. The slider adjusts the value of the |
ksdensity (Statistics and Machine Learning Toolbox) |
统计和机器学习工具箱 | Produces a probability density estimate for the data, evaluated at 100 points for univariate data, or 900 points for bivariate data. |
datasample (Statistics and Machine Learning Toolbox) |
统计和机器学习工具箱 |
|
Tall Array Plotting Examples
此示例显示了几种可视化高阵列的不同方式。
为AirlinesMall.CSV
data set, which contains rows of airline flight data. Select a subset of the table variables to work with and remove rows that contain missing values.
ds = tabularTextDatastore('airlinesmall.csv','TreatAsMissing','na');ds.selectedVariablenames = {“年”,'Month','arrdelay','DepDelay','起源','dest'}; T = tall(ds); T = rmmissing(T)
T = Mx6 tall table Year Month ArrDelay DepDelay Origin Dest ____ _____ ________ ________ _______ _______ 1987 10 8 12 {'LAX'} {'SJC'} 1987 10 8 1 {'SJC'} {'BUR'} 1987 10 21 20 {'SAN'} {'SMF'} 1987 10 13 12 {'BUR'} {'SJC'} 1987 10 4 -1 {'SMF'} {'LAX'} 1987 10 59 63 {'LAX'} {'SJC'} 1987 10 3 -2 {'SAN'} {'SFO'} 1987 10 11 -1 {'SEA'} {'LAX'} : : : : : : : : : : : :
按月航班的饼图
Convert the numericMonth
variable into a categorical variable that reflects the name of the month. Then plot a pie chart showing how many flights are in the data for each month of the year.
T.Month = categorical(T.Month,1:12,{'扬','feb','Mar','apr','May','Jun','Jul','Aug','Sep','oct','Nov','dec'})
t = mx6高桌子年度ard ardelay depdelay起源____ ______________________________________________________________________________________________________________11878111871987198719871987198719871987198719878片{'1987年12月8日12 {'lax'} {'sjc'} 1987年10月8日1 {'sjc'} {'sjc'} {'bur'}'san'} {'smf'} 1987年10月13日12 {'bur'} {'sjc'} 1987 oct 4 -1 {'smf'} {'lax'} 1987 oct 59 63 {'lax'} {'sjc {'sjc {'sjc'} 1987年10月3日-2 {'san'} {'sfo'} 1987年10月11日-1 {'sea'} {'lax'} :: :: :: :: :: :: ::: :: :: :: ::: :: :: :: :: :: :: ::
馅饼(T.Month)
使用本地MATLAB会话评估高高的表达: - 通过2:of 2:在1.3秒完成 - 第2次,共2个:在3.1秒内完成的1秒评估完成
Histogram of Delays
Plot a histogram of the arrival delays for each flight in the data. Since the data has a long tail, limit the plotting area using theBinLimits
name-value pair.
直方图(T.ArrDelay,“二手限制”,[-50 150])
使用本地MATLAB会话评估高高的表达: - 通过2:of 2:在2.3秒内完成 - 第2秒:完成在3.9秒内完成的0.86秒评估
scatter
Plot a scatter plot of arrival and departure delays. You can expect a strong correlation between these variables since flights that leave late are also likely to arrive late.
When operating on tall arrays, theplot
,分散
, 和binscatter
功能绘制迭代中的数据,随着读取更多数据的读取,逐渐添加到图中。在更新过程中,图的顶部有一个进度指标,显示绘制了多少数据。在绘图完成之前,在更新过程中支持缩放和平宁。万博1manbetx
散布(T.Arrdelay,T.Depdelay)Xlabel(“到达延迟”)ylabel('Departure Delay')xlim([-140 1000])Ylim([-140 1000])
The progress bar also includes aPause/Resume按钮。显示足够的数据后,请使用按钮尽早停止图。
适合趋势线
Use thepolyfit
和多腔
functions to overlay a linear trend line on the plot of arrival and departure delays.
hold上p = polyFit(T.arrdelay,T.Depdelay,1);x = stort(t.arrdelay,1);yp = polyval(p,x);情节(x,yp,'r-') 抓住离开
Visualize Density
点的散点图有助于到一定点,但是如果点广泛重叠,则很难从图中解密信息。在这种情况下,它有助于可视化图中点的点密度以发现趋势。
Use thebinscatter
在到达和出发延迟图中可视化点密度的功能。
binscatter(T.ArrDelay,T.DepDelay,'XLimits',[-100 1000],'YLimits',[ - 100 1000])xlim([ - 100 1000])ylim([ - 100 1000])xlabel(“到达延迟”)ylabel('Departure Delay')
调整攀登
轴的属性使所有大于150的bin值都相同。这样可以防止几个具有非常大值的垃圾箱主导地块。
ax = gca;ax.clim = [0 150];