主要内容

比较多个分配适合

This example shows how to fit multiple probability distribution objects to the same set of sample data, and obtain a visual comparison of how well each distribution fits the data.

Step 1. Load sample data.

加载样本数据。

loadCarsmall.

此数据包含每加仑数英里(MPG) measurements for different makes and models of cars, grouped by country of origin (起源),模型年(model_year.)和其他车辆特征。

Step 2. Create a categorical array.

转变起源进入一个分类的数组并从样本数据中删除意大利汽车。因为只有一个意大利汽车,fitdistcannot fit a distribution to that group using other than a kernel distribution.

起源= categorical(cellstr(Origin)); MPG2 = MPG(Origin~='Italy');起源2 = Origin(Origin~='Italy');起源2 = removecats(Origin2,'Italy');

步骤3.按组适合多个分布。

采用fitdist适用于对每个原产国集团的威布尔,正常,逻辑和内核分布MPGdata.

[WeiByOrig,Country] = fitdist(MPG2,'weibull','by',origin2);[NormByOrig,Country] = fitdist(MPG2,'normal','by',origin2);[LogByOrig,Country] = fitdist(MPG2,'logistic','by',origin2);[Kerbyorig,Country] = FITDIST(MPG2,'核心','by',origin2);
WeiByOrig
Weibyorig =1×5 cell arrayColumns 1 through 2 {1x1 prob.WeibullDistribution} {1x1 prob.WeibullDistribution} Columns 3 through 4 {1x1 prob.WeibullDistribution} {1x1 prob.WeibullDistribution} Column 5 {1x1 prob.WeibullDistribution}
Country
国家=5x1细胞{'France' } {'Germany'} {'Japan' } {'Sweden' } {'USA' }

Each country group now has four distribution objects associated with it. For example, the cell arrayWeiByOrigcontains five Weibull distribution objects, one for each country represented in the sample data. Likewise, the cell array NormByOrig包含五个正常分布对象,等等。每个对象包含包含有关数据,分发和参数信息的属性。阵列Country以与分发对象存储在单元格阵列中的相同顺序中,为每个组列出每个组的原点。

步骤4.计算每个分发的PDF。

Extract the four probability distribution objects for USA and compute the pdf for each distribution. As shown in Step 3, USA is in position 5 in each cell array.

WeiUSA = WeiByOrig{5}; NormUSA = NormByOrig{5}; LogUSA = LogByOrig{5}; KerUSA = KerByOrig{5}; x = 0:1:50; pdf_Wei = pdf(WeiUSA,x); pdf_Norm = pdf(NormUSA,x); pdf_Log = pdf(LogUSA,x); pdf_Ker = pdf(KerUSA,x);

Step 5. Plot pdf the for each distribution.

将每个分配适合美国数据的PDF绘制PDF,叠加在样本数据的直方图上。为更轻松的显示标准化直方图。

Create a histogram of the USA sample data.

数据= mpg(origin2 =='美国');图直方图(数据,10,'Normalization','pdf','FaceColor',[1,0.8,0]);

Plot the pdf of each fitted distribution.

line(x,pdf_Wei,'linestyle','-','颜色','r')线(x,pdf_Norm,'linestyle','-.','颜色','b')线(x, pdf_Log'linestyle','--','颜色','g')线(x,pdf_ker,'linestyle',':','颜色','k') legend('数据','weibull','Normal','Logistic','Kernel','地点','Best') 标题('MPG for Cars from USA')Xlabel('MPG')

Figure contains an axes object. The axes object with title MPG for Cars from USA contains 5 objects of type histogram, line. These objects represent Data, Weibull, Normal, Logistic, Kernel.

在样本数据的直方图上叠加PDF曲线,提供了每种类型的分布如何适合数据的视觉比较。只有非参数内核分布康短拉comes close to revealing the two modes in the original data.

Step 6. Further group USA data by year.

To investigate the two modes revealed in Step 5, group theMPG所有原产国的数据(起源) and model year (model_year.)和使用fitdist适合每个组的内核分布。

[KerByYearOrig,Names] = fitdist(MPG,'Kernel','By',{Origin Model_Year});

Each unique combination of origin and model year now has a kernel distribution object associated with it.

Names
Names =14x1 cell{'France...' } {'France...' } {'Germany...'} {'Germany...'} {'Germany...'} {'Italy...' } {'Japan...' } {'Japan...' } {'Japan...' } {'Sweden...' } {'Sweden...' } {'USA...' } {'USA...' } {'USA...' }

Plot the three probability distributions for each USA model year, which are in positions 12, 13, and 14 in the cell arrayKerByYearOrig.

图持有fori = 12 : 14 plot(x,pdf(KerByYearOrig{i},x))endlegend('1970','1976','1982') 标题(“美国汽车的MPG由模型年”)Xlabel('MPG') 抓住离开

Figure contains an axes object. The axes object with title MPG in USA Cars by Model Year contains 3 objects of type line. These objects represent 1970, 1976, 1982.

When further grouped by model year, the pdf plots reveal two distinct peaks in theMPG美国制造的汽车数据 - 一个用于1970年的模型年份,其中一个用于1982年的模型。这解释了为什么每加仑数据组合的美国数程数的直方图显示两个峰值而不是一个。

See Also

||

相关话题