Main Content

线性混合效果模型工作流程

此示例显示如何适应和分析线性混合效果模型(LME)。

Load the sample data.

load流感

流感数据集数组有一个日期variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the CDC).

重新组织并绘制数据。

要适合线性混合效果模型,您的数据必须处于正确格式化的数据集数组中。为了用流感速率作为响应来适应线性混合效应模型,将与区域对应的九个列结合到阵列中。新数据集数组,流感2那must have the response variable泛滥,名义上的变量Region这表明了每个估计来自哪个区域,全国范围内的估计WTDILI.,以及分组变量日期

流感2 = stack(flu,2:10,'newdatavarname''泛骨'......'IndVarName''Region');流感2.Date = nominal(flu2.Date);

定义流感2作为一个table

flu2 = dataset2table(flu2);

Plot flu rates versus the nationwide estimate.

plot(flu2.WtdILI,flu2.FluRate,'ro')Xlabel('WtdILI')ylabel('流感率'的)

您可以看到地区的流感率与全国范围内的估计有直接的关系。

Fit an LME model and interpret the results.

Fit a linear mixed-effects model with the nationwide estimate as the predictor variable and a random intercept that varies by日期

lme = fitlme(flu2,'脆性〜1 + wtdili +(1 |日期)'的)
lme = Linear mixed-effects model fit by ML Model information: Number of observations 468 Fixed effects coefficients 2 Random effects coefficients 52 Covariance parameters 2 Formula: FluRate ~ 1 + WtdILI + (1 | Date) Model fit statistics: AIC BIC LogLikelihood Deviance 286.24 302.83 -139.12 278.24 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue {'(Intercept)'} 0.16385 0.057525 2.8484 466 0.0045885 {'WtdILI' } 0.7236 0.032219 22.459 466 3.0502e-76 Lower Upper 0.050813 0.27689 0.66028 0.78691 Random effects covariance parameters (95% CIs): Group: Date (52 Levels) Name1 Name2 Type Estimate {'(Intercept)'} {'(Intercept)'} {'std'} 0.17146 Lower Upper 0.13227 0.22226 Group: Error Name Estimate Lower Upper {'Res Std'} 0.30201 0.28217 0.32324

这small$p$- 0.0045885和3.0502E-76的值表明截距和全国范围内的估计都很重要。此外,随机效应项的标准偏差的置信度限制,$\sigma_{b}$那do not include 0 (0.13227, 0.22226), which indicates that the random-effects term is significant.

Plot the raw residuals versus the fitted values.

数字();plotResiduals(lme,'fitted'的)

残留物的方差随着拟合响应值的增加而增加,称为异源性。

在右上方找到两个看起来像异常值的两个观察。

find(residuals(lme) > 1.5)
ans = 98 107

通过去除这些观察来改装模型。

lme = fitlme(flu2,'脆性〜1 + wtdili +(1 |日期)''排除',[98,107]);

Improve the model.

确定是否包括全国范围内估计的独立随机术语日期改善模型。

altlme = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (WtdILI-1|Date)'......'排除'那[98,107])
ALTLME =线性混合效果模型适合ML型号信息:观测数量466固定效果系数2随机效果系数104协方差参数3公式:脆性〜1 + WTDILI +(1 |日期)+(WTDILI |日期)模型拟合统计:AIC BIC loglikelihie偏差179.39 200.11 -84.694 169.39固定效果系数(95%CIS):名称估计SE TSTAT DF PVALUE {'(拦截)'} 0.17837 0.054585 3.2676 464 0.01165 {'WTDILI'} 0.70836 0.030594 23.153 464 2.1234-79下较高0.0711 0.28563 0.64824 0.64824 0.76849随机效应协方差参数(95%CIS):组:日期(52级)Name1 Name2 eStimate {'(拦截)'} {'(拦截)'} {'std'} 0.16631下高0.129770.21313组:日期(52级)Name1 Name2型估计{'Wtdili'} {'Wtdili'} {'Wtdili'} {'STD'} 4.6672E-08 NaN上NAN组:误差名估计下上部{'res std'} 0.26691 0.249340.28572

估计的标准偏差WTDILI.术语几乎是0,无法计算其置信区间。这表明该模型是过度公路化的(WTDILI-1 |日期)term is not significant. You can formally test this using the比较方法如下:比较(lme,altlme,'checknesting',true)

Add a random effects-term for intercept grouped by Region to the initial modellme

lme2 = fitlme(flu2,'脆性〜1 + wtdili +(1 |日期)+(1 |地区)'......'排除',[98,107]);

Compare the modelslmelme2

比较(lme,lme2,'CheckNesting',真的)
ANS =理论似然比测试模型DF AIC BIC LOGLIK LRSTAT DELTADF PVALUE LME 4 177.39 193.97 -84.694 LME2 5 62.265 82.986 -26.133 117.12 1 0

$p$-value of 0 indicates thatlme2is a better fit thanlme

Now, check if adding a potentially correlated random-effects term for the intercept and national average improves the modellme2

lme3 = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (1 + WtdILI|Region)'......'排除'那[98,107])
LME3 =线性混合效果模型适合ML型号信息:观测数量466固定效果系数2随机效应系数70协方差参数5公式:脆〜1 + WTDILI +(1 |日期)+(1 + WTDILI |)模型FIT统计:AIC BIC loglikelihiale Deviance 13.338 42.348 0.33076 -0.66153固定效果系数(95%CIS):名称估计SE TSTAT DF PVALUE {'(拦截)'} 0.1795 0.054953 3.2665 464 0.1.2665 464 0.11697 {'WTDILI'} 0.70719 0.04252 16.632 464 4.6451e-49较高0.071514 0.28749 0.62363 0.79074 0.79074随机效应协方差参数(95%CIS):组:日期(52级)Name1 name2 yexims {'(拦截)'} {'(拦截)'} {'std'} 0.17634Upper 0.14093 0.22064组:区域(9级)Name1 name2 yexims {'(拦截)'} {'(intercepty)'} {'std'} 0.0077037 {'wtdili'} {'(拦截)'} {'corr'} -0.059603 {'wtdili'} {'wtdili'} {'wtdili'} {'std'} {'std'} {'std'} 0.088069下高3.1945e-16 1.8578e + 11 -0.99996 0.99995 0.051693 0.15004组:错误名称rightATE下上部{'res std'} 0.20976 0.19568 0.22486

按区域分组的随机效应项的标准偏差的估计为0.0077037,其置信区间非常大并且包括零。这表明按区域分组的截距随机效应是微不足道的。随机效应与拦截与截取的相关性WTDILI.is -0.059604. Its confidence interval is also very large and includes zero. This is an indication that the correlation is not significant.

Refit the model by eliminating the intercept from the(1 + WtdILI | Region)随机效应项。

lme3 = fitlme(flu2,'FluRate ~ 1 + WtdILI + (1|Date) + (WtdILI - 1|Region)'......'排除'那[98,107])
LME3 =线性混合效果模型适用于ML型号信息:观测数量466固定效果系数2随机效应系数61协方差参数3公式:脆〜1 + WTDILI +(1 |日期)+(WTDILI |地区)模型拟合统计:AIC BIC LOGLIKELIHIES DEVIANE 9.3395 30.06 0.33023 -0.66046固定效果系数(95%CIS):名称估计SE TSTAT DF PVALUE {'(拦截)'} 0.1795 0.054892 3.2702 464 0. 0.70718 0.042486 16.645 464 49 16.645 464 49较低的0.071637 0.28737 0.28737 0.62369 0.62369 0.79067 0.79067 0.79067随机效应协方差参数(95%CIS):组:日期(52级)Name1 Name2 estimate {'(拦截)'} {'(拦截)'} {'std'} 0.17633下高0.140920.22062组:区域(9级)Name1 name2型估计{'wtdili'} {'wtdili'} {'std'} {'std'} {'std'} 0.087925 0.054474上部0.14192组:误差名估计下上部{'res std'} 0.20979 0.19585 0.22473

All terms in the new modellme3是显着的。

Comparelme2lme3

比较(lme2,lme3,'CheckNesting',真的,'NSim',100)
ans =模拟的似然比检验:NSIM = 100,ALPHA = 0.05 Model DF AIC BIC LogLik LRStat pValue lme2 5 62.265 82.986 -26.133 lme3 5 9.3395 30.06 0.33023 52.926 0.009901 Lower Upper 0.00025064 0.053932

$p$- 0.009901的值表示lme3is a better fit thanlme2

为模型添加二次固定效果术语lme3

lme4 = fitlme(flu2,'脆性〜1 + wtdili ^ 2 +(1 |日期)+(Wtdili  -  1 |地区)'......'排除'那[98,107])
LME4 =线性混合效果模型适用于ML型号信息:观测数量466固定效果系数3随机效果系数61协方差参数3公式:裂解〜1 + WTDILI + WTDILI ^ 2 +(1 |日期)+(WTDILI |区域)模型拟合统计:AIC BIC loglikeliheal Deviance 6.7234 31.588 2.6383 -5.2766固定效果系数(95%CIS):名称估计SE TSTAT DF pvalue {'(拦截)'} -0.063406 0.12236 -0.51821 463 0.60456 {'wtdili'} 1.0594 0.165546.3996 463 3.8232E-10 {'WTDILI ^ 2'} -0.096919 0.0441 -2.1977 463 0.028463下高级-0.30385 0.17704 0.7704 0.17704 0.73406 1.3847 -0.18358 -0.010259随机效果协方差参数(95%CIS):组:日期(52级)名称1名称类型估计{'(拦截)'} {'(拦截)'} {'std'} {'std'} 0.16732下高0.13326 0.21009组:区域(9级)Name1 name2类型估计{'wtdili'} {'wtdili'} {''STD'} 0.087865 0.054443上部0.1418组:误差名估计下{'res std'} 0.20979 0.19585 0.22473

$p$- 0.028463的值表明二次术语的系数wtdili ^ 2是显着的。

绘制拟合的反应与观察到的反应和残差。

f =安装(LME4);r =响应(LME4);数字();绘图(r,f,'rx')Xlabel('回复')ylabel('适合'的)

拟合与观察到的响应值形成近45度的角度,表示良好的合适。

绘制残差与装配值。

数字();plotresidss(LME4,'fitted'的)

Although it has improved, you can still see some heteroscedasticity in the model. This might be due to another predictor that does not exist in the data set, hence not in the model.

找到区域围系的拟合流感值,日期11/6/2005。

F(flu2.Region =='笼罩'&flu2.date ==.'11/6/2005'的)
ANS = 1.4860.

Randomly generate response values.

Randomly generate response values for a national estimate of 1.625, region MidAtl, and date 4/23/2006. First, define the new table. Because Date and Region are nominal in the original table, you must define them similarly in the new table.

tblnew.date =名义('4/23/2006');tblnew.WtdILI = 1.625; tblnew.Region = nominal('MidAtl');tblnew = struct2table(tblnew);

现在,生成响应值。

random(lme4,tblnew)
ans = 1.2679.