Nonlinear Regression

What Are Parametric Nonlinear Regression Models?

Parametric nonlinear models represent the relationship between a continuous response variable and one or more continuous predictor variables in the form

y=f(X,β)+ε,

哪里

  • yis ann- 1-1响应变量的观察矢量。

  • f是任何功能Xβ这会评估每一行X和矢量一起βto compute the prediction for the corresponding row ofy

  • Xis ann-by-pmatrix of predictors, with one row for each observation, and one column for each predictor.

  • βis ap-1-1估计未知参数的向量。

  • εis ann-by-1 vector of independent, identically distributed random disturbances.

In contrast, nonparametric models do not attempt to characterize the relationship between predictors and response with model parameters. Descriptions are often graphical, as in the case of决策树

fitnlmattempts to find values of the parametersβthat minimize the mean squared differences between the observed responsesy和模型的预测f(X,β). To do so, it needs a starting valueBeta0.before iteratively modifying the vectorβ到具有最小平均方形错误的向量。

准备数据

To begin fitting a regression, put your data into a form that fitting functions expect. All regression techniques begin with input data in an arrayX和response data in a separate vectory, or input data in a table or dataset arraytbl作为列的响应数据tbl。输入数据的每一行代表一个观察。每列代表一个预测器(变量)。

For a table or dataset arraytbl,表示响应变量'responsevar'name-value pair:

mdl = fitlm(tbl,'responsevar','血压');

响应变量默认为最后一列。

你不能使用categorical非线性回归的预测变量。分类预测器是从固定的可能性集中获取值的预测器。

代表缺失数据NaN对于输入数据和响应数据。

Dataset Array for Input and Response Data

例如,从Excel创建数据集数组®spreadsheet:

ds = dataset('XLSFile','hospital.xls',......'ReadObsNames',真正);

To create a dataset array from workspace variables:

loadCarsmall.ds = dataset(Weight,Model_Year,MPG);

Table for Input and Response Data

To create a table from an Excel spreadsheet:

tbl =可读取的('hospital.xls',......'ReadRowNames',真正);

To create a table from workspace variables:

loadCarsmall.tbl =表(重量,model_year,mpg);

Numeric Matrix for Input Data and Numeric Vector for Response

For example, to create numeric arrays from workspace variables:

loadCarsmall.x = [重量马力圆柱体model_year];y = mpg;

To create numeric arrays from an Excel spreadsheet:

[X, Xnames] = xlsread('hospital.xls');y = x(:,4);%反应Y是收缩压X(:,4) = [];% remove y from the X matrix

Notice that the nonnumeric entries, such assex,不要出现在X

代表非线性模型

有几种方法可以代表非线性模型。使用最方便的。

非线性模型是必需的输入fitnlm, 在里面modelfuninput.

fitnlmassumes that the response functionf(X,β)在参数中是光滑的β。If your function is not smooth,fitnlm不能提供最佳参数估计。

函数句柄到匿名函数或函数文件

The function handle @modelfun(b,x)accepts a vectorb和matrix, table, or dataset arrayx。函数句柄应该返回向量f与相同的行数x。例如,函数文件hougen.mcomputes

hougen ( b , x ) = b ( 1 ) x ( 2 ) - x ( 3 ) / b ( 5 ) 1 + b ( 2 ) x ( 1 ) + b ( 3 ) x ( 2 ) + b ( 4 ) x ( 3 )

通过输入来检查功能type hougenat the MATLAB®command line.

function yhat = hougen(beta,x) %HOUGEN Hougen-Watson model for reaction kinetics. % YHAT = HOUGEN(BETA,X) gives the predicted values of the % reaction rate, YHAT, as a function of the vector of % parameters, BETA, and the matrix of data, X. % BETA must have 5 elements and X must have three % columns. % % The model form is: % y = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3) % % Reference: % [1] Bates, Douglas, and Watts, Donald, "Nonlinear % Regression Analysis and Its Applications", Wiley % 1988 p. 271-272. % Copyright 1993-2004 The MathWorks, Inc. % B.A. Jones 1-06-95. b1 = beta(1); b2 = beta(2); b3 = beta(3); b4 = beta(4); b5 = beta(5); x1 = x(:,1); x2 = x(:,2); x3 = x(:,3); yhat = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3);

You can write an anonymous function that performs the same calculation ashougen.m

modelfun = @(b,x)(b(1)* x(:,2) -  x(:,3)/ b(5))。/ ...(1 + b(2)* x(:,1)+ B(3)* x(:,2)+ b(4)* x(:,3));

公式的文本表示

For data in a matrixX并在矢量中的响应y:

  • 代表公式使用'x1'as the first predictor (column) inX,'x2'as the second predictor, etc.

  • 代表参数的矢量优化'b1','b2'

  • Write the formula as'y ~ (mathematical expressions)'

例如,表示对反应数据的响应:

modelfun ='Y〜(B1 * x2  -  x3 / b5)/(1 + b2 * x1 + b3 * x2 + b4 * x3)';

对于表或数据集数组中的数据,您可以使用表示为从表或数据集数组表示为变量名称的公式。将响应变量名放在公式左侧,然后放置一个~,然后是表示响应公式的字符载体。

此示例显示如何创建字符向量以表示对该响应的响应反应数据集数组中的数据。

  1. Load the反应data.

    负荷反应
  2. Put the data into a dataset array, where each variable has a name given inxn要么yn

    ds = dataset({reactants,xn(1,:),xn(2,:),xn(3,:)},... {rate,yn});
  3. 检查数据集数组的第一行。

    ds(1,:) ans = Hydrogen n_Pentane Isopentane ReactionRate 470 300 10 8.55
  4. Write thehougenformula using names in the dataset array.

    modelfun = ['反式〜(b1 * n_pentane  - 异戊烷/ b5)/'...'(1 +氢气* b2 + n_pentane * b3 +等戊烷* b4)modelfun =  - 反式〜(b1 * n_pentane  - 异戊胺/ b5)/ ...(1 +氢* B2 + N_PENTANE * B3 +等孔* B4)

Choose Initial Vector beta0

The initial vector for the fitting iterations,Beta0.,可以大大影响所得拟合模型的质量。Beta0.gives the dimensionality of the problem, meaning it needs the correct length. A good choice ofBeta0.leads to a quick, reliable model, while a poor choice can lead to a long computation, or to an inadequate model.

It is difficult to give advice on choosing a goodBeta0.。If you believe certain components of the vector should be positive or negative, set yourBeta0.to have those characteristics. If you know the approximate value of other components, include them inBeta0.。但是,如果您不知道好的值,请尝试随机向量,例如

Beta0.= randn(nVars,1); % or beta0 = 10*rand(nVars,1);

适合数据的非线性模型

The syntax for fitting a nonlinear regression model using a table or dataset arraytblis

MDL.= fitnlm(tbl,modelfun,beta0)

使用数字阵列拟合非线性回归模型的语法X和numeric response vectoryis

MDL.= fitnlm(X,y,modelfun,beta0)

For information on representing the input parameters, see准备数据,代表非线性模型,和Choose Initial Vector beta0

fitnlm假设表或数据集数组中的响应变量tblis the last column. To change this, use the响应官员name-value pair to name the response column.

检查质量并调整拟合的非线性模型

There are diagnostic plots to help you examine the quality of a model.plotDiagnostics(mdl)gives a variety of plots, including leverage and Cook's distance plots.plotresids(MDL)gives the difference between the fitted model and the data.

There are also properties ofMDL.这与模型质量有关。mdl.rmse.在数据和拟合模型之间给出均方根误差。MDL.。Residuals.Rawgives the raw residuals.MDL.。Diagnosticscontains several fields, such as Leverage and CooksDistance, that can help you identify particularly interesting observations.

此示例显示了如何使用诊断,剩余和切片图检查拟合的非线性模型。

加载样本数据。

load反应

创建作为函数的非线性速率模型reactantsusing thehougen.mfunction.

Beta0.= ones(5,1); mdl = fitnlm(reactants,......rate,@hougen,beta0);

Make a leverage plot of the data and model.

plotDiagnostics(mdl)

有一点具有高杠杆率。找到点。

[~,maxl] = max(mdl.Diagnostics.Leverage)
maxl = 6

Examine a residuals plot.

plotresids(MDL,'fitted')

Nothing stands out as an outlier.

Use a slice plot to show the effect of each predictor on the model.

plotSlice(mdl)

You can drag the vertical dashed blue lines to see the effect of a change in one predictor on the response. For example, drag the X2 line to the right, and notice that the slope of the X3 line changes.

Predict or Simulate Responses Using a Nonlinear Model

此示例显示了如何使用这些方法预测,feval,和randomto predict and simulate responses to new data.

Randomly generate a sample from a Cauchy distribution.

rng('default')x = rand(100,1);x =棕褐色(pi * x  -  pi / 2);

根据模型生成响应Y = B1 *(PI / 2 + atan((x - b2)/ b3))和add noise to the response.

modelfun = @(b,x) b(1) *......(pi/2 + atan((x - b(2))/b(3))); y = modelfun([12 5 10],X) + randn(100,1);

Fit a model starting from the arbitrary parametersb= [1,1,1].

beta0 = [1 1 1];% An arbitrary guessMDL.= fitnlm(X,y,modelfun,beta0)
MDL.= Nonlinear regression model: y ~ b1*(pi/2 + atan((x - b2)/b3)) Estimated Coefficients: Estimate SE tStat pValue ________ _______ ______ __________ b1 12.082 0.80028 15.097 3.3151e-27 b2 5.0603 1.0825 4.6747 9.5063e-06 b3 9.64 0.46499 20.732 2.0382e-37 Number of observations: 100, Error degrees of freedom: 97 Root Mean Squared Error: 1.02 R-Squared: 0.92, Adjusted R-Squared 0.918 F-statistic vs. zero model: 6.45e+03, p-value = 1.72e-111

拟合值在参数的几个百分比范围内[12,5,10]。

Examine the fit.

plotSlice(mdl)

预测

The预测方法预测平均响应,如果请求,则给出置信界限。找到预测的响应值,并预测关于响应的置信区间Xvalues [-15;5;12].

xnew = [-15; 5; 12];[Ynew,Ynewci] =预测(MDL,Xnew)
ynew =3×15.4122 18.9022 26.5161
ynewci =3×24.8233 6.0010 18.4555 19.3490 25.0170 28.0151

置信区间反映在切片图中。

feval

Thefevalmethod predicts the mean responses.feval当您从数据集数组构造模型时,通常比预测更方便。

Create the nonlinear model from a dataset array.

ds = dataset({X,'X'},{y,'y'}); mdl2 = fitnlm(ds,modelfun,beta0);

找到预测的模型响应(CDF)Xvalues [-15;5;12].

xnew = [-15; 5; 12];ynew = feval(mdl2,Xnew)
ynew =3×15.4122 18.9022 26.5161

random

Therandom方法模拟新的随机响应值,等于平均预测以及随机干扰与训练数据相同的差异。

xnew = [-15; 5; 12];ysim =随机(mdl,xnew)
ysim =3×16.0505 19.0893 25.4647

Rerun the random method. The results change.

ysim =随机(mdl,xnew)
ysim =3×16.3813 19.2157 26.6541