evaluating Goodness of Fit

How to Evaluate Goodness of Fit

将数据与一个或多个型号拟合后，您应该评估拟合的好处。对曲线拟合应用中显示的拟合曲线的视觉检查应该是您的第一步。除此之外，工具箱还提供了这些方法来评估对线性和非线性参数拟合的拟合优度：

As is common in statistical literature, the termgoodness of fit一世sused here in several senses: A “good fit” might be a model

考虑到最小二乘拟合的假设，您的数据可能会合理地来自
一世nwhich the model coefficients can be estimated with little uncertainty
that explains a high proportion of the variability in your data, and is able to predict new observations with high certainty

一个特定的应用程序可能决定还是其他aspects of model fitting that are important to achieving a good fit, such as a simple model that is easy to interpret. The methods described here can help you determine goodness of fit in all these senses.

these methods group into two types: graphical and numerical. Plotting residuals and prediction bounds are graphical methods that aid visual interpretation, while computing goodness-of-fit statistics and coefficient confidence bounds yield numerical measures that aid statistical reasoning.

一般而言，图形措施比数值措施更有益，因为它们允许您一次查看整个数据集，并且它们可以轻松显示模型和数据之间的广泛关系。数值度量更狭窄地集中在数据的特定方面，并经常试图将该信息压缩到单个数字中。实际上，根据您的数据和分析要求，您可能需要使用两种类型来确定最佳拟合度。

请注意，根据这些方法，您的任何拟合都可能不适合您的数据。在这种情况下，可能需要选择其他模型。所有合适的措施也可能表明特定拟合是合适的。但是，如果您的目标是提取具有物理含义的合适系数，但是您的模型不能反映数据的物理，则结果系数是无用的。在这种情况下，了解您的数据所代表的内容以及如何测量数据与评估拟合优度同样重要。

合适的统计数据

在使用图形方法评估拟合度的优点之后，您应该检查拟合优度统计数据。曲线拟合工具箱™软件支持参数模型的这些拟合优度统计信息：万博1manbetx

the sum of squares due to error (SSE)
r-square
Adjusted R-square
root mean squared error (RMSE)

对于当前的拟合，这些统计数据显示在resultspane in the Curve Fitting app. For all fits in the current curve-fitting session, you can compare the goodness-of-fit statistics in the适合表。

要在命令行中获取拟合优度统计信息：

在曲线拟合应用中，选择Fit>save to Workspaceto export your fit and goodness of fit to the workspace.
specify thegofoutput argument with the合身功能。

sum of Squares Due to Error

该统计量衡量响应值从拟合到响应值的总偏差。它也称为残差的求和广场，通常标记为sse。

$s s e = \sum_{一世 = 1}^{n} w_{一世} {（（ y_{一世} - {\hat{y}}_{一世} ）}^{2}$

接近0的值表示模型具有较小的随机误差组件，并且拟合对于预测更有用。

r-Square

this statistic measures how successful the fit is in explaining the variation of the data. Put another way, R-square is the square of the correlation between the response values and the predicted response values. It is also called the square of the multiple correlation coefficient and the coefficient of multiple determination.

r-square is defined as the ratio of the sum of squares of the regression (ssr）and the total sum of squares (sst）。ssr被定义为

$s s r = \sum_{一世 = 1}^{n} w_{一世} {（（ {\hat{y}}_{一世} - \overset{}{y} ）}^{2}$

sst也称为均值的平方之和，定义为

$s s t = \sum_{一世 = 1}^{n} w_{一世} {（（ y_{一世} - \overset{}{y} ）}^{2}$

在哪里sst=ssr+sse。Given these definitions, R-square is expressed as

$r-square = \frac{s s r}{s s t} = 1 - \frac{s s e}{s s t}$

R-square可以在0到1之间采用任何值，其值接近1，表明该模型将考虑更大比例的方差。例如，R平方值为0.8234意味着拟合解释了有关平均值数据中总差异的82.34％。

If you increase the number of fitted coefficients in your model, R-square will increase although the fit may not improve in a practical sense. To avoid this situation, you should use the degrees of freedom adjusted R-square statistic described below.

请注意，对于不包含恒定术语的方程式，有可能获得负R平方。由于R平方定义为拟合所解释的方差比例，因此，如果拟合实际上比拟合水平线差，则R-square为负。在这种情况下，R平方不能解释为相关的平方。这种情况表明应将恒定项添加到模型中。

Degrees of Freedom Adjusted R-Square

该统计量使用上面定义的R平方统计量，并根据剩余自由度对其进行调整。剩余的自由度定义为响应值的数量n减去拟合系数的数量mestimated from the response values.

v=n-m

v指示涉及的独立信息的数量ndata points that are required to calculate the sum of squares. Note that if parameters are bounded and one or more of the estimates are at their bounds, then those estimates are regarded as fixed. The degrees of freedom is increased by the number of such parameters.

the adjusted R-square statistic is generally the best indicator of the fit quality when you compare two models that arenested— that is, a series of models each of which adds additional coefficients to the previous model.

$调整后的R平方 = 1 - \frac{s s e （（ n - 1 ）}{s s t （（ v ）}$

the adjusted R-square statistic can take on any value less than or equal to 1, with a value closer to 1 indicating a better fit. Negative values can occur when the model contains terms that do not help to predict the response.

root Mean Squared Error

this statistic is also known as the fit standard error and the standard error of the regression. It is an estimate of the standard deviation of the random component in the data, and is defined as

$r m s e = s = \sqrt{m s e}$