main Content

Unconstrained Nonlinear Optimization Algorithms

Unconstrained Optimization Definition

Unconstrained minimization is the problem of finding a vectorXthat is a local minimum to a scalar functionF((X):

m在 X F (( X

术语不受限制意味着没有限制在X

Fm在unctrust-region算法

trust-Region Methods for Nonlinear Minimization

many of the methods used in Optimization Toolbox™ solvers are based ontrust regions,a simple yet powerful concept in optimization.

to understand the trust-region approach to optimization, consider the unconstrained minimization problem, minimizeF((X),,,,where the function takes vector arguments and returns scalars. Suppose you are at a pointXn-space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximateFwith a simpler function,,,,which reasonably reflects the behavior of functionF在附近naround the pointX。this neighborhood is the trust region. A trial steps是Computed by minimizing (or approximately minimizing) overn。这是信任区的子问题,
m在 s { (( s ,,,, s n } (1)

当前点已更新为X+s如果F((X+s)<F((X; otherwise, the current point remains unchanged andn,信任区域是缩小的,并且重复试验步骤计算。

定义特定信任区域的关键问题以最大程度地减少F((X)are how to choose and compute the approximation((defined at the current pointX),,,,how to choose and modify the trust regionn,,,,and how accurately to solve the trust-region subproblem. This section focuses on the unconstrained problem. Later sections discuss additional complications due to the presence of constraints on the variables.

在标准信任区域([48]),,,,the quadratic approximation是defined by the first two terms of the Taylor approximation toFX; the neighborhoodn通常是球形或椭圆形的形状。数学上通常陈述信任区子问题
m在 { 1 2 s t H s + s t G such that d s Δ } ,,,, (2)

whereG是the gradient ofF在the current pointX,,,,H是the Hessian matrix (the symmetric matrix of second derivatives),d是对角线缩放矩阵,δ是正标量表,''。'是2个norm。存在良好的算法用于解决Equation 2((see[48]); such algorithms typically involve the computation of all eigenvalues ofHand a Newton process applied to thesecular equation

1 Δ 1 s = 0。

这种算法为Equation 2。然而,they require time proportional to several factorizations ofH。这refore, for large-scale problems a different approach is needed. Several approximation and heuristic strategies, based onEquation 2,在文献中提出了[42]and[50])。随后在Op的近似方法timization Toolbox solvers is to restrict the trust-region subproblem to a two-dimensional subspaces(([39]and[42])。Once the subspaces已经计算出来解决的工作Equation 2是trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace.

这two-dimensional subspaces是determined with the aid of a下面描述的预处理共轭梯度过程。求解器定义s随着线性空间跨越s1ands2,,,,wheres1是在这个方向上of the gradientG,,,,ands2是either an approximate牛顿方向,即
H s 2 = G ,,,, (3)

or a direction ofnegative curvature,

s 2 t H s 2 < 0。 (4)

这philosophy behind this choice ofs是to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

  1. 制定二维信任区子问题。

  2. 解决Equation 2to determine the trial steps

  3. 一世FF((X+s)<F((X, 然后X=X+s

  4. Adjust Δ.

这se four steps are repeated until convergence. The trust-region dimension Δ is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e.,F((X+s)≥F((X。see[46]and[49]讨论这一方面。

优化工具箱solvers treat a few important special cases ofF特殊功能:非线性最小二乘s,,,,问uadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case. These special cases are discussed in later sections.

Preconditioned Conjugate Gradient Method

A popular way to solve large, symmetric, positive definite systems of linear equations生命值= -G是预处理共轭梯度(PCG)的方法。这种迭代方法需要计算形式的矩阵矢量产物的能力s manbetx 845H·V。wherev是an arbitrary vector. The symmetric positive definite matrixm是a预处理为了H。that is,m=C2,,,,whereC–1HC–1是a well-conditioned matrix or a matrix with clustered eigenvalues.

一世na minimization context, you can assume that the Hessian matrixH是对称的。然而,H保证是正定的吗neighborhood of a strong minimizer. Algorithm PCG exits when it encounters a direction of negative (or zero) curvature, that is,dt高清≤ 0。这PCG output directionp是either a direction of negative curvature or an approximate solution to the Newton system生命值= -G。一世neither case,phelps to define the two-dimensional subspace used in the trust-region approach discussed intrust-Region Methods for Nonlinear Minimization

Fm在unc问uasi-newton算法

不受约束优化的基础知识

Although a wide spectrum of methods exists for unconstrained optimization, methods can be broadly categorized in terms of the derivative information that is, or is not, used. Search methods that use only function evaluations (e.g., the simplex search of Nelder and Mead[30])are most suitable for problems that are not smooth or have a number of discontinuities. Gradient methods are generally more efficient when the function to be minimized is continuous in its first derivative. Higher order methods, such as Newton's method, are only really suitable when the second-order information is readily and easily calculated, because calculation of second-order information, using numerical differentiation, is computationally expensive.

Gradient methods use information about the slope of the function to dictate a direction of search where the minimum is thought to lie. The simplest of these is the method of steepest descent in which a search is performed in a direction,- ∇F((X,,,,whereF((X是目标函数的梯度。当要最小化的功能具有较长狭窄的山谷时,此方法非常低效,例如Rosenbrock's function

F (( X = 100 (( X 2 X 1 2 2 + (( 1 X 1 2 (5)

这m在imum of this function is atX= [1,1],,,,whereF((X)=0。该函数的轮廓图显示在下图,以及最小值的解决方案路径,从[-1.9,2]开始的最陡峭下降实现。1000次迭代后终止优化,与最小值距离仍然相当远。黑色区域是该方法不断从山谷的一侧曲折的地方。请注意,朝向地块的中心,当点恰好降落在山谷中心时,将采取许多更大的步骤。

Figure 5-1, Steepest Descent Method on Rosenbrock's Function

Level curves of the Rosenbrock function are close to the parabola y = x^2

this function, also known as the banana function, is notorious in unconstrained examples because of the way the curvature bends around the origin. Rosenbrock's function is used throughout this section to illustrate the use of a variety of optimization techniques. The contours have been plotted in exponential increments because of the steepness of the slope surrounding the U-shaped valley.

For a more complete description of this figure, including scripts that generate the iterative points, seeBanana Function Minimization

Quasi-Newton Methods

在使用梯度信息的方法中,最受欢迎的是准Newton方法。这些方法在每次迭代中构建曲率信息,以制定形式的二次模型问题
m在 X 1 2 X t H X + C t X + b ,,,, (6)

where the Hessian matrix,H,,,,是a positive definite symmetric matrix,C是a constant vector, andb是一个常数。当该问题的最佳解决方案发生在XGo to zero, i.e.,

F (( X * = H X * + C = 0。 (7)

这optimal solution point,X*,,,,Can be written as

X * = H 1 C (8)

牛顿型方法(与准牛顿方法相反)计算Hdirectly and proceed in a direction of descent to locate the minimum after a number of iterations. CalculatingH数值涉及大量计算。准Newton方法通过使用观察到的行为来避免这种情况F((X)andF((X建立曲率信息以进行近似Husing an appropriate updating technique.

A large number of Hessian updating methods have been developed. However, the formula of Broyden[3],,,,Fletcher[12],戈德法布[20]和Shanno[37]((BFGS) is thought to be the most effective for use in a general purpose method.

BFG给出的公式是
H k + 1 = H k + k k t k t s k H k s k s k t H k t s k t H k s k ,,,, ((9)

where

s k = X k + 1 X k ,,,, k = F (( X k + 1 F (( X k

As a starting point,H0Can be set to any symmetric positive definite matrix, for example, the identity matrix一世。为了避免黑森州的反转H,,,,you can derive an updating method that avoids the direct inversion ofH通过使用使逆黑板的近似值的公式H–1在each update. A well-known procedure is the DFP formula of Davidon[7],,,,Fletcher, and Powell[14]。this uses the same formula as the BFGS method (Equation 9)except thatk是substituted forsk

这Gradient information is either supplied through analytically calculated gradients, or derived by partial derivatives using a numerical differentiation method via finite differences. This involves perturbing each of the design variables,X,反过来并计算目标函数的变化率。

At each major iteration,k,,,,a line search is performed in the direction

d = H k 1 F (( X k (10)

准Newton方法由解决方案路径上说明Rosenbrock的功能Figure 5-2, BFGS Method on Rosenbrock's Function。这method is able to follow the shape of the valley and converges to the minimum after 140 function evaluations using only finite difference gradients.

Figure 5-2, BFGS Method on Rosenbrock's Function

Level curves of the Rosenbrock function are close to the parabola y = x^2. The iterative steps go from upper-left, down around the parabola, to upper-right.

For a more complete description of this figure, including scripts that generate the iterative points, seeBanana Function Minimization

行搜索

Line search是a search method that is used as part of a larger optimization algorithm. At each step of the main algorithm, the line-search method searches along the line containing the current point,Xk,,,,parallel to thesearch direction,这是由主要算法确定的矢量。也就是说,该方法找到了下一个迭代Xk+1of the form

X k + 1 = X k + α * d k ,,,, (11)

whereXkdenotes the current iterate,dk是the search direction, andα*是a scalar step length parameter.

线搜索方法尝试降低沿线的目标函数Xk+α*dkby repeatedly minimizing polynomial interpolation models of the objective function. The line search procedure has two main steps: