Continuous OptimizationAccelerated scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization
Introduction
In this paper, we consider the following unconstrained optimization problem:where is continuously differentiable and its gradient is available. We are interested in elaborating an algorithm for solving large-scale cases for which the Hessian of f is either not available or requires a large amount of storage and computational costs. Plenty of conjugate gradient methods are known, and an excellent survey of these methods, with a special attention on their global convergence, is given by Hager and Zhang [24]. Different conjugate gradient algorithms correspond to different choices for the scalar parameter [8], [16], [21], [36], [37]. Line search in the conjugate gradient algorithms often is based on the standard Wolfe conditions. A numerical comparison of conjugate gradient algorithms with Wolfe line search, for different formulae of parameter computation, including the Dolan and Moré performance profile, is given in [8].
The paper presents a conjugate gradient algorithm based on a combination of the scaled memoryless BFGS method and the preconditioning technique [3], [4], [5], [6]. For general nonlinear functions a good preconditioner is any matrix that approximates , where is a local solution of (1.1). In this algorithm the preconditioner is a scaled memoryless BFGS matrix which is reset when the Powell restart criterion holds. The scaling factor in the preconditioner is selected as spectral gradient [38].
The algorithm uses the conjugate gradient direction where the famous parameter is obtained by equating the conjugate gradient direction with the direction corresponding to the Newton method. Thus, we get a general formula for the direction computation, which could be particularized to include the Polak–Ribiére [32] and Polyak [33] and the Fletcher and Reeves [20] conjugate gradient algorithms, the spectral conjugate gradient (SCG) by Birgin and Martínez [11] or the algorithm of Dai and Liao [14], for . This direction is then modified in a canonical manner as it was considered earlier by Oren and Luenberger [29], Oren and Spedicato [30], Perry [31] and Shanno [39], [40], [41], by means of a scaled, memoryless BFGS preconditioner placed into the Beale–Powell restart technology. This is the reason we call this a scaled memoryless BFGS preconditioned conjugate gradient algorithm. The scaling factor is computed in a spectral manner based on the inverse Rayleigh quotient, as suggested by Raydan [38]. The method could be considered as an extension of the spectral conjugate gradient (SCG) by Birgin and Martínez [11] or of a variant of the conjugate gradient algorithm by Dai and Liao [14] (for suggested to overcome the lack of positive definiteness of the matrix defining their search direction.
In [28] Jorge Nocedal articulated a number of open problems in conjugate gradient algorithms. One of them focuses on the step length. Intensive numerical experiments with conjugate gradient algorithms proved that the step length may differ from 1 up to two orders of magnitude, being larger or smaller than 1, depending on how the problem is scaled. Moreover, the sizes of the step length tend to vary in a totally unpredictable way. This is in sharp contrast with the Newton and quasi-Newton methods, as well as with the limited memory quasi-Newton methods, which usually admit the unit step length for most of the iterations and require only very few function evaluations for step length determination. Therefore, in this paper we take the advantage of this behavior of the step lengths in conjugate gradient algorithms and present an acceleration scheme, which modify the step length in such a manner to improve the reduction in functions values.
The paper is organized as follows: In Section 2 we present the scaled conjugate gradient algorithm BFGS preconditioned. The algorithm performs two types of steps: a standard one in which a double quasi-Newton updating scheme is used and a restart one where the current information is used to define the search direction. The convergence of the algorithm for strongly convex functions is proved in Section 3. In Section 4 we present an acceleration scheme of the algorithm. The idea of this computational scheme is to take advantage that the step lengths in conjugate gradient algorithms are very different from 1. Therefore, we suggest that we modify in such a manner as to improve the reduction of the function values along the iterations. In Section 5 we present the ASCALCG algorithm and we prove that for uniformly convex functions the rate of convergence of the accelerated algorithm is still linear, but the reduction in function values is significantly improved. Finally, in Section 6 we present computational results on a set of 750 unconstrained optimization problems from the CUTE [12] collection along with some other large-scale unconstrained optimization problems presented in [1]. The Dolan–Moré [19] performance profiles of ASCALCG versus some known conjugate gradient algorithms including Hestenes and Stiefel [25], Polak–Ribière–Polyak [32], [33], Dai and Yuan [17], hybrid Dai and Yuan [17], SCALCG by Andrei [3], [4], [5], [6], CONMIN by Shanno and Phua [42], [43], CG_DESCENT by Hager and Zhang [22], [23], or limited memory quasi-Newton LBFGS by Liu and Nocedal [26] and truncated Newton TN by Nash [27] prove that ASCALCG is top performer among these algorithms.
Section snippets
Scaled conjugate gradient method
The algorithm generates a sequence of approximations to the minimum of f, in whichwhere is selected to minimize f(x) along the search direction is a scalar parameter, and is a scalar parameter or a matrix to be determined. The iterative process is initialized with an initial point and .
Observe that if , then we get the classical conjugate gradient algorithms according to the value of the scalar parameter
Convergence analysis for strongly convex functions
Throughout this section we assume that f is strongly convex and is Lipschitz continuous on the level setThat is, there exists constants and L such thatandfor all x and y from S. For the convenience of the reader we include here the following lemma (see [22]). Lemma 3.1 Assume that is a descent direction and satisfies the Lipschitz conditionfor every x on the line segment connecting and , where L
Acceleration of the algorithm
It is common to see that in conjugate gradient algorithms the search directions tend to be poorly scaled and as a consequence the line search must perform more function evaluations in order to obtain a suitable steplength . In order to improve the performances of the conjugate gradient algorithms the efforts were directed to design procedures for direction computation based on the second order information. For example, CONMIN [42], and SCALCG [3], [4], [5], [6] take this idea of BFGS
ASCALCG algorithm
Having in view the above developments and the definitions of and , as well as the selection procedure for computation, the following accelerated scaled conjugate gradient algorithm can be presented.
- Step 1.
Initialization. Select , and the parameters . Compute and . Set and . Set .
- Step 2.
Line search. Compute satisfying the Wolfe conditions (2.11), (2.12). Update the variables . Compute and .
- Step 3.
Test for
Computational results and comparisons
In this section, we present the performance of a Fortran implementation of the ASCALCG – accelerated scaled conjugate gradient algorithm on a set of 750 unconstrained optimization test problems. At the same time, we compare the performance of ASCALCG with some conjugate gradient algorithms including SCALCG [3], [4], [5], [6], CONMIN [42], Hestenes–Stiefel (HS) [25], Polak–Ribière–Polyak (PRP) [32], [33], Dai–Yuan (DY) [17], Dai–Liao (DL) [14], conjugate gradient with sufficient descent CGSD [7]
Conclusion
We have presented a new conjugate gradient algorithm which mainly is an acceleration of SCALCG – scaled BFGS preconditioned conjugate gradient algorithm [3], [4], [5], [6]. The acceleration scheme is simple and proved to be robust in numerical experiments. In very mild conditions we proved that the algorithm is globally convergent. For uniformly convex functions the convergence of the accelerated algorithm is still linear, but the reduction in the function values is significantly improved. For
References (46)
A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization
Applied Mathematics Letters
(2007)A Dai–Yuan conjugate gradient algorithm with sufficient descent and conjugacy conditions for unconstrained optimization
Applied Mathematics Letters
(2008)Acceleration of conjugate gradient algorithms for unconstrained optimization
Applied Mathematics and Computation
(2009)The conjugate gradient method in extreme problems
USSR Computational Mathematics and Mathematical Physics
(1969)An unconstrained optimization test functions collection
Advanced Modeling and Optimization. An Electronic International Journal
(2008)An acceleration of gradient descent algorithm with backtracking for unconstrained optimization
Numerical Algorithms
(2006)Scaled conjugate gradient algorithms for unconstrained optimization
Computational Optimization and Applications
(2007)Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization
Optimization Methods and Software
(2007)A scaled nonlinear conjugate gradient algorithm for unconstrained optimization
Optimization. A Journal of Mathematical Programming and Operations Research
(2008)Numerical comparison of conjugate gradient algorithms for unconstrained optimization
Studies in Informatics and Control
(2007)
A spectral conjugate gradient method for unconstrained optimization
Applied Mathematics and Optimization
CUTE: Constrained and unconstrained testing environments
ACM Transactions on Mathematical Software
New properties of a nonlinear conjugate gradient method
Numerische Mathematik
New conjugate conditions and related nonlinear conjugate gradient methods
Applied Mathematics and Optimization
Convergence properties of the Beale–Powell restart algorithm
Science in China (Series A)
Global convergence of the method of shortest residuals
Numerische Mathematik
An efficient hybrid conjugate gradient method for unconstrained optimization
Annals Operations Research
The conjugate gradient method for linear and nonlinear operator equations
SIAM Journal on Numerical Analysis
Benchmarking optimization software with performance profiles
Mathematical Programming
Function minimization by conjugate gradients
Computer Journal
Global convergence properties of conjugate gradient methods for optimization
SIAM Journal on Optimization
A new conjugate gradient method with guaranteed descent and an efficient line search
SIAM Journal on Optimization
Cited by (0)
- 1
Dr. Neculai Andrei is a member of the Academy of Romanian Scientists, Splaiul Independenţiei Nr. 54, Sector 5, Bucharest, Romania.