Continuous Optimization
Accelerated scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization

https://doi.org/10.1016/j.ejor.2009.11.030Get rights and content

Abstract

An accelerated scaled memoryless BFGS preconditioned conjugate gradient algorithm for solving unconstrained optimization problems is presented. The basic idea is to combine the scaled memoryless BFGS method and the preconditioning technique in the frame of the conjugate gradient method. The preconditioner, which is also a scaled memoryless BFGS matrix, is reset when the Beale–Powell restart criterion holds. The parameter scaling the gradient is selected as a spectral gradient. For the steplength computation the method has the advantage that in conjugate gradient algorithms the step lengths may differ from 1 by two order of magnitude and tend to vary unpredictably. Thus, we suggest an acceleration scheme able to improve the efficiency of the algorithm. Under common assumptions, the method is proved to be globally convergent. It is shown that for uniformly convex functions the convergence of the accelerated algorithm is still linear, but the reduction in the function values is significantly improved. In mild conditions the algorithm is globally convergent for strongly convex functions. Computational results for a set consisting of 750 unconstrained optimization test problems show that this new accelerated scaled conjugate gradient algorithm substantially outperforms known conjugate gradient methods: SCALCG [3], [4], [5], [6], CONMIN by Shanno and Phua (1976, 1978) [42], [43], Hestenes and Stiefel (1952) [25], Polak–Ribiére–Polyak (1969) [32], [33], Dai and Yuan (2001) [17], Dai and Liao (2001) (t=1) [14], conjugate gradient with sufficient descent condition [7], hybrid Dai and Yuan (2001) [17], hybrid Dai and Yuan zero (2001) [17], CG_DESCENT by Hager and Zhang (2005, 2006) [22], [23], as well as quasi-Newton LBFGS method [26] and truncated Newton method by Nash (1985) [27].

Introduction

In this paper, we consider the following unconstrained optimization problem:minf(x),where f:RnR is continuously differentiable and its gradient is available. We are interested in elaborating an algorithm for solving large-scale cases for which the Hessian of f is either not available or requires a large amount of storage and computational costs. Plenty of conjugate gradient methods are known, and an excellent survey of these methods, with a special attention on their global convergence, is given by Hager and Zhang [24]. Different conjugate gradient algorithms correspond to different choices for the scalar parameter βk [8], [16], [21], [36], [37]. Line search in the conjugate gradient algorithms often is based on the standard Wolfe conditions. A numerical comparison of conjugate gradient algorithms with Wolfe line search, for different formulae of parameter βk computation, including the Dolan and Moré performance profile, is given in [8].

The paper presents a conjugate gradient algorithm based on a combination of the scaled memoryless BFGS method and the preconditioning technique [3], [4], [5], [6]. For general nonlinear functions a good preconditioner is any matrix that approximates 2f(x)-1, where x is a local solution of (1.1). In this algorithm the preconditioner is a scaled memoryless BFGS matrix which is reset when the Powell restart criterion holds. The scaling factor in the preconditioner is selected as spectral gradient [38].

The algorithm uses the conjugate gradient direction where the famous parameter βk is obtained by equating the conjugate gradient direction with the direction corresponding to the Newton method. Thus, we get a general formula for the direction computation, which could be particularized to include the Polak–Ribiére [32] and Polyak [33] and the Fletcher and Reeves [20] conjugate gradient algorithms, the spectral conjugate gradient (SCG) by Birgin and Martínez [11] or the algorithm of Dai and Liao [14], for t=1. This direction is then modified in a canonical manner as it was considered earlier by Oren and Luenberger [29], Oren and Spedicato [30], Perry [31] and Shanno [39], [40], [41], by means of a scaled, memoryless BFGS preconditioner placed into the Beale–Powell restart technology. This is the reason we call this a scaled memoryless BFGS preconditioned conjugate gradient algorithm. The scaling factor is computed in a spectral manner based on the inverse Rayleigh quotient, as suggested by Raydan [38]. The method could be considered as an extension of the spectral conjugate gradient (SCG) by Birgin and Martínez [11] or of a variant of the conjugate gradient algorithm by Dai and Liao [14] (for t=1) suggested to overcome the lack of positive definiteness of the matrix defining their search direction.

In [28] Jorge Nocedal articulated a number of open problems in conjugate gradient algorithms. One of them focuses on the step length. Intensive numerical experiments with conjugate gradient algorithms proved that the step length may differ from 1 up to two orders of magnitude, being larger or smaller than 1, depending on how the problem is scaled. Moreover, the sizes of the step length tend to vary in a totally unpredictable way. This is in sharp contrast with the Newton and quasi-Newton methods, as well as with the limited memory quasi-Newton methods, which usually admit the unit step length for most of the iterations and require only very few function evaluations for step length determination. Therefore, in this paper we take the advantage of this behavior of the step lengths in conjugate gradient algorithms and present an acceleration scheme, which modify the step length in such a manner to improve the reduction in functions values.

The paper is organized as follows: In Section 2 we present the scaled conjugate gradient algorithm BFGS preconditioned. The algorithm performs two types of steps: a standard one in which a double quasi-Newton updating scheme is used and a restart one where the current information is used to define the search direction. The convergence of the algorithm for strongly convex functions is proved in Section 3. In Section 4 we present an acceleration scheme of the algorithm. The idea of this computational scheme is to take advantage that the step lengths αk in conjugate gradient algorithms are very different from 1. Therefore, we suggest that we modify αk in such a manner as to improve the reduction of the function values along the iterations. In Section 5 we present the ASCALCG algorithm and we prove that for uniformly convex functions the rate of convergence of the accelerated algorithm is still linear, but the reduction in function values is significantly improved. Finally, in Section 6 we present computational results on a set of 750 unconstrained optimization problems from the CUTE [12] collection along with some other large-scale unconstrained optimization problems presented in [1]. The Dolan–Moré [19] performance profiles of ASCALCG versus some known conjugate gradient algorithms including Hestenes and Stiefel [25], Polak–Ribière–Polyak [32], [33], Dai and Yuan [17], hybrid Dai and Yuan [17], SCALCG by Andrei [3], [4], [5], [6], CONMIN by Shanno and Phua [42], [43], CG_DESCENT by Hager and Zhang [22], [23], or limited memory quasi-Newton LBFGS by Liu and Nocedal [26] and truncated Newton TN by Nash [27] prove that ASCALCG is top performer among these algorithms.

Section snippets

Scaled conjugate gradient method

The algorithm generates a sequence xk of approximations to the minimum x of f, in whichxk+1=xk+αkdk,dk+1=-θk+1gk+1+βksk,where gk=f(xk),αk is selected to minimize f(x) along the search direction dk,βk is a scalar parameter, sk=xk+1-xk and θk+1 is a scalar parameter or a matrix to be determined. The iterative process is initialized with an initial point x0 and d0=-g0.

Observe that if θk+1=1, then we get the classical conjugate gradient algorithms according to the value of the scalar parameter βk

Convergence analysis for strongly convex functions

Throughout this section we assume that f is strongly convex and fis Lipschitz continuous on the level setS={xRn:f(x)f(x0)}.That is, there exists constants μ>0 and L such that(f(x)-f(y))T(x-y)μx-y2andf(x)-f(y)Lx-y,for all x and y from S. For the convenience of the reader we include here the following lemma (see [22]).

Lemma 3.1

Assume that dk is a descent direction and f satisfies the Lipschitz conditionf(x)-f(xk)Lx-xk,for every x on the line segment connecting xk and xk+1, where L

Acceleration of the algorithm

It is common to see that in conjugate gradient algorithms the search directions tend to be poorly scaled and as a consequence the line search must perform more function evaluations in order to obtain a suitable steplength αk. In order to improve the performances of the conjugate gradient algorithms the efforts were directed to design procedures for direction computation based on the second order information. For example, CONMIN [42], and SCALCG [3], [4], [5], [6] take this idea of BFGS

ASCALCG algorithm

Having in view the above developments and the definitions of gk,sk and yk, as well as the selection procedure for θk+1 computation, the following accelerated scaled conjugate gradient algorithm can be presented.

  • Step 1.

    Initialization. Select x0Rn, and the parameters 0<ρσ<1. Compute f(x0) and g0=f(x0). Set d0=-g0 and α0=1/g0. Set k=0.

  • Step 2.

    Line search. Compute αk satisfying the Wolfe conditions (2.11), (2.12). Update the variables xk+1=xk+αkdk. Compute f(xk+1),gk+1 and sk=xk+1-xk,yk=gk+1-gk.

  • Step 3.

    Test for

Computational results and comparisons

In this section, we present the performance of a Fortran implementation of the ASCALCG – accelerated scaled conjugate gradient algorithm on a set of 750 unconstrained optimization test problems. At the same time, we compare the performance of ASCALCG with some conjugate gradient algorithms including SCALCG [3], [4], [5], [6], CONMIN [42], Hestenes–Stiefel (HS) [25], Polak–Ribière–Polyak (PRP) [32], [33], Dai–Yuan (DY) [17], Dai–Liao (DL) [14], conjugate gradient with sufficient descent CGSD [7]

Conclusion

We have presented a new conjugate gradient algorithm which mainly is an acceleration of SCALCG – scaled BFGS preconditioned conjugate gradient algorithm [3], [4], [5], [6]. The acceleration scheme is simple and proved to be robust in numerical experiments. In very mild conditions we proved that the algorithm is globally convergent. For uniformly convex functions the convergence of the accelerated algorithm is still linear, but the reduction in the function values is significantly improved. For

References (46)

  • N. Andrei, Accelerated hybrid conjugate gradient algorithm with modified secant condition for unconstrained...
  • E. Birgin et al.

    A spectral conjugate gradient method for unconstrained optimization

    Applied Mathematics and Optimization

    (2001)
  • I. Bongartz et al.

    CUTE: Constrained and unconstrained testing environments

    ACM Transactions on Mathematical Software

    (1995)
  • Y.H. Dai

    New properties of a nonlinear conjugate gradient method

    Numerische Mathematik

    (2001)
  • Y.H. Dai et al.

    New conjugate conditions and related nonlinear conjugate gradient methods

    Applied Mathematics and Optimization

    (2001)
  • Y.H. Dai et al.

    Convergence properties of the Beale–Powell restart algorithm

    Science in China (Series A)

    (1998)
  • Y.H. Dai et al.

    Global convergence of the method of shortest residuals

    Numerische Mathematik

    (1999)
  • Y.H. Dai et al.

    An efficient hybrid conjugate gradient method for unconstrained optimization

    Annals Operations Research

    (2001)
  • J.W. Daniel

    The conjugate gradient method for linear and nonlinear operator equations

    SIAM Journal on Numerical Analysis

    (1967)
  • E.D. Dolan et al.

    Benchmarking optimization software with performance profiles

    Mathematical Programming

    (2002)
  • R. Fletcher et al.

    Function minimization by conjugate gradients

    Computer Journal

    (1964)
  • J.Ch. Gilbert et al.

    Global convergence properties of conjugate gradient methods for optimization

    SIAM Journal on Optimization

    (1992)
  • W.W. Hager et al.

    A new conjugate gradient method with guaranteed descent and an efficient line search

    SIAM Journal on Optimization

    (2005)
  • Cited by (0)

    1

    Dr. Neculai Andrei is a member of the Academy of Romanian Scientists, Splaiul Independenţiei Nr. 54, Sector 5, Bucharest, Romania.

    View full text