A global convergence theory for general trust-region-based algorithms for equality constrained optimization

This work presents a global convergence theory for a broad class of trust-region algorithms for the smooth nonlinear programming problem with equality constraints. The main result generalizes Powell's 1975 result for unconstrained trust-region algorithms. 
The trial step is characterized by very mild conditions on its normal and tangential components. The normal component need not be computed accurately. The theory requires a quasi-normal component to satisfy a fraction of Cauchy decrease condition on the quadratic model of the linearized constraints. The tangential component then must satisfy a fraction of Cauchy decrease condition on a quadratic model of the Lagrangian function in the translated tangent space of the constraints determined by the quasi-normal component. Estimates of the Lagrange multipliers and the Hessians are assumed only to be bounded. 
The other main characteristic of this class of algorithms is that the step is evaluated by using the augmented Lagrangian as a merit function with the penalty parameter updated using the El-Alem scheme. The properties of the step and the way that the penalty parameter is chosen are sufficient to establish global convergence. 
As an example, an algorithm is presented that can be viewed as a generalization of the Steihaug--Toint dogleg algorithm for the unconstrained case. It is based on a quadratic programming algorithm that uses a step in a quasi-normal direction to the tangent space of the constraints and then takes feasible conjugate reduced-gradient steps to solve the reduced quadratic program. This algorithm should cope quite well with large problems for which effective preconditioners are known.

1. Introduction.This work is concerned with the development of a global convergence theory for a broad class of algorithms for the equality constrained minimization problem: (EQC) minimize f(x) subject to C(x) = 0: The functions f : < n !< and C : < n !< m are at least twice continuously di erentiable where C(x) = (c 1 (x); : : : ; c m (x)) T and m < n.
Our purpose is to generalize to constrained problems a powerful theorem given in 1975 by Powell for unconstrained problems.
The global convergence theory that we establish in this work holds for a class of nonlinear programming algorithms for (EQC) that is characterized by the following features: then m c (0) ?m c (s c ) 2 krf c k min krf c k kG c k ; c : (2.2) Proof.See Powell 22].We end this section by stating Powell's powerful theorem for unconstrained trustregion algorithms.The proof can be found in Powell 22].More details about the convergence theory for trust-region algorithms for unconstrained optimization can be found in Fletcher 14], Mor e 18], Mor e and Sorensen 19], and Sorensen 26].
Theorem 2.2.Let f : < n !< be continuously di erentiable and bounded below on the level set fx 2 < n : f(x) f(x 0 )g.Assume that the sequence fG k g is uniformly bounded.If fx k g is the sequence generated by any trust-region algorithm that satis es (2.1) or (2.2), then: Notice that this theorem does not prove convergence to a solution to the unconstrained problem, rather it proves a \weak" rst order convergence.However, we do not see that as the point of this theorem, nor is it surprising given the weak assumptions on the sequence of local models.In other words, this theorem is not about convergence conditions on a quasi-Newton method.Such a theorem would be expected to be based on analyzing some way of estimating the Hessian, and we all know how important the method for estimating the Hessian is in the practical performance of a trust-region algorithm.In the unconstrained case, the version of Powell's theorem that says that the sequence of gradients converges to zero, requires the additional hypothesis that the gradient is uniformly continuous.The algorithms here would probably require a uniformly continuous reduced gradient, a strengthening of the assumptions used here.The related algorithms mentioned earlier also prove weak rst order stationary convergence, as do we.
The point of this line of research is an analysis of the local quadratic-model/trustregion paradigm for unconstrained optimization.In that context, this theorem says that the power of using a trust-region globalization is that if the rst order information is correct, then little is required of the second order information.Speci cally, the sequence of model Hessians need only be bounded.
Our theory is analogous for problem (EQC).In this case, the local model of the problem is generally taken to be a linear model of the constraints and a quadratic model of the Lagrangian.The information in the local model depends on the Lagrange multiplier estimates as well as second order information.In this paper, we identify a way to extend the unconstrained paradigm to problem (EQC) for which the only requirement is boundedness of the sequence of model Lagrange multipliers and Hessians.
The above discussion summarizes the point of this paper, which is not to give a convergence proof for a speci c SQP approach using a speci c Lagrange multiplier estimation technique and perhaps an exact merit function.3. The SQP algorithm.The Lagrangian function `: < n < m !< associated with problem (EQC) is the function `(x; ) = f(x) + T C(x); where = ( 1 ; :::; m ) T is a Lagrange multiplier vector estimate.
A common algorithm for solving problem (EQC) is the successive quadratic programming algorithm.It is an iterative procedure.At each iteration, a step s QP and associated Lagrange multiplier QP are obtained by solving the following quadratic program (QP) minimize q c (s) = 1 2 s T H c s + r x `T c s + `c subject to rC T c s + C c = 0; where the matrix H c is the Hessian of the Lagrangian at (x c ; c ) or an approximation to it.
Unfortunately, the SQP algorithm can not be guaranteed to work without modication.There is a fundamental di culty in the de nition of the SQP step because the second-order su ciency condition need not hold at each iteration.By this we mean that, the matrix H c need not be positive de nite on the null space of rC T c ; hence the QP subproblem may not have a solution or a unique solution.This di culty will not arise near a solution of problem (EQC) if the standard assumptions for Newton's method hold at the solution.For this reason, the SQP algorithm usually performs very well locally.See Tapia 28] for more details.
An e ective modi cation that deals with the lack of positive de niteness on the null space is to use a trust-region globalization strategy.This takes us to the following section.4. Existing trust-region algorithms for (EQC).A straightforward way to extend the trust-region idea to problem (EQC) is to add a trust-region constraint to bounded strict subinterval like (0:7; 0:9).This subinterval corresponds to stopping criteria on a trust-region algorithm to solve for s n c .See Mor e 18], Mor e and Sorensen 19], or Dennis and Schnabel 7].4.2.The full-space approach.The other approach to overcoming the problem of inconsistency is the full-space approach.Algorithms based on this approach compute s c at once in the whole < n space instead of considering the decomposition of the trial step.This has the advantage of avoiding the computation of a Moore-Penrose pseudoinverse solution.
The rst example we know of this category of trust-region subproblems is the CDT subproblem proposed by Celis, Dennis and Tapia 4].Instead of considering the linearized constraints rC T c s + C c = 0, they replace it by a particular inequality: krC T c s + C c k c , where c 2 <.The CDT subproblem can be written as follows The key to the CDT subproblem (and its variants) is the choice of c .For more details, see Williamson 33].Celis, Dennis, and Tapia 4]  Note that in this case the CDT subproblem minimizes the quadratic model of òver the set of steps inside the trust region that gives at least r 1 times as much decrease in the `2-norm of the residual of the linearized constraints as does the Cauchy step.
In order to prevent the possibility of a single point for the subproblem and obtain a meaningful trust-region subproblem, it is suggested that r < 1, for instance r = 0:8.

5.
A general trust-region algorithm.In this section we describe a very inclusive class of trust-region algorithms.
The typical form of trust-region algorithms for solving (EQC) is basically as follows: At the current point x c with associated multiplier estimate c , a step s c is computed by solving some trust-region subproblems, and a Lagrange multiplier estimate + is obtained by using some scheme.The point x + , where x + = x c + s c , is tested using some merit function to decide whether it is a better approximation to a solution x ? .Such merit functions often involve a penalty parameter, which is updated using some scheme.The trust-region radius is then adjusted and a new quadratic model is formed.
In our requirements on the trust-region algorithm, the way of computing the trial steps is replaced by some conditions the steps must satisfy and the estimates of the Lagrange multiplier vectors and the Hessian matrices need only be uniformly bounded.This allows the inclusion of a wide variety of trust-region algorithms and it is exactly in the spirit of Powell's Theorem 2.2 for unconstrained trust-region methods.In Section 9, we will present an example algorithm that satis es these mild conditions.5.1.Computing the trial steps.We rst write the trial step as s c = s t c + s n c ; where s t c and s n c are respectively the tangential and a quasi-normal component.We do not require that s n c be normal to the tangent space.
We will require that the components s n c and s t c satisfy a fraction of Cauchy decrease condition on appropriate model functions.At the current iterate, if C c 6 = 0, then we will require that the quasi-normal component gives at least as much decrease as s cp c = ?ncp c rC c C c on the quadratic model of the linearized constraints in a trust region of radius r c , where the step length n cp c is given by where ^ c = r c and 0 < r < 1.In words, the step s n c is chosen from the set of steps that satisfy a fraction of Cauchy decrease condition on the quadratic model of the linearized constraints inside ksk ^ c .Equivalently, s n c lies in the set S c = fs : ksk ^ c g \ fs : krC T c s + C c k 2 ( fcd c ) 2 g where ( fcd c ) 2 is given by (4.1).Because the quasi-normal component s n c is not required to be normal to the tangent space, a condition on the step is needed to ensure global convergence.In particular, the following condition is required ks n c k K 1 kC c k; (5.1) where K 1 is some positive constant independent of the iteration.If s n c is normal to the tangent space, this condition holds (see Lemma 7.1) as long as K 1 is greater than a uniform bound on the norm of the right inverse for rC(x) T .
When s n c is not normal to the tangent space, we do not suggest choosing K 1 and enforcing (5.1).Rather, we suggest (as in Section 9) that (5.1) is enforced naturally by any reasonable algorithm for computing a linearly feasible point.We will deal with the quasi-normal components of the trial steps assuming that they satisfy (5.1).We are indebted to Robert Michael Lewis for informing us of the e ectiveness of this feature in the algorithm which he has implemented to solve a PDE inverse problem 6].Speci cally, this allows special linear algebra developed for simulation constraints to be used in place of prohibitively large least-squares solutions.Now we use the quasi-normal component to pick a linear manifold M c parallel to the null-space of the constraints in which we will select the tangential component.
Let M c = fs : rC T c s = rC T c s n c g. Thus, M c \ fs = s t + s n c : ksk c g 6 = ;.
Observe that, in the set S c , we are taking a fraction of c , in order to forestall the case that M c lies too close to the boundary of the trust region of radius c .
On the manifold M c , we consider a quadratic model q c (s) of the Lagrangian function associated with problem (EQC).Then, when W T c rq c (s n c ) 6 = 0, we ask the tangential component to satisfy a fraction of Cauchy decrease condition from s n c on q c (s) reduced to M c .That is s c = s t c + s n c 2 G c \ M c , where G c = fs = s t + s n c : ksk c ; q c (s) ?q c (s n c ) q c (s n c ? t cp c W c W T c rq c (s n c )) ?q c (s n c )]g; 5.2.Updating the model Lagrange multiplier and the model Hessian.
The method for estimating the multiplier c is left unspeci ed.We only require that the sequence of estimates f k g be bounded.Any approximation to the Lagrange multiplier vector that produces a bounded sequence can be used.For example, setting k to a xed vector (or even the zero vector) for all k is valid.Similarly we require only boundedness of the sequence fH k g of approximate Hessians.Thus all H k = 0 is allowed.Note that, here, we are not addressing the question of the choice of the Lagrange multiplier and Hessian estimates that produce an e cient algorithm.We are addressing some weak assumptions on those estimates f k g and fH k g that produce a globally convergent algorithm.For example, our theory applies to a form of successive linear programming.
5.3.The choice of the merit function.Let x c be the current iterate.We need to decide if a trial step chosen to satisfy s n c 2 S c and s c = s n c + s t c 2 G c \M c is a good step, that is, if the step s c gives a new iterate x + that is a better approximation than x c to a solution, say x ?, of (EQC).In constrained optimization, the meaning of better approximation should consider improvement not only in f but also in the constraint violation kCk 2 .The evaluation of the trial step requires the choice of a merit function, which usually involves the objective function and the constraint violations.
Here, we use the augmented Lagrangian as a merit function L(x; ; ) = f(x) + T C(x) + C(x) T C(x); > 0: (5.4)This function has been used as a merit function in trust-region algorithms also by Celis, Dennis, and Tapia 4], El-Alem 9], 10] and Powell and Yuan 23].
El-Alem 10] and Powell and Yuan 23] used the formula (x) = ?(rC(x)T rC(x)) ?1 rC(x) T rf(x) for updating the Lagrange multiplier.For this particular choice of the multiplier, is a function of x and (5.4) is an exact penalty function.This means that if is su ciently large, then the solution to problem (EQC) will be an unconstrained minimizer of the penalty function.See Fletcher 12], 13].
Celis, Dennis, and Tapia 4] and El-Alem 9], on the other hand, with a particular choice of the multiplier, have treated the multiplier as an independent parameter that really only enters in the merit function for accepting the step and updating the other parameters in the algorithm.In other words, one never explicitly uses the merit function in computing the optimization step; it is used only for evaluating the steps.The e ect on the trial step computation of the multiplier estimates is in the tangential component through the estimate of the Hessian of the Lagrangian.This is a major di erence between merit function roles in trust region algorithms and in line-search algorithms.
In the context of a line-search globalization strategy, Gill, Murray, Saunders, and Wright 15] and Schittkowski 24] have considered the augmented Lagrangian as a merit function, but also as an objective function for choosing the step along the direction of search.They have treated the multiplier as an independent variable and proved global convergence for their algorithms.
In summary, we believe that having an exact penalty function as a merit function is, of course, a desirable property, especially in line-search algorithms.On the other hand, in practice, one never really knows anyway that the penalty constant has been chosen so that the exactness property holds.In 8], 9] global convergence for a particular trust-region method is shown with no assumption of exactness.
In this work, the choice of the multiplier estimate is left open and = 0 is allowed, in which case one is using the `2 penalty function as a merit function.
5.4.Evaluating the trial step.Let s c be a trial step chosen to satisfy the conditions of Section 5.1.We will accept it if su cient improvement is produced in the merit function.To measure this improvement we compare the actual reduction and predicted reduction in the merit function from the current iterate x c to the new one x + = x c + s c .The actual reduction is de ned by We will accept the step and set x + = x c + s c if Aredc Predc 1 where 1 2 (0; 1) is a xed constant.A typical value for 1 might be 10 ?4 .
5.5.Updating the trust-region radius.The strategy that we follow for updating the trust-region radius is based on the standard rules for the unconstrained case.More details can be found in Dennis and Schnabel 7] or Fletcher 14].However for our global convergence theory, we use a modi cation due to Zhang, Kim, and Lasdon 34] (see also El Hallabi and Tapia 11]) of the strategy of updating the trust-region radius.The reader will see that this modi cation is of no importance in practice; it is merely an analytic formality.At the beginning we set constants max min and each time we nd an acceptable step, we start the next iteration with a value of + min .In short, c can be reduced below min while seeking an acceptable step, but + min must hold at the beginning of the next iteration after nding an acceptable step.The following is the scheme for evaluating the step and updating the trust-region radius.

End if
It is worth noting that in practice one might have another branch in which some 3 2 2 ( 1 ; 2 ) is used to reduce the trust-region radius if 1 Aredc Predc 3 2 .A typical value for 3   2 is :1, and the motivation is to try to avoid the expense of a next unacceptable trial step.Another modi cation sometimes used in practice is to allow internal doubling.This can be viewed loosely as letting 2 in (5.7) depend on Aredc Predc .See Dennis and Schnabel, page 144, 7].The present analysis would allow these niceties, but to avoid further complication, we do not include them here.Observe that in (5.5) and (5.6) we have expressed the quantities Ared and Pred as functions of .Thus, although c does not e ect the choice of the trial step s c , we need to determine c before deciding the acceptance of the step s c .The right choice of the penalty parameter is one of the most important issues for algorithms that use the augmented Lagrangian as a merit function.This takes us to the following section.require that the sequence f k g be nondecreasing.El-Alem 8] requires that be chosen so that the predicted decrease in the merit function be at least as much as the decrease in krC T c s + C c k 2 .
We consider, as an update formula for the penalty parameter, El-Alem's scheme given in 9], since it ensures that the merit function is predicted to decrease at each iteration by at least a fraction of Cauchy decrease in the quadratic model of the constraints.This indicates compatibility with the fraction of Cauchy decrease conditions imposed on the trial steps.In addition, good performance was reported when implementing this scheme.See Williamson 33].It can be stated as follows: Algorithm 5.2.Updating the penalty parameter 1. Initialization Set ?1 = 1 and choose a small constant > 0.

At the current iterate x c , after s c has been chosen:
Compute Pred c (s c ; ? ) = q c (0)?q c (s c )? T c (C c +rC T c s c )+ ?kC c k 2 ?krCT c s c +C c k 2 ]: If Pred c (s c ; ? ) ?  2 kC c k 2 ?krC T c s c + C c k 2 ]; then set c = ?; else set c = c + , where c = 2 q c (s c ) ? q c (0

End if
The initial choice of the penalty parameter ?1 is arbitrary.However, it should be chosen consistent with the scale of the problem.Here, we take ?1 = 1 for convenience.
An immediate consequence of the above algorithm is that, at the current iteration, we have Pred c (s c ; c ) c 2 kC c k 2 ?kC c + rC T c s c k 2 ]: (5.8) 5.7.Termination of the algorithm.We use rst order necessary conditions for problem (EQC) to terminate the algorithm.The algorithm is terminated if kW T c r x `ck + kC c k " tol where " tol > 0 is a pre-speci ed constant and W c is a matrix with columns forming a basis for the null space.We require that fW k g be uniformly bounded in norm for all k.
6. Statement of the algorithm.We present a formal description of our class of nonlinear programming algorithms.Algorithm 6.1.The NLP-algorithm. step 0. (Initialization) Given x 0 , 0 , compute W 0 .Choose 0 , min , max , and " tol > 0: Set ?1 = 1 and > 0. Update ? to obtain c by using Algorithm 5.2.step 5. (Evaluate the step) Compute Ared c (s c ; c ) = `(x c ; c ) ? `(x + ; + ) + c (kC c k 2 ?kC + k 2 ): Evaluate the step and update the trust-region radius by using Algorithm 5.1.If the step is accepted then update H c and go to step 1. else go to step 2.

End if
The above represents a typical trust-region algorithm for solving problem (EQC).We leave the way of computing the trial steps unde ned.This will allow the inclusion of a wide variety of trial step calculation techniques.For similar reasons we left the way of updating the Lagrange multiplier vector and the Hessian matrix unde ned.
In the next two sections we prove global convergence of the above algorithm class.
7. The global convergence theory.Before beginning our global convergence theory, let us give an overview of the steps that comprise this theory.The trial step is chosen to satisfy a su cient predicted decrease condition, the fraction of Cauchy decrease.Note that in our algorithm, we assume that the tangential and the quasi-normal components of any trial step each satisfy this condition.In Lemma 7.2, we will express this in a technical form similar to inequality (2.2).
The de nition of predicted reduction is shown to give an approximation to the actual reduction that is accurate to within the square of the trial step length times the penalty parameter.This is proved in Lemma 7.5.However, we emphasize again that the step is not chosen to maximize the predicted decrease.
We introduce some notation for the quantities computed during the trial steps.We have not introduced this notation up to now because it obscures the simplicity of the algorithm.However, in the analysis that follows we need to show some properties of every trial step, not just the successful steps fs k g.Therefore, let i k , s i k , and i k denote the quantities set by Algorithm 6.1 as it searches for an acceptable step.Thus, 0 k = k at the rst trial step of the kth iteration, s 0 k is set by the rst time though step 2, and 0 k is set using ?1 k = k?1 the rst time through step 4. If the trial step s i k is acceptable, then s k = s i k , k = i k , and i k is updated to become k+1 .In short, the algorithm is simpler to explain and code if one counts only successful steps.However, for the analysis, one needs a way to refer unambiguously to all the trial steps.
The model Lagrange multipliers also may depend on i.However, to keep the notation as simple as possible, we do not make this dependence explicit.
The penalty parameters i k are shown to be bounded for tol > 0 as long as the algorithm does not terminate.The technique is to prove that, at any iteration k at which the penalty parameter is increased, we have: the product of the penalty parameter i k and the trust-region radius i k is bounded by a constant that does not depend on k or i (this is done in Lemma 7.10); and the sequence of the trust-region radii i k is shown to be bounded away from zero (this is shown in Lemma 7.11).The proof of this lemma shows the crucial role that is played by setting the trust region to be no smaller than min after every acceptable step.See Section 5.5.Finally, under the assumption that the algorithm does not terminate, the penalty parameter k is shown to be bounded.The proof is given in Lemma 7.12.
The algorithm is shown to be well-de ned in the sense that at a given iterate, it either terminates, or nds an acceptable step after nitely many trials.This result is proved in Theorem 8.1.Using the above results and Theorem 8.1, the trust-region radius is shown to be bounded away from zero.The proof is given in Lemma 8.2.
Finally, in Theorem 8.4, it is shown that for any " tol > 0, the algorithm always terminates, i: e:, the termination condition of the algorithm will be met after nitely many iterations.
An immediate consequence of Assumptions A4 and A5 is the existence of a constant 7 > 0 that does not depend on k such that kH k k 7 , kW T k H k k 7 , and Assumption A6 means that for all x 2 , there exists a constant 8 > 0 that does not depend on k, such that k k k 8 .
The following three subsections are devoted to presenting lemmas needed to prove global convergence.7.2.Properties of the trial step.The following lemma shows that condition (5.1) holds for the normal component s i k n of s i k when it is truly normal to the tangent space.
Lemma 7.1.At the current iterate x k , let the trial step component s i k n actually be normal to the tangent space, then under the problem assumptions, there exists a constant K 1 > 0 independent of the iterates, such that ks i k n k K 1 kC k k: Proof.Because s i k n is actually normal to the tangent space, we have The rest follows from the problem assumptions.
The following lemma expresses in a workable form the pair of fraction of Cauchy decrease conditions imposed on the trial steps.
Lemma 7.2.Let the trial steps satisfy the conditions given in step 2 of Algorithm 6.1, then under the problem assumptions there exist positive constants K 2 , K 3 , and K 4 independent of the iterates such that Proof.The proof is an application of Lemma 2.1 to the two subproblems, followed by a use of the problem assumptions and (5.3).Now we deal with the trial steps assuming that they satisfy inequalities (7.2) and (7.3).In what follows, we will use implicitly that rC T k s i k n = rC T k s i k .Lemma 7.3.Under the problem assumptions, there exists a constant K 5 > 0 independent of the iterates, such that q k (0) ?q k (s i k n ) ? k T (C k + rC T k s i k n ) ?K 5 kC k k: = ?(krx `kk + 1 2 kH k k ks i k n k) ks i k n k: Using (5.1), the fact that ks i k n k < max , k and k are bounded, kC k + rC T k s i k k kC k k, and the problem assumptions, we have q k (0) ?q k (s i k n ) ? k T (C k + rC T k s i k ) ?K 5 kC k k; and we obtain the desired result.
The following lemma gives an upper bound on the di erence between the actual reduction and the predicted reduction.
Lemma 7.4.Under the problem assumptions, there exist positive constants K 6 ; K 7 and K 8 , independent of k, such that jAred k (s i k ; i k ) ?Pred k (s i k ; i k )j K 6 ks i k k 2 + K 7 i k ks i k k 3 + K 8 i k ks i k k 2 kC k k: Proof.The proof follows directly from El-Alem 9].If the penalty parameter were uniformly bounded, the next lemma would show that the predicted reduction provides an approximation to the actual merit function's reduction that is accurate to within the square of the step length.Lemma 7.5.Under the problem assumptions, there exists a constant K 9 > 0 that does not depend on k, such that jAred k (s i k ; i k ) ?Pred k (s i k ; i k )j K 9 i k ks i k k 2 : (7.6) Proof.The proof follows directly from the above lemma and the fact that ks i k k and kC k k are bounded.
7.3.The decrease in the model.This section deals with the predicted decrease in the merit function produced by the trial step.We start with a lemma.
Lemma 7.6.Let s i k be generated by Algorithm 6.1.Then under the problem assumptions, for any positive , the predicted decrease in the merit function satis es Pred k (s i k ; ) where K 5 is as in Lemma 7.3.
Proof.We have Pred k (s i k ; ) = q k (0) ?q k (s i k ) ? k T (C k + rC T k s i k ) From (7.3) and Lemma 7.3, we have Pred k (s i k ; ) Hence the result is established.
If x k is feasible, then the predicted reduction does not depend on k , so we take k as the penalty parameter from the previous iteration.The question now is how near to feasibility must an iterate be in order that the penalty parameter need not be increased.The answer is given by the following lemma.
Proof.If the algorithm does not terminate at x k , then kW T k r x `kk + kC k k > " tol ; and since kC k k i k with "tol 3 max , therefore, kC k k "tol 3 and the reduced gradient satis es kW T k r x `kk > 2 3 " tol : Now, kW T k rq k ( 3 " tol : From Lemma 7.6, we have Pred k (s i k ; ) This completes the proof.Inequality (7.9) with = i?i k guarantees that if the algorithm does not terminate and if kC k k i k , then the penalty parameter at the current trial step does not need to be increased in step 2 of Algorithm 6.1.This is equivalent to saying that the possible increases in the penalty parameter will occur only when kC k k > i k .
Lemma 7.8.Given " tol > 0, there exists K 10 > 0, which depends on " tol , but not on k or i, such that at any trial step s i k of iteration k at which the algorithm does not terminate and kC k k i k where is as in Lemma 7.7, the following inequality holds Pred k (s i k ; i k ) K 10 i k : (7.10) Proof.Since the algorithm does not terminate and kC k k i k , where is as in (7.8), then from (7.9) and using a similar argument as in Lemma 7.7, we can write Pred k (s i k ; i k ) " tol 12 min 1 ?r we have Pred k (s i k ; i k ) K 10 i k and this is the desired result.In the next section we will discuss the role of the penalty parameter in the global convergence of the nonlinear programming algorithm.7.4.The behavior of the penalty parameter.In this section we discuss the behavior of the penalty parameter.The crucial result here is that the sequence f i k g of trust-region radii is bounded away from zero at those iterations for which the penalty parameter is increase at some trial step.This will allow us to conclude that the sequence f i k g of penalty parameters is bounded.
According to the rule for updating the penalty parameter, we use the penalty parameter from the previous trial step if the amount of predicted decrease with the old penalty parameter is at least a fraction of the decrease in the quadratic model of the linearized constraints, that is, if Pred k (s i k ; i?1 k ) i?1 k 2 kC k k 2 ?kC k + rC T k s i k k 2 ]; (7.11) then i k = i?1 k .Otherwise, we use i k = k i + , which enforces (5.8).See Section 5.6.Lemma 7.9.Let f i k g be the sequence of penalty parameters generated by the algorithm, then 1. f i k g forms a nondecreasing sequence.2. If the penalty parameter is increased, it will increase by at least .3. If the penalty parameter is not increased, then inequality (7.11) will hold.
Proof.The proof is straightforward.Lemma 7.10.Let k; i be any pair of indices such that i k is increased at the ith trial step of the kth iteration.If the algorithm does not terminate at x k , then there exists K 11 > 0 which depends on " tol but does not depend on k or i, such that for every j i, j k j k K 11 : (7.12) Proof.If i k is increased at the ith trial step of the kth iteration, then it is updated by the rule i k = 2 q k (s i k ) ? q k (0 Applying (7.2) to the left-hand side, and (7.3) and Lemma 7.3 to the right-hand side, we can obtain the following: The following lemma gives a lower bound for the sequence f i k g for those iterates at which the algorithm does not terminate and the penalty parameter is increased.
In the next section, we will be able to do away with the assumption that the penalty parameter is increased.
Lemma 7.11.Let the penalty parameter be increased at the ith trial step of the kth iteration.Then under the problem assumptions, if the algorithm does not terminate, there exists ~ , which depends on " tol but does not depend on the iterates, such that i k ~ : (7.13) Proof.To begin, we note that if i = 0, i:e: we are at the rst trial step of iteration k, then by Algorithm 5.1, k can not have gotten smaller than min during the course of the iteration.Thus, we can restrict our attention to the case where i 1.
Our proof will consist in showing the existence of ~ such that i k ~ whether or not s i k is acceptable.Remember that for all the rejected trial steps we have j+1 k = 1 ks j k k.
We consider two cases: i) kC k k > j k for all j = 0; : : : ; i.
ii) kC k k > j k does not hold for some j between 0 and i.
i) Consider the case where the constraint violation kC k k > j k for all j = 0; ; i. ii) If kC k k > j k does not hold for all j = 0; ; i, then there exists a largest index l, 0 l < i, such that kC k k l k holds.
If i = l + 1 then, from the way of updating the trust-region radius, i k = 1 ks l k k.On the other hand, if i 6 = l + 1, since kC k k > j k , for all j = l + 1; ; i, then from (7.16) we have ks j k k (1 ? 1 )K 2 minf K 3 ; rg 2K 9 kC k k; 8j = l + 1; ; i ?1: Now, because s i?1 k and s l+1 k are rejected trial steps and using kC k k > l+1 k , we can write (1 ? 1 )K 2 minf K 3 ; rg (1 ? 1 )K 2 minf K 3 ; rg From (7.5) we have jAred k (s l k ; l k ) ?Pred k (s l k ; l k )j K 6 + (K 7 + K 8 ) l k ks l k k]ks l k k l k : Therefore, jAred k (s l k ; l k ) ?Pred k (s l k ; l k )j K 6 + (K 7 + K 8 )K 14 ]ks l k k l k : (7.21) Also, since kC k k l k , then from Lemma 7.8, we have Pred k (s l k ; l k ) K 10 l k : (7.22) Using (7.21), (7.22) and the fact that s l k is rejected, we obtain 1 ? 1 < Ared k (s l k ; l k ) Pred k (s l k ; l k ) ? 1 K 6 + K 7 K 14 + K 8 K 14 ]ks l k k K 10 : Hence ks l k k (1 ? 1 )K 10 K 6 + K 7 K 14 + K 8 K 14 : (7.23) Now, using (7.20) and (7.23), we obtain the bound i k K 13 (1 ? 1 )K 10 K 6 + K 7 K 14 + K 8 K 14 = K 15 : De ning ~ = minf min ; K 12 ; K 15 g we obtain the desired bound.Now we can show that the nondecreasing sequence of penalty parameters generated by the nonlinear programming Algorithm 6.1 is bounded.
Lemma 7.12.Under the problem assumptions, if the algorithm does not terminate then there is some ?, which depends on " tol , for which lim k!1 k = ?< 1: Furthermore, there exists some index k such that k = ?for every k k .
Proof.We need to show that ?i k for all pairs k; i.Clearly, it su ces to consider the sequence i k of di erent k 's where the double index k; i means that the penalty constant was increased to be i k at the ith trial step of the kth iteration.Thus, there may be no terms or more than one term for a given k.Then from Lemma 7.10 and Lemma 7.11, we have i k Therefore f k g is a bounded sequence, and since it is nondecreasing, there exists ?< 1 such that lim k!1 k = ?: Now since the existence of ?ensures that k is bounded, and since we know that when it is increased it is increased by at least , there must be at most nitely many increases, and the proof is complete.
This last result and the following one will play crucial roles in the proof of the global convergence of Algorithm 6.1.
Lemma 7.13.Under the problem assumptions, if the algorithm does not terminate then the augmented Lagrangian is bounded on Proof.The proof is immediate from the boundedness of the penalty constant and the problem assumptions.
8. The main global convergence results.This section is devoted to presenting our main global convergence results.We start with the nite termination theorem where we show that the general nonlinear programming algorithm is well-de ned.In Section 8.2, we will present more properties of the trust-region radius sequence generated by the algorithm under the assumption that it does not terminate.In Section 8.3, we prove global convergence of our algorithm.
8.1.The nite termination theorem.The following lemma shows that the nonlinear programming Algorithm 6.1 is well-de ned in the sense that at each iteration we can nd an acceptable step after nite number of trial step computations, or equivalently, trust-region reductions.This will allow us to drop the consideration of trial steps, and only consider \successful trial steps," fs k g.Theorem 8.1.Under the problem assumptions, unless some iterate x k satis es the termination condition of Algorithm 6.1, an acceptable step from x k will be found after nitely many trial steps.
Proof.The proof follows from Theorem 5.1 of El-Alem 9].Lemma 8.2.Under the problem assumptions, assume that the algorithm does not terminate.Then there exists ?> 0, which depends on " tol but does not depend on the iterates, such that for all k; i, Proof.The proof is very similar to the proof of Lemma 7.11.
To begin, we note that if the rst trial step is acceptable, then by Algorithm 5.1, k can not have gotten smaller than min during the course of the iteration.Thus, we can restrict our attention to the case where there is at least one unsuccessful trial step.Let us assume then that we have j unsuccessful steps.Our proof will consist in showing the existence of ~ such that j k ~ whether or not s j k is acceptable, i.e., is s k .Remember that for all the rejected trial steps we have j+1 k = 1 ks j k k < j k .We consider two cases: i) kC k k > i k for all i = 0; : : : ; j.
ii) kC k k > i k does not hold for some i such that 0 < i j.
The proof of (i) is exactly the same as in the proof of Lemma 7.11, so let us proceed to (ii).
ii) Now if kC k k > i k does not hold for all i = 0; : : : ; j.As in Lemma 7.11, we let l be the largest index such that kC k k l k holds.Now, since kC k k i k for all i l, it follows from Lemma 7.8 that for all such i, Pred k (s i k ; i k ) K 10 i k .Furthermore, from Lemma 7.5, jAred k (s i k ; i k ) ?Pred k (s i k ; i k )j K 9 i k ks i k k 2 , and because the step s i k is an unacceptable step, we have The above inequality implies that, for all i l, i k ks i k k (1 ? 1 )K 10 K 9 ?: For all i > l, we have from (7.20) and the above inequality, i k K 13 ks l k k K 13 (1 ? 1 )K 10 K 9 ?: It remains only to collect the constants as in Lemma 7.11.
8.2.The global convergence results.Now we present our main global convergence result.Namely, under the problem assumptions, the general nonlinear programming algorithm generates a sequence of iterates fx k g, which has at least a subsequence that converges to a stationary point of problem (EQC).We start with a proof that if the algorithm does not terminate it will converge to a feasible point.
Theorem 8.3.Under the problem assumptions, if there exists " tol > 0, such that kW T k r x `kk + kC k k > " tol for all k, then lim k!1 kC k k = 0: ( Proof.We prove (8.2) by contradiction.We begin by assuming that there exists an in nite sequence of indices fk j g such that kC k k is bounded away from zero for all k 2 fk j g.This implies that there exists > 0 such that for all k 2 fk j g, kC k k .Now for each k j k where k is as in Lemma 7.12, we have from (5.8) and (7.2) that Pred kj kj 2 kC kj k 2 ?kC kj + rC T kj s kj k 2 ] K 2 ?
2 minfK 3 ; r ?g = K 16 > 0: Remember that we are only looking at successful steps at this point in the analysis so, L kj ?L kj+1 = Ared kj 1 Pred kj 1 K 16 > 0: Since fL k g is bounded below, a contradiction arises if we let k j go to in nity.
Theorem 8.4.Under the problem assumptions, given any " tol > 0, the algorithm terminates because kW T k r x l k k + kC k k < " tol : ( Proof.Notice that if we suppose that the algorithm does not terminate and that some subsequence of fkW T k r x `kkg converges to zero, then nontermination is immediately contradicted by Theorem 8.3. So, let us suppose that kW T k r x `kk 1 , for some 1 > 0. Since kC k k goes to zero by Theorem 8.3 and the sequence of trust-region radii is bounded below by ?, there exists an index N 1 > k such that for all k N 1 , kC k k ?k , with as in (7.8).Therefore, by Lemma 7.8 with the i taken so that s i k = s k was the successful step, and by Lemma 8.2, we have again an in nite sequence of steps in which the actual decrease in L is at least 1 K 10 ? .This contradicts the boundedness of L and completes the proof.
9. An example algorithm.In this section we propose, as an example, a particular step choice algorithm for step 2 of Algorithm 6.1.We include di erent ways for computing s n c according to the dimension of the problem.We will then state the complete algorithm for nding the trial step.Finally, in Sections 9.5 and 9.6 we will show that the trial step generated by this algorithm satis es the pair of fraction of Cauchy decrease conditions and (5.1).
The step choice algorithm we propose in this section is based on a conjugate directions method.It can be viewed as a generalization of the Steihaug-Toint dogleg algorithm for the unconstrained problem.This algorithm is much like a trust-region version of an algorithm due to Nash 20].9.1.The Steihaug-Toint dogleg algorithm.This section is devoted to describing the generalized dogleg algorithm introduced by Steihaug 27] and Toint 30], for approximating the solution of problem (TRS), (see Section 2).This algorithm is based on the linear conjugate gradient method.Set d i+1 = r i+1 + i d i .
Set i = i + 1 and go to step 1: The Steihaug-Toint dogleg algorithm is well-known for being suitable for largescale unconstrained problems.It can be used in the framework of any general trustregion algorithm for solving problem (UCMIN).9.2.Computing a quasi-normal component.We start our proposed step choice algorithm by nding a quasi-normal component s n c of the trial step.This step must satisfy a fraction of Cauchy decrease condition on the constraint norm inside the inner trust region.It determines for us which translate of the null space of the constraint Jacobian will be the one in which we choose the next iterate.
We repeat, because it is so important, that we do not require that s n c be normal to the tangent space, just that it satis es (5.1).In fact, below we will see that one way we might choose the quasi-normal component by nding a linearly feasible point and just scaling it back onto the inner trust region.9.2.1.Via Craig's algorithm.First we note that we can solve for a linearly feasible point by using Craig's algorithm on the underdetermined linear system rC T c s+ C c = 0 (see 5]).Craig's algorithm consists of making the transformation s = rC c y and applying the standard conjugate gradient algorithm to the following m m linear system rC T c rC c y + C c = 0: This implies that s craig c = s mn c = ?rCc (rC T c rC c ) ?1 C c : Furthermore, the result is the Moore-Penrose pseudoinverse constraint normal and it requires no more than m iterations.Preconditioning is very important of course, but how to do it certainly will depend on the particular application.
Therefore, we can nd the step s n c by a Steihaug-Toint version of Craig's algorithm in the inner trust region of radius r c .In this algorithm, iterates will be generated until we nd the desired constraint normal s mn c such that ks mn c k r c or until s craig j and s craig j+1 straddle the r c trust-region boundary.For the rst case, we set s n c = s mn c .
For the second case, we choose the dogleg step: s dog c 2 s craig j ; s craig j+1 ] \ fs : ksk = r c g and set s n c = s dog c .It is not di cult to prove that each Craig iterate is the `2 projection of the origin onto the subspace of the tangent space spanned by the steps up to that point and that each fs craig j g satis es (5.1).Now, the Craig steps may not give monotone increasing `2 length, so a more agressive strategy that works perfectly well with our theory is to take the last pair of Craig iterates that straddle the trust-region boundary.In either case, by convexity, s dog c also satis es (5.1).Furthermore, it is clear that s n c = s dog c satis es the fraction of Cauchy decrease condition required by step 2 of Algorithm 6.1.9.2.2.Via a linearly feasible point.There are some problems for which Craig's method might be too slow and too hard to precondition to use the \inner Steihaug-Toint" algorithm given above.Or, for reasons too technical to be of much interest here, someone might prefer to do an implementation that computes a linearly feasible point s lf c either by Craig's method or by some special application dependent methods.The point of this subsection is that when this is the case, s n c can be taken to be the projection of s lf c back onto the inner trust region.If s lf c satis es (5.1), then so does s n c .Suppose we have any linearly feasible point s lf c that satis es (5.1).Then, if it is inside the inner trust region, we can take s n c to be that point and it clearly satis es the fraction of Cauchy decrease condition required by step 2 of Algorithm 6.1.If ks lf c k r c , then we take s n c = r c ks lf c k s lf c : A classical mathematical programming way to compute a linearly feasible point that encompasses some special purpose methods we have seen for some inverse problems is as follows.In some way, divide s into so-called basic and nonbasic components.
Let us assume that we have done so, and using column pivoting, we write rC T as rC T = BjN] where B is a nonsingular matrix corresponding to the basic components of s.This corresponds to W c = ?B ?1 As long as fkB ?1 k kg is uniformly bounded by some constant , s lf c satis es (5.1)where the constant here is .This is a standard assumption for important classes of discretized optimal control problems, though it is stronger than our assumption that rC(x c ) T rC(x c )] ?1 is uniformly bounded.9.3.Computing the tangential component.We now assume that we have the quasi-normal component step s n c .We start the process of computing the tangent space component s t c by formatting the basis matrix W c 2 < n (n?m) .The columns of W c form a basis to the null space of the constraints N(rC T c ).We then transfer the constrained problem into an unconstrained trust-region problem of dimension n ?m, in the following form: minimize 1 2 s tT H c s t + rq c (s n c ) T W c s t + q(s n c ) subject to kW c s t + s n c k c ; where s t c 2 R n?m , and set s t c = W c s t c .The step s t c is the component in the tangent space of the constraints and the matrix H c = W T c H c W c 2 < (n?m) (n?m) is the reduced Hessian matrix.Now we use the Steihaug-Toint algorithm to determine s t c such that kW c s t + s n c k c .
The complete algorithm for nding the trial step is presented in the following section 9.4.Conjugate reduced gradient algorithm for EQC.Here we write, in more detail, the example algorithm for computing a trial step.Set d i+1 = r i+1 + i d i .
Set i = i + 1 and go to step 1: It is worth noting here that this way of computing the tangent step does not have the property that once a step goes outside the trust region it could not come back in were the cg iteration continued.This means that the relaxed SQP step might lie inside the trust region, but the algorithm above might not return this more desirable step if the gradient scale and trust-region scale are inconsistent.
It would be better otherwise, of course, but the steps given here will lead to convergence, and we hope that near the solution when it becomes important to take SQP steps, the trust region will be large enough to compensate for the di erence in shape.If the implementer wanted to be more agressive, there are various ways that t our theory to deal with this situation.For example, we could take the dogleg step based on the last time the cg iteration leaves the trust region rather than the rst.Our concern here is to prove convergence theorems for the weakest conditions on the algorithm, and to show that reasonable algorithms satisfy those conditions, not to advocate particular implementation details of no consequence to the theory.9.5.Su cient decrease by the steps.In this section we show that the conjugate reduced gradient algorithm produces steps that satisfy the conditions we impose on the steps in step 2 of Algorithm 6.1.In particular, we show that both the quasi-normal and the tangential components of the trial steps satisfy their respective fraction of Cauchy decrease conditions.
The following Lemma gives a bound on the reducer matrix W c .The proof is straightforward, so we will omit it.Lemma 9.3.Under the problem assumptions, if there is a uniform bound on the matrix B(x) ?1 , then the reducer matrix W(x) = ?B(x) ?1 N(x) I n?m is bounded for all x 2 .
The following lemma shows that the quasi-normal component s n c , satis es a fraction of Cauchy decrease condition on the quadratic model of the linearized constraints.
where K 2 and K 3 are constants independent of the iterates.
Proof.Suppose that we are applying Craig's algorithm to nd s n c .Let fs 1 ; s 2 ; : : :g be the sequence of iterates generated by the algorithm, hence for all i.The desired result will follow from the de nition of s lf c and Lemma 9.3.
The following lemma shows that the null-space component s t c , satis es a fraction of Cauchy decrease condition on the quadratic model of the Lagrangian.
Lemma 9.5.Let s c be a trial step generated by the algorithm.Then, under the problem assumptions, there exists a positive constant K 4 , which does not depend on x c such that q c (s n c ) ? q c (s c ) 2 kW T c rq c (s n c )k min K 4 kW T c rq c (s n c )k ; (1 ?r) 6 c : Proof.Since we are solving the reduced problem minimize 1   2 s tT H c s t + rq c (s n c ) T W c s t + q(s n c ) subject to kW c s t + s n c k c ; which is an unconstrained trust-region subproblem, the proof is immediate from Theorem 2.5 of Steihaug 27] followed by the use of the problem assumptions and Lemma 9.3.
We state the following lemma here for completeness.Lemma 9.6.The quasi-normal component computed by our proposed step choice algorithm satis es ks n c k K 1 kC c k; where K 1 is a positive constant independent of c.
Proof.The proof is given with the discussion of how to compute a quasi-normal component.See Section 9.2.
10. Discussion and concluding remarks.We have established a global convergence theory for a broad class of nonlinear programming algorithms for the smooth problem with equality constraints.The class includes algorithms based on the fullspace approach and the tangent-space approach.The family is characterized by generating steps that satisfy very mild conditions on the normal and tangential components.The normal component satis es a fraction of Cauchy decrease condition on the quadratic model of the linearized constraints and the tangential component satis es a fraction of Cauchy decrease condition on the quadratic model of the Lagrangian function associated with the problem, reduced to the tangent space of the constraints.Of course the step, which is the sum of these components, satis es both conditions.
The augmented Lagrangian was chosen as a merit function.The scheme for updating the penalty parameter is the one proposed by El-Alem 9] since it predicts that the merit function is decreased at each iteration be at least a fraction of Cauchy decrease on the quadratic model of the linearized constraints.This indicates compatibility with the fraction of Cauchy decrease conditions imposed on the trial steps.
In presenting the algorithm, we have left open the way of computing the trial steps to satisfy the double fraction of Cauchy decrease condition.This will allow the inclusion of a wide variety of trial step calculation techniques.For the same reason we have left unspeci ed the way of approximating the Lagrange multiplier vector and the Hessian matrix.
With respect to the trial steps, we have suggested an algorithm of the class that should work quite well for large problems.The algorithm is a generalization of the Steihaug-Toint dogleg algorithm for the unconstrained case.This algorithm was one we had in mined as motivation for the convergence theory.
The least-squares or projection formula can be used as a scheme for estimating the multiplier since it ts the condition imposed on the multiplier updating scheme.Namely, under the standard assumptions, it produces bounded multipliers for the local models.For large problems, = ?B ?1 r B f is likely to be a much preferable formula because of the cost of the least-squares solution.Furthermore, this will match better with the reducer matrix W, especially for problems where B can be easily identi ed.See Dennis and Lewis 6].In either case, the uniform boundedness of f k g follows from the problem assumptions.
The exact Hessian matrix perhaps can be gotten by using automatic di erentiation or an adjoint integration approach.See Bischof et al: 1].However, an approximation to the Hessian of the Lagrangian can be used.Also, for example, setting H k to a xed matrix (e: g: H k = 0) for all k is valid.The question of how to use a secant approximation of the Hessian of the Lagrangian in order to produce a more e cient algorithm is a research topic.We believe that Tapia 29] will be of considerable value here.
A related question that has to be looked at is the search for preconditioners to produce more e cient algorithms.We believe that the reducer matrix W should play a role in that search.See Dennis and Lewis 6].This theory is developed for the equality constrained case, but it can be applied to the general case, by one of the strategies known as EQP and IQP.Here, we mean that in the EQP strategy the choice of the active set is made outside the algorithm that determines the step while in the IQP strategy, that choice is made inside the procedure that determines the step.Since the active set may change at each iteration, the choice of the submatrix B, will be strongly a ected.Certainly, this is an important topic that deserves to be investigated.11.Acknowledgements.We wish to thank Richard Byrd for many helpful comments and the referees for pointing out many unclear points, which we hope have been clari ed.We thank Robert Michael Lewis and the referees for pointing out the importance of dealing with the quasi-normal component.We especially thank Luis Vicente for his careful and insightful reading.
choose c based on a fraction of Cauchy decrease condition on krC T c s+C c k 2 .They ask the step to satisfy, for some r 1 2 (0; 1], kC c k 2 ?kC c + rC T c sk 2 r 1 fkC c k 2 ?krC T c s cp c + C c k 2 g: Ared c (s c ; c ) = L(x c ; c ; c ) ?L(x + ; + ; c ) (5.5) = `(x c ; c ) ? `(x + ; + ) + c (kC c k 2 ?kC + k 2 ); and the predicted reduction is de ned to be Pred c (s c ; c ) = L(x c ; c ; c ) ?Q(s c ; c ; c ) (5.6) where Q(s c ; c ; c ) = `(x c ; c )+r x `(x c ; c ) T s c + 1 2 s T c H c s c +( c ) T (C c +rC T c s c )+ c (kC c + rC T c s c k 2 ):

5. 6 .
The penalty parameter.Numerical experience with nonlinear programming algorithms that use the augmented Lagrangian as a merit function has shown that good performance of the algorithm depends on keeping the penalty parameter as small as possible.See Gill, Murray, Saunders and Wright 16].On the other hand, global convergence theories developed by El-Alem 8], 9] and Powell and Yuan 23],

step 1 .
(Test for convergence) If kW T c r x `(x c )k + kC(x c )k " tol then terminate.End if step 2. (Compute a trial step) If x c is feasible then a) nd a step s t c that satis es a fraction of Cauchy decrease condition on the quadratic model q c (s) of the Lagrangian around x c .(This might be done by solving a trust-region subproblem since s n c = 0 is available.See Section 5.1) b) Set s c = s t c .else (* C(x c ) 6 = 0 *) a) Compute a quasi-normal step s n c that satis es a fraction of Cauchy decrease condition on the square norm quadratic model of the linearized constraints.(See Section 5.1) b) If W T c rq(s n c ) = 0 then set s t c = 0 else nd s t c that satis es a fraction of Cauchy decrease condition on the quadratic model q c (s n c + s) from s n c .(Perhaps not by solving a speci c trust-region subproblem.See Section 5.1) End if c) Set s c = s n c + s t c .End if step 3. (Update c ) Choose an estimate + of the Lagrange multiplier vector.Set c = + ?c .step 4. (Update the penalty parameter) we set K 13 = minf 1 ;2 1

Algorithm 9 . 1 .
Steihaug-Toint dogleg algorithm for (TRS) Given x c , c , and c < 1. step 0: (Initialization) Set ŝ0 = 0. Set r 0 = ?(Gc ŝ0 + rf c ). Set d 0 = r 0 .Set i = 0. step 1: Compute i = d T i G c d i .If i > 0 then go to step 2 .Otherwise (* d i is a direction of negative or zero curvature *) compute > 0 such that kŝ i + d i k = c .Set s c = ŝi + d i and terminate.step 2: Compute i = krik 2 i .Set ŝi+1 = ŝi + i d i .If kŝ i k < c go to step 3: Otherwise (* the step is too long, take the dogleg step *) compute > 0 such that kŝ i + d i k = c .Set s c = ŝi + d i and terminate.step 3: Compute r i+1 = r i ?i G c d i .If kri+1k kr0k c ; then set s c = ŝi+1 and terminate.
c + N c s N ); and then if we choose s N = 0 and s B = ?B ?1 c C c , a feasible point will be s lf c = (s B ; s N ) T = (?B ?1 c C c ; 0) T :

Algorithm 9 . 2 .
The CRG step choice algorithm Given x c 2 < n , c > 0, and c < 1.I.FEASIBILITY: 1) If x c is feasible go to II. 2) Determine s n c .(* Use, for example, s n c = s dog c or s n c = r ks lf c k s lf c and s lf c = (?B ?1 c C c ; 0) T .*) II.MINIMIZATION: (* Find s c by applying the CRG/Steihaug-Toint algorithm, starting from s = s n c *) step 0: (Initialization) Set ŝ0 = s n c .Set r 0 = ?W T c (H c s n c + r x `c).Set d 0 = r 0 .Set i = 0. step 1: Compute i = d T i H c d i .If i > 0 then go to step 2:, otherwise (* d i is a direction of negative or zero curvature *) compute > 0 such that kŝ i + d i k = c .Set s c = ŝi + d i and terminate.step 2: Compute i = krik 2 i .Set ŝi+1 = ŝi + i d i .If kŝ i k < c go to step 3:, otherwise (* the step is too long, take the dogleg step *) compute > 0 such that kŝ i + d i k = c .Set s c = ŝi + d i and terminate.step 3: Compute r i+1 = r i ?i W T c H c d i .

Lemma 9 . 4 .
Let s c be a step generated by Algorithm 9.2 at the current iterate.Then s c satis es a fraction of Cauchy decrease condition on the quadratic model of the linearized constraints, i: e:, kC c k 2 ?kC c + rC T c s c k 2 K 2 kC c k minfr c ; K 3 kC c kg; s i = arg minfkrC T c s + C c k; s 2 spanfp 1 ; : : : ; p i gg: Assume that ks i k r c and ks i+1 k r c .Therefore s dog c = s i + (1 ?)s i+1 with 2 0; 1]: It is easy to see that krC T c s i + C c k krC T c s cp c + C c k and krC T c s i+1 + C c k krC T c s cp c + C c k: By convexity, krC T c s dog c + C c k krC T c s cp c + C c k: Thus, kC c k 2 ?kC c + rC T c s dog c k 2 kC c k 2 ?kC c + rC T c s cp c k 2 : Thus we can apply Lemma 2.1.Now suppose that s n c is given by s n c = c s lf c with c = r c ks lf c k when ks lf c k > r c and c = 1 otherwise.When c = 1, we have kC c k 2 ?krC T c s n c + C c k 2 = kC c k 2 ?krC T c s lf c + C c k 2 = kC c k 2 : When c < 1, we have kC c k 2 ?kC c + rC T c s n c k 2 = kC c k 2 ?kC c + c rC T c s lf c k 2 kC c k 2 ?(1 ?c ) kC c k + c kC c + rC T c s lf c k ] 2 = 1 ?(1 ?c ) 2 ] kC c k 2 c kC c k 2 : Lemma 7.7.Assume that the algorithm does not terminate at the current iterate.