This thesis proposes and analyzes a new parallel-in-time gradient-type method for time-dependent optimal control problems. When the classical gradient method is applied to time-dependent optimal control problems, each iteration requires the forward solution of the state equations followed by the backward solution of the adjoint equations before the gradient can be computed and the controls can be updated. The solution of the state equations and the adjoint equations is computationally expensive, is carried out sequentially in the time dimension, and consumes most of the computation time. The proposed new parallel-in-time gradient-type method introduces parallelism by splitting the time domain into N subdomains and executes the forward and backward computation in each time subdomain in parallel using state and adjoint variables at time subdomain boundaries from the last optimization iteration as initial values. Ignoring communication cost and assuming computation load balance, N parallel-in-time gradient-type iterations can be executed in the computing time required by a single classical gradient iteration. The basic proposed parallel-in-time gradient-type method is also generalized to allow for different time domain partitions for forward and backward computations and overlapping time subdomains.
Due to the time domain decomposition, the state and the adjoint equations are not satisfied, since states and adjoints exhibit jump discontinuities at the time subdomain boundaries. Therefore the control update direction in the parallel gradient-type method is not a negative gradient direction and existing convergence theories for the gradient method do not apply. I provide convergence analyses for the new parallel-in-time gradient-type method applied to discrete-time optimal control problems. For linear-quadratic discrete-time optimal control problems, I show that the new parallel-in-time gradient-type method can be interpreted as a multiple-part splitting iteration scheme where control update in one iteration is determined by control variable iterates from multiple previous iterations, and I prove convergence of the new method for sufficiently small fixed step size by showing that the spectral radius of a corresponding implicitly constructed iteration matrix is less than one. For general non-linear discrete-time optimal control problems the parallel-in-time gradient-type method is combined with metric projection onto a closed convex set to handle simple control constraints. Convergence is proved for sufficiently small step sizes. Convergence theorems are given with different assumptions on the problem, such as convex objective function or compact control constraints. For linear-quadratic optimal control problems, I also interpret the parallel-in-time gradient-type method using a multiple shooting reformulation of the optimization problem. The new method can be seen as using a gradient-type iteration to solve the optimality saddle point system in the multiple shooting formulation. An alternative convergence proof is given for linear-quadratic problems by the multiple shooting point of view.
The new parallel-in-time gradient-type method is applied to linear-quadratic optimal control problems governed by linear advection-diffusion partial differential equations (PDEs), and to a well-rate optimization problem governed by a system of highly non-linear PDEs that models two-phase immiscible incompressible subsurface flow. For linear-quadratic optimal control problem, each iteration of the parallel gradient-type method with N time subdomains takes roughly 1/N-th of the time required for one iteration of the classical gradient method. For moderate N (up to N=50 in one example) time subdomains, the parallel gradient-type method converges in approximately the same number of iterations as the classical gradient method and thus exhibits excellent scaling. For larger N, the parallel gradient-type method may use significantly more iterations than the classical gradient method, which negatively impacts scaling for large N. The parallelization in time is on top of parallelization already used to solve the state and adjoint equations (e.g., through parallel linear solvers/preconditioners). This is exploited for the larger and more complex well-rate optimization problem. If existing parallelism in space scales well up to K processors, the addition of time domain decomposition scales well up to K * N processors for small to moderate number N of time subdomains.