Generalizations of the Alternating Direction Method of Multipliers for Large-Scale and Distributed Optimization
Doctor of Philosophy
Due to the dramatically increasing demand for dealing with "Big Data", efficient and scalable computational methods are highly desirable to cope with the size of the data. The alternating direction method of multipliers (ADMM), as a versatile algorithmic tool, has proven to be very effective at solving many large-scale and structured optimization problems, particularly arising from the areas of compressive sensing, signal and image processing, machine learning and applied statistics. Moreover, the algorithm can be implemented in a fully parallel and distributed manner to process huge datasets. These benefits have mainly contributed to the recent renaissance of ADMM for modern applications. This thesis makes important generalizations to ADMM to improve its flexibility and efficiency, as well as extending its convergence theory. Firstly, we allow more options of solving the subproblems either exactly or approximately, such as linearizing the subproblems, taking one gradient descent step, and approximating the Hessian. Often, when subproblems are expensive to solve exactly, it is much cheaper to compute approximate solutions to the subproblems which are still good enough to guarantee convergence. Although it may take more iterations to converge due to less accurate subproblems, the entire algorithm runs faster since each iteration takes much less time. Secondly, we establish the global convergence of these generalizations of ADMM. We further show the linear convergence rate under a variety of scenarios, which cover a wide range of applications in practice. Among these scenarios, we require that at least one of the two objective functions is strictly convex and has Lipschitz continuous gradient, along with certain full rank conditions on the constraint coefficient matrices. The derived rate of convergence also provides some theoretical guidance for optimizing the parameters of the algorithm. In addition, we introduce a simple technique to improve an existing convergence rate from O(1/k) to o(1/k). Thirdly, we introduce a parallel and multi-block extension to ADMM for solving convex separable problems with N blocks of variables. The algorithm decomposes the original problem into N smaller subproblems and solves them in parallel at each iteration. It is well suited to distributed computing and is particularly attractive for solving certain large-scale problems. We show that extending ADMM straightforwardly from the classic Gauss-Seidel setting to the Jacobi setting, from 2 blocks to N blocks, will preserve convergence if the constraint coefficient matrices are mutually near-orthogonal and have full column-rank. For general cases, we propose to add proximal terms of different kinds to the N subproblems so that they can be solved in flexible and efficient ways and the algorithm converges globally at a rate of o(1/k). We introduce a strategy for dynamically tuning the parameters of the algorithm, often leading to substantial acceleration of the convergence in practice. Numerical results are presented to demonstrate the efficiency of the proposed algorithm in comparison with several existing parallel algorithms. We also implemented our algorithm on Amazon EC2, an on-demand public computing cloud, and report its performance on very large-scale basis pursuit problems with distributed data.
Alternating direction method of multipliers; Convergence rate; Parallel and distributed optimization