The alternating direction method of multipliers (ADMM) has been revived in recent years due to its effectiveness at solving many large-scale and distributed optimization problems, particularly arising from the areas of compressive sensing, signal and image processing, machine learning and statistics. This thesis makes important generalizations to ADMM as well as extending its convergence theory. We propose a generalized ADMM framework that allows more options of solving the subproblems, either exactly or approximately. Such generalization is of great practical importance because it brings more flexibility in solving the subproblems efficiently, thereby making the entire algorithm run faster and scale better for large problems. We establish its global convergence and further show its linear convergence under a variety of scenarios, which cover a wide range of applications. The derived rate of convergence also provides some theoretical guidance for optimizing the parameters of the algorithm. In addition, we introduce a simple technique to improve an existing convergence rate from O(1/k) to o(1/k). Moreover, we introduce a parallel and multi-block extension to ADMM for solving convex separable problems with multiple blocks of variables. The algorithm decomposes the original problem into smaller subproblems and solves them in parallel at each iteration. It can be implemented in a fully distributed manner and is particularly attractive for solving certain large-scale problems. We show that extending ADMM straightforwardly from the classic Gauss-Seidel setting to the Jacobi setting, from two blocks to multiple blocks, will preserve convergence if the constraint coefficient matrices are mutually near-orthogonal and have full column-rank. For general cases, we propose to add proximal terms of different kinds to the subproblems, so that they can be solved in flexible and efficient ways and the algorithm converges globally at a rate of o(1/k). We introduce a strategy for dynamically tuning the parameters of the algorithm, often leading to faster convergence in practice. Numerical results are presented to demonstrate the efficiency of the proposed algorithm in comparison with several existing parallel algorithms. We also implemented our algorithm on Amazon EC2, an on-demand public computing cloud, and report its performance on very large-scale basis pursuit problems with distributed data.