Sequential quantization is a constrained quantization method in which elements of a real-valued vector are sequentially mapped (quantized) onto a finite set. These quantization systems are constrained in that they are not allowed to jointly process their data. Information is, however, shared to varying degrees among the quantizers. This thesis focuses on the design of these systems when the quantized data are used as the input to a classifier. The performance criteria are the f -divergences whose connections to optimal classification are well-known. Priority is placed on understanding how a sequential quantizer's structure affects performance, estimation strategies, and computational complexity. Structure serves as an unifying concept that can help assess the benefits, if any, of inter-quantizer communications in sequential systems. Four nonparametric estimation strategies are proposed and analyzed. The conditional estimation strategy mirrors the operation of sequential quantization structures and successively maximizes conditional divergences. The local estimation strategy optimizes the marginal divergences associated with the outputs of the component quantizers. Both of these strategies decompose into simpler optimization problems whose solutions are known, and though generally suboptimal, these strategies can produce optimal estimates. Conditional and local estimates also automatically satisfy a sequential quantizer's structural constraints and are highly scalable. The joint estimation strategy is based on a uniform fine resolution partition, simulated annealing, and mechanisms to ensure adherence to a sequential quantizer's structural constraints. Compared to the conditional or local strategies, the method produces superior estimates, but is more computationally demanding. The computational burden is, however, tempered by a sequential quantizer's structure, thus making the joint strategy a practical design method for some scenarios. Finally, we construct an empirical estimator using empirical risk minimization. It is shown that the estimation loss, that is, the divergence loss caused by using an empirical estimate relative to the optimal sequential quantizer, decays no worse than n -1/2 , where n, denotes the number of observations from each distribution. It is also shown that rates as fast as n -1 are possible depending on a particular assumption on the underlying distributions.