Robust Discriminant Analysis and Clustering by a Partial Minimum Integrated Squared Error Criterion
Adler, Yeshaya Adam
Scott, David W.
Doctor of Philosophy
In parametric supervised classification and unsupervised clustering traditional methods are often inadequate when data are generated under departures from normality assumptions. A class of density power divergences was introduced by Basu et al. (1998) to alleviate these problems. This class of estimators is indexed by a parameter α which balances efficiency versus robustness. It includes the maximum likelihood as a limiting case as α ↓ 0, and the special case known as L2E where α = 1 (Scott, 2001), which has been studied for its robustness properties. In this thesis, we develop two methods which utilize L2E estimation to perform discriminant analysis and modal clustering. Robust versions of discriminant analysis built on the Bayesian model usually supplant the maximum likelihood estimates by plugging robust alternatives into the discriminant rule. We develop robust discriminant analysis which does not rely on multiple plug-in estimates but rather jointly estimates model parameters. We apply these methods to simulated and applied cases and show them to be robust to departures from normality. In the second application, we explore the problem of obtaining all possible modes of a kernel density estimate. We introduce a clustering method based on the stochastic mode tree, originally developed in an unpublished manuscript of Scott and Szewczyk (2000). This method applies the multivariate partial density component L2E estimator, which includes maximum likelihood estimation as a limiting case, of Scott (2004) to locally probe the data and find all potential modes of a density. We provide an efficient implementation of the stochastic mode tree which is re-purposed to cluster the data according to its modal hierarchy. We explore the behavior of this clustering method with simulations and applied data. We develop an interactive exploratory visualization tool which relates the modal clustering of a density to the optimal weights of individual partial density components. We show how this method can be used to interactively prune the stochastic mode tree to obtain a desired cluster hierarchy. Finally, we show our hierarchical mode clustering to be useful in image thresholding and segmentation.
Statistics; Discriminant Analysis; Mode Finding