A STUDY OF PROJECTION PURSUIT METHODS (MULTIVARIATE STATISTICS, DIMENSION REDUCTION, DENSITY ESTIMATION, GRAPHICS, ENTROPY)
JEE, JAMES RODNEY
Doctor of Philosophy
A standard method for analyzing high dimensional multivariate data is to view scatter-plots of 2-dimensional projections of the data. Since all projections are not equally informative and the number of significantly different 2-dimensional projections in a high dimensional space can be large, there is a need for computer algorithms which will automatically determine the most informative projections for viewing. When the data are assumed to be a sample from a population density then it is natural to measure the information content in a projection by evaluating the Shannon entropy or the Fisher information of the marginal density corresponding to the projection. Because the population density is an unknown the techniques of nonparametric probability density estimation can be employed to estimate the population density thereby providing a means for extracting a well known measure of information from a projection of a sample. A theoretical study of algorithms based on these ideas suggests that Fisher information is a slightly better measure of information for use in projection pursuit. Calculation of both Shannon entropy and Fisher information measures in data-based algorithms is based on computationally efficient oversmoothed histograms. Application of the algorithms to real data sets reveals that these methods are very promising.