On the operating characteristics of some non-parametric methodologies for the classification of distributions by tail behavior
Ott, Richard Charles
Doctor of Philosophy
New methods for classifying tails of probability distributions based on data are proposed. Some methods apply the nonparametric theories of Rojo  and Schuster  and differ from classical extreme value theory and other well established methods. All the methods implement the extreme spacing of the data, the difference of the largest and second largest values. The results are then compared based on power properties to the classical technique of a Points Over Threshold model based on the Generalized Pareto Distribution (GPD). The following topics are the foundation of this thesis: Chapter 1. Review of classical extreme value theory and discussion on the class of medium-tailed distributions. Chapter 2. Review of the tail classification schemes of Parzen, Schuster, and Rojo upon which the latter two suggest the usage of the Extreme Spacing (ES) as a possible classifying instrument. Additional subcategorizations are also provided for the schemes of Schuster and Rojo. Chapter 3. Review of estimation methods for the Points Over Threshold GPD parameters for classification purposes. A Monte Carlo study classifying tails of many common distributions using the GPD by way of maximum likelihood is also provided. Chapter 4. Three classification tests based on the ES are provided. The first is a test to decide whether a sample originates from a completely specified distribution such as Exp(1). The second classifies whether data originated from an exponential distribution with unknown parameter. The third classifies an underlying distribution as short-, medium-, or long-tailed. Also discussed, is the potential benefit of blocking the data before applying the above mentioned tests. Chapter 5. Classifying specific data sets by way of the new methods. Some of the new ES methods may be applicable to the data when classical methods are inapplicable, for example when the GPD maximum likelihood numerical algorithm does not converge to yield a shape parameter estimate or when the variance of the shape parameter cannot be estimated since the parameter estimate is close to a parameter space endpoint. Even when classical methods are applicable, these tests can give a more thorough understanding of the tail behavior of the underlying distribution.