Dyadic decision trees
Nowak, Robert D.
Doctor of Philosophy
This thesis introduces a new family of classifiers called dyadic decision trees (DDTs) and develops their theoretical properties within the framework of statistical learning theory. First, we show that DDTs achieve optimal rates of convergence for a broad range of classification problems and are adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; and (3) reject irrelevant features. DDTs are selected by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly and efficiently. DDTs are the first practical classifier known to achieve optimal rates for the diverse class of distributions studied here. This is also the first study (of which we are aware) to consider rates for adaptation to data dimension and relevant features. Second, we develop the theory of statistical learning using the Neyman-Pearson (NP) criterion. It is shown that concepts from learning with a Bayes error criterion have counterparts in the NP context. Thus, we consider constrained versions of empirical risk minimization and structural risk minimization (NP-SRM), proving performance guarantees for both. We also provide a general condition under which NP-SRM leads to strong universal consistency. Finally, we apply NP-SRM to dyadic decision trees, deriving rates of convergence and providing an explicit algorithm to implement NP-SRM in this setting. Third, we study the problem of pruning a binary tree by minimizing an objective function that sums an additive cost with a non-additive penalty depending only on tree size. We focus on sub-additive penalties which are motivated by theoretical results for dyadic and other decision trees. Consider the family of optimal prunings generated by varying the scalar multiplier of a sub-additive penalty. We show this family is a subset of the analogous family produced by an additive penalty. This implies (by known results for additive penalties) that the trees generated by a sub-additive penalty (1) are nested; (2) are unique; and (3) can be computed efficiently. It also implies that an additive penalty is preferable when using cross-validation to select from the family of possible prunings.
Electronics; Electrical engineering