CORT: Classification Or Regression Trees
Nowak, Robert David
classification; multiscale; risk; CART
In this paper we challenge three of the underlying principles of CART, a well know approach to the construction of classification and regression trees. Our primary concern is with the penalization strategy employed to prune back an initial, overgrown tree. We reason, based on both intuitive and theoretical arguments, that the pruning rule for classification should be different from that used for regression (unlike CART). We also argue that growing a treestructured partition that is specifically fitted to the data is unnecessary. Instead, our approach to tree modeling begins with a nonadapted (fixed) dyadic tree structure and partition, much like that underlying multiscale wavelet analysis. We show that dyadic trees provide sufficient flexibility, are easy to construct, and produce near-optimal results when properly pruned. Finally, we advocate the use of a negative log-likelihood measure of empirical risk. This is a more appropriate empirical risk for non-Gaussian regression problems, in contrast to the sum-of-squared errors criterion used in CART regression.