In this Thesis we study issues related to learning small tree and graph formed classifiers. First, we study reduced error pruning of decision trees and branching programs. We analyze the behavior of a reduced error pruning algorithm for decision trees under various probabilistic assumptions on the pruning data. As a result we get, e.g., new upper bounds for the probability of replacing a tree that fits random noise by a leaf. In the case of branching programs we show that the existence of an efficient approximation algorithm for reduced error pruning would imply P$=$NP. This indicates that reduced error pruning of branching programs is most likely impossible in practice, even though the corresponding problem for decision trees is easily solvable in linear time.

The latter part of the Thesis is concerned with generalization error analysis, more particularly on Rademacher penalization applied to small or otherwise restricted decision trees. We develop a progressive sampling method based on Rademacher penalization that yields reasonable data dependent sample complexity estimates for learning two-level decision trees. Next, we propose a new scheme for deriving generalization error bounds for prunings of induced decision trees. The method for computing these bounds efficiently relies on the reduced error pruning algorithm studied in the first part of this Thesis. Our empirical experiments indicate that the obtained training set bounds may be almost tight enough to be useful in practice.