Robust objective functions and stable learning algorithms

Matthew James Holland (1561025)


In this thesis, we look at designing algorithms for statistical inference which satisfy much stronger reliability requirements than are typically sought. In the modern era of machine learning, there exists a large and diverse user base, including a large population of statistical laymen. When checking of formal model assumptions is not economical or simply not feasible, a potential solution is to design highly robust algorithms which adapt to the data in such a way that they effectively automate the checking of assumptions on the back end. Our goal here is to provide both theoretical and empirical foundations to a methodology that can realize such robust performance.

We begin by introducing key concepts and an overarching model for evaluation of algorithm performance. Proceeding, we start our introduction of technical content by looking at the moment estimation task, when the data may be sub-Gaussian but also may be heavy-tailed, and the learning algorithm does not have access to any a priori information. We carry out theoretical and numerical analysis of a useful class of M-estimators, deriving practical routines and testing them in comprehensive numerical simulations emulating the non-parametric setting of interest.

Moving forward, we raise the level of complexity gradually, to the task of function approximation under a linear model with additive stochastic noise. Results of the previous chapter are used to construct a robust loss to be minimized in general regression tasks, and practical computational algorithms are proposed. Novel formal analysis and comprehensive numerical testing of the robust loss minimizers is carried out. Off-sample prediction error using the proposed approach is shown to be uniformly dominant over a wide variety of data classes, compared against standard benchmarks both classic and modern.

Finally, we treat high-dimensional sparse regression. Rather than non-convex robust loss minimizers, here we look at l1-constrained empirical quasi-risk minimizers. Theoretical groundwork for this class of algorithms is provided, efficient new algorithms are derived, and their performance is evaluated in a series of simulations. Here strong performance is demonstrated, not only in terms of prediction error, but also with respect to accurate signal recovery and variable selection.