Title: : Algorithms for robust and heavy-tailed statistics -- theory and experiment
Abstract: Algorithms for statistics on corrupted or heavy-tailed data have seen a flurry activity in the last few years. (Indeed, even the last few months!) I will survey some recent developments, and then zoom in on joint work with Yihe Dong and Jerry Li in which we focus on translating the progress in polynomial-time algorithms into something practical. In particular, we obtain the first nearly-linear time algorithm for robust mean estimation in high dimensions, where the goal is to estimate the mean of a random vector from independent samples of which a constant fraction have been maliciously corrupted. Our algorithm is sufficiently practical that our implementation scales to thousands of dimensions and tens/hundreds of thousands of samples on laptop hardware; I will discuss some experimental validations of our theoretical results.
Based on "Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection," to appear in NeurIPS 2019. https://arxiv.org/pdf/1906.11366.pdf