- When: Friday, October 26, 2018 from 11:00 AM to 12:00 PM
- Speakers: Martin Slawski
- Location: ENGR 4201
- Export to iCal
Abstract
A tacit assumption in regression is that labels and predictors are in correspondence with each other. However, this needs not be true, for example, if labels and predictors have been collected separately and are merged into a single data set in the presence of linkage errors. Motivated by this setting, we study linear regression under an unknown permutation that perturbs the correspondence between labels and predictors for a subset of the given data. We present and analyze two-stage methods that first estimate the regression parameter by treating permuted data as data contamination, and that subsequently try to recover the unknown permutation by solving a linear assignment problem. It is shown that this approach
is both computationally attractive and statistically optimal in certain regimes.
Bio:
Martin Slawski joined the Department of Statistics at George Mason University as an assistant professor in Fall 2016. Prior to that, he spent two years as a postdoc at Rutgers University, hosted by both the Department of Statistics and the Department Computer Science. He received his PhD in Germany with a thesis in the field of statistical machine learning. His main research interests include
structured and compressed representations of high-dimensional data, record linkage and data integration, and optimization in statistical settings.