Model-X knockoffs for controlled variable selection in high dimensional nonlinear regression

Name: Model-X knockoffs for controlled variable selection in high dimensional nonlinear regression
Start: 2018-11-16T11:00:00-05:00
End: 2018-11-16T12:00:00-05:00
Location: E18-304

November 16, 2018 @ 11:00 am - 12:00 pm

Lucas Janson (Harvard University)

E18-304

Event Navigation

Abstract: Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of model-X knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in low-dimensional linear models. Model-X knockoffs can deal with arbitrary (and unknown) conditional models and any dimensions, including when the number of explanatory variables p exceeds the sample size n. Our approach requires the design matrix be random (independent and identically distributed rows) with a known distribution for the explanatory variables, although we show preliminary evidence that our procedure is robust to unknown/estimated distributions. As we require no knowledge/assumptions about the conditional distribution of the response, we effectively shift the burden of knowledge from the response to the explanatory variables, in contrast to the canonical model-based approach which assumes a parametric model for the response but very little about the explanatory variables. To our knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. We also apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.

Biography: Lucas Janson is an Assistant Professor in the Department of Statistics at Harvard University, where he works on high-dimensional inference, autonomous robotic motion planning, and statistical machine learning. Prior to Harvard, he was a PhD student in Statistics at Stanford University advised by Professor Emmanuel Candès.

MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764

Accessibility

Events

Model-X knockoffs for controlled variable selection in high dimensional nonlinear regression

November 16, 2018 @ 11:00 am - 12:00 pm

Event Navigation