Loading Events
  • This event has passed.
Stochastics and Statistics Seminar

On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

March 12, 2021 @ 11:00 am - 12:00 pm

James Robins, Harvard


Abstract: For many causal effect parameters of interest, doubly robust machine learning (DRML) estimators ψ̂ 1 are the state-of-the-art, incorporating the good prediction performance of machine learning; the decreased bias of doubly robust estimators; and the analytic tractability and bias reduction of sample splitting with cross fitting. Nonetheless, even in the absence of confounding by unmeasured factors, the nominal (1α) Wald confidence interval ψ̂ 1±zα/2ˆ[ψ̂ 1] may still undercover even in large samples, because the bias of ψ̂ 1 may be of the same or even larger order than its standard error of order n1/2.

In this paper, we introduce essentially assumption-free tests that (i) can falsify the null hypothesis that the bias of ψ̂ 1 is of smaller order than its standard error, (ii) can provide an upper confidence bound on the true coverage of the Wald interval, and (iii) are valid under the null under no smoothness/sparsity assumptions on the nuisance parameters. The tests, which we refer to as \underline{A}ssumption \underline{F}ree \underline{E}mpirical \underline{C}overage \underline{T}ests (AFECTs), are based on a U-statistic that estimates part of the bias of ψ̂ 1.

Our claims need to be tempered in several important ways. First no test, including ours, of the null hypothesis that the ratio of the bias to its standard error is smaller than some threshold δ can be consistent [with- out additional assumptions (e.g. smoothness or sparsity) that may be in- correct]. Second the above claims only apply to certain parameters in a particular class. For most of the others, our results are unavoidably less sharp.

Work with Lin Liu and Rajarshi Mukherjee


The principal focus of Dr. Robins’ research has been the development of analytic methods appropriate for drawing causal inferences from complex observational and randomized studies with time-varying exposures or treatments. The new methods are to a large extent based on the estimation of the parameters of a new class of causal models – the structural nested models – using a new class of estimators – the G estimators. The usual approach to the estimation of the effect of a time-varying treatment or exposure on time to disease is to model the hazard incidence of failure at time t as a function of past treatment history using a time-dependent Cox proportional hazards model.

More information available here.

MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307