Views Navigation

Event Views Navigation

Flexible Perturbation Models for Robustness to Misspecification

Jeffrey Miller (Harvard University)
E18-304

Abstract: In many applications, there are natural statistical models with interpretable parameters that provide insight into questions of interest. While useful, these models are almost always wrong in the sense that they only approximate the true data generating process. In some cases, it is important to account for this model error when quantifying uncertainty in the parameters. We propose to model the distribution of the observed data as a perturbation of an idealized model of interest by using a nonparametric…

Find out more »

Inferring the Evolutionary History of Tumors

Simon Tavaré (Columbia University)
E18-304

Abstract: Bulk sequencing of tumor DNA is a popular strategy for uncovering information about the spectrum of mutations arising in the tumor, and is often supplemented by multi-region sequencing, which provides a view of tumor heterogeneity. The statistical issues arise from the fact that bulk sequencing makes the determination of sub-clonal frequencies, and other quantities of interest, difficult. In this talk I will discuss this problem, beginning with its setting in population genetics. The data provide an estimate of the…

Find out more »

The Statistical Finite Element Method

Mark Girolami, University of Cambridge
E18-304

Abstract: The finite element method (FEM) is one of the great triumphs of modern day applied mathematics, numerical analysis and software development. Every area of the sciences and engineering has been positively impacted by the ability to model and study complex physical and natural systems described by systems of partial differential equations (PDE) via the FEM . In parallel the recent developments in sensor, measurement, and signalling technologies enables the phenomenological study of systems as diverse as protein signalling in the…

Find out more »

Gaussian Differential Privacy, with Applications to Deep Learning

Weijie Su (University of Pennsylvania)
E18-304

Abstract:   Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on the Renyi divergences. We propose an alternative relaxation we term "f-DP", which has a number of nice properties and avoids some of the difficulties associated with divergence based relaxations. First, f-DP preserves…

Find out more »

Webinar: Inside the MITx MicroMasters Program in Statistics and Data Science

Devavrat Shah, Karene Chu
Online

<br> </br> Interested in starting your data science journey? <a href="https://event.on24.com/eventRegistration/EventLobbyServlet?target=reg20.jsp&amp;referrer=&amp;eventid=2170691&amp;sessionid=1&amp;key=02F897D60682F202E261E07985F9CB92&amp;regTag=&amp;sourcepage=register">Register for this special free virtual event.</a> You'll receive a confirmation e-mail with further details about the webinar. <br> </br> Demand for professionals skilled in data, analytics, and machine learning is exploding. A recent report by IBM and Burning Glass states that there will be 364K new job openings in data-driven professions this year in the US alone. Data scientists bring value to organizations across industries because they are able…

Find out more »

Diffusion K-means Clustering on Manifolds: provable exact recovery via semidefinite relaxations

Xiaohui Chen (University of Illinois at Urbana-Champaign)
E18-304

Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given…

Find out more »

Predictive Inference with the Jackknife+

Rina Foygel Barber (University of Chicago)
E18-304

Abstract: We introduce the jackknife+, a novel method for constructing predictive confidence intervals that is robust to the distribution of the data. The jackknife+ modifies the well-known jackknife (leaveoneout cross-validation) to account for the variability in the fitted regression function when we subsample the training data. Assuming exchangeable training samples, we prove that the jackknife+ permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically (in contrast, such guarantees…

Find out more »

Tales of Random Projections

Kavita Ramanan (Brown University)
E18-304

Abstract: Properties of random projections of high-dimensional probability measures are of interest in a variety of fields, including asymptotic convex geometry, and potential applications to high-dimensional statistics and data analysis.   A particular question of interest is to identify what properties of the high-dimensional measure are captured by its lower-dimensional projections.   While fluctuations of these projections have been well studied over the past decade, we describe more recent work on the tail behavior of such projections, and various implications.  This talk is based on…

Find out more »

Does Revolution Work? Evidence from Nepal

Rohini Pande (Yale University)
E18-304

The last half century has seen the adoption of democratic institutions in much of the developing world. However, the conditions under which de jure democratization leads to the representation of historically disadvantaged groups remains debated as do the implications of descriptive representation for policy inclusion. Using detailed administrative and survey data from Nepal, we examine political selection in a new democracy, the implications for policy inclusion and the role of conflict in affecting political transformation. I situate these findings in the context…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764