Stochastics and Statistics Seminar

Views Navigation

Event Views Navigation

Testing degree corrections in Stochastic Block Models

Subhabrata Sen (Microsoft)
E18-304

Abstract: The community detection problem has attracted significant attention in re- cent years, and it has been studied extensively under the framework of a Stochas- tic Block Model (SBM). However, it is well-known that SBMs t real data very poorly, and various extensions have been suggested to replicate characteristics of real data. The recovered community assignments are often sensitive to the model used, and this naturally begs the following question: Given a network with community structure, how to decide whether…

Find out more »

Inference, Computation, and Visualization for Convex Clustering and Biclustering

Genevera Allen (Rice)
E18-304

Abstract: Hierarchical clustering enjoys wide popularity because of its fast computation, ease of interpretation, and appealing visualizations via the dendogram and cluster heatmap. Recently, several have proposed and studied convex clustering and biclustering which, similar in spirit to hierarchical clustering, achieve cluster merges via convex fusion penalties. While these techniques enjoy superior statistical performance, they suffer from slower computation and are not generally conducive to representation as a dendogram. In the first part of the talk, we present new convex…

Find out more »

Size-Independent Sample Complexity of Neural Networks

Ohad Shamir (Weizman Institute)
E18-304

Abstract: I'll describe new bounds on the sample complexity of deep neural networks, based on the norms of the parameter matrices at each layer. In particular, we show how certain norms lead to the first explicit bounds which are fully independent of the network size (both depth and width), and are therefore applicable to arbitrarily large neural networks. These results are derived using some novel techniques, which may be of independent interest. Joint work with Noah Golowich (Harvard) and Alexander…

Find out more »

Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Adel Javanmard (USC)
E18-304

Abstract: Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers’ valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers’ valuations, i.e., buyers’ preferences. The seller’s goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is…

Find out more »

Fitting a putative manifold to noisy data

Hariharan Narayanan (Tata Institute of Fundamental Research, Mumbai)
E18-304

Abstract: We give a solution to the following question from manifold learning. Suppose data belonging to a high dimensional Euclidean space is drawn independently, identically distributed from a measure supported on a low dimensional twice differentiable embedded compact manifold M, and is corrupted by a small amount of i.i.d gaussian noise. How can we produce a manifold M whose Hausdorff distance to M is small and whose reach (normal injectivity radius) is not much smaller than the reach of M? This…

Find out more »

Variational problems on random structures and their continuum limits

Dejan Slepčev (Carnegie Mellon University)
E18-304

Abstract: We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying "ground-truth" measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and semi-supervised learning, can be posed as minimizing  functionals on such graphs. We consider functionals involving graph cuts, graph laplacians…

Find out more »

An Information-Geometric View of Learning in High Dimensions

Gregory Wornell (MIT)
32-155

Abstract: We consider the problem of data feature selection prior to inference task specification, which is central to high-dimensional learning. Introducing natural notions of universality for such problems, we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a conceptually and practically useful learning methodology. The development reveals the key roles of the singular value decomposition, Hirschfeld-Gebelein-Renyi maximal correlation, canonical correlation and principle component analyses, Tishby's information bottleneck, Wyner's common information, Ky Fan…

Find out more »

Reverse hypercontractivity beats measure concentration for information theoretic converses

Jingbo Liu (MIT)
E18-304

Abstract: Concentration of measure refers to a collection of tools and results from analysis and probability theory that have been used in many areas of pure and applied mathematics. Arguably, the first data science application of measure concentration (under the name ‘‘blowing-up lemma’’) is the proof of strong converses in multiuser information theory by Ahlswede, G'acs and K"orner in 1976. Since then, measure concentration has found applications in many other information theoretic problems, most notably the converse (impossibility) results in…

Find out more »

Efficient Algorithms for the Graph Matching Problem in Correlated Random Graphs

Tselil Schramm (Harvard University)
E18-304

Abstract: The Graph Matching problem is a robust version of the Graph Isomorphism problem: given two not-necessarily-isomorphic graphs, the goal is to find a permutation of the vertices which maximizes the number of common edges. We study a popular average-case variant; we deviate from the common heuristic strategy and give the first quasi-polynomial time algorithm, where previously only sub-exponential time algorithms were known. Based on joint work with Boaz Barak, Chi-Ning Chou, Zhixian Lei, and Yueqi Sheng.   Biography:  Tselil Schramm is a postdoc in theoretical…

Find out more »

Locally private estimation, learning, inference, and optimality

John Duchi (Stanford University)
E18-304

Abstract: In this talk, we investigate statistical learning and estimation under local privacy constraints, where data providers do not trust the collector of the data and so privatize their data before it is even collected. We identify fundamental tradeoffs between statistical utility and privacy in such local models of privacy, providing instance-specific bounds for private estimation and learning problems by developing local minimax risks. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764