Stochastics and Statistics Seminar Series

Views Navigation

Event Views Navigation

Today
  • The Planted Matching Problem

    E18-304 , United States

    Abstract: What happens when an optimization problem has a good solution built into it, but which is partly obscured by randomness? Here we revisit a classic polynomial-time problem, the minimum perfect matching problem on bipartite graphs. If the edges have random weights in , Mézard and Parisi — and then Aldous, rigorously — showed that…

  • Towards Robust Statistical Learning Theory

    E18-304 , United States

    Abstract: Real-world data typically do not fit statistical models or satisfy assumptions underlying the theory exactly, hence reducing the number and strictness of these assumptions helps to lessen the gap between the “mathematical” world and the “real” world. The concept of robustness, in particular, robustness to outliers, plays the central role in understanding this gap. The goal…

  • Accurate Simulation-Based Parametric Inference in High Dimensional Settings

    E18-304 , United States

    Abstract: Accurate estimation and inference in finite sample is important for decision making in many experimental and social fields, especially when the available data are complex, like when they include mixed types of measurements, they are dependent in several ways, there are missing data, outliers, etc. Indeed, the more complex the data (hence the models),…

  • Communicating uncertainty about facts, numbers and science

    32-D643

    The claim of a ‘post-truth’ society, in which emotional responses trump balanced consideration of evidence, presents a strong challenge to those who value quantitative and scientific evidence: how can we communicate risks and unavoidable scientific uncertainty in a transparent and trustworthy way? Communication of quantifiable risks has been well-studied, leading to recommendations for using an…

  • SDP Relaxation for Learning Discrete Structures: Optimal Rates, Hidden Integrality, and Semirandom Robustness

    E18-304 , United States

    Abstract: We consider the problems of learning discrete structures from network data under statistical settings. Popular examples include various block models, Z2 synchronization and mixture models. Semidefinite programming (SDP) relaxation has emerged as a versatile and robust approach to these problems. We show that despite being a relaxation, SDP achieves the optimal Bayes error rate…

  • Understanding machine learning with statistical physics

    E18-304 , United States

    Abstract: The affinity between statistical physics and machine learning has long history, this is reflected even in the machine learning terminology that is in part adopted from physics. Current theoretical challenges and open questions about deep learning and statistical learning call for unified account of the following three ingredients: (a) the dynamics of the learning algorithm,…

  • Automated Data Summarization for Scalability in Bayesian Inference

    E18-304 , United States

    Abstract: Many algorithms take prohibitively long to run on modern, large data sets. But even in complex data sets, many data points may be at least partially redundant for some task of interest. So one might instead construct and use a weighted subset of the data (called a “coreset”) that is much smaller than the…

  • Inferring the Evolutionary History of Tumors

    E18-304 , United States

    Abstract: Bulk sequencing of tumor DNA is a popular strategy for uncovering information about the spectrum of mutations arising in the tumor, and is often supplemented by multi-region sequencing, which provides a view of tumor heterogeneity. The statistical issues arise from the fact that bulk sequencing makes the determination of sub-clonal frequencies, and other quantities…

  • Gaussian Differential Privacy, with Applications to Deep Learning

    E18-304 , United States

    Abstract: Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on the Renyi divergences. We propose an alternative…

  • Diffusion K-means Clustering on Manifolds: provable exact recovery via semidefinite relaxations

    E18-304 , United States

    Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of…


© MIT Institute for Data, Systems, and Society | 77 Massachusetts Avenue | Cambridge, MA 02139-4307 | 617-253-1764 |