BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//MIT Statistics and Data Science Center - ECPv5.14.2.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:MIT Statistics and Data Science Center
X-ORIGINAL-URL:https://stat.mit.edu
X-WR-CALDESC:Events for MIT Statistics and Data Science Center
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:20200308T070000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:20201101T060000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/New_York:20201106T110000
DTEND;TZID=America/New_York:20201106T120000
DTSTAMP:20220528T131130
CREATED:20200901T173420Z
LAST-MODIFIED:20201028T134524Z
UID:4301-1604660400-1604664000@stat.mit.edu
SUMMARY:Valid hypothesis testing after hierarchical clustering
DESCRIPTION:Abstract: As datasets continue to grow in size\, in many settings the focus of data collection has shifted away from testing pre-specified hypotheses\, and towards hypothesis generation. Researchers are often interested in performing an exploratory data analysis in order to generate hypotheses\, and then testing those hypotheses on the same data; I will refer to this as ‘double dipping’. Unfortunately\, double dipping can lead to highly-inflated Type 1 errors. In this talk\, I will consider the special case of hierarchical clustering. First\, I will show that sample–splitting does not solve the ‘double dipping’ problem for clustering. Then\, I will propose a test for a difference in means between estimated clusters that accounts for the cluster estimation process\, using a selective inference framework. I will also show an application of this approach to single-cell RNA-sequencing data. This is joint work with Lucy Gao (University of Waterloo) and Jacob Bien (University of Southern California).\n\n–\nAbout the Speaker: Daniela Witten is a professor of Statistics and Biostatistics at University of Washington\, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data\, with a focus on unsupervised learning. Daniela is the recipient of an NIH Director’s Early Independence Award\, a Sloan Research Fellowship\, an NSF CAREER Award\, a Simons Investigator Award in Mathematical Modeling of Living Systems\, a David Byar Award\, a Gertrude Cox Scholarship\, and an NDSEG Research Fellowship. She is also the recipient of the Spiegelman Award from the American Public Health Association for a statistician under age 40 who has made outstanding contributions to statistics for public health\, as well as the Leo Breiman Award for contributions to the field of statistical machine learning. She is a Fellow of the American Statistical Association\, and an Elected Member of the International Statistical Institute. Daniela is a co-author of the very popular textbook “Introduction to Statistical Learning”. She completed a BS in Math and Biology with Honors and Distinction at Stanford University in 2005\, and a PhD in Statistics at Stanford University in 2010.
URL:https://stat.mit.edu/calendar/witten/
LOCATION:online
CATEGORIES:Stochastics and Statistics Seminar
END:VEVENT
END:VCALENDAR