Inference, Computation, and Visualization for Convex Clustering and Biclustering
April 27 @ 11:00 am - 12:00 pm
Genevera Allen (Rice)
Abstract: Hierarchical clustering enjoys wide popularity because of its fast computation, ease of interpretation, and appealing visualizations via the dendogram and cluster heatmap. Recently, several have proposed and studied convex clustering and biclustering which, similar in spirit to hierarchical clustering, achieve cluster merges via convex fusion penalties. While these techniques enjoy superior statistical performance, they suffer from slower computation and are not generally conducive to representation as a dendogram. In the first part of the talk, we present new convex (bi)clustering methods and fast algorithms that inherit all of the advantages of hierarchical clustering. Specifically, we develop a new fast approximation and variation of the convex (bi)clustering solution path that can be represented as a dendogram or cluster heatmap. Also, as one tuning parameter indexes the sequence of convex (bi)clustering solutions, we can use these to develop interactive and dynamic visualization strategies that allow one to watch data form groups as the tuning parameter varies. In the second part of this talk, we consider how to conduct inference for convex clustering solutions that addresses questions like: Are there clusters in my data set? Or, should two clusters be merged into one? To achieve this, we develop a new geometric representation of Hotelling’s T^2-test that allows us to use the selective inference paradigm to test multivariate hypotheses for the first time. We can use this approach to test hypotheses and calculate confidence ellipsoids on the cluster means resulting from convex clustering. We apply these techniques to examples from text mining and cancer genomics. This is joint work with John Nagorski, Michael Weylandt, and Frederick Campbell.
Biography: Genevera Allen is an Associate Professor of Statistics, Computer Science, and Electrical and Computer Engineering at Rice University. She is also a member of the Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital and Baylor College of Medicine where she holds a joint appointment. Dr. Allen received her PhD in statistics from Stanford University (2010), under the mentorship of Prof. Robert Tibshirani, and her bachelors, also in statistics, from Rice University (2006).
Dr. Allen’s research focuses on developing statistical methods to help scientists make sense of their ‘Big Data’ in applications such as high-throughput genomics and neuroimaging. Her work lies in the areas of modern multivariate analysis, graphical models, statistical machine learning, and data integration or data fusion. She is the recipient of several honors including a National Science Foundation CAREER award, the International Biometric Society’s Young Statistician Showcase award, and the George R. Brown School of Engineering’s Research and Teaching Excellence Award at Rice University. In 2013 and 2014, she represented the American Statistical Association (ASA) at the Coalition for National Science Funding on Capitol Hill and has had her research highlighted on the House floor in a speech by Congressman McNerney (D-CA). In 2014, Dr. Allen was named to the “Forbes ’30 under 30′: Science and Healthcare” list. Dr. Allen currently serves as an Associated Editor for Biometrics, the Secretary / Treasurer for the ASA Section on Statistical Computing, and the Program Chair for the ASA Section on Statistical Learning and Data Science.
Outside of work, Dr. Allen is a patron of the Houston Symphony and Houston Grand Opera and is involved with several arts organizations throughout Houston. She also enjoys traveling, Texas craft beers, and playing viola.