Emmanuel Candes The Barnum-Simons Chair in Mathematics and Statistics Stanford University |
Around the Reproducibility of Scientific Research in the Big Data Era: What Statistics Can Offer
Wednesday, 16 Mar 2016, 2.00p.m. – 3.30p.m.
Faculty of Science, Lecture Theatre 31, Block S16 Level 3 (Directions for getting here)
The big data era has created a new scientific paradigm: collect data first, ask questions later. When the universe of scientific hypotheses that are being examined simultaneously is not taken account, inferences are likely to be false. The consequence is that follow up studies are likely not to be able to reproduce earlier reported findings or discoveries. This reproducibility failure bears a substantial cost and this talk is about new statistical tools to address this issue. In the last two decades, statisticians have developed many techniques for addressing this look-everywhere effect, whose proper use would help in alleviating the problems discussed above. This lecture will discuss some of these proposed solutions including the Benjamin-Hochberg procedure for false discovery rate (FDR) control and the knockoff filter, a method which reliably selects which of the many potentially explanatory variables of interest (e.g. the absence or not of a mutation) are indeed truly associated with the response under study (e.g. the log fold increase in HIV-drug resistance).
Activities Held in Conjunction with Oppenheim Lecture
Modern Optimization Meets Physics: Recent Progress on the Phase Retrieval Problem
Thursday, 17 Mar 2016, 3.00p.m. – 4.00p.m.
Department of Mathematics, Seminar Room 1, Block 17 Level 4 (Directions for getting here)
In many imaging problems such as X-ray crystallography, detectors can only record the intensity or magnitude of a diffracted wave as opposed to measuring its phase. This means that we need to solve quadratic equations (this is notoriously NP hard) as opposed to linear ones. While we present recently introduced effective convex relaxations to the phase retrieval problem with rather spectacular theoretical guarantees, the focus is on class of novel non-convex algorithms, which can be provably exact. This class of algorithms, dubbed Wirtinger flows, finds the solution to randomized quadratic systems from a number of equations (samples) and flops that are both optimal. At a high level, the algorithm can be interpreted as a sort of stochastic gradient scheme, starting from a guess obtained by means of a spectral method. We demonstrate that the Wirtinger flow reconstruction degrades gracefully as the signal-to-noise ratio decreases. The empirical performance shall be demonstrated on phase retrieval problems currently at the center of significant research efforts collectively known under the name of coherent diffraction imaging; among other things, these efforts are aimed at determining the 3D structure of large protein complexes.