I have chosen two examples, contingency table analysis and causal inference, but I could have written similarly about the evolution of statistical thinking associated with time series analysis, again going back to early contributions of Yule), or about spatial statistics or meta-analysis. Over time, we hope to provide comprehensive reviews and discussions that cover a broad spectrum of topics within the modern statistical landscape.
Reproducible research has long been a concern of statistical scientists under various descriptors, and this term has recently become something of a buzzword in science, gaining prominence from a series of Nature articles on the topic.
- Stodden’s review identifies factors that contribute to statistical results not being replicable and suggests several remedies. Some problems arise from simply overlooking well-known principles of statistical design and inference, but the computational complexity of modern data preparation and analysis adds a new dimension and new statistical issues that are being addressed in various ways often invoking the term data science.
Although principles of randomization and design increase our claims to reproducibility, many, even most, real-world data analyses involve observational studies.
- Rosenbaum describes quasi-experimental devices that borrow from the theory of design to improve inference in observational studies.
- Kaizar discusses statistical ideas behind combining inference from experiments and from observational studies for cases in which they pertain to the same outcome of interest.
Integration of data from multiple sources is a key component of the new data world.
- Li describes integrating phylogenetic data into metagenomics, and
- Newton & Wang describe integration of genome-wide association data with established knowledge on individual genes.
- Budavári & Laredo describe recent research on integrating astronomical data from different sky surveys. To a survey methodologist, such integration is referred to as record linkage; in astronomy it is called cross-matching or cross-identification.
- Schafer’s review describes new statistical methods for estimation of key cosmological parameters using rapidly growing astronomical data sets.
From the very large to the very small—
- Aspelmeier et al. review statistical methods for superresolution microscopy, arguing that Gaussian and Poisson models are not suitable for these quantum-level effects. A different aspect of small-scale analysis is that of very rare events.
- Davison & Huser review developments in the theory of multivariate extremes and describe their application to extreme weather events.
- Multivariate order statistics are the theme of Weller & Eddy’s review of Tukey’s halfspace depth.
A phrase common to many of the papers in this volume is inferential challenges: The need for statistical theory and methods continues to be outpaced by technological and computational advances.
- Heard et al. describe agent-based models, a popular simulation tool, but one requiring new statistical theory.
- Dawid tackles the difficult issue of causality, using a decision-theoretic approach.
- Causality is also a theme in Thompson’s review of longitudinal methods and complex survey data.
Advances in computation are of course also a great benefit to statistical inference.
-
Morris gives a comprehensive review of the development of functional regression, an area demanded by new technology and made feasible by increasing computational power.
-
Computational power is pushed to its limits in the field of deep learning, reviewed by Salakhutdinov. These algorithms estimate millions of parameters in very deep layers and use these to model high-level features for purposes such as object recognition and information retrieval.