Kavli Affiliate: Brian Caffo, Joshua Vogelstein
| Authors: Eric W. Bridgeford, Michael Powell, Gregory Kiar, Stephanie Noble, Jaewon Chung, Sambit Panda, Ross Lawrence, Ting Xu, Michael Milham, Brian Caffo and Joshua T. Vogelstein
| Summary:
Batch effects, undesirable sources of variance across multiple experiments, present a substantial hurdle for scientific and clinical discoveries. Specifically, the presence of batch effects can create both spurious discoveries and hide veridical signals, contributing to the ongoing reproducibility crisis. Typical approaches to dealing with batch effects conceptualize them as associational effects, rather than as causal effects, despite the fact that the sources of variance that comprise the batch — potentially including experimental design and population demographics — causally impact downstream inferences. Our key insight is that batch effects can be modeled as causal, rather than associational effects. We develop a simple strategy for augmenting existing techniques for batch effect correction to demonstrate that modeling batches as causal, rather than associational, effects leads to disparate downstream conclusions across a range of simulated and real data experiments. Our work therefore introduces a conceptual framing for thinking about how and whether to combine data across batches, and the potential limitations of doing so.