Kavli Affiliate: Brian Caffo, Joshua Vogelstein
| Authors: Eric W. Bridgeford, Michael Powell, Gregory Kiar, Stephanie Noble, Jaewon Chung, Sambit Panda, Ross Lawrence, Ting Xu, Michael Milham, Brian Caffo and Joshua T. Vogelstein
| Summary:
Batch effects, undesirable sources of variance across multiple experiments, present significant challenges for scientific and clinical discoveries. Specifically, batch effects can introduce spurious findings and obscure genuine signals, contributing to the ongoing reproducibility crisis. Typically, batch effects are treated as associational or conditional effects, despite their potential to causally impact downstream inferences due to variations in experimental design and population demographics. In this study, we propose a novel framework to formalize batch effects as causal effects. Motivated by this perspective, we develop straightforward procedures to enhance existing approaches for batch effect detection and correction. We illustrate via simulation the utility of this perspective, finding that causal augmentations of existing approaches yield sufficient removal of batch effects in intuitively simple settings where conditional approaches struggle. By applying our approaches to a large neuroimaging study, we show that modeling batch effects as causal, rather than associational, effects leads to disparate downstream scientific conclusions. Together, we believe that this work provides a framework and potential limitations for the collection, harmonization, and subsequent analysis of multi-site scientific mega-studies.