Kavli Affiliate: Angela Wu
| Authors: Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, The Tabula Microcebus Consortium, Angela Ruohao Wu and Can Yang
| Summary:
The rapid emergence of large-scale atlas-level single-cell RNA-seq datasets presents remarkable opportunities for broad and deep biological investigations through integrative analyses. However, harmonizing such datasets requires integration approaches to be not only computationally scalable, but also capable of preserving a wide range of fine-grained cell populations. We created Portal, a unified framework of adversarial domain translation to learn harmonized representations of datasets. With innovation in model and algorithm designs, Portal achieves superior performance in preserving biological variation during integration, while achieving integration of millions of cells in minutes with low memory consumption. We show that Portal is widely applicable to integrating datasets across samples, platforms and data types (including scRNA-seq, snRNA-seq and scATAC-seq). Finally, we demonstrate the power of Portal by applying it to the integration of cross-species datasets with limited shared information among them, elucidating biological insights into the similarities and divergences in the spermatogenesis process among mouse, macaque and human.