Kavli Affiliate: Xin Duan
| Authors: Junyi Chen, Danqing Yin, Harris Y. H. Wong, Xin Duan, Ken H. O. Yu and Joshua W. K. Ho
| Summary:
The rapidly growing collection of public single-cell sequencing data have become a valuable resource for molecular, cellular and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our scalability benchmarking experiments, Vulture can outperform the state-of-the-art cloud-based pipeline Cumulus with a 40% and 80% reduction of runtime and cost, respectively. Furthermore, Vulture is 2-10 times faster than PathogenTrack and Venus, while generating comparable results. We applied Vulture to two COVID-19, three hepatocellular carcinoma (HCC), and two gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell-type specific enrichment of SARS-CoV2, hepatitis B virus (HBV), and H. pylori positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.