Inferring cellular and molecular processes in single-cell data with non-negative matrix factorization using Python, R, and GenePattern Notebook implementations of CoGAPS

Kavli Affiliate: Loyal Goff

| Authors: Jeanette Anna Irene Johnson, Ashley Tsang, Jacob T Mitchell, Emily F Davis-Marcisak, Thomas Sherman, Ted Liefeld, Melanie Loth, Loyal Goff, Jacquelyn Zimmerman, Ben Kinny-Köster, Elizabeth Jaffee, Pablo Tamayo, Jill Mesirov, Michael Reich, Elana J Fertig and Genevieve L Stein-O’Brien

| Summary:

Non-negative matrix factorization (NMF) is an unsupervised learning method well suited to high-throughput biology. Still, inferring biological processes requires additional post hoc statistics and annotation for interpretation of features learned from software packages developed for NMF implementation. Here, we aim to introduce a suite of computational tools that implement NMF and provide methods for accurate, clear biological interpretation and analysis. A generalized discussion of NMF covering its benefits, limitations, and open questions in the field is followed by three vignettes for the Bayesian NMF algorithm CoGAPS (Coordinated Gene Activity across Pattern Subsets). Each vignette will demonstrate NMF analysis to quantify cell state transitions in public domain single-cell RNA-sequencing (scRNA-seq) data of malignant epithelial cells in 25 pancreatic ductal adenocarcinoma (PDAC) tumors and 11 control samples. The first uses PyCoGAPS, our new Python interface for CoGAPS that we developed to enhance runtime of Bayesian NMF for large datasets. The second vignette steps through the same analysis using our R CoGAPS interface, and the third introduces two new cloud-based, plug-and-play options for running CoGAPS using GenePattern Notebook and Docker. By providing Python support, cloud-based computing options, and relevant example workflows, we facilitate user-friendly interpretation and implementation of NMF for single-cell analyses.

Read More