Sphae: An automated toolkit for predicting phage therapy candidates from sequencing data

Kavli Affiliate: Robert Edwards

| Authors: Bhavya Papudeshi, Michael J. Roach, Vijini Mallawaarachchi, George Bouras, Susanna R Grigson, Sarah K Giles, Clarice M Harker, Abbey L.K Hutton, Anita Tarasenko, Laura K Inglis, Alejandro A Vega, Cole Souza, Lance Boling, Hamza Hajama, Ana Georgina Cobian-Guemes, Anca Segall, Elizabeth A Dinsdale and Robert A Edwards

| Summary:

Motivation Phage therapy is a viable alternative for treating bacterial infections amidst the escalating threat of antimicrobial resistance. However, the therapeutic success of phage therapy depends on selecting safe and effective phage candidates. While experimental methods focus on isolating phages and determining their lifecycle and host range, comprehensive genomic screening is critical to identify markers that indicate potential risks, such as toxins, antimicrobial resistance, or temperate lifecycle traits. These analyses are often labor-intensive and time-consuming, limiting the rapid deployment of phage in clinical settings. Results We developed Sphae, an automated bioinformatics pipeline designed to streamline therapeutic potential of a phage in under ten minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators like integrase, recombinase, and transposase, which could preclude therapeutic use. Benchmarked on 65 phage sequences, 28 phage samples showed therapeutic potential, 8 failed during assembly due to low sequencing depth, 22 samples included prophage or virulent markers, and the remaining 23 samples included multiple phage genomes per sample. This workflow outputs a comprehensive report, enabling rapid assessment of phage safety and suitability for phage therapy under these criteria. Sphae is scalable, portable, facilitating efficient deployment across most high-performance computing (HPC) and cloud platforms, expediting the genomic evaluation process. Availability Sphae is source code and freely available at https://github.com/linsalrob/sphae, with installation supported on Conda, PyPi, Docker containers.

Read More