Kavli Affiliate: Robert Edwards
| Authors: George Bouras, Susanna R Grigson, Milot Mirdita, Michael Heinzinger, Bhavya Papudeshi, Vijini Mallawaarachchi, Renee Green, Rachel Seongeun Kim, Victor Mihalia, Alkis James Psaltis, Peter-John Wormald, Sarah Vreugde, Martin Steinegger and Robert A Edwards
| Summary:
Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here we introduce Phold, an annotation framework utilising protein structural information that combines the ProstT5 protein language model and structural alignment tool Foldseek. Phold assigns annotations using a database of over 1.36 million predicted phage protein structures with high quality functional labels. Benchmarking reveals that Phold outperforms existing sequence-based homology approaches in functional annotation sensitivity whilst maintaining speed, consistency and scalability. Applying Phold to diverse cultured and metagenomic phage genomes shows it consistently annotates over 50% of genes on an average phage and 40% on an average archaeal virus. Comparisons of phage protein structures to other protein structures across the tree of life reveals that phage proteins commonly have structural homology to proteins shared across the tree of life, particularly those that have nucleic acid metabolism and enzymatic functions. Phold is available as free and open-source software at https://github.com/gbouras13/phold.