Kavli Affiliate: Robert Edwards
| Authors: Susanna R Grigson, George Bouras, Bhavya Papudeshi, Vijini Mallawaarachchi, Michael J Roach, Przemyslaw Decewicz and Robert A Edwards
| Summary:
Accurate genome annotation is fundamental to decoding viral diversity and understanding bacteriophage biology; yet, the majority of bacteriophage genes remain functionally uncharacterised. Bacteriophage genomes often exhibit conserved gene order, or synteny, that reflects underlying constraints in genome architecture and expression. Here, we present Phynteny, a genome-scale, deep learning framework that leverages gene synteny to predict the function of unknown bacteriophage genes. Phynteny integrates protein language model embeddings with positional encoding, bidirectional long short-term memory, and transformer encoders featuring circular attention to learn genome-wide organisational patterns. Trained on a dereplicated dataset of over 280,000 bacteriophage genomes, Phynteny achieves high predictive performance (AUC > 0.84) across the nine PHROG functional categories and confidently assigns putative functions to improve the number of annotated genes in phage isolate genomes by 14%. To assess the validity of these predictions, we compared them with annotations derived independently using protein structural information, revealing broad functional concordance and additional confidence in Phynteny predictions. By incorporating genomic context into functional annotation, Phynteny offers a novel approach to illuminate the functional landscape of viral dark matter and is available at https://github.com/susiegriggo/Phynteny_transformer.