Fold first, ask later: structure-informed function annotation of Pseudomonas phage proteins

Kavli Affiliate: Robert Edwards

| Authors: Hannelore Longin, George Bouras, Susanna R. Grigson, Robert A. Edwards, Hanne Hendrix, Rob Lavigne and Vera van Noort

| Summary:

Phages, the viruses of bacteria, harbor an incredibly diverse repertoire of proteins capable of manipulating their bacterial hosts, inspiring many medical and biotechnological applications. However, to date, only a limited subset of that repertoire can be exploited, due to the difficulties in functionally elucidating these proteins. In this study, we investigated several structure-informed approaches to annotate hypothetical proteins from Pseudomonas infecting phages. We curated a representative dataset of over 10,000 proteins derived from NCBI, for which we predicted protein structures with ColabFold and assessed structural similarity via FoldSeek against the PDB, AlphaFold, and Phold databases. We evaluated multiple annotation strategies, including sequence-based (Pharokka), and structure-based (FoldSeek, Phold) methods. Our results show that up to 43 % of truly unannotated proteins can be functionally annotated when combining structure-informed approaches with UniProt-derived annotations. We highlight the complementarity of different databases and the importance of annotation quality filtering. This work provides a valuable resource of predicted structures and annotations, and offers insights into optimizing structure-based annotation pipelines for viral proteins, paving the way for deeper exploration of phage biology and its applications.

Read More