Bidirectional Representations Augmented Autoregressive Biological Sequence Generation:Application in De Novo Peptide Sequencing

Kavli Affiliate: Xiang Zhang

| First 5 Authors: Xiang Zhang, Xiang Zhang, , ,

| Summary:

Autoregressive (AR) models, common in sequence generation, are limited in
many biological tasks such as de novo peptide sequencing and protein modeling
by their unidirectional nature, failing to capture crucial global bidirectional
token dependencies. Non-Autoregressive (NAR) models offer holistic,
bidirectional representations but face challenges with generative coherence and
scalability. To transcend this, we propose a hybrid framework enhancing AR
generation by dynamically integrating rich contextual information from
non-autoregressive mechanisms. Our approach couples a shared input encoder with
two decoders: a non-autoregressive one learning latent bidirectional biological
features, and an AR decoder synthesizing the biological sequence by leveraging
these bidirectional features. A novel cross-decoder attention module enables
the AR decoder to iteratively query and integrate these bidirectional features,
enriching its predictions. This synergy is cultivated via a tailored training
strategy with importance annealing for balanced objectives and cross-decoder
gradient blocking for stable, focused learning. Evaluations on a demanding
nine-species benchmark of de novo peptide sequencing show that our model
substantially surpasses AR and NAR baselines. It uniquely harmonizes AR
stability with NAR contextual awareness, delivering robust, superior
performance on diverse downstream data. This research advances biological
sequence modeling techniques and contributes a novel architectural paradigm for
augmenting AR models with enhanced bidirectional understanding for complex
sequence generation. Code is available at https://github.com/BEAM-Labs/denovo.

| Search Query: ArXiv Query: search_query=au:”Xiang Zhang”&id_list=&start=0&max_results=3

Read More