Kavli Affiliate: Xiang Zhang | First 5 Authors: Xiang Zhang, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, , | Summary: The Transformer architecture excels in a variety of language modeling tasks, outperforming traditional neural architectures such as RNN and LSTM. This is partially due to its elimination of recurrent connections, which allows for parallel training and […]
Continue.. Autoregressive + Chain of Thought $simeq$ Recurrent: Recurrence’s Role in Language Models’ Computability and a Revisit of Recurrent Transformer