STAR: An Efficient Softmax Engine for Attention Model with RRAM Crossbar

Kavli Affiliate: Jing Wang

| First 5 Authors: Yifeng Zhai, Bing Li, Bonan Yan, Jing Wang,

| Summary:

RRAM crossbars have been studied to construct in-memory accelerators for
neural network applications due to their in-situ computing capability. However,
prior RRAM-based accelerators show efficiency degradation when executing the
popular attention models. We observed that the frequent softmax operations
arise as the efficiency bottleneck and also are insensitive to computing
precision. Thus, we propose STAR, which boosts the computing efficiency with an
efficient RRAM-based softmax engine and a fine-grained global pipeline for the
attention models. Specifically, STAR exploits the versatility and flexibility
of RRAM crossbars to trade off the model accuracy and hardware efficiency. The
experimental results evaluated on several datasets show STAR achieves up to
30.63x and 1.31x computing efficiency improvements over the GPU and the
state-of-the-art RRAM-based attention accelerators, respectively.

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3

Read More