Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI

Kavli Affiliate: Wei Gao

| First 5 Authors: Kai Huang, Wei Gao, , ,

| Summary:

With the wide adoption of AI applications, there is a pressing need of
enabling real-time neural network (NN) inference on small embedded devices, but
deploying NNs and achieving high performance of NN inference on these small
devices is challenging due to their extremely weak capabilities. Although NN
partitioning and offloading can contribute to such deployment, they are
incapable of minimizing the local costs at embedded devices. Instead, we
suggest to address this challenge via agile NN offloading, which migrates the
required computations in NN offloading from online inference to offline
learning. In this paper, we present AgileNN, a new NN offloading technique that
achieves real-time NN inference on weak embedded devices by leveraging
eXplainable AI techniques, so as to explicitly enforce feature sparsity during
the training phase and minimize the online computation and communication costs.
Experiment results show that AgileNN’s inference latency is >6x lower than the
existing schemes, ensuring that sensory data on embedded devices can be timely
consumed. It also reduces the local device’s resource consumption by >8x,
without impairing the inference accuracy.

| Search Query: ArXiv Query: search_query=au:”Wei Gao”&id_list=&start=0&max_results=3