Kavli Affiliate: Fuchun Zhang
| First 5 Authors: Zongnan Ma, Fuchun Zhang, Zhixiong Nan, Yao Ge,
| Summary:
Anticipating human intention from videos has broad applications, such as
automatic driving, robot assistive technology, and virtual reality. This study
addresses the problem of intention action anticipation using egocentric video
sequences to estimate actions that indicate human intention. We propose a
Hierarchical Complete-Recent (HCR) information fusion model that makes full use
of the features of the entire video sequence (i.e., complete features) and the
features of the video tail sequence (i.e., recent features). The HCR model has
two primary mechanisms. The Guide-Feedback Loop (GFL) mechanism is proposed to
model the relation between one recent feature and one complete feature. Based
on GFL, the MultiComplete-Recent Feature Aggregation (MCRFA) module is proposed
to model the relation of one recent feature with multiscale complete features.
Based on GFL and MCRFA, the HCR model can hierarchically explore the rich
interrelationships between multiscale complete features and multiscale recent
features. Through comparative and ablation experiments, we validate the
effectiveness of our model on two well-known public datasets: EPIC-Kitchens and
EGTEA Gaze+.
| Search Query: ArXiv Query: search_query=au:”Fuchun Zhang”&id_list=&start=0&max_results=3