TY - JOUR
T1 - Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose
AU - BEHERA, ARDHENDU
AU - Wharton, Zachary
AU - GALBOKKA HEWAGE, PRADEEP RUWAN PADMASIRI
AU - KUMAR, SWAGAT
PY - 2020/9/18
Y1 - 2020/9/18
N2 - Head pose is a vital indicator of human attention and behavior. Therefore, automatic estimation of head pose from images is key to many real-world applications. In this paper, we propose a novel approach for head pose estimation from a single RGB image. Many existing approaches often predict head poses by localizing facial landmarks and then solve 2D to 3D correspondence problem with a mean head model. Such approaches completely rely on the landmark detection accuracy, an ad-hoc alignment step, and the extraneous head model. To address this drawback, we present an end-to-end deep network, which explores rotation axis (yaw, pitch, and roll) focused innovative attention mechanism to capture the subtle changes in images. The mechanism uses attentional spatial pooling from a self-attention layer and learns the importance over fine-grained to coarse spatial structures and combine them to capture rich semantic information concerning a given rotation axis. The experimental evaluation of our approach using three benchmark datasets is very competitive to state-of-the-art methods, including with and without landmark-based approaches.
AB - Head pose is a vital indicator of human attention and behavior. Therefore, automatic estimation of head pose from images is key to many real-world applications. In this paper, we propose a novel approach for head pose estimation from a single RGB image. Many existing approaches often predict head poses by localizing facial landmarks and then solve 2D to 3D correspondence problem with a mean head model. Such approaches completely rely on the landmark detection accuracy, an ad-hoc alignment step, and the extraneous head model. To address this drawback, we present an end-to-end deep network, which explores rotation axis (yaw, pitch, and roll) focused innovative attention mechanism to capture the subtle changes in images. The mechanism uses attentional spatial pooling from a self-attention layer and learns the importance over fine-grained to coarse spatial structures and combine them to capture rich semantic information concerning a given rotation axis. The experimental evaluation of our approach using three benchmark datasets is very competitive to state-of-the-art methods, including with and without landmark-based approaches.
KW - Deep Regression
KW - Attention Network
KW - Attentional pooling
KW - CNN
KW - Head pose estimation
KW - Vanilla deep regression
KW - Self-attention
KW - Coarse-to-fine pooling
UR - https://link.springer.com/conference/accv
UR - http://accv2020.kyoto/
M3 - Conference proceeding article (ISSN)
SN - 0302-9743
JO - Asian Conference on Computer Vision - ACCV 2020
JF - Asian Conference on Computer Vision - ACCV 2020
ER -