A CNN model for head pose recognition using wholes and regions

Ardhendu Behera, Andrew G. Gidney, Zachary Wharton, Daniel Robinson, Keiron Quinn

Research output: Chapter in Book/Report/Conference proceedingConference proceeding (ISBN)Researchpeer-review

Abstract

Head pose recognition and monitoring is key to many real-world applications, since it is a vital indicator for human attention and behavior. Currently, head pose is often computed by localizing landmarks on a targeted face and solving 2D to 3D correspondence problem with a mean head model. Recent research has shown that this is a brittle approach since it relies entirely on the accuracy of landmark detection, the extraneous head model and an ad-hoc alignment step. Recent work has also shown that the best-performing methods often combine multiple low-level image features with high-level contextual cues. In this paper, we present a novel end-to-end deep network, which is inspired by these ideas and explores regions within an image to capture topological changes due to changes in viewpoint. We adapt the existing state-of-the-art deep CNNs to use more than one region for accurate head pose recognition. Our regions consist of one or more consecutive cells and is adapted from the strategies used in computing HOG descriptor. Extensive experimental results on head pose recognition using four different large-scale datasets, demonstrate that the proposed approach outperforms many state-of-the-art deep CNN models. We also compare our pose recognition performance with the latest OpenFace 2.0 facial behavior analysis toolkit. In addition, we contribute head pose annotation to a large-scale dataset (VGGFace2).

Original languageEnglish
Title of host publicationProceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728100890
DOIs
Publication statusPublished - 1 May 2019
Event14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019 - Lille, France
Duration: 14 May 201918 May 2019

Publication series

NameProceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019

Conference

Conference14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019
CountryFrance
CityLille
Period14/05/1918/05/19

Fingerprint

Monitoring

Cite this

Behera, A., Gidney, A. G., Wharton, Z., Robinson, D., & Quinn, K. (2019). A CNN model for head pose recognition using wholes and regions. In Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019 [8756536] (Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/FG.2019.8756536
Behera, Ardhendu ; Gidney, Andrew G. ; Wharton, Zachary ; Robinson, Daniel ; Quinn, Keiron. / A CNN model for head pose recognition using wholes and regions. Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019. Institute of Electrical and Electronics Engineers Inc., 2019. (Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019).
@inproceedings{91580e17a6d74718b60bf0f25610f17f,
title = "A CNN model for head pose recognition using wholes and regions",
abstract = "Head pose recognition and monitoring is key to many real-world applications, since it is a vital indicator for human attention and behavior. Currently, head pose is often computed by localizing landmarks on a targeted face and solving 2D to 3D correspondence problem with a mean head model. Recent research has shown that this is a brittle approach since it relies entirely on the accuracy of landmark detection, the extraneous head model and an ad-hoc alignment step. Recent work has also shown that the best-performing methods often combine multiple low-level image features with high-level contextual cues. In this paper, we present a novel end-to-end deep network, which is inspired by these ideas and explores regions within an image to capture topological changes due to changes in viewpoint. We adapt the existing state-of-the-art deep CNNs to use more than one region for accurate head pose recognition. Our regions consist of one or more consecutive cells and is adapted from the strategies used in computing HOG descriptor. Extensive experimental results on head pose recognition using four different large-scale datasets, demonstrate that the proposed approach outperforms many state-of-the-art deep CNN models. We also compare our pose recognition performance with the latest OpenFace 2.0 facial behavior analysis toolkit. In addition, we contribute head pose annotation to a large-scale dataset (VGGFace2).",
author = "Ardhendu Behera and Gidney, {Andrew G.} and Zachary Wharton and Daniel Robinson and Keiron Quinn",
year = "2019",
month = "5",
day = "1",
doi = "10.1109/FG.2019.8756536",
language = "English",
series = "Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019",
address = "United States",

}

Behera, A, Gidney, AG, Wharton, Z, Robinson, D & Quinn, K 2019, A CNN model for head pose recognition using wholes and regions. in Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019., 8756536, Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019, Institute of Electrical and Electronics Engineers Inc., 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019, Lille, France, 14/05/19. https://doi.org/10.1109/FG.2019.8756536

A CNN model for head pose recognition using wholes and regions. / Behera, Ardhendu; Gidney, Andrew G.; Wharton, Zachary; Robinson, Daniel; Quinn, Keiron.

Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 8756536 (Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019).

Research output: Chapter in Book/Report/Conference proceedingConference proceeding (ISBN)Researchpeer-review

TY - GEN

T1 - A CNN model for head pose recognition using wholes and regions

AU - Behera, Ardhendu

AU - Gidney, Andrew G.

AU - Wharton, Zachary

AU - Robinson, Daniel

AU - Quinn, Keiron

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Head pose recognition and monitoring is key to many real-world applications, since it is a vital indicator for human attention and behavior. Currently, head pose is often computed by localizing landmarks on a targeted face and solving 2D to 3D correspondence problem with a mean head model. Recent research has shown that this is a brittle approach since it relies entirely on the accuracy of landmark detection, the extraneous head model and an ad-hoc alignment step. Recent work has also shown that the best-performing methods often combine multiple low-level image features with high-level contextual cues. In this paper, we present a novel end-to-end deep network, which is inspired by these ideas and explores regions within an image to capture topological changes due to changes in viewpoint. We adapt the existing state-of-the-art deep CNNs to use more than one region for accurate head pose recognition. Our regions consist of one or more consecutive cells and is adapted from the strategies used in computing HOG descriptor. Extensive experimental results on head pose recognition using four different large-scale datasets, demonstrate that the proposed approach outperforms many state-of-the-art deep CNN models. We also compare our pose recognition performance with the latest OpenFace 2.0 facial behavior analysis toolkit. In addition, we contribute head pose annotation to a large-scale dataset (VGGFace2).

AB - Head pose recognition and monitoring is key to many real-world applications, since it is a vital indicator for human attention and behavior. Currently, head pose is often computed by localizing landmarks on a targeted face and solving 2D to 3D correspondence problem with a mean head model. Recent research has shown that this is a brittle approach since it relies entirely on the accuracy of landmark detection, the extraneous head model and an ad-hoc alignment step. Recent work has also shown that the best-performing methods often combine multiple low-level image features with high-level contextual cues. In this paper, we present a novel end-to-end deep network, which is inspired by these ideas and explores regions within an image to capture topological changes due to changes in viewpoint. We adapt the existing state-of-the-art deep CNNs to use more than one region for accurate head pose recognition. Our regions consist of one or more consecutive cells and is adapted from the strategies used in computing HOG descriptor. Extensive experimental results on head pose recognition using four different large-scale datasets, demonstrate that the proposed approach outperforms many state-of-the-art deep CNN models. We also compare our pose recognition performance with the latest OpenFace 2.0 facial behavior analysis toolkit. In addition, we contribute head pose annotation to a large-scale dataset (VGGFace2).

UR - http://www.scopus.com/inward/record.url?scp=85070451101&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070451101&partnerID=8YFLogxK

U2 - 10.1109/FG.2019.8756536

DO - 10.1109/FG.2019.8756536

M3 - Conference proceeding (ISBN)

T3 - Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019

BT - Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Behera A, Gidney AG, Wharton Z, Robinson D, Quinn K. A CNN model for head pose recognition using wholes and regions. In Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 8756536. (Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019). https://doi.org/10.1109/FG.2019.8756536