Adapting MobileNets for mobile based upper body pose estimation

Bappaditya Debnath, Mary O'Brien, Motonori Yamaguchi, Ardhendu Behera

Research output: Contribution to journalArticleResearchpeer-review

14 Downloads (Pure)

Abstract

Human pose estimation through deep learning has achieved very high accuracy over various difficult poses. However, these are computationally expensive and are often not suitable for mobile based systems. In this paper, we investigate the use of MobileNets, which is well-known to be a light-weight and efficient CNN architecture for mobile and embedded vision applications. We adapt MobileNets for pose estimation inspired by the hourglass network. We introduce a novel split stream architecture at the final two layers of the MobileNets. This approach reduces over-fitting, resulting in improvement in accuracy and reduction in parameter size. We also show that by maintaining part of the original network we are able to improve accuracy by transferring the learned features from ImageNet pre-trained MobileNets. The adapted model is evaluated on the FLIC dataset. Our network out-performed the default MobileNets for pose estimation, as well as achieved performance comparable to the state of the art results while reducing inference time significantly.
Original languageEnglish
JournalIEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)
Early online date14 Feb 2019
DOIs
Publication statusE-pub ahead of print - 14 Feb 2019

Fingerprint

Deep learning

Cite this

@article{c46c723d6d60470bb1516a4b3f620680,
title = "Adapting MobileNets for mobile based upper body pose estimation",
abstract = "Human pose estimation through deep learning has achieved very high accuracy over various difficult poses. However, these are computationally expensive and are often not suitable for mobile based systems. In this paper, we investigate the use of MobileNets, which is well-known to be a light-weight and efficient CNN architecture for mobile and embedded vision applications. We adapt MobileNets for pose estimation inspired by the hourglass network. We introduce a novel split stream architecture at the final two layers of the MobileNets. This approach reduces over-fitting, resulting in improvement in accuracy and reduction in parameter size. We also show that by maintaining part of the original network we are able to improve accuracy by transferring the learned features from ImageNet pre-trained MobileNets. The adapted model is evaluated on the FLIC dataset. Our network out-performed the default MobileNets for pose estimation, as well as achieved performance comparable to the state of the art results while reducing inference time significantly.",
author = "Bappaditya Debnath and Mary O'Brien and Motonori Yamaguchi and Ardhendu Behera",
note = "15th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS) 27th-30th November 2018 Auckland, New Zealand",
year = "2019",
month = "2",
day = "14",
doi = "10.1109/AVSS.2018.8639378",
language = "English",
journal = "IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)",

}

Adapting MobileNets for mobile based upper body pose estimation. / Debnath, Bappaditya; O'Brien, Mary; Yamaguchi, Motonori; Behera, Ardhendu.

In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS), 14.02.2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Adapting MobileNets for mobile based upper body pose estimation

AU - Debnath, Bappaditya

AU - O'Brien, Mary

AU - Yamaguchi, Motonori

AU - Behera, Ardhendu

N1 - 15th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS) 27th-30th November 2018 Auckland, New Zealand

PY - 2019/2/14

Y1 - 2019/2/14

N2 - Human pose estimation through deep learning has achieved very high accuracy over various difficult poses. However, these are computationally expensive and are often not suitable for mobile based systems. In this paper, we investigate the use of MobileNets, which is well-known to be a light-weight and efficient CNN architecture for mobile and embedded vision applications. We adapt MobileNets for pose estimation inspired by the hourglass network. We introduce a novel split stream architecture at the final two layers of the MobileNets. This approach reduces over-fitting, resulting in improvement in accuracy and reduction in parameter size. We also show that by maintaining part of the original network we are able to improve accuracy by transferring the learned features from ImageNet pre-trained MobileNets. The adapted model is evaluated on the FLIC dataset. Our network out-performed the default MobileNets for pose estimation, as well as achieved performance comparable to the state of the art results while reducing inference time significantly.

AB - Human pose estimation through deep learning has achieved very high accuracy over various difficult poses. However, these are computationally expensive and are often not suitable for mobile based systems. In this paper, we investigate the use of MobileNets, which is well-known to be a light-weight and efficient CNN architecture for mobile and embedded vision applications. We adapt MobileNets for pose estimation inspired by the hourglass network. We introduce a novel split stream architecture at the final two layers of the MobileNets. This approach reduces over-fitting, resulting in improvement in accuracy and reduction in parameter size. We also show that by maintaining part of the original network we are able to improve accuracy by transferring the learned features from ImageNet pre-trained MobileNets. The adapted model is evaluated on the FLIC dataset. Our network out-performed the default MobileNets for pose estimation, as well as achieved performance comparable to the state of the art results while reducing inference time significantly.

UR - https://avss2018.org

UR - https://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1001307

U2 - 10.1109/AVSS.2018.8639378

DO - 10.1109/AVSS.2018.8639378

M3 - Article

JO - IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)

JF - IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)

ER -