Human pose estimation through deep learning has achieved very high accuracy over various difficult poses. However, these are computationally expensive and are often not suitable for mobile based systems. In this paper, we investigate the use of MobileNets, which is well-known to be a light-weight and efficient CNN architecture for mobile and embedded vision applications. We adapt MobileNets for pose estimation inspired by the hourglass network. We introduce a novel split stream architecture at the final two layers of the MobileNets. This approach reduces over-fitting, resulting in improvement in accuracy and reduction in parameter size. We also show that by maintaining part of the original network we are able to improve accuracy by transferring the learned features from ImageNet pre-trained MobileNets. The adapted model is evaluated on the FLIC dataset. Our network out-performed the default MobileNets for pose estimation, as well as achieved performance comparable to the state of the art results while reducing inference time significantly.
|Journal||IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS)|
|Early online date||14 Feb 2019|
|Publication status||E-pub ahead of print - 14 Feb 2019|