Human activity recognition (HAR) is one of the most important and challenging problems in the computer vision. It has critical application in wide variety of tasks including gaming, human-robot interaction, rehabilitation, sports, health monitoring, video surveillance, and robotics. HAR is challenging due to the complex posture made by the human and multiple people interaction. Various artefacts that commonly appears in the scene such as illuminations variations, clutter, occlusions, background diversity further adds the complexity to HAR. Sensors for multiple modalities could be used to overcome some of these inherent challenges. Such sensors could include an RGB-D camera, infrared sensors, thermal cameras, inertial sensors, etc. This article introduces a comprehensive review of different multimodal human activity recognition methods where different types of sensors being used along with their analytical approaches and fusion methods. Further, this article presents classification and discussion of existing work within seven rational aspects: (a) what are the applications of HAR; (b) what are the single and multi-modality sensing for HAR; (c) what are different vision based approaches for HAR; (d) what and how wearable sensors based system contributes to the HAR; (e) what are different multimodal HAR methods; (f) how a combination of vision and wearable inertial sensors based system contributes to the HAR; and (g) challenges and future directions in HAR. With a more and comprehensive understanding of multimodal human activity recognition, more research in this direction can be motivated and refined.
- Activity Recognition
- Computer vison
- Wearable sensors
- Fusion of vision and inertial sensors