Abstract
Automatic recognition and prediction of in-vehicle human activities has a significant impact on the next generation of driver assistance and intelligent autonomous vehicles. In this paper, we present a novel single image driver action recognition algorithm inspired by human perception that often focuses selectively on parts of the images to acquire information at specific places which are distinct to a given task. Unlike existing approaches, we argue that human activity is a combination of pose and semantic contextual cues. In detail, we model this by considering the configuration of body joints, their interaction with objects being represented as a pairwise relation to capture the structural information. Our body-pose and body-object interaction representation is built to be semantically rich and meaningful, and is highly discriminative even though it is coupled with a basic linear SVM classifier. We also propose a Multi-stream Deep Fusion Network (MDFN) for combining high-level semantics with CNN features. Our experimental results demonstrate that the proposed approach significantly improves the drivers’ action recognition accuracy on two exacting datasets.
Original language | English |
---|---|
Pages (from-to) | 1-8 |
Number of pages | 8 |
Journal | IEEE Transactions on Intelligent Transportation Systems |
Early online date | 12 Oct 2020 |
DOIs | |
Publication status | Published - 12 Oct 2020 |
Keywords
- Transfer Learning
- Intelligent Vehicles
- Deep Learning
- CNN
- Body pose
- Autonomous Vehicles
- In-vehicle Activity Monitoring
- Neural network-based fusion
Research Centres
- Centre for Intelligent Visual Computing Research
- Data Science STEM Research Centre
- Data and Complex Systems Research Centre