In this paper, we present a novel method to explore semantically meaningful visual information and identify the discriminative spatiotemporal relationships between them for real-time activity recognition. Our approach infers human activities using continuous egocentric (first-person-view) videos of object manipulations in an industrial setup. In order to achieve this goal, we propose a random forest that unifies randomization, discriminative relationships mining and a Markov temporal structure. Discriminative relationships mining helps us to model relations that distinguish different activities, while randomization allows us to handle the large feature space and prevents over-fitting. The Markov temporal structure provides temporally consistent decisions during testing. The proposed random forest uses a discriminative Markov decision tree, where every nonterminal node is a discriminative classifier and the Markov structure is applied at leaf nodes. The proposed approach outperforms the state-of-the-art methods on a new challenging video dataset of assembling a pump system.
|Publication status||Published - 1 Sept 2014|
|Event||25th British Machine Vision Conference - Nottingham, United Kingdom|
Duration: 1 Sept 2014 → 5 Sept 2014
|Conference||25th British Machine Vision Conference|
|Period||1/09/14 → 5/09/14|