A Multi-Stream Sequence Learning Framework for Human Interaction Recognition

Umair Haroon, Amin Ullah, Tanveer Hussain, Waseem Ullah, Muhammad Sajjad, Khan Muhammad, Mi Young Lee, Sung Wook Baik

Research output: Contribution to journalArticle (journal)peer-review

17 Citations (Scopus)

Abstract

Human interaction recognition (HIR) is challenging due to multiple humans’ involvement and their mutual interaction in a single frame, generated from their movements. Mainstream literature is based on three-dimensional (3-D) convolutional neural networks (CNNs), processing only visual frames, where human joints data play a vital role in accurate interaction recognition. Therefore, this article proposes a multistream network for HIR that intelligently learns from skeletons’ key points and spatiotemporal visual representations. The first stream localises the joints of the human body using a pose estimation model and transmits them to a 1-D CNN and bidirectional long short-term memory to efficiently extract the features of the dynamic movements of each human skeleton. The second stream feeds the series of visual frames to a 3-D convolutional neural network to extract the discriminative spatiotemporal features. Finally, the outputs of both streams are integrated via fully connected layers that precisely classify the ongoing interactions between humans. To validate the performance of the proposed network, we conducted a comprehensive set of experiments on two benchmark datasets, UT-interaction and TV human interaction, and found 1.15% and 10.0% improvement in the accuracy.
Original languageEnglish
Pages (from-to)435-444
Number of pages10
JournalIEEE Transactions on Human-Machine Systems
Volume52
Issue number3
DOIs
Publication statusPublished - 27 Jan 2022

Fingerprint

Dive into the research topics of 'A Multi-Stream Sequence Learning Framework for Human Interaction Recognition'. Together they form a unique fingerprint.

Cite this