TY - JOUR
T1 - ARFDNet: An Efficient Activity Recognition & Fall Detection System using Latent Feature Pooling
AU - PANDEY, HARI MOHAN
AU - Yadav, Santosh Kumar
AU - Luthra, Achleshwar
AU - Tiwari, Kamlesh
AU - Akbar, Shaik Ali
PY - 2021/12/29
Y1 - 2021/12/29
N2 - This paper presents an efficient activity recognition and fall detection system (ARFDNet). Here, the raw RGB videos are passed to a pose estimation network to extract skeleton features of the user. These skeleton coordinates are then pre-processed and inputted in a sliding window fashion to specially designed convolutional neural networks (CNNs) followed by gated recurrent units (GRUs), to learn the spatiotemporal dynamics present in the data. The output of the GRUs is then passed to fully connected layers for the classifications. The proposed model is tested on two databases, namely, ADLF (activities of daily living and fall) and UP-Fall detection dataset. ADLF dataset is an in-house dataset collected from 12 participants with a single web camera. It consists of 119 videos (740,375 frames), recorded for a total duration of 29,077 seconds. UP-Fall detection dataset is a publicly available large-scale dataset for ADL (activities of daily living) monitoring and fall detection. In the current research, we consider only the vision-based UP-Fall detection dataset, which utilizes 2 cameras for recording 6 ADLs and 5 types of falls with the help of 17 individuals. It comprises 277 GB of vision data with a total number of 589,418 images. Result reveals that the proposed system demonstrated – (a) an accuracy of 89.05% and 89.64% before and after polling, respectively, on the ADLF dataset; (b) an accuracy of 96.7% on the UP-Fall detection dataset. These results show the superiority of the proposed system over the most recent state-of-the-art work.
AB - This paper presents an efficient activity recognition and fall detection system (ARFDNet). Here, the raw RGB videos are passed to a pose estimation network to extract skeleton features of the user. These skeleton coordinates are then pre-processed and inputted in a sliding window fashion to specially designed convolutional neural networks (CNNs) followed by gated recurrent units (GRUs), to learn the spatiotemporal dynamics present in the data. The output of the GRUs is then passed to fully connected layers for the classifications. The proposed model is tested on two databases, namely, ADLF (activities of daily living and fall) and UP-Fall detection dataset. ADLF dataset is an in-house dataset collected from 12 participants with a single web camera. It consists of 119 videos (740,375 frames), recorded for a total duration of 29,077 seconds. UP-Fall detection dataset is a publicly available large-scale dataset for ADL (activities of daily living) monitoring and fall detection. In the current research, we consider only the vision-based UP-Fall detection dataset, which utilizes 2 cameras for recording 6 ADLs and 5 types of falls with the help of 17 individuals. It comprises 277 GB of vision data with a total number of 589,418 images. Result reveals that the proposed system demonstrated – (a) an accuracy of 89.05% and 89.64% before and after polling, respectively, on the ADLF dataset; (b) an accuracy of 96.7% on the UP-Fall detection dataset. These results show the superiority of the proposed system over the most recent state-of-the-art work.
KW - Action Recognition
KW - Elderly Monitoring
KW - CNNs and GRUs
KW - Pose Recognition
U2 - 10.1016/j.knosys.2021.107948
DO - 10.1016/j.knosys.2021.107948
M3 - Article (journal)
SN - 0950-7051
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 107948
ER -