A Vision-Based Approach For Assisting Functional Assessment Involving Activities of Daily Living


Student thesis: Doctoral Thesis


This thesis intends to contribute towards Computer Vision (CV)-based functional assessment of
physically impaired persons involving Activities of Daily Living (ADL). Patients rehabilitating from
conditions like stroke, spinal cord injury, Parkinson’s disease, among other symptoms experience
difficulty in physical movements which limit their functional ability and independence. Such patients
usually undergo physical rehabilitation programs which require constant monitoring of their (ADL)
to record their progress. This monitoring is currently done by Healthcare professionals which is not
only labour intensive and expensive but also error-prone. The study aims to address this problem
by proposing CV-based methods for detecting ADL which can be used for functional assessment of
patients automatically from recorded videos. This has the potential of lessening the labour-intensive
manual annotation and continuous human observation, resulting in reduction of the overall cost of
rehabilitation for such patients. While the current CV literature is replete with methods for human
activity or ADL recognition, there are very few that aim to detect impairment-specific versions
of ADL executed or exhibited by physically impaired persons. A part of the problem lies in the
unavailability of labelled datasets for such activities making it difficult for researchers to develop
the necessary methods to detect and recognise them. In recent years, the field of CV has seen
increasing use of Deep Learning (DL) methods for ADL recognition. However, DL-based models
are almost exclusively data-driven and require very large datasets often containing thousands of
human activity videos to successfully train and validate. The current study attempts to address
this issue by developing and contributing a novel multi-label dataset that includes labelled videos of
several categories of normal and impairment-specific executions of ADL similar to what is exhibited
by normal persons and physically impaired persons, respectively. This dataset has been developed
under the guidance of an Occupational Therapist providing the necessary credibility to the entire
exercise. This is an inter-disciplinary research involving CV, Artificial Intelligence and Health and
Social Care.
One of the key focus of this thesis is to contribute towards the advancement of research in DL.
To this end, the thesis presents three novel human activity recognition models based on DL. The
first model uses an intelligent or learn-able pooling method based on Fisher Vector (FV) to propose
a better alternative to the standard statistical pooling method known as Global Average Pooling
(GAP). In this model, FV with activity-aware pooling method is integrated within the DL model to
semantically cluster the structural information contained in Attention-focused hidden LSTM states
in a novel manner. It leads the network to pool more relevant information in contrast to normally
used statistical pooling methods. The model achieves better performance than the state-of-the-art
video-based models. The second activity recognition model introduces a novel 3D human body-pose
encoding method. The body-pose encoding algorithm learns the spatial arrangement between various
body joints to present an enriched pose information to the network for improved performance. The
algorithm also encodes the frame-wise positions of body joints and presents a temporally enriched
representation for each joint, individually. The pose encoding algorithm coupled with an Attention
mechanism is presented as a part of combined video and pose-based activity recognition model
that achieves state-of-the-art results on three challenging benchmark datasets. The third is a pure
human body pose-based lightweight DL model based on Temporal Convolution Networks. The
spatial-temporal two-stream model takes advantage of the pose encoding algorithm and the learnable pooling method introduced earlier to impact the model performance, positively. The model is
not only able to recognise an ADL, but also discriminate between the normal and different physical
impairment-specific variations of the same ADL when evaluated on the multi-label dataset. Thus,
it fulfills the main research aim. To the best of my knowledge, this is an unique inter-disciplinary
research that attempts to recognise physical impairment-specific ADL through multi-label video
analysis and recognition. In addition to the three activity recognition models, the thesis presents a
mobile-based DL approach for human pose estimation. The model introduces a novel Split-Stream
architecture as an alternative to the standard GAP method present towards the end of many DL
models. The thesis also presents a critical review of existing research on CV-based rehabilitation
and assessment. The review proposes its own taxonomy and analyses articles from a CV perspective
compared to other reviews that mainly focus on the clinical perspective. The literature review,
the mobile-based human body-pose estimation model, the multi-label dataset and the three human
activity recognition models are the major contributions of this inter-disciplinary research.
Date of Award21 Jun 2021
Original languageEnglish
Awarding Institution
  • Edge Hill University
SupervisorARDHENDU BEHERA (Director of Studies), MARY O'BRIEN (Supervisor) & SWAGAT KUMAR (Supervisor)


  • Computer Vision
  • Deep Learning
  • Physical Rehabilitation
  • CNN
  • TCN
  • Fisher Vectors
  • Human activity Recognition
  • Human Pose Estimation

Cite this