Vision transformer attention with multi-reservoir echo state network for anomaly recognition

W. Ullah, T. Hussain, S.W. Baik

Research output: Contribution to journalArticle (journal)peer-review

21 Citations (Scopus)

Abstract

Anomalous event recognition requires an instant response to reduce the loss of human life and property; however, existing automated systems show limited performance due to considerations related to the temporal domain of the videos and ignore the significant role of spatial information. Furthermore, although current surveillance systems can detect anomalous events, they require human intervention to recognise their nature and to select appropriate countermeasures, as there are no fully automatic surveillance techniques that can simultaneously detect and interpret anomalous events. Therefore, we present a framework called Vision Transformer Anomaly Recognition (ViT-ARN) that can detect and interpret anomalies in smart city surveillance videos. The framework consists of two stages: the first involves online anomaly detection, for which a customised, lightweight, one-class deep neural network is developed to detect anomalies in a surveillance environment, while in the second stage, the detected anomaly is further classified into the corresponding class. The size of our anomaly detection model is compressed using a filter pruning strategy based on a geometric median, with the aim of easy adaptability for resource-constrained devices. Anomaly classification is based on vision transformer features and is followed by a bottleneck attention mechanism to enhance the representation. The refined features are passed to a multi-reservoir echo state network for a detailed analysis of real-world anomalies such as vandalism and road accidents. A total of 858 and 1600 videos from two datasets are used to train the proposed model, and extensive experiments on the LAD-2000 and UCF-Crime datasets comprising 290 and 400 testing videos reveal that our framework can recognise anomalies more effectively, outperforming other state-of-the-art approaches with increases in accuracy of 10.14% and 3% on the LAD-2000 and UCF-Crime datasets, respectively.
Original languageEnglish
Article number103289
Pages (from-to)1-17
JournalInformation Processing and Management
Volume60
Issue number3
Early online date6 Feb 2023
DOIs
Publication statusPublished - 6 Feb 2023

Keywords

  • Anomaly detection and recognition
  • Surveillance system
  • Artificial intelligence
  • Spatio-temporal
  • ESN
  • Weakly supervised learning
  • Attention mechanism

Fingerprint

Dive into the research topics of 'Vision transformer attention with multi-reservoir echo state network for anomaly recognition'. Together they form a unique fingerprint.

Cite this