TransCNN: Hybrid CNN and transformer mechanism for surveillance anomaly detection

W. Ullah, T. Hussain, F.U.M. Ullah, M.Y. Lee, S.W. Baik

Research output: Contribution to journalArticle (journal)peer-review

31 Citations (Scopus)

Abstract

Surveillance video anomaly detection (SVAD) is a challenging task due to the variations in object scale, discrimination and unexpected events, the impact of the background, and the wide range of definitions of anomalous events in different surveillance contexts. In this work, we introduce an end-to-end hybrid convolution neural network (CNN) and vision transformer-based framework for anomaly detection. The proposed framework uses spatial and temporal information from a surveillance video to detect anomalous events and operates in two steps: in the first step, an efficient backbone CNN model is used for spatial feature extraction, while in the second step, these features are passed from the transformer-based model to learn the long-term temporal relationships between various complex surveillance events. The features from the backbone model are fed to a sequential learning model in which temporal self-attention is utilised to generate an attention map; this allows the proposed framework to learn the spatiotemporal features effectively and to detect anomalous events. Our experimental results on various benchmark VAD datasets prove the validity of the proposed framework, which outperforms other state-of-the-art approaches by achieving high AUC values of 94.6%, 98.4%, and 89.6% on the ShanghaiTech, UCSD Ped2 and CUHK avenue datasets, respectively.
Original languageEnglish
Article number 106173
Pages (from-to)1-11
JournalEngineering Applications of Artificial Intelligence
Volume123
Early online date1 Apr 2023
DOIs
Publication statusPublished - Aug 2023

Keywords

  • Anomaly recognition
  • Artificial intelligence
  • Vision transformer
  • Big data
  • Surveillance videos
  • Machine learning

Fingerprint

Dive into the research topics of 'TransCNN: Hybrid CNN and transformer mechanism for surveillance anomaly detection'. Together they form a unique fingerprint.

Cite this