A modified vision transformer architecture with scratch learning capabilities for effective fire detection

Hikmat Yar, Zulfiqar Ahmad Khan, Tanveer Hussain, Sung Wook Baik*

*Corresponding author for this work

Research output: Contribution to journalArticle (journal)peer-review

5 Citations (Scopus)

Abstract

Fire is considered to be one of the major influencing factors that cause fatalities, property damage, and economic, and ecological disruption. To perform the early detection of fire using data from vision sensors and prevent or reduce the damage that fires cause, deep learning models have been widely adopted to overcome the limitations of conventional methods. However, mainstream convolutional neural network (CNN) models have limited generalisation abilities in unseen scenarios and struggle to obtain a good trade-off among the accuracy, inference speed, and model size. Currently, vision transformers (ViT) outperform conventional CNN models; however, they are computationally expensive and required more data for training. They provide a limited performance for small and medium-sized datasets, which are very common in the fire scene classification domain. In this work, we employ a novel ViT architecture by combining shifted patch tokenisation and local self-attention modules for efficient fire scene classification and enable the model to learn from scratch even on small and medium-sized datasets. Furthermore, to make the model suitable for real-time inferencing, we modify the transformer encoder and eventually achieve a reduced number of floating-point operations and a reduced model size. Additionally, in this work, a medium-scale fire dataset is developed that contains complex real-world scenarios. Our model is assessed on three benchmark and a self-created datasets using several evaluation metrics, including a novel cross-corpse evaluation metric, as well as a robustness evaluation metric. The experimental results indicate that our model achieved an overwhelmingly better performance compared to existing methods in terms of the accuracy and model complexity.
Original languageEnglish
Article number123935
Pages (from-to)1-13
Number of pages13
JournalExpert Systems with Applications
Volume252
Issue numberA
Early online date20 Apr 2024
DOIs
Publication statusE-pub ahead of print - 20 Apr 2024

Keywords

  • Convolution neural network
  • Deep learning
  • Disaster management
  • Fire detection
  • Forest fire
  • Machine learning
  • Surveillance system
  • Vehicle fire
  • Vision transformer

Research Centres

  • Centre for Intelligent Visual Computing Research

Fingerprint

Dive into the research topics of 'A modified vision transformer architecture with scratch learning capabilities for effective fire detection'. Together they form a unique fingerprint.

Cite this