TY - JOUR
T1 - A modified vision transformer architecture with scratch learning capabilities for effective fire detection
AU - Yar, Hikmat
AU - Khan, Zulfiqar Ahmad
AU - Hussain, Tanveer
AU - Baik, Sung Wook
N1 - Publisher Copyright:
© 2024
PY - 2024/10/15
Y1 - 2024/10/15
N2 - Fire is considered to be one of the major influencing factors that cause fatalities, property damage, and economic, and ecological disruption. To perform the early detection of fire using data from vision sensors and prevent or reduce the damage that fires cause, deep learning models have been widely adopted to overcome the limitations of conventional methods. However, mainstream convolutional neural network (CNN) models have limited generalisation abilities in unseen scenarios and struggle to obtain a good trade-off among the accuracy, inference speed, and model size. Currently, vision transformers (ViT) outperform conventional CNN models; however, they are computationally expensive and required more data for training. They provide a limited performance for small and medium-sized datasets, which are very common in the fire scene classification domain. In this work, we employ a novel ViT architecture by combining shifted patch tokenisation and local self-attention modules for efficient fire scene classification and enable the model to learn from scratch even on small and medium-sized datasets. Furthermore, to make the model suitable for real-time inferencing, we modify the transformer encoder and eventually achieve a reduced number of floating-point operations and a reduced model size. Additionally, in this work, a medium-scale fire dataset is developed that contains complex real-world scenarios. Our model is assessed on three benchmark and a self-created datasets using several evaluation metrics, including a novel cross-corpse evaluation metric, as well as a robustness evaluation metric. The experimental results indicate that our model achieved an overwhelmingly better performance compared to existing methods in terms of the accuracy and model complexity.
AB - Fire is considered to be one of the major influencing factors that cause fatalities, property damage, and economic, and ecological disruption. To perform the early detection of fire using data from vision sensors and prevent or reduce the damage that fires cause, deep learning models have been widely adopted to overcome the limitations of conventional methods. However, mainstream convolutional neural network (CNN) models have limited generalisation abilities in unseen scenarios and struggle to obtain a good trade-off among the accuracy, inference speed, and model size. Currently, vision transformers (ViT) outperform conventional CNN models; however, they are computationally expensive and required more data for training. They provide a limited performance for small and medium-sized datasets, which are very common in the fire scene classification domain. In this work, we employ a novel ViT architecture by combining shifted patch tokenisation and local self-attention modules for efficient fire scene classification and enable the model to learn from scratch even on small and medium-sized datasets. Furthermore, to make the model suitable for real-time inferencing, we modify the transformer encoder and eventually achieve a reduced number of floating-point operations and a reduced model size. Additionally, in this work, a medium-scale fire dataset is developed that contains complex real-world scenarios. Our model is assessed on three benchmark and a self-created datasets using several evaluation metrics, including a novel cross-corpse evaluation metric, as well as a robustness evaluation metric. The experimental results indicate that our model achieved an overwhelmingly better performance compared to existing methods in terms of the accuracy and model complexity.
KW - Convolution neural network
KW - Deep learning
KW - Disaster management
KW - Fire detection
KW - Forest fire
KW - Machine learning
KW - Surveillance system
KW - Vehicle fire
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85192184425&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192184425&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.123935
DO - 10.1016/j.eswa.2024.123935
M3 - Article (journal)
SN - 0957-4174
VL - 252
SP - 1
EP - 13
JO - Expert Systems with Applications
JF - Expert Systems with Applications
IS - A
M1 - 123935
ER -