TY - JOUR
T1 - Cloud-Assisted Multi-View Video Summarization using CNN and Bi-Directional LSTM
AU - HUSSAIN, TANVEER
AU - Muhammad, Khan
AU - Ullah, Amin
AU - Cao, Zehong
AU - Baik, Sung Wook
AU - Albuquerque, Victor Hugo C. De
PY - 2019/7/17
Y1 - 2019/7/17
N2 - The massive amount of video data produced by surveillance networks in industries instigate various challenges in exploring these videos for many applications, such as video summarization (VS), analysis, indexing, and retrieval. The task of multiview video summarization (MVS) is very challenging due to the gigantic size of data, redundancy, overlapping in views, light variations, and interview correlations. To address these challenges, various low-level features and clustering-based soft computing techniques are proposed that cannot fully exploit MVS. In this article, we achieve MVS by integrating deep neural network based soft computing techniques in a two-tier framework. The first online tier performs target-appearance-based shots segmentation and stores them in a lookup table that is transmitted to cloud for further processing. The second tier extracts deep features from each frame of a sequence in the lookup table and pass them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary. Experimental evaluation on benchmark dataset and industrial surveillance data from YouTube confirms the better performance of our system compared to the state-of-the-art MVS methods.
AB - The massive amount of video data produced by surveillance networks in industries instigate various challenges in exploring these videos for many applications, such as video summarization (VS), analysis, indexing, and retrieval. The task of multiview video summarization (MVS) is very challenging due to the gigantic size of data, redundancy, overlapping in views, light variations, and interview correlations. To address these challenges, various low-level features and clustering-based soft computing techniques are proposed that cannot fully exploit MVS. In this article, we achieve MVS by integrating deep neural network based soft computing techniques in a two-tier framework. The first online tier performs target-appearance-based shots segmentation and stores them in a lookup table that is transmitted to cloud for further processing. The second tier extracts deep features from each frame of a sequence in the lookup table and pass them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary. Experimental evaluation on benchmark dataset and industrial surveillance data from YouTube confirms the better performance of our system compared to the state-of-the-art MVS methods.
UR - http://dx.doi.org/10.1109/tii.2019.2929228
U2 - 10.1109/tii.2019.2929228
DO - 10.1109/tii.2019.2929228
M3 - Article (journal)
SN - 1941-0050
VL - 16
SP - 77
EP - 86
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 1
ER -