TY - JOUR
T1 - Cost-Effective Video Summarization using Deep CNN with Hierarchical Weighted Fusion for IoT Surveillance Networks
AU - Muhammad, Khan
AU - HUSSAIN, TANVEER
AU - Tanveer, M
AU - Sannino, Giovanna
AU - Albuquerque, Victor Hugo C. De
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Video summarization (VS) has attracted intense attention recently due to its enormous applications in various computer vision domains, such as video retrieval, indexing, and browsing. Traditional VS researches mostly target at the effectiveness of the VS algorithms by introducing the high quality of features and clusters for selecting representative visual elements. Due to the increased density of vision sensors network, there is a tradeoff between the processing time of the VS methods with reasonable and representative quality of the generated summaries. It is a challenging task to generate a video summary of significant importance while fulfilling the needs of Internet of Things (IoT) surveillance networks with constrained resources. This article addresses this problem by proposing a new computationally effective solution through designing a deep CNN framework with hierarchical weighted fusion for the summarization of surveillance videos captured in IoT settings. The first stage of our framework designs discriminative rich features extracted from deep CNNs for shot segmentation. Then, we employ image memorability predicted from a fine-tuned CNN model in the framework, along with aesthetic and entropy features to maintain the interestingness and diversity of the summary. Third, a hierarchical weighted fusion mechanism is proposed to produce an aggregated score for the effective computation of the extracted features. Finally, an attention curve is constituted using the aggregated score for deciding outstanding keyframes for the final video summary. Experiments are conducted using benchmark data sets for validating the importance and effectiveness of our framework, which outperforms the other state-of-the-art schemes.
AB - Video summarization (VS) has attracted intense attention recently due to its enormous applications in various computer vision domains, such as video retrieval, indexing, and browsing. Traditional VS researches mostly target at the effectiveness of the VS algorithms by introducing the high quality of features and clusters for selecting representative visual elements. Due to the increased density of vision sensors network, there is a tradeoff between the processing time of the VS methods with reasonable and representative quality of the generated summaries. It is a challenging task to generate a video summary of significant importance while fulfilling the needs of Internet of Things (IoT) surveillance networks with constrained resources. This article addresses this problem by proposing a new computationally effective solution through designing a deep CNN framework with hierarchical weighted fusion for the summarization of surveillance videos captured in IoT settings. The first stage of our framework designs discriminative rich features extracted from deep CNNs for shot segmentation. Then, we employ image memorability predicted from a fine-tuned CNN model in the framework, along with aesthetic and entropy features to maintain the interestingness and diversity of the summary. Third, a hierarchical weighted fusion mechanism is proposed to produce an aggregated score for the effective computation of the extracted features. Finally, an attention curve is constituted using the aggregated score for deciding outstanding keyframes for the final video summary. Experiments are conducted using benchmark data sets for validating the importance and effectiveness of our framework, which outperforms the other state-of-the-art schemes.
UR - http://dx.doi.org/10.1109/jiot.2019.2950469
U2 - 10.1109/jiot.2019.2950469
DO - 10.1109/jiot.2019.2950469
M3 - Article (journal)
SN - 2327-4662
VL - 7
SP - 4455
EP - 4463
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 5
ER -