Abstract
This paper presents a pilot study for a personalized media service which aims at creating an intelligent, sentiment-aware, and language-independent access to large archives of audiovisual documents, providing equal services to both mainstream and marginalized users. The proposed multi-modal framework analyzes aural, visual, and human descriptions, integrating them into an automatic content analyzer. Firstly, text is extracted from the aural stream and mapped to American Sign Language (ASL), translating conventional video to content suitable for the deaf. Next, sentiment is estimated from text, aural, and visual contents using two deep convolutional neural networks (CNN), extracting discriminative features from each modality. This provides output predictions for two broad classes: positive and negative sentiments. Preliminary results indicate that the proposed approach is capable of accurately estimating the sentiment of multimedia contents, which is an important step for personalized and intelligent media services.
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW) 2017 |
Subtitle of host publication | 10-14 July 2017 |
Place of Publication | Hong Kong, China |
Publisher | IEEE |
Pages | 220-225 |
Number of pages | 6 |
Volume | ICMEW 2017 |
ISBN (Electronic) | 978-1-5386-0560-8 |
ISBN (Print) | 978-1-5386-0561-5 |
DOIs | |
Publication status | Published - 7 Sept 2017 |
Keywords
- Deep learning
- sentiment analysis
- Marginalized users
- Deaf
- Sign language