[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Multiple Emotion Tagging for Multimedia Data by Exploiting High-Order Dependencies Among Emotions

Published: 01 December 2015 Publication History

Abstract

In this paper, a novel approach of multiple emotional multimedia tagging is proposed, which explicitly models the higher-order relations among emotions. First, multimedia features are extracted from the multimedia data. Second, a traditional multi-label classifier is used to obtain the measurements of the multi-emotion labels. Then, we propose a three-layer restricted Boltzmann machine (TRBM) model to capture the higher-order relations among emotion labels, as well as the relations between labels and measurements . Finally, the TRBM model is used to infer the samples’ multi- emotion labels by combining the emotion measurements with the dependencies among multi- emotions . Experimental results on four databases demonstrate that our method is more effective than both feature -driven methods and current model-based methods, which capture the pairwise relations among labels by the Bayesian network (BN). Furthermore, the comparison of BN models and the proposed TRBM model verifies that the patterns captured by the latent units of TRBM contain not only all the dependencies captured by the BN but also many other dependencies that the BN cannot capture.

References

[1]
S. Arifin and P. Cheung, “Affective level video segmentation by utilizing the pleasure-arousal-dominance information,” IEEE Trans. Multimedia, vol. 10, no. 7, pp. 1325–1341, Nov. 2008.
[2]
S. Arifin and P. Y. K. Cheung, “A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information,” in Proc. 15th Int. Conf. Multimedia, 2007, pp. 68–77.
[3]
S. Arifin and P. Y. K. Cheung, “A novel probabilistic approach to modeling the pleasure-arousal-dominance content of the video based on working memory,” in Proc. Int. Conf. IEEE Semantic Comput., Sep. 2007, pp. 147–154.
[4]
L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features,” IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 4, pp. 636–647, Apr. 2013.
[5]
L. Canini, S. Benini, P. Migliorati, and R. Leonardi, “Emotional identity of movies,” in Proc. 16th IEEE Int. Conf. Image Process., Nov. 2009, pp. 1821–1824.
[6]
X. Y. Chen and Z. Segall, “XV-Pod: An emotion aware, affective mobile video player,” in Proc. WRI World Congr. Comput. Sci. Inf. Eng., 2009, pp. 277–281.
[7]
E. A. Cherman, M. C. Monard, and J. Metz, “Multi-label problem transformation methods: A case study,” CLEI Electron. J., vol. 14, no. 1, pp. 4–4, 2011.
[8]
Y. Cui, J. S. Jin, S. Zhang, S. Luo, and Q. Tian, “Music video affective understanding using feature importance analysis,” in Proc. ACM Int. Conf. Image Video Retrieval, New York, NY, 2010, pp. 213–219.
[9]
Y. Cui, S. Luo, Q. Tian, S. Zhang, Y. Peng, L. Jiang, and J. Jin Mutual information-based emotion recognition The Era of Interactive Media, New York, NY USA: Springer, 2013, pp. 471–479.
[10]
X. Ding, B. Li, W. Hu, W. Xiong, and Z. Wang, “Horror video scene recognition based on multi-view multi-instance learning,” in Proc. Comput. Vis., 2013, pp. 599–610.
[11]
T. Eerola, O. Lartillot, and P. Toiviainen, “Prediction of multidimensional emotional ratings in music from audio using multivariate regression models,” in Proc. Int. Conf. Music Inf. Retrieval, 2009, pp. 621–626.
[12]
P. Ekman, Handbook of Cognition and Emotion, Sussex, U.K.: Wiley, 1999, Basic Emotions, pp. 45–60.
[13]
J. H. French, “Automatic affective video indexing: Sound energy and object motion correlation discovery: Studies in identifying slapstick comedy using low-level video features,” in Proc. IEEE Southeastcon, Apr. 2013, pp. 1–6.
[14]
J. Fürnkranz, E. Hüllermeier, E. Loza Menca, and K. Brinker, “Multilabel classification via calibrated label ranking,” Mach. Learn., vol. 73, no. 2, pp. 133–153, Nov. 2008.
[15]
S. Godbole and S. Sarawagi, “Discriminative methods for multi-labeled classification,” in Proc. 8th Pacific–Asia Conf. Knowl. Discovery Data Mining, 2004, pp. 22–30.
[16]
J. J. Gross and R. W. Levenson, “Emotion elicitation using films,” Cognition Emotion, vol. 9, no. 1, pp. 87–108, 1995.
[17]
A. Hanjalic and L. Q. Xu, “Affective video content representation and modeling,” IEEE Trans. Multimedia, vol. 7, no. 1, pp. 143–154, Feb. 2005.
[18]
Y. hsuan Yang and H. H. Chen, “Machine recognition of music emotion: A review,” in Proc. ACM Trans. Intell. Syst. Technol., 2012, vol. 3, no. 3, p. 40.
[19]
S.-J. Huang, Y. Yu, and Z.-H. Zhou, “Multi-label hypothesis reuse,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 525–533.
[20]
G. Irie, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawaki, “Affective audio-visual words and latent topic driving model for realizing movie affective scene classification,” IEEE Trans. Multimedia, vol. 12, no. 6, pp. 523–535, Oct. 2010.
[21]
P. Isola, J. Xiao, A. Torralba, and A. Oliva, “What makes an image memorable?,” in Proc. IEEE Conf. IEEE Comput. Vis. Pattern Recog., Jun. 2011, pp. 145–152.
[22]
D. Joshi, R. Datta, E. Fedorovskaya, Q.-T. Luong, J. Z. Wang, J. Li, and J. Luo, “Aesthetics and emotions in images,” IEEE Signal Process. Mag., vol. 28, no. 5, pp. 94–115, Sep. 2011.
[23]
H.-B. Kang, “Affective content detection using HMMs,” in Proc. 11th ACM Int. Conf. Multimedia, 2003, pp. 259–262.
[24]
H.-B. Kang, “Emotional event detection using relevance feedback,” in Proc. Int. Conf. Image Process., 2003, vol. 1, pp. I-721–I-724.
[25]
Y. E. Kim et al., “Music emotion recognition: A state of the art review,” in Proc. ISMIR, 2010, pp. 255–266.
[26]
D. Li, I. K. Sethi, N. Dimitrova, and T. McGee, “Classification of general audio data for content-based retrieval,” Pattern Recog. Lett., vol. 22, no. 5, pp. 533–544, 2001.
[27]
T. Li and M. Ogihara, “Detecting emotion in music,” in Proc. Int. Symp. Music Inf. Retrieval, 2003, pp. 239–240.
[28]
R. M. A. Teixeira, T. Yamasaki, and K. Aizawa, “Determination of emotional content of video clips by low-level audiovisual features,” Multimedia Tools Appl., vol. 61, no. 1, pp. 21–49, 2012.
[29]
A. G. Money and H. Agius Feasibility of personalized affective video summaries Affect and Emotion Human-Computer Interaction, Berlin, Germany: Springer, 2008, pp. 194–208.
[30]
A. G. Money and H. Agius, “Analysing user physiological responses for affective video summarisation,” Displays, vol. 30, no. 2, pp. 59–70, 2009.
[31]
V. Nair and G. E. Hinton, “3D object recognition with deep belief nets,” in Proc. Adv. Neural Inf. Process. Syst., 2009, pp. 1339–1347.
[32]
K.-M. Ong and W. Kameyama, “Classification of video shots based on human affect,” in Proc. Inf. Media Technol., 2009, vol. 4, no. 4, pp. 903–912.
[33]
P. Philippot, “Inducing and assessing differentiated emotion-feeling states in the laboratory,” Cognition Emotion, vol. 7, no. 2, pp. 171–193, 1993.
[34]
J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classifier chains for multi-label classification,” in Proc. Eur. Conf. on Mach. Learn. Knowl. Discovery Databases: Part II, 2009, pp. 254–269.
[35]
A. Schaefer, F. Nils, X. Sanchez, and P. Philippot, “Assessing the effectiveness of a large database of emotion-eliciting films: A new tool for emotion researchers,” Cognition Emotion, vol. 24, no. 7, pp. 1153–1172, 2010.
[36]
B. Schuller, D. Johannes, and R. Gerhard, “Determination of nonprototypical valence and arousal in popular music: Features and performances,” EURASIP J. Audio, Speech, Music Process., vol. 2010, pp. 735–854, 2010.
[37]
X. W. Shangfei Wang, “Emotion semantics image retrieval: An brief overview,” in Proc. ACII, 2005, pp. 490–497.
[38]
M. S. Sorower, “A literature survey on algorithms for multi-label learning,”, Ph.D. dissertation Dept. of Comput. Sci. Oregon State Univ., Corvallis, OR USA: 2010.
[39]
K. Sun and J. Yu, “Video affective content representation and recognition using video affective tree and hidden Markov models,” in Proc. 2nd Int. Conf. Affective Comput. Intell. Interaction, 2007, pp. 594–605.
[40]
R. Marcelino, A. Teixeira, T. Yamasaki, and K. Aizawa, “Determination of emotional content of video clips by low-level audiovisual features,” Multimedia Tools Appl., vol. 61, no. 1, pp. 1–29, 2011.
[41]
K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas, “Multi-label classification of music into emotions,” in Proc. 9th Int. Conf. Music Inf. Retrieval, 2008, pp. 325–330.
[42]
K. Trohidis, G. Tsoumakas, G. Kalliris, and I. Vlahavas, “Multi-label classification of music by emotion,” EURASIP J. Audio, Speech, Music Process., vol. 4, pp. 1–9, 2011.
[43]
G. Tsoumakas, I. Katakis, and I. Vlahavas Mining multi-label data Data Mining and Knowledge Discovery Handbook, New York, NY USA: Springer, 2010, pp. 667–685.
[44]
G. Tsoumakas and I. Vlahavas, “Random k-labelsets: An ensemble method for multilabel classification,” in Proc. Mach. Learn., 2007, pp. 406–417.
[45]
C. W. Wang, W. H. Cheng, J. C. Chen, S. S. Yang, and J. L. Wu, “Film narrative exploration through the analysis of aesthetic elements,” in Proc. Adv. Multimedia Modeling, 2006, pp. 606–615.
[46]
H. L. Wang and L.-F. Cheong, “Affective understanding in film,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 6, pp. 689–704, Jun. 2006.
[47]
S. Wang and X. Wang Emotional semantic detection from multimedia: A brief overview Kansei Engineering and Soft Computing: Theory and Practice 2010, Hershey, PA USA: Eng. Sci. Ref., 2010, pp. 126–146.
[48]
S. Wang, Z. Wang, and Q. Ji, “Multiple emotional tagging of multimedia data by exploiting dependencies among emotions,” Multimedia Tools Appl., vol. 74, no. 6, pp. 1863–1883, 2013.
[49]
S. Wang, Y. Zhu, G. Wu, and Q. Ji, “Hybrid video emotional tagging using users’ eeg and video content,” Multimedia Tools Appl., vol. 72, no. 2, pp. 1257–1283, 2014.
[50]
W. Wang and Q. He, “A survey on emotional semantic image retrieval,” in Proc. 15th IEEE Int. Conf. Image Process., Oct. 2008, pp. 117–120.
[51]
Z. Wang, Y. Li, S. Wang, and Q. Ji, “Capturing global semantic relationships for facial action unit recognition,” in Proc IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3304–3311.
[52]
S. C. Watanapa, B. Thipakorn, and N. Charoenkitkarn, “A sieving ANN for emotion-based movie clip classification,” IEICE Trans. Inf. Syst., vol. 91, no. 5, pp. 1562–1572, 2008.
[53]
C. Y. Wei, N. Dimitrova, and S. F. Chang, “Color-mood analysis of films based on syntactic and psychological models,” in Proc. IEEE Int. Conf. Multimedia Expo, Jun. 2004, vol. 2, pp. 831–834.
[54]
A. Wieczorkowska, P. Synak, and Z. W. Ras, “Multi-label classification of emotions in music,” in Proc. Intell. Inf. Process. Web Mining, 2006, pp. 307–315.
[55]
J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba, “Sun database: Large-scale scene recognition from abbey to zoo,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2010, pp. 3485–3492.
[56]
M. Xu, L.-T. Chia, and J. Jin, “Affective content analysis in comedy and horror videos by audio emotional event detection,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2005, p. 4.
[57]
M. Xu et al., “Using scripts for affective content retrieval,” in Proc. Adv. Multimedia Inf. Process., 2010, pp. 43–51.
[58]
M. Xu, J. S. Jin, S. Luo, and L. Duan, “Hierarchical movie affective content analysis based on arousal and valence features,” in Proc. 16th ACM Int. Conf. Multimedia, 2008, pp. 677–680.
[59]
M. Xu et al., “A three-level framework for affective content analysis and its case studies,” English Multimedia Tools Appl., pp. 1–23, 2012.
[60]
M. Xu et al., “Hierarchical affective content analysis in arousal and valence dimensions,” Signal Process., vol. 93, no. 8, pp. 2140–2150, 2013.
[61]
A. Yazdani, K. Kappeler, and T. Ebrahimi, “Affective content analysis of music video clips,” in Proc. 1st Int. ACM Workshop Music Inf. Retrieval User-Centered Multimodal Strategies, 2011, pp. 7–12.
[62]
A. Yazdani, J.-S. Lee, and T. Ebrahimi, “Implicit emotional tagging of multimedia using EEG signals and brain computer interface,” in Proc. 1st SIGMM Workshop Social Media, 2009, pp. 81–88.
[63]
A. Yazdani, E. Skodras, N. Fakotakis, and T. Ebrahimi, “Multimedia content analysis for emotional characterization of music video clips,” EURASIP J. Image Video Process., vol. 1, no. 26, pp. 1–10, 2013.
[64]
M. Zhang and K. Zhang, “Multi-label learning by exploiting label dependency,” in Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 999–1008.
[65]
M.-L. Zhang and Z.-H. Zhou, “Multilabel neural networks with applications to functional genomics and text categorization,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 10, pp. 1338–1351, Oct. 2006.
[66]
M.-L. Zhang and Z.-H. Zhou, “Ml-knn: A lazy learning approach to multi-label learning,” Pattern Recog., vol. 40, no. 7, pp. 2038–2048, 2007.
[67]
S. Zhang, Q. Huang, S. Jiang, W. Gao, and Q. Tian, “Affective visualization and retrieval for music video,” IEEE Trans. Multimedia, vol. 12, no. 6, pp. 510–522, Oct. 2010.
[68]
S. Zhang, Q. Tian, Q. Huang, W. Gao, and S. Li, “Utilizing affective analysis for efficient movie browsing,” in Proc. 16th IEEE Int. Conf. Image Process., Nov. 2009, pp. 1853–1856.
[69]
S. Zhang, Q. Tian, S. Jiang, Q. Huang, and W. Gao, “Affective MTV analysis based on arousal and valence features,” in Proc. IEEE Int. Conf. Multimedia Expo, Apr.–Jun. 2008, pp. 1369–1372.
[70]
S. Zhao et al., “Video indexing and recommendation based on affective analysis of viewers,” in Proc. 19th ACM Int. Conf. Multimedia, 2011, pp. 1473–1476.

Cited By

View all
  • (2023)Beyond Word Embeddings: Heterogeneous Prior Knowledge Driven Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2022.317109525(4013-4025)Online publication date: 1-Jan-2023
  • (2021)Joint Input and Output Space Learning for Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2020.300218523(1696-1707)Online publication date: 1-Jan-2021
  • (2021)Capturing Emotion Distribution for Multimedia Emotion TaggingIEEE Transactions on Affective Computing10.1109/TAFFC.2019.290024012:4(821-831)Online publication date: 1-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 17, Issue 12
Dec. 2015
243 pages

Publisher

IEEE Press

Publication History

Published: 01 December 2015

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Beyond Word Embeddings: Heterogeneous Prior Knowledge Driven Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2022.317109525(4013-4025)Online publication date: 1-Jan-2023
  • (2021)Joint Input and Output Space Learning for Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2020.300218523(1696-1707)Online publication date: 1-Jan-2021
  • (2021)Capturing Emotion Distribution for Multimedia Emotion TaggingIEEE Transactions on Affective Computing10.1109/TAFFC.2019.290024012:4(821-831)Online publication date: 1-Oct-2021
  • (2017)Inferring Emotional Tags From Social Images With User DemographicsIEEE Transactions on Multimedia10.1109/TMM.2017.265588119:7(1670-1684)Online publication date: 15-Jun-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media