[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Personalization in multimedia retrieval: A survey

Published: 01 January 2011 Publication History

Abstract

With the explosive broadcast of multimedia (text documents, image, video etc.) in our life, how to annotate, search, index, browse and relate various forms of information efficiently becomes more and more important. Combining these challenges by relating them to user preference and customization only complicates the matter further. The goal of this survey is to give an overview of the current situation in the branches of research that are involved in annotation, relation and presentation to a user by preference. This paper will present some current models and techniques being researched to model ontology, preference, context, and presentation and bring them together in a chain of ideas that leads from raw uninformed data to an actual usable user interface that adapts with user preference and customization.

References

[1]
Agarwal S, Fankhauser P, Gonzalez-Ollala J, Hartman J, Hollfelder S, Jameson A, Klink S, Lehti P, Ley M, Rabbidge E, Scharzkopf E, Shrestha N, Stojanovic N, Studer R, Stumme G, Walter B, Weber A (2003). Semantic methods and tools for information portals. Proceedings of INFORMATIK 2003 - Innovative Informatikanwendungen, pp 116-131.
[2]
Agius H, Angelides M (2007) Closing the content-user gap in MPEG-7: the hanging basket model. Multimed Syst 13(2):155-176.
[3]
Ahn LV, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images, SIGCHI Conference. Human Factors in Computing Systems, pp 55-64.
[4]
Aizawa K, Tancharoen D, Kawasaki S, Yamasaki T (2004) Efficient retrieval of life log based on context and content. ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, pp 22-31.
[5]
Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. ACM Multimedia, pp 68-77.
[6]
Arthur GM, Harry A (2008) Video summarization: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121-143.
[7]
Battelle J (2005) The search: how Google and its rivals rewrote the rules of business and transformed our culture, Portofolio Hardcover.
[8]
Belloti R, Decurtins C, Grossniklaus M, Norrie M, Palinginis A (2004) Modeling context for information environments, ubiquitous mobile information and collaboration systems. Lect Notes Comput Sci 3272:43-56.
[9]
Blei D, Jordan M (2003) Modeling annotated data. ACM SIGIR, pp 127-134.
[10]
Brewer E et al (2005) The case for technology in developing regions. IEEE Computer 38(6):25-38.
[11]
Bruno D, Denis L, Sharon O (2009) Multimodal interfaces: a survey of principles, models and frameworks, human machine interaction. Lect Notes Comput Sci 5440:3-26.
[12]
Bulterman D, Rutledge L (2004) SMIL 2.0: Interactive multimedia for web and mobile devices. Springer-Verlag, Heidelberg.
[13]
Bulterman D, Hardman L, Jansen J, Mullender K, Rutledge L (1998) GRiNS: A GRaphical interface for creating and playing SMIL documents. Comput Netw ISDN systems 10:519-529.
[14]
Chen L, Sycara K (1998) WebMate: personal agent for browsing and searching. Int. Conf. on Autonomous Agents, pp 132-139.
[15]
Chen H, Zheng NN, Liang L, Li Y, Xu YQ, Shum HY (2002) PicToon: a personalized image-based cartoon system, ACM Multimedia, pp 171-178.
[16]
Crystal D (1991) A dictionary of linguistics and phonetics. Blackwell, Oxford.
[17]
Deng J, Dong W, Socher R, Li J, Li K, Li FF (2009) ImageNet: a large-scale hierarchical image database. IEEE Conf. on Computer Vision and Pattern Recognition, pp 248-255.
[18]
Dimitrova N (2003) Multimedia content analysis: the next wave, Int. Conf. on Image and Video Retrieval, pp 415-420.
[19]
Dimitrova N, Zhang HJ, Shahraray B, Sezan I, Huang T, Zakhor A (2002) Applications of videocontent analysis and retrieval. IEEE Multimedia 9(3):42-55.
[20]
Dorai C, Farrell R, Katriel A, Kofman G, Li Y, Park Y (2006) BMAGICAL demonstration: system for automated metadata generation for instructional content. ACM Multimedia, pp 491-492.
[21]
eHealth Workshop 2010, http://research.microsoft.com/en-us/collaboration/global/asia-pacific/programs/ ehealth.aspx
[22]
Eynard D (2008) Using semantics and user participation to customize personalization, HP Laboratories Technical Report HPL-2008-197.
[23]
Fergus R, Perona P, Zissermann A (2003) Object class recognition by unsupervised scale invariant learning, IEEE Conf. on Computer Vision and Pattern Recognition, pp 264-271.
[24]
Foote JT (1997) Content-based retrieval of music and audio. SPIE Multimed Storage Archiving Syst II 3229:138-147.
[25]
Gevers T, Smeulders A (1999) Color based object recognition. Pattern Recogn 32:453-464.
[26]
Ghidini C, Giunchiglia F (2001) Local models, semantics, or contextual reasoning = locality + compatibility. Artif Intell 127(2):221-259.
[27]
Giunchiglia F, Serafini L (1994) Multilanguage hierarchical logics, or how can we do without modal logics. Artif Intell 65(1):29-70.
[28]
Guerts J, van OssenBruggen J, Hardman L (2001) Application-specific constraints for multimedia presentation generation. Int. Conf. on Multimedia Modelling, pp 247-266.
[29]
Guerts J, van OssenBruggen J, Hardman L, Rutledge L (2003) Towards a multimedia formatting vocabulary. Int. Conf. on WWW, pp 384-393.
[30]
Hanjalic A (2005) Adaptive extraction of highlights from a sport video based on excitement modeling. IEEE Trans Multimedia 7(6):1114-1122.
[31]
Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized TV. IEEE Signal Process Mag 23(2):90-100.
[32]
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143-154.
[33]
Hirsh H, Basu C, Davison B (2000) Learning to personalize. Commun ACM 43(8):102-106.
[34]
Hori T, Aizawa K (2003) Context-based video retrieval system for the Life Log applications. ACM Multimedia Information Retrieval Workshop, pp 31-38.
[35]
Hori T, Aizawa K (2004) Capturing life log and retrieval based on context. IEEE Conf. on Multimedia and Expo, pp 301-304.
[36]
http://www.oratrix.com/GRiNS/
[37]
Hua XS, Lu L, Zhang HJ (2004) P-Karaoke: personalized karaoke system, ACM Multimedia, pp 172-173.
[38]
Infomedia Project, http://www.informedia.cs.cmu.edu
[39]
Isbister K, Hook K, Sharp M, Laaksolahti J (2006) The sensual evaluation instrument: developing an affective evaluation tool. SIGCHI Conf. on Human Factors in Computing Systems, pp 1163-1172.
[40]
Jaimes A, Sebe N (2007) Multimodal human-computer interaction: a survey. Comput Vis Image Underst 108(1-2):116-134.
[41]
Jaimes A, Sebe N, Gatica-Perez D (2006) Human-centered computing: a multimedia perspective, ACM Multimedia, pp 855-864.
[42]
Jaimes A, Gatica-Perez D, Sebe N, Huang T (2007) Human-centered computing: toward a human revolution. IEEE Computer 40(5):30-34.
[43]
Jain R (2003) Folk computing. Communications ACM 46(4):27-29.
[44]
Jameson A (2001) Systems that adapt to their users. Tutorial presented at IJCAI 2001, www.dfki.de/~jameson
[45]
Jameson A (2001) User-adaptive and other smart adaptive systems: possible synergies. The First EUNITE Symposium, pp 13-14.
[46]
Kadlek T, Jelenik I (2008) Semantic user profile acquisition and sharing, Int. Conf. on Computer Systems and Technologies and Workshop for PhD students in Computing.
[47]
Kang HB (2002) Analysis of scene context related with emotional events. ACM Multimedia, pp 311-314.
[48]
Klemke R (2000) Context framework--an open approach to enhance organizational memory systems with context modeling techniques, Int. Conf. on Practical Aspects of Knowledge Management, pp 14-1-14-12.
[49]
Lang PJ (1993) The network model of emotion: motivational connections. In: Advances in social cognition. Lawrence Erlbaum Associates, Hillsdale, NJ, pp 109-133.
[50]
Lavrenko V, Feng S, Manmatha R (2003) Statistical models for automatic video annotation and retrieval. Int. Conf. on Acoustics, Speech and Signal Processing, pp 17-21.
[51]
Lee M, Wilks Y (1996) An ascription-based approach to speech acts, Int. Conf. on Computational Linguistics, pp 699-704.
[52]
Lew M, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state-of-theart and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1-19.
[53]
Li T, Mitsunori O (2003) Detecting emotion in music. Int. Conf. on Music Information Retrieval (ISMIR), pp 239-240.
[54]
Li X, Yan J, Fan WG, Liu N, Yan SC, Chen Z (2009) An online blog reading system by topic clustering and personalized ranking. ACM Trans. on Internet Technology 9(3) Article 9.
[55]
Liu D, Lu L, Zhang HJ (2003) Automatic mood detection from acoustic music data. Int. Conf. on Music Information Retrieval (ISMIR), pp 81-87.
[56]
Liu B, Gupta A, Jain R (2005) MedSMan: a streaming data management system over live multimedia, ACM Multimedia, pp 171-180.
[57]
Liu D, Hua G, Viola P, Chen T (2008) Integrated feature selection and higher-order spatial feature extraction for object categorization. IEEE Conf. on Computer Vision and Pattern Recognition, pp 1-8.
[58]
Lu L, Liu D, Zhang HJ (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio Lang Process 14(1):5-18.
[59]
Magnini B, Strapparava C (2004) User modeling for news web sites with word sense based techniques. User Model User-Adapt Interact 14(2-3):239-257.
[60]
Mann W, Matthiesen C, Thompson S (1989) Rhetorical structure theory and text analysis, technical report ISI/RR-89-242, November.
[61]
Marszalek M, Schmid C (2006) Spatial weighting for bag-of-features. IEEE Conf. on Computer Vision and Pattern Recognition, pp 2118-2125.
[62]
Maybury MT (1997) Intelligent multimedia information retrieval, AAAI/MIT Press.
[63]
McCarthy J (1987) Generality in artificial intelligence. Commun ACM 30(12):1030-1035.
[64]
Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psycho 14(4):261-292.
[65]
Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comp Vis 60:63-86.
[66]
Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. ACM Multimedia, pp 525-527.
[67]
MPEG--Moving Picture Expert Group, http://www.chiariglione.org/mpeg/
[68]
Naphade, Huang TS (2001) A probabilistic framework for semantic video indexing, filtering and reieval. IEEE Trans Multimedia 3(1):141-151.
[69]
Naphade MR, Huang TS (2002) Extracting semantics from audiovisual content: the final frontier in multimedia retrieval. IEEE Trans Neural Netw 13(4):793-810.
[70]
Naphade MR, Kristjansson T, Frey B, Huang TS (1998) Probabilistic multimedia objects (Multijects): a novel approach to video indexing and retrieval in multimedia systems. Int. Conf. on Image Processing, pp 536-540.
[71]
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree, IEEE Conf. on Computer Vision and Pattern Recognition, pp 2161-2168.
[72]
Oviatt S (2003) User-centered modeling and evaluation of multimodal interfaces. Proc IEEE 91 (9):1457-1468.
[73]
Parsons S, Sierra C, Jennings NR (1998) Agents that reason and negotiate by arguing. J Log Comput 8 (3):261-292.
[74]
Quiroga L (1999) Empirical evaluation of explicit vs implicit acquisition of user profiles in information filtering systems, ACM Conf. on Digital Libraries, pp 238-239.
[75]
Rauber A, Pampalk E, Merkl D (2003) The SOM-enhanced jukebox: organization and visualization of music collections based on perceptual models. J New Music Res JNMR 32(2):193-210.
[76]
Rigo S, Jose O (2008) Advanced in conceptual modeling--challenges and opportunities: ER 2008 Workshops CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM. Lect Notes Comput Sci 5232.
[77]
Roy D, Pentland A (2002) Learning words from sights and sounds: a computational model. Cogn Sci 26(1):113-146.
[78]
Russell J, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11:273-294.
[79]
Savarese S, Winn J, Criminisi A (2006) Discriminative object class models of appearance and shape by correlatons. IEEE Conf. on Computer Vision and Pattern Recognition, pp 2033-2040.
[80]
Schilit B, Adams N, Want R (1994) Context-aware computing applications. IEEE Workshop on Mobile Computing Systems and Applications, pp 85-90.
[81]
Schlosberg H (1954) Three dimensions of emotion. Psychol Rev 61(2):81-88.
[82]
Sebe N, Tian Q (2007) Personalized multimedia retrieval: the new trend? ACM Multimedia Information Retrieval Workshop, pp 299-306.
[83]
Zhang S, Huang Q, Jiang S, Gao W, Tian Q (2010) Affective visualization and retrieval for music video. IEEE Trans Multimedia, Special Issue on Multimodal Afftective Interaction 12 (6):510-522.
[84]
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos, Int. Conf. on Computer Vision, pp 1470-1477.
[85]
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349-1380.
[86]
Snoek CGM, Worring M, Geusebroek J, Koelma D, Seinstra F, Smeulders A (2006) The semantic pathfinder: using an authoring metaphor for generic multimedia indexing. IEEE Trans Patt Anal Mach Intell 28(10):1678-1689.
[87]
Song Y, Hua XS, Dai LR, Wang M (2005) Semi-automatic video annotation based on active learning with multiple complementary predictors. ACM Int. Workshop on Multimedia Information Retrieval, pp 97-104.
[88]
StreamSage, http://www.streamsage.com
[89]
Sullivan DO, Smyth B, Wilson DC, McDonald K, Smeaton A (2004) Improving the quality of the personalized electronic program guide. User Model User-Adapt Interact 14(1):5-36.
[90]
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958-1970.
[91]
Tseng BL, Lin CY, Smith JR (2004) Using MPEG-7 and MPEG-21 for personalizing video. IEEE Trans Multimedia 11(1):42-52.
[92]
Tsinaraki C, Christodoulakis S (2005) Semantic user preference descriptions in MPEG-7/21. The 4th Hellienic Data Managerment Symposium (HDMS).
[93]
Tsinaraki C, Christodoulakis S (2006) A multimedia user preference model that supports semantics and its application to MPEG 7/21. Int. Conf. on Multimedia Modelling, pp 35-42.
[94]
Tsinaraki C, Polydoros P, Kazasis F, Christodoulakis S (2005) Ontology-based semantic indexing for MPEG-7 and TV-anytime audiovisual content. Multimed Tools Appl 26(3):299-325.
[95]
Venkatesh S, Adams B, Phung D, Dorai C, Farrell RG, Agnihotri L, Dimitrova N (2008) "You Tube and I Find"-personalizing multimedia content access. Proc IEEE 96(4):697-711.
[96]
Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689-704.
[97]
Wang FS, Lu W, Liu J, Shah M, Xu D (2008) Automatic video annotation with adaptive number of key words, Int. Conf. on Pattern Recognition, pp 1-4.
[98]
Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. ACM Multimedia, pp 239-248.
[99]
Webb GI, Pazzani MJ, Billsus D (2001) Machine learning for user modeling. User Model User-Adapt Interact 11(1-2):19-29.
[100]
Wei G, Petrushin V, Gershman A (2002) From data to insight: the community of multimedia agents, Int. Workshop on Multimedia Data Mining.
[101]
Weitzman L, Wittenberg K (1994) Automatic presentation of multimedia documents using relational grammars. ACM Multimedia, pp 443-451.
[102]
Winn J, Criminisi A, Minka T (2005) Object categorization by learning universal visual word dictionary. Int. Conf. on Computer Vision, pp 1800-1807.
[103]
Wold E, Blum T, Kreislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. IEEE Multimedia 3(3):27-36.
[104]
Xu D, Chang SF (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985-1997.
[105]
Xu M, Chia LT, Jin J (2005) Affective content analysis in comedy and horror videos by audio emotional event detection. IEEE Int. Conf. on Multimedia and Expo, pp 622-625.
[106]
Yang L, Meer P, Foran DJ (2007) Multiple class segmentation using a unified framework over mean-shift patches. IEEE Conf. on Computer Vision and Pattern Recognition, pp 1-8.
[107]
Yu B, Ma WY, Nahrstedt K, Zhang HJ (2003) Video summarization based on user log enhanced link analysis. ACM Multimedia, pp 382-391.
[108]
Zeng ZH, Pantic M, Roisman GI, Huang T. A survey of affect recognition methods: audio, visual and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39-58.
[109]
Zhang S, Tian Q, Hua G, Huang Q, Li S (2009) Descriptive visual words and visual phrases for image applications. ACM Multimedia, pp 75-84.
[110]
Zhou M (1999) Visual planning: a practical approach to automated presentation design. Int. Joint Conference on Artificial Intelligence, pp 634-641.
[111]
Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimed Syst 8(6):536-544.
[112]
Zhou M, Houck K, Pan S, Shaw J, Aggarwal V, Wen Z (2006) Enabling context-sensitive information seeking, Int. Conf. on Intelligent User Interfaces, pp 116-123.
[113]
Zhou X, Zhuang XD, Yan SC, Chang SF, Johnson MH, Huang T (2008) SIFT-Bag kernel for video event analysis. ACM Multimedia, pp 229-238.
[114]
Von AL (2006) Games with a purpose. IEEE Computer 39(6):96-98.

Cited By

View all
  • (2022)Requirements and Concepts for Interactive Media Retrieval User InterfacesNordic Human-Computer Interaction Conference10.1145/3546155.3546701(1-10)Online publication date: 8-Oct-2022
  • (2022)The Model May Fit You: User-Generalized Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2021.309188824(2998-3012)Online publication date: 1-Jan-2022
  • (2022)Graph Jigsaw Learning for Cartoon Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2022.317795231(3961-3972)Online publication date: 1-Jan-2022
  • Show More Cited By
  1. Personalization in multimedia retrieval: A survey

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Multimedia Tools and Applications
    Multimedia Tools and Applications  Volume 51, Issue 1
    January 2011
    391 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 January 2011

    Author Tags

    1. Information Access
    2. Multimedia
    3. Personalization

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Requirements and Concepts for Interactive Media Retrieval User InterfacesNordic Human-Computer Interaction Conference10.1145/3546155.3546701(1-10)Online publication date: 8-Oct-2022
    • (2022)The Model May Fit You: User-Generalized Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2021.309188824(2998-3012)Online publication date: 1-Jan-2022
    • (2022)Graph Jigsaw Learning for Cartoon Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2022.317795231(3961-3972)Online publication date: 1-Jan-2022
    • (2019)Multimedia information retrieval in big data using OpenCV pythonProceedings of the 25th Brazillian Symposium on Multimedia and the Web10.1145/3323503.3345030(25-27)Online publication date: 29-Oct-2019
    • (2018)A new multimodal deep-learning model to video scene segmentationProceedings of the 24th Brazilian Symposium on Multimedia and the Web10.1145/3243082.3243108(205-212)Online publication date: 16-Oct-2018
    • (2015)Shot-HRProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695841(1257-1262)Online publication date: 13-Apr-2015
    • (2014)Hybrid video emotional tagging using users' EEG and video contentMultimedia Tools and Applications10.1007/s11042-013-1450-872:2(1257-1283)Online publication date: 1-Sep-2014
    • (2013)Video scene segmentation by improved visual shot coherenceProceedings of the 19th Brazilian symposium on Multimedia and the web10.1145/2526188.2526206(23-30)Online publication date: 5-Nov-2013
    • (2013)Exploiting content relevance and social relevance for personalized ad recommendation on internet TVACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2501643.25016489:4(1-23)Online publication date: 19-Aug-2013

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media