[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Towards large-scale multimedia retrieval enriched by knowledge about human interpretation

Published: 01 January 2016 Publication History

Abstract

Recent Large-Scale Multimedia Retrieval (LSMR) methods seem to heavily rely on analysing a large amount of data using high-performance machines. This paper aims to warn this research trend. We advocate that the above methods are useful only for recognising certain primitive meanings, knowledge about human interpretation is necessary to derive high-level meanings from primitive ones. We emphasise this by conducting a retrospective survey on machine-based methods which build classifiers based on features, and human-based methods which exploit user annotation and interaction. Our survey reveals that due to prioritising the generality and scalability for large-scale data, knowledge about human interpretation is left out by recent methods, while it was fully used in classical methods. Thus, we defend the importance of human-machine cooperation which incorporates the above knowledge into LSMR. In particular, we define its three future directions (cognition-based, ontology-based and adaptive learning) depending on types of knowledge, and suggest to explore each direction by considering its relation to the others.

References

[1]
Adams B, Dorai C, Venkatesh S (2000) Novel approach to determining tempo and dramatic story sections in motion pictures. In: Proceedings of ICIP 2000, pp 283---286
[2]
Alham NK, Li M, Liu Y, Hammoud S (2011) A Map Reduce-based distributed SVM algorithm for automatic image annotation. Comput Math Appl 62(7):2801---2811
[3]
Anderson ML, Oates T (2007) A review of recent research in metareasoning and metalearning. AI Mag 28(1):7---16
[4]
Ando R, Shinoda K, Furui S, Mochizuki T (2006) Robust scene recognition using language models for scene contexts. In: Proceedings of MIR 2006, pp 99---106
[5]
Arandjelovic R, Zisserman A (2013) All about VLAD. In: Proceedings of CVPR 2013, pp 1578---1585
[6]
Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: Proceedings of ECIR 2008, pp 187---198
[7]
Barrett S, Chang R, Qi X (2009) A fuzzy combined learning approach to content-based image retrieval. In: Proceedings of ICME 2009, pp 838---841
[8]
Barrington L, O'Malley D, Turnbull D, Lanckriet G (2009) User-centered design of a social game to tag music. In: Proceedings of HCOMP 2009, pp 7---10
[9]
Bay H, Tuytelaars T, Gool L (2006) SURF: speeded up robust features. In: Proceedings of ECCV 2006, pp 404---417
[10]
Bell M, Reeves S, Brown B, Sherwood S, MacMillan D, Ferguson J, Chalmers M (2009) EyeSpy: supporting navigation through play. In: Proceedings of CHI 2009, pp 123---132
[11]
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1---127
[12]
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of ICML 2009, pp 41---48
[13]
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798---1828
[14]
Bensusan H, Giraud-Carrier CG, Kennedy CJ (2000) A higher-order approach to meta-learning. In: Proceedings of ILP 2000
[15]
Bhatt C, Kankanhalli M (2011) Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51(1):35---76
[16]
Biswas A, Parikh D (2013) Simultaneous active learning of classifiers & attributes via relative feedback. In: Proceedings of CVPR 2013, pp 644---651
[17]
Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of MM 2013, pp 223---232
[18]
Catanzaro B, Sundaram N, Keutzer K (2008) Fast support vector machine training and classification on graphics processors. In: Proceedings of ICML 2008, pp 104---111
[19]
Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of ICCV 2013, pp 321---328
[20]
Chen N, Zhou Q-Y, Prasanna V (2012) Understanding web images by object relation network. In: Proceedings of WWW 2012, pp 291---300
[21]
Chen X, Shrivastava A, Gupta A (2013) NEIL: extracting visual knowledge from web data. In: Proceedings of ICCV 2013, pp 1409---1416
[22]
Chu C, et al. (2007) Map-Reduce for machine learning on multicore. In: Schölkopf B, Platt J, Hoffman T (eds) NIPS 19. Birkhaüser, Cambridge, pp 281---288
[23]
Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Proceedings of ECCV 2004 SLCV, pp 1---22
[24]
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):5:1---5:60
[25]
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of CVPR 2009, pp 248---255
[26]
Deng J, Berg A, Li FF (2011) Hierarchical semantic indexing for large scale image retrieval. In: Proceedings of CVPR 2011, pp 785---792
[27]
Denoeux T (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans Knowl Data Eng 25(1):119---130
[28]
Djordjevic D, Izquierdo E, Grzegorzek M (2007) User driven systems to bridge the semantic gap. In: Proceedings of EUSIPCO 2007, pp 718---722
[29]
Fan RE, Chen PH, Lin CJ (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889---1918
[30]
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: Proceedings of CVPR 2009, pp 1778---1785
[31]
Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
[32]
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627---1645
[33]
François A, Nevatia R, Hobbs J, Bolles R, Smith J (2005) VERL: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76---86
[34]
Frintrop S, Rome E, Christensen HI (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept 7:6:1---6:39
[35]
Gao T, Koller D (2011) Discriminative learning of relaxed hierarchy for large-scale visual recognition. In: Proceedings of ICCV 2011, pp 2072---2079
[36]
Gemmell D, Vin H, Kandlur D, Venkat Rangan P, Rowe L (1995) Multimedia storage servers: a tutorial. IEEE Comput 28(5):40---49
[37]
Guadarrama S, et al. (2013) YouTube2Text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: Proceedings of ICCV 2013, pp 2712---2719
[38]
Gupta M, Li R, Yin Z, Han J (2010) Survey on social tagging techniques. SIGKDD Explor 12(1):58---72
[39]
Hamzaoui A, Letessier P, Joly A, Buisson O, Boujemaa N (2014) Object-based visual query suggestion. Multimed Tools Appl 68(2):429---454
[40]
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318---1334
[41]
Horridge M, Knublauch H, Rector A, Stevens R, Wroe C (2004) A practical guide to building OWL ontologies with the protege-OWL plugin, 1st edn. University of Manchester. http://home.skku.edu/samoh/class/sw/ProtegeOWLTutorial.pdf
[42]
Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear svm. In: Proceedings of ICML 2008, pp 408---415
[43]
ImageNet Large Scale Visual Recognition Challenge (2012) (ILSVRC 2012). http://image-net.org/challenges/LSVRC/2012/index#workshop
[44]
Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimed 14(4):1196---1205
[45]
Izquierdo E, Chandramouli K, Grzegorzek M, Piatrik T (2007) K-space content management and retrieval system. In: Proceedings of ICIAPW 2007, pp 131---136
[46]
Jain AK, Vailaya A, Wei X (1999) Query by video clip. Multimed Syst 7(5):369---384
[47]
Jégou H, Perronnin F, Douze M, Sánchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704---1716
[48]
Jiang YG, Wang J, Chang SF, Ngo CW (2009) Domain adaptive semantic diffusion for large scale context-based video annotation. In: Proceedings of ICCV 2009, pp 1420---1427
[49]
Jiang YG, Yang J, Ngo CW, Hauptmann A (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans Multimed 12(1):42---53
[50]
Jiang YG, Bhattacharya S, Chang SF, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Inf Retr 2(2):73---101
[51]
Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proceedings of CVPR 2013, pp 923---930
[52]
Karsch K, Liu C, Kang S (2012) Depth extraction from video using non-parametric sampling. In: Proceedings of ECCV 2012, pp 775---788
[53]
Kashino K, Kurozumi T, Murase H (2003) A quick search method for audio and video signals based on histogram pruning. IEEE Trans Multimed 5(3):348---357
[54]
Kim YT, Chua TS (2005) Retrieval of news video using video sequence matching. In: Proceedings of MMM 2005, pp 68---75
[55]
Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of CHI 2008, pp 453---456
[56]
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Bartlett P, Pereira F, Burges C, Bottou L, Weinberger K (eds) NIPS 25, pp 1106---1114
[57]
Krüger N et al (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847---1871
[58]
Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel R, Culotta A (eds) NIPS 23, pp 1189---1197
[59]
Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of CVPR 2009, pp 951---958
[60]
Lan T, Raptis M, Sigal L, Mori G (2013) From subcategories to visual composites: a multi-level framework for object detection. In: Proceedings of ICCV 2013, pp 369---376
[61]
Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, Ng A (2012) Building high-level features using large scale unsupervised learning. In: Proceedings of ICML 2012
[62]
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1---19
[63]
Li X, Wang D, Li J, Zhang B (2007) Video search in concept subspace: a text-like paradigm. In: Proceedings of CIVR 2007, pp 603---610
[64]
Li J, Tian Y, Huang T, Gao W (2010) Probabilistic multi-task learning for visual saliency estimation in video. Int J Comput Vis 90(2):150---165
[65]
Lin CY, Tseng BL, Smith JR (2003) Video collaborative annotation forum: establishing ground-truth labels on large multimedia datasets. In: Proceedings of TRECVID 2003
[66]
Litayem S, Joly A, Boujemaa N (2012) Hash-based support vector machines approximation for large scale prediction. In: Proceedings of BMVC 2012, pp 86.1---86.11
[67]
Liu X, Zhuang Y, Pan Y (1999) A new approach to retrieve video by example video clip. In: Proceedings of MM 1999, pp 41---44
[68]
Liu Y, Zhang D, Lu G, Ma W (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recognit 40(1):262---282
[69]
Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of ICCV 1999, pp 1150---1157
[70]
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of CVPR 2013, pp 2714---2721
[71]
Ma Z, Yang Y, Xu Z, Sebe N, Hauptmann AG (2013) We are not equally negative: fine-grained labeling for multimedia event detection. In: Proceedings of MM 2013, pp 293---302
[72]
Maji S, Shakhnarovich G (2014) Part and attribute discovery from relative annotations. Int J Comput Vis 108(1---2):82---96
[73]
Maji S, Berg A, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: Proceedings of CVPR 2008, pp 1---8
[74]
Marszalek M, Schmid C (2007) Semantic hierarchies for visual object recognition. In: Proceedings of CVPR 2007, pp 1---7
[75]
Mazloom M, Habibian A, Snoek CG (2013) Querying for video events by semantic signatures from few examples. In: Proceedings of MM 2013, pp 609---612
[76]
Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimed 14(1):88---101
[77]
Monaco J (1981) How to read a film. Oxford University Press, Oxford
[78]
Nam J, Alghoniemy M, Tewfik A (1998) Audio-visual content-based violent scene characterization. In: Proceedings of ICIP 98, pp 353---357
[79]
Naphade MR, Smith JR (2004) On the detection of semantic concepts at TRECVID. In: Proceedings of MM 2004, pp 660---667
[80]
Naphade M, Smith J, Tesic J, Chang SF, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86---91
[81]
Natsev AP, Naphade MR, Tešić J (2005) Learning the semantics of multimedia queries and concepts from a small number of examples. In: Proceedings of MM 2005, pp 598---607
[82]
Natsev AP, Haubold A, Tešić J, Xie L, Yan R (2007) Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of MM 2007, pp 991---1000
[83]
Ngo C, et al. (2009) VIREO/DVM at TRECVID 2009: high-level feature extraction, automatic video search and content-based copy detection. In: Proceedings of TRECVID 2009, pp 415---432
[84]
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of ECCV 2006, pp 490---503
[85]
Ogiela M, Tadeusiewicz R (2010) Towards new classes of cognitive vision systems. In: Proceedings of CISIS 2010, pp 851---855
[86]
Oh J, Bandi B (2002) Multimedia data mining framework for raw video sequences. In: Proceedings MDM/KDD 2002, pp 23---26
[87]
Oomoto E, Tanaka K (1993) OVID: design and implementation of a video-object database system. IEEE Trans Knowl Data Eng 5(4):629---643
[88]
Pan JY, Faloutsos C (2001) VideoGraph: a new tool for video mining and classification. In: Proceedings of JCDL 2001, pp 116---117
[89]
Parkash A, Parikh D (2012) Attributes for classifier feedback. In: Proceedings of ECCV 2012, pp 354---368
[90]
PASCAL Visual Object Classes. http://pascallin.ecs.soton.ac.uk/challenges/VOC/
[91]
Pattanasri N, Chatvichienchai S, Tanaka K (2005) Towards a unified framework for context-preserving video retrieval and summarization. In: Proceedings of ICADL 2005, pp 119---128
[92]
Peng Y, Ngo CW (2005) EMD-based video clip retrieval by many-to-many matching. In: Proceedings of CIVR 2005, pp 71---81
[93]
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of CVPR 2007, pp 1---8
[94]
Petkovic M, Jonker W (2002) Content-based video retrieval: a database perspective. Kluwer Academic Publishers, Norwell
[95]
Quinn AJ, Bederson BB (2011) Human computation: a survey and taxonomy of a growing field. In: Proceedings of CHI 2011, pp 1403---1412
[96]
Rasiwasia N, Moreno P, Vasconcelos N (2007) Bridging the gap: query by semantic example. IEEE Trans Multimed 9(5):923---938
[97]
Ren X, Bo L, Fox D (2012) RGB-(D) scene labeling: features and algorithms. In: Proceedings of CVPR 2012, pp 2759---2766
[98]
Rui Y, Huang T, Ortega M, Mehrotra S (1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5):644---655
[99]
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1-3):157---173
[100]
Saxena A, Sun M, Ng AY (2009) Make3D: learning 3D scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824---840
[101]
Scherp A, Mezaris V (2014) Survey on modeling and indexing events in multimedia. Multimed Tools Appl 70(1):7---23
[102]
Schmid C, Mohr R (1997) Local grayvalue invariants for image retrieval. IEEE Trans Pattern Anal Mach Intell 19(5):530---535
[103]
Schoeffmann K, et al. (2014) The video browser showdown: a live evaluation of interactive video search tools. Int J Multimed Inf Retr 3(2):113---127
[104]
Shirahama K, Uehara K (2008) A novel topic extraction method based on bursts in video streams. Int J Hybrid Inf Technol 1(3):21---32
[105]
Shirahama K, Uehara K (2012) Kobe university and Muroran institute of technology at TRECVID 2012 semantic indexing task. In: Proceedings of TRECVID 2012, pp 239---247
[106]
Shirahama K, Ideno K, Uehara K (2007) A time-constrained sequential pattern mining for extracting semantic events in videos. In: Petrushin V, Khan L (eds) Multimedia data mining and knowledge discovery. Springer, London, pp 404---426
[107]
Shirahama K, Matsuoka Y, Uehara K (2012) Event retrieval in video archives using rough set theory and partially supervised learning. Multimed Tools Appl 57(1):145---173
[108]
Shirahama K, Kumabuchi K, Uehara K (2013) Video retrieval by learning uncertainties in concept detection from imbalanced annotation data. In: Proceedings of MMEDIA 2013, pp 19---24
[109]
Shirahama K, Grzegorzek M, Uehara K (2014) Multimedia event detection using hidden conditional random fields. In: Proceedings of ICMR 2014, pp 9:9---9:16
[110]
Shirahama K, Kumabuchi K, Grzegorzek M, Uehara K (2014) Video retrieval based on uncertain concept detection using dempster-shafer theory. In: Baughman AK, Gao J, Pan JY, Petrushin V (eds) Multimedia data mining and analytics: disruptive innovation. Springer, London
[111]
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of MIR 2006, pp 321---330
[112]
Smeaton AF, Wilkins P, Worring M, de Rooij O, Chua TS, Luan H (2008) Content-based video retrieval: three example systems from TRECVid. Int J Imaging Syst Technol 18 (2---3):195---201
[113]
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349---1380
[114]
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215---322
[115]
Snoek CGM, Worring M, Geusebroek JM, Koelma D, Seinstra F (2005) On the surplus value of semantic video analysis beyond the key frame. In: Proceedings of ICME 2005, pp 386---389
[116]
Snoek C, et al. (2009) The MediaMill TRECVID 2009 semantic video search engine. In: Proceedings of TRECVID 2009, pp 226---238
[117]
Staab S, Scherp A, Arndt R, Troncy R, Grzegorzek M, Saathoff C, Schenk S, Hardman L (2008) Semantic multimedia. In: Baroglio C, Bonatti PA, Maluszynski J, Marchiori M, Polleres A, Schaffert S (eds) Reasoning web, chap 4. Springer LNCS 5224, San Servolo, pp 125---170
[118]
Steggink J, Snoek C (2011) Adding semantics to image-region annotations with the name-it-game. Multimed Syst 17(5):367---378
[119]
Sugano Y, Matsushita Y, Sato Y (2013) Graph-based joint clustering of fixations and visual entities. ACM Trans Appl Percept 10(2):10:1---10:16
[120]
Sun C, Nevatia R (2013) ACTIVE: activity concept transitions in video event classification. In: Proceedings of ICCV 2013, pp 913---920
[121]
Tadeusiewicz R (2007) Intelligent web mining for semantically adequate images. In: Proceedings of AWIC 2007, pp 3---10
[122]
Tadeusiewicz R (2007) What does it means automatic understanding of the images?. In: Proceedings of IST 2007, pp 1---3
[123]
Tanaka K, Ariki Y, Uehara K (1999) Organization and retrieval of video data (special issue on new generation database technologies). IEICE Trans Inf Syst 82(1):34---44
[124]
Tang K, Fei-Fei L, Koller D (2012) Learning latent temporal structure for complex event detection. In: Proceedings of CVPR 2012, pp 1250---1257
[125]
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088---1099
[126]
Tešic¿ J, Natsev AP, Smith JR (2007) Cluster-based data modeling for semantic video search. In: Proceedings of CIVR 2007, pp 595---602
[127]
Thagard P (2007) Cognitive science. Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/archives/fall2008/entries/cognitive-science/
[128]
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of MM 2001, pp 107---118
[129]
Torralba A., Fergus R., Freeman W. (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958---1970
[130]
Uehara K, Oe M, Maehara K (1996) Knowledge representation, concept acquisition and retrieval of video data. In: Proceedings of CODAS 1996, pp 527---534
[131]
Vahdat A, Cannons K, Mori G, Oh S, Kim I (2013) Compositional models for video event detection: a multiple kernel learning latent variable approach. In: Proceedings of ICCV 2013, pp 1185---1192
[132]
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582---1596
[133]
van de Sande KEA, Gevers T, Snoek CGM (2011) Empowering visual categorization with the GPU. IEEE Trans Multimed 13(1):60---70
[134]
Vapnik V (1998) Statistical learning theory. Wiley-Interscience
[135]
Volkmer T, Smith JR, Natsev AP (2005) A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. In: Proceedings of MM 2005, pp 892---901
[136]
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of CHI 2004, pp 319---326
[137]
von Ahn L, Dabbish L (2008) Designing games with a purpose. Commun ACM 51(8):58---67
[138]
von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: Proceedings of CHI 2006, pp 55---64
[139]
Wang M, Hua XS (2011) Active learning in multimedia annotation and retrieval: a survey. ACM Trans Intell Syst Technol 2(2):10:1---10:21
[140]
Wang XJ, Zhang L, Liu M, Li Y, Ma WY (2010) ARISTA--image search to annotation on billions of web photos. In: Proceedings of CVPR 2010, pp 2987---2994
[141]
Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of CVPR 2011, pp 3169---3176
[142]
Wei XY, Jiang YG, Ngo CW (2011) Concept-driven multi-modality fusion for video search. IEEE Trans Circuits Syst Video Technol 21(1):62---73
[143]
Weiss R, Duda A, Gifford D (1994) Content-based access to algebraic video. In: Proceedings of ICMCS 1994, pp 140---151
[144]
Westermann U, Jain R (2007) Toward a common event model for multimedia applications. IEEE Multimed 14(1):19---29
[145]
Wilkins P, et al. (2007) K-space at TRECVid 2007. In: Proceedings of TRECVID 2007
[146]
Woelk D, Kim W, Luther W (1986) An object-oriented approach to multimedia databases. In: Proceedings of SIGMOD 1986, pp 311---325
[147]
Wu Y, Zhang A (2003) An adaptive classification method for multimedia retrieval. In: Proceedings of ICME 2003, pp 757---760
[148]
Wu Y, Zhang A (2003) Adaptive pattern discovery for interactive multimedia retrieval. In: Proceedings of CVPR 2003, pp 649---655
[149]
Wu Y, Zhang A (2004) PatternQuest: learning patterns of interest using relevance feedback in multimedia information retrieval. In: Proceedings of ICME 2004, pp 261---264
[150]
Yan R, Fleury MO, Merler M, Natsev A, Smith JR (2009) Large-scale multimedia semantic concept modeling using robust subspace bagging and mapreduce. In: Proceedings LS-MMRM 2009, pp 35---42
[151]
Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of CVPR 2009, pp 1794---1801
[152]
Yap KH, Wu K (2003) Fuzzy relevance feedback in content-based image retrieval. In: Proceedings of ICICS-PCM 2003, pp 1595---1599
[153]
Yi J, Peng Y, Xiao J (2013) Exploiting semantic and visual context for effective video annotation. IEEE Trans Multimed 15(6):1400---1414
[154]
Yoshitaka A, Ishii T, Hirakawa M, Ichikawa T (1997) Content-based retrieval of video data by the grammar of film. In: Proceedings of VL 1997, pp 310---317
[155]
Yu K, Zhang T, Gong Y (2009) Nonlinear learning using local coordinate coding. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) NIPS 22, pp 2223---2231
[156]
Yuan J, Tian Q, Ranganath S (2004) Fast and robust search method for short video clips from large video collection. In: Proceedings of ICPR 2004, pp 866---869
[157]
Yuan J, Wu Y, Yang M (2007) Discovery of collocation patterns: from visual words to visual phrases. In: Proceedings of CVPR 2007, pp 1---8
[158]
Zettsu K, Uehara K, Tanaka K, Kimura N (1997) A time-stamped authoring graph for video databases. In: Proceedings of DEXA 1997, pp 192---201
[159]
Zha ZJ, Yang L, Mei T, Wang M, Wang Z, Chua TS, Hua XS (2010) Visual query suggestion: towards capturing user intent in internet image search. ACM Trans Multimed Comput Commun Appl 6(3):13:1---13:19
[160]
Zhai Y, Rasheed Z, Shah M (2004) A framework for semantic classification of scenes using finite state machines. In: Proceedings of CIVR 2004, pp 279---288
[161]
Zhai Y, Yilmaz A, Shah M (2005) Story segmentation in news videos using visual and text cues. In: Proceedings of CIVR 2005, pp 92---102
[162]
Zhang H, Gong Y, Smoliar S, Yeo Tan S (1994) Automatic parsing of news video. In: Proceedings of ICMCS 1994, pp 45---54
[163]
Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213---238
[164]
Zhong D, Chang SF (2001) Structure analysis of sports video using domain models. In: Proceedings of ICME 2001, pp 713---716
[165]
Zhou XS, Huang TS (2003) Relevance feedback in image retrieval: a comprehensive review. Multimed Syst 8(6):536---544
[166]
Zhou H, Kimber D (2006) Unusual event detection via multi-camera video mining. In: Proceedings ICPR 2006, pp 1161---1166
[167]
Zhu X, Wu X, Elmagarmid AK, Feng Z, Wu L (2005) Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665---677
[168]
Zhu S, Wei XY, Ngo CW (2013) Error recovered hierarchical classification. In: Proceedings of MM 2013, pp 697---700
[169]
Zwol RV, Garcia L, Ramirez G, Sigurbjornsson B, Labad M (2008) Video tag game. In: Proceedings of WWW 2008

Cited By

View all
  • (2021)Pattern analysis based acoustic signal processing: a survey of the state-of-artInternational Journal of Speech Technology10.1007/s10772-020-09681-324:4(913-955)Online publication date: 1-Dec-2021
  • (2018)CNN-RNNMultimedia Tools and Applications10.1007/s11042-017-5443-x77:8(10251-10271)Online publication date: 1-Apr-2018
  • (2016)Improving object classification robustness in RGB-D using adaptive SVMsMultimedia Tools and Applications10.1007/s11042-015-2612-775:12(6829-6847)Online publication date: 1-Jun-2016
  1. Towards large-scale multimedia retrieval enriched by knowledge about human interpretation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Multimedia Tools and Applications
    Multimedia Tools and Applications  Volume 75, Issue 1
    January 2016
    667 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 January 2016

    Author Tags

    1. Human-based methods
    2. Human-machine cooperation
    3. Large-scale multimedia retrieval
    4. Machine-based methods

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Pattern analysis based acoustic signal processing: a survey of the state-of-artInternational Journal of Speech Technology10.1007/s10772-020-09681-324:4(913-955)Online publication date: 1-Dec-2021
    • (2018)CNN-RNNMultimedia Tools and Applications10.1007/s11042-017-5443-x77:8(10251-10271)Online publication date: 1-Apr-2018
    • (2016)Improving object classification robustness in RGB-D using adaptive SVMsMultimedia Tools and Applications10.1007/s11042-015-2612-775:12(6829-6847)Online publication date: 1-Jun-2016

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media