Abstract
Clickthrough data is a source of information that can be used for automatically building concept detectors for image retrieval. Previous studies, however, have shown that in many cases the resulting training sets suffer from severe label noise that has a significant impact in the SVM concept detector performance. This paper evaluates and proposes a set of strategies for automatically building effective concept detectors from clickthrough data. These strategies focus on: (1) automatic training set generation; (2) assignment of label confidence weights to the training samples and (3) using these weights at the classifier level to improve concept detector effectiveness. For training set selection and in order to assign weights to individual training samples three Information Retrieval (IR) models are examined: vector space models, BM25 and language models. Three SVM variants that take into account importance at the classifier level are evaluated and compared to the standard SVM: the Fuzzy SVM, the Power SVM, and the Bilateral-weighted Fuzzy SVM. Experiments conducted on the MM Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) for 40 concepts demonstrate that (1) on average, all weighted SVM variants are more effective than the standard SVM; (2) the vector space model produces the best training sets and best weights; (3) the Bilateral-weighted Fuzzy SVM produces the best results but is very sensitive to weight assignment and (4) the Fuzzy SVM is the most robust training approach for varying levels of label noise.
Similar content being viewed by others
Notes
References
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167. doi:10.1023/A:1009715923555
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27. doi:10.1145/1961189.1961199
Chapelle O, Zhang Y (2009) A dynamic bayesian network click model for web search ranking. In: Proceedings of the 18th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’09, pp 1–10. doi:10.1145/1526709.1526711
Craswell N, Szummer M (2007) Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’07, pp 239–246. doi:10.1145/1277741.1277784
Dupret G, Liao C (2010) A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ’10, pp 181–190. doi:10.1145/1718487.1718510
Fang Q, Xu H, Wang R, Qian S, Wang T, Sang J, Xu C (2013) Towards MSR-Bing challenge: ensemble of diverse models for image retrieval. http://research.microsoft.com/en-us/events/irc2013/paper_irc_nlpr-mmc.pdf. Accessed 15 Aug 2014
Hiemstra D (1998) A linguistically motivated probabilistic model of information retrieval. In: Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, vol 1513, pp 569–584. Springer, Berlin Heidelberg. doi:10.1007/3-540-49653-X_34
Hsu CC, Han MF, Chang SH, Chung HY (2009) Fuzzy support vector machines with the uncertainty of parameter C. Expert Systems Appl 36(3, Part 2):6654–6658. doi:10.1016/j.eswa.2008.08.032
Hua XS, Yang L, Wang J, Wang J, Ye M, Wang K, Rui Y, Li J (2013) Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 243–252. doi:10.1145/2502081.2502283
Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proceedings of International Joint Conference on Neural Networks, 2001. IJCNN ’01., vol 2, pp 1449–1454. doi:10.1109/IJCNN.2001.939575
Jain V, Varma M (2011) Learning to re-rank: query-dependent image re-ranking using click data. In: Proceedings of the 20th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’11, pp 277–286. doi:10.1145/1963405.1963447. http://doi.acm.org/10.1145/1963405.1963447
Jilani T, Burney S (2008) Multiclass bilateral-weighted fuzzy support vector machine to evaluate financial strength credit rating. In: Proceedings of International Conference on Computer Science and Information Technology, 2008. ICCSIT ’08, pp 342–348. doi:10.1109/ICCSIT.2008.191
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471. doi:10.1109/72.991432
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press
Min R, Cheng HD (2009) Effective image retrieval using dominant color descriptor and fuzzy support vector machine. Pattern Recogn 42(1):147–157. doi:10.1016/j.patcog.2008.07.001
Pan Y, Yao T, Yang K, Li H, Ngo CW, Wang J, Mei T (2013) Image search by graph-based label propagation with image representation from DNN. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 397–400. doi:10.1145/2502081.2508128
Pan Y, Yao T, Mei T, Li H, Ngo CW, Rui Y (2014) Click-through-based cross-view learning for image search. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’14, pp 717–726. doi:10.1145/2600428.2609568. http://doi.acm.org/10.1145/2600428.2609568
Radlinski F, Joachims T (2005) Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, ACM, New York, NY, USA, KDD ’05, pp 239–248. doi:10.1145/1081870.1081899
Rao Y, Mundur P, Yesha Y (2006) Fuzzy SVM ensembles for relevance feedback in image retrieval. In: Proceedings of the 5th International Conference on Image and Video Retrieval, Springer, Berlin, Heidelberg, CIVR’06, pp 350–359. doi:10.1007/11788034_36
Robertson S, Zaragoza H (2009) The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conf erence on Research and Development in Information Retreival, Springer, New York Inc, New York, NY, USA, SIGIR ’94, pp 232–241
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220
Van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596. doi:10.1109/TPAMI.2009.154
Sarafis I, Diou C, Delopoulos A (2014a) Building robust concept detectors from clickthrough data: a study in the msr-bing dataset. In: Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop, pp 66–71. doi:10.1109/SMAP.2014.22
Sarafis I, Diou C, Tsikrika T, Delopoulos A (2014) Weighted SVM from clickthrough data for image retrieval. In: IEEE International Conference on Image Process 2014 (ICIP 2014). France, Paris, pp 3051–3055
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322. doi:10.1561/1500000014
Sohail A, Bhattacharya P, Mudur S, Krishnamurthy S (2011) Classification of ultrasound medical images using distance based feature selection and fuzzy-SVM. In: Pattern Recognit and Image Anal, Lecture Notes in Computer Science, vol 6669, pp 176–183. Springer, Berlin Heidelberg. doi:10.1007/978-3-642-21257-4_22
Sun Z, Ruan D, Ma Y, Hu X, Zhang Xg (2009) Crack defects detection in radiographic weldment images using FSVM and beamlet transform. In: Proceedings of the 6th International Conference on Fuzzy Systems and Knowl Discoverey, vol 3, IEEE Press, Piscataway, NJ, USA, FSKD’09, pp 402–406
Tsikrika T, Diou C (2014) Multi-evidence user group discovery in professional image search. In: de Rijke M, Kenter T, de Vries A, Zhai C, de Jong F, Radinsky K, Hofmann K (eds) Advances in Information Retrieval, Lecture Notes in Computer Science, vol 8416, pp 693–699. Springer, Berlin. doi:10.1007/978-3-319-06028-6_78. http://dx.doi.org/10.1007/978-3-319-06028-6_78
Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Are clickthrough data reliable as image annotations? In: Proceedings of the Theseus/ImageCLEF workshop on visual information retrieval. Fraunhofer Verlag, Corfu
Tsikrika T, Diou C, de Vries A, Delopoulos A (2011) Reliability and effectiveness of clickthrough data for automatic image annotation. Multimed Tools Appl 55(1):27–52. doi:10.1007/s11042-010-0584-1
Wang L, Cen S, Bai H, Huang C, Zhao N, Liu B, Feng Y, Dong Y (2013) France telecom orange labs (beijing) at MSR-Bing challenge on image retrieval 2013. http://www.research.microsoft.com/en-us/events/irc2013/paper_irc_orange.pdf Accessed 15 Aug 2014
Wang Y, Wang S, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. Trans Fuzzy Syst 13(6):820–831. doi:10.1109/TFUZZ.2005.859320
Wu CC, Chu KY, Kuo YH, Chen YY, Lee WY, Hsu WH (2013) Search-based relevance association with auxiliary contextual cues. In: Proceedings of the 21st ACM International Conference on Multimedia, ACM, New York, NY, USA, MM ’13, pp 393–396. doi:10.1145/2502081.2508127
Wu K, Yap KH (2006) Fuzzy SVM for content-based image retrieval: a pseudo-label support vector machine framework. Comp Intell Mag 1(2):10–16. doi:10.1109/MCI.2006.1626490
Gm Xian (2010) An identification method of malignant and benign liver tumors from ultrasonography based on GLCM texture features and fuzzy SVM. Expert Syst Appl 37(10):6737–6741. doi:10.1016/j.eswa.2010.02.067
Yang X, Zhang Y, Yao T, Ngo CW, Mei T (2014) Click-boosting multi-modality graph-based reranking for image search. Multimed Syst 1–11. doi:10.1007/s00530-014-0379-8
Yu SX (2012) Power SVM: generalization with exemplar classification uncertainty. In: Proceedings of the 2012 IEEE Conference on Comput Visual and Pattern Recognition (CVPR), IEEE Computer Society, Washington, DC, USA, CVPR ’12, pp 2144–2151
Zhang Y, Yang X, Mei T (2014) Image search reranking with query-dependent click-based relevance feedback. Image Process IEEE Trans 23(10):4448–4459. doi:10.1109/TIP.2014.2346991
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sarafis, I., Diou, C. & Delopoulos, A. Building effective SVM concept detectors from clickthrough data for large-scale image retrieval. Int J Multimed Info Retr 4, 129–142 (2015). https://doi.org/10.1007/s13735-015-0080-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-015-0080-5