[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

MapReduce-based clustering for near-duplicate image identification

Published: 01 November 2017 Publication History

Abstract

In this paper, an effective algorithm is developed for tackling the problem of near-duplicate image identification from large-scale image sets, where the LLC (locality-constrained linear coding) method is seamlessly integrated with the maxIDF cut model to achieve more discriminative representations of images. By incorporating MapReduce framework for image clustering and pairwise merging, the near duplicates of images can be identified effectively from large-scale image sets. An intuitive strategy is also introduced to guide the process for parameter selection. Our experimental results on large-scale image sets have revealed that our algorithm can achieve significant improvement on both the accuracy rates and the computation efficiency as compared with other baseline methods.

References

[1]
Bayardo RJ, Ma Y, Srikant R (2007) Scaling up all pairs similarity search. In: Proceedings of the 16th international conference on World Wide Web, pp. 131---140. ACM
[2]
Broder AZ (1997) On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997. Proceedings, pp. 21---29. IEEE
[3]
Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Computer Networks and Isdn Systems 29(8-13):1157---1166
[4]
Cherian A, Morellas V, Papanikolopoulos N (2012) Robust sparse hashing. In: Proceedings / ICIP... International Conference on Image Processing, pp. 2417---2420
[5]
Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: Finding a (thick) needle in a haystack. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 17---24. IEEE
[6]
Chum O, Philbin J, Zisserman A, et al. (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: BMVC, vol. 810, pp. 812---815
[7]
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry, pp. 253---262. ACM
[8]
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107---113
[9]
Dong W, Wang Z, Charikar M, Li K (2012) High-confidence near-duplicate image detection. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
[10]
Elsayed T, Lin J, Oard DW (2008) Pairwise document similarity in large collections with mapreduce. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 265---268. Association for Computational Linguistics
[11]
Foo JJ, Zobel J, Sinha R (2007) Clustering near-duplicate images in large collections. In: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 21---30
[12]
Hama H, Zin TT, Tin P (2009) A hybrid ranking of link and popularity for novel search engine. International Journal of Innovative Computing. Inf Control 5 (11):4041---4049
[13]
Hsieh LC, Wu GL, Hsu YM, Hsu W (2014) Online image search result grouping with mapreduce-based image clustering and graph construction for large-scale photos. J Vis Commun Image Represent 25(2):384---395
[14]
Hsieh LC, Wu GL, Lee WY, Hsu W (2012) Two-stage sparse graph construction using minhash on mapreduce. In: IEEE International Conference on Acoustics, pp. 1013---1016
[15]
Kim S, Wang XJ, Zhang L, Choi S (2015) Near duplicate image discovery on one billion images. In: 2015 IEEE Winter Conference on, Applications of Computer Vision (WACV), pp. 943---950
[16]
Lee DC, Ke Q, Isard M (2010) Partition min-hash for partial duplicate image discovery. In: European Conference on Computer Vision, pp. 648---662. Springer
[17]
Liu T, Rosenberg C, Rowley H, et al. (2007) Clustering billions of images with large scale nearest neighbor search. In: Applications of Computer Vision, 2007. WACV'07. IEEE Workshop on, pp. 28---28. IEEE
[18]
Peng J, Shen Y, Fan J (2013) Cross-modal social image clustering and tag cleansing. J Vis Commun Image Represent 24(7):895---910
[19]
Salakhutdinov R, Hinton GE (2007) Learning a nonlinear embedding by preserving class neighbourhood structure. In: International Conference on Artificial Intelligence and Statistics, pp. 412---419
[20]
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pp. 1470---1477. IEEE
[21]
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2015) The new data and new challenges in multimedia research. arXiv preprint. arXiv:1503.01817
[22]
Vonikakis V, Jinda-Apiraksa A, Winkler S (2014) Photocluster: A multi-clustering technique for near-duplicate detection in personal photo collections. In: Computer Vision Theory and Applications (VISAPP), 2014 International Conference on, pp. 153---161
[23]
Wang H, Zhu F, Xiao B, Wang L, Jiang YG (2014) Gpu-based mapreduce for large-scale near-duplicate video retrieval. Multimedia Tools & Applications 74(23):10,515---10,534
[24]
Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. Pattern Analysis and Machine Intelligence. Tran IEEE 34(12):2393---2406
[25]
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3360---3367. IEEE
[26]
Wang XJ, Zhang L, Liu C (2013) Duplicate discovery on 2 billion internet images. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 429---436
[27]
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp. 1753---1760
[28]
Xie L, Tian Q, Zhou W, Zhang B (2014) Fast and accurate near-duplicate image search with affinity propagation on the imageweb. Comput Vis Image Underst 124:31---41
[29]
Yang C, Peng J, Fan J (2012) Image collection summarization via dictionary learning for sparse representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1122--- 1129
[30]
Zheng L, Wang S, Liu Z, Tian Q (2013) Lp-norm idf for large scale image search. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 1626---1633. IEEE

Cited By

View all

Index Terms

  1. MapReduce-based clustering for near-duplicate image identification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Multimedia Tools and Applications
        Multimedia Tools and Applications  Volume 76, Issue 22
        Nov 2017
        1415 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 November 2017

        Author Tags

        1. Image clustering
        2. Large-scale photos
        3. MapReduce
        4. Near-duplicate identification
        5. Representative image

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 07 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media