More Web Proxy on the site http://driver.im/

article

MapReduce-based clustering for near-duplicate image identification

Authors:

Jianping FanAuthors Info & Claims

Multimedia Tools and Applications, Volume 76, Issue 22

Pages 23291 - 23307

https://doi.org/10.1007/s11042-016-4060-4

Published: 01 November 2017 Publication History

Abstract

In this paper, an effective algorithm is developed for tackling the problem of near-duplicate image identification from large-scale image sets, where the LLC (locality-constrained linear coding) method is seamlessly integrated with the maxIDF cut model to achieve more discriminative representations of images. By incorporating MapReduce framework for image clustering and pairwise merging, the near duplicates of images can be identified effectively from large-scale image sets. An intuitive strategy is also introduced to guide the process for parameter selection. Our experimental results on large-scale image sets have revealed that our algorithm can achieve significant improvement on both the accuracy rates and the computation efficiency as compared with other baseline methods.

References

[1]

Bayardo RJ, Ma Y, Srikant R (2007) Scaling up all pairs similarity search. In: Proceedings of the 16th international conference on World Wide Web, pp. 131---140. ACM

Digital Library

[2]

Broder AZ (1997) On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997. Proceedings, pp. 21---29. IEEE

Digital Library

[3]

Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Computer Networks and Isdn Systems 29(8-13):1157---1166

Digital Library

[4]

Cherian A, Morellas V, Papanikolopoulos N (2012) Robust sparse hashing. In: Proceedings / ICIP... International Conference on Image Processing, pp. 2417---2420

[5]

Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: Finding a (thick) needle in a haystack. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 17---24. IEEE

[6]

Chum O, Philbin J, Zisserman A, et al. (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: BMVC, vol. 810, pp. 812---815

[7]

Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry, pp. 253---262. ACM

Digital Library

[8]

Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107---113

Digital Library

[9]

Dong W, Wang Z, Charikar M, Li K (2012) High-confidence near-duplicate image detection. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Digital Library

[10]

Elsayed T, Lin J, Oard DW (2008) Pairwise document similarity in large collections with mapreduce. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 265---268. Association for Computational Linguistics

Digital Library

[11]

Foo JJ, Zobel J, Sinha R (2007) Clustering near-duplicate images in large collections. In: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 21---30

Digital Library

[12]

Hama H, Zin TT, Tin P (2009) A hybrid ranking of link and popularity for novel search engine. International Journal of Innovative Computing. Inf Control 5 (11):4041---4049

[13]

Hsieh LC, Wu GL, Hsu YM, Hsu W (2014) Online image search result grouping with mapreduce-based image clustering and graph construction for large-scale photos. J Vis Commun Image Represent 25(2):384---395

Digital Library

[14]

Hsieh LC, Wu GL, Lee WY, Hsu W (2012) Two-stage sparse graph construction using minhash on mapreduce. In: IEEE International Conference on Acoustics, pp. 1013---1016

[15]

Kim S, Wang XJ, Zhang L, Choi S (2015) Near duplicate image discovery on one billion images. In: 2015 IEEE Winter Conference on, Applications of Computer Vision (WACV), pp. 943---950

Digital Library

[16]

Lee DC, Ke Q, Isard M (2010) Partition min-hash for partial duplicate image discovery. In: European Conference on Computer Vision, pp. 648---662. Springer

Digital Library

[17]

Liu T, Rosenberg C, Rowley H, et al. (2007) Clustering billions of images with large scale nearest neighbor search. In: Applications of Computer Vision, 2007. WACV'07. IEEE Workshop on, pp. 28---28. IEEE

Digital Library

[18]

Peng J, Shen Y, Fan J (2013) Cross-modal social image clustering and tag cleansing. J Vis Commun Image Represent 24(7):895---910

Digital Library

[19]

Salakhutdinov R, Hinton GE (2007) Learning a nonlinear embedding by preserving class neighbourhood structure. In: International Conference on Artificial Intelligence and Statistics, pp. 412---419

[20]

Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pp. 1470---1477. IEEE

Digital Library

[21]

Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2015) The new data and new challenges in multimedia research. arXiv preprint. arXiv:1503.01817

[22]

Vonikakis V, Jinda-Apiraksa A, Winkler S (2014) Photocluster: A multi-clustering technique for near-duplicate detection in personal photo collections. In: Computer Vision Theory and Applications (VISAPP), 2014 International Conference on, pp. 153---161

[23]

Wang H, Zhu F, Xiao B, Wang L, Jiang YG (2014) Gpu-based mapreduce for large-scale near-duplicate video retrieval. Multimedia Tools & Applications 74(23):10,515---10,534

Digital Library

[24]

Wang J, Kumar S, Chang SF (2012) Semi-supervised hashing for large-scale search. Pattern Analysis and Machine Intelligence. Tran IEEE 34(12):2393---2406

Digital Library

[25]

Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3360---3367. IEEE

[26]

Wang XJ, Zhang L, Liu C (2013) Duplicate discovery on 2 billion internet images. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, pp. 429---436

Digital Library

[27]

Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp. 1753---1760

Digital Library

[28]

Xie L, Tian Q, Zhou W, Zhang B (2014) Fast and accurate near-duplicate image search with affinity propagation on the imageweb. Comput Vis Image Underst 124:31---41

[29]

Yang C, Peng J, Fan J (2012) Image collection summarization via dictionary learning for sparse representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1122--- 1129

Digital Library

[30]

Zheng L, Wang S, Liu Z, Tian Q (2013) Lp-norm idf for large scale image search. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 1626---1633. IEEE

Digital Library

Cited By

Index Terms

MapReduce-based clustering for near-duplicate image identification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
      1. Clustering

Index terms have been assigned to the content through auto-classification.

Recommendations

Online image search result grouping with MapReduce-based image clustering and graph construction for large-scale photos

Current image search system uses paged image list to show search results. However, the problems such as query ambiguity make users hard to find search targets in such image list. In this work, we propose an image search result grouping system that ...
Clustering near-duplicate images in large collections
MIR '07: Proceedings of the international workshop on Workshop on multimedia information retrieval

Near-duplicate images introduce problems of redundancy and copyright infringement in large image collections. The problem is acute on the web, where appropriation of images without acknowledgment of source is prevalent. In this paper, we present an ...
Image Clustering Using Discriminant Image Features
FIT '13: Proceedings of the 2013 11th International Conference on Frontiers of Information Technology

Manifold learning based image clustering models are usually employed at local level to deal with images sampled from nonlinear manifold. Usually, gray level image features are used that are obtained by resizing original images through linear ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications

Multimedia Tools and Applications Volume 76, Issue 22

Nov 2017

1415 pages

ISSN:1380-7501

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer Science+Business Media, LLC.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 November 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents