[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2009916.2009950acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Composite hashing with multiple information sources

Published: 24 July 2011 Publication History

Abstract

Similarity search applications with a large amount of text and image data demands an efficient and effective solution. One useful strategy is to represent the examples in databases as compact binary codes through semantic hashing, which has attracted much attention due to its fast query/search speed and drastically reduced storage requirement. All of the current semantic hashing methods only deal with the case when each example is represented by one type of features. However, examples are often described from several different information sources in many real world applications. For example, the characteristics of a webpage can be derived from both its content part and its associated links.
To address the problem of learning good hashing codes in this scenario, we propose a novel research problem -- Composite Hashing with Multiple Information Sources (CHMIS). The focus of the new research problem is to design an algorithm for incorporating the features from different information sources into the binary hashing codes efficiently and effectively. In particular, we propose an algorithm CHMIS-AW (CHMIS with Adjusted Weights) for learning the codes. The proposed algorithm integrates information from several different sources into the binary hashing codes by adjusting the weights on each individual source for maximizing the coding performance, and enables fast conversion from query examples to their binary hashing codes. Experimental results on five different datasets demonstrate the superior performance of the proposed method against several other state-of-the-art semantic hashing techniques.

References

[1]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006.
[2]
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges in distributed information retrieval. In ICDE, 2007.
[3]
R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999.
[4]
S. Baluja and M. Covell. Learning to hash: forgiving hash functions and applications. Data Mining and Knowledge Discovery, 17(3):402--430, 2008.
[5]
Y. Bengio, O. Delalleau, N. Roux, J. Paiement, P. Vincent, and M. Ouimet. Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation, 16(10):2197--2219, 2004.
[6]
C. Bishop et al. Pattern recognition and machine learning. Springer New York, 2006.
[7]
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge Univ Pr, 2004.
[8]
A. Z. Broder. On the resemblance and containment of documents. In CCS, pages 21--29. IEEE Computer Society, 1997.
[9]
O. Chum, M. Perdoch, and J. Matas. Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR, pages 17--24, 2009.
[10]
O. Chum, J. Philbin, and A. Zisserman. Near duplicate image detection: min-hash and tf-idf weighting. In BMVC, volume 3, page 4, 2008.
[11]
F. Chung and N. Biggs. Spectral graph theory. American Mathematical Society Providence, RI, 1997.
[12]
T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to algorithms. The MIT press, 2001.
[13]
J. S. Culpepper and A. Moffat. Efficient set intersection for inverted indexing. ACM Trans. Inf. Syst., 29:111--125, December 2010.
[14]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004.
[15]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.
[16]
J. Farquhar, D. Hardoon, H. Meng, J. Shawe-Taylor, and S. Szedmak. Two view learning: SVM-2K, theory and practice. NIPS, 18:355, 2006.
[17]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.
[18]
G. E. Hinton, S. Osindero, and Y.-W. Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Comp., 18(7):1527--1554, July 2006.
[19]
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786):504--507, July 2006.
[20]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57. ACM, 1999.
[21]
T. Joachims and N. Cristianini. Composite kernels for hypertext categorisation. In ICML, pages 250--257. Morgan Kaufmann Publishers, 2001.
[22]
B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 2009.
[23]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004.
[24]
G. Li, S. C. H. Hoi, and K. Chang. Two-view transductive support vector machines. In SDM, pages 235--244, 2010.
[25]
J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with mapreduce. In SIGIR, pages 155--162. ACM, 2009.
[26]
R. Lin, D. Ross, and J. Yagnik. SPEC hashing: Similarity preserving algorithm for entropy-based coding. In CVPR, pages 848--854, 2010.
[27]
D. Lowe. Object recognition from local scale-invariant features. In ICCV, page 1150, 1999.
[28]
C. Manning, P. Raghavan, and H. Schutze. An introduction to information retrieval. Cambridge University Press, 2008.
[29]
A. K. Mccallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 3:127--163, 1999.
[30]
Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In CVPR, pages 3344--3351, 2010.
[31]
R.E.Schapire. The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification, pages 149--172, 2003.
[32]
D. Rosenberg, V. Sindhwani, P. Bartlett, and P. Niyogi. A Kernel for Semi-Supervised Learning With Multi-View Point Cloud Regularization. IEEE Signal Processing Magazine, 2009.
[33]
R. Salakhutdinov and G. Hinton. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969--978, 2009.
[34]
G. Salton. Developments in automatic text retrieval. Science, 253(5023):974--980, 1991.
[35]
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information processing and management, 24(5):513--523, 1988.
[36]
B. Scholkopf and A. Smola. Learning with kernels. MIT press Cambridge, Mass, 2002.
[37]
G. Shakhnarovich, P. A. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV, pages 750--759, 2003.
[38]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2002.
[39]
F. Silvestri and R. Venturini. Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In CIKM, pages 1219--1228, 2010.
[40]
V. Sindhwani and P. Niyogi. A co-regularized approach to semi-supervised learning with multiple views. In ICML Workshop on Learning with Multiple Views, 2005.
[41]
B. Stein. Principles of hash-based text retrieval. In SIGIR, page 534. ACM, 2007.
[42]
V. Vapnik. The nature of statistical learning theory. Springer Verlag, 2000.
[43]
J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.
[44]
J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In ICML, pages 1127--1134, 2010.
[45]
X. Wang, L. Zhang, F. Jing, and W. Ma. Annosearch: Image auto-annotation by search. In CVPR, 2006.
[46]
R. Weber, H. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205.
[47]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 21:1753--1760, 2009.
[48]
I. H. Witten, I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, May 1999.
[49]
L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. NIPS, 17:1601--1608, 2004.
[50]
D. Zhang, F. Wang, C. Zhang, and T. Li. Multi-view local learning. In AAAI, pages 752--757, 2008.
[51]
D. Zhang, J. Wang, D. Cai, and J. Lu. Laplacian co-hashing of terms and documents. Advances in Information Retrieval, pages 577--580, 2010.
[52]
D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, pages 18--25. ACM, 2010.
[53]
T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In SIGKDD, pages 821--826. ACM, 2006.
[54]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2), 2006.

Cited By

View all
  • (2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
  • (2024)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: Jan-2024
  • (2024)A Multi-View Double Alignment Hashing Network with Weighted Contrastive Learning2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687739(1-6)Online publication date: 15-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. composite hashing
  2. multiple information sources
  3. semantic hashing

Qualifiers

  • Research-article

Conference

SIGIR '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
  • (2024)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: Jan-2024
  • (2024)A Multi-View Double Alignment Hashing Network with Weighted Contrastive Learning2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687739(1-6)Online publication date: 15-Jul-2024
  • (2023)Supervised Discrete Multiple-Length Hashing for Image RetrievalIEEE Transactions on Big Data10.1109/TBDATA.2022.31619059:1(312-327)Online publication date: 1-Feb-2023
  • (2023)Locality Preserving Multiview Graph Hashing For Large Scale Remote Sensing Image SearchICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096369(1-5)Online publication date: 4-Jun-2023
  • (2023)Binary multi-view clustering with spectral embeddingNeurocomputing10.1016/j.neucom.2023.126733557:COnline publication date: 7-Nov-2023
  • (2023)Research on Multimodal Retrieval Method Based on Commodity ImagesProceedings of the 9th International Conference on Advanced Intelligent Systems and Informatics 202310.1007/978-3-031-43247-7_46(532-541)Online publication date: 18-Sep-2023
  • (2023)Composite Multi-modal HashingMulti-modal Hash Learning10.1007/978-3-031-37291-9_4(91-144)Online publication date: 5-Aug-2023
  • (2022)Efficient modal-aware feature learning with application in multimodal hashingIntelligent Data Analysis10.3233/IDA-21578026:2(345-360)Online publication date: 14-Mar-2022
  • (2022)The Model May Fit You: User-Generalized Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2021.309188824(2998-3012)Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media