[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3397271.3401220acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Unsupervised Semantic Hashing with Pairwise Reconstruction

Published: 25 July 2020 Publication History

Abstract

Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.

Supplementary Material

MP4 File (3397271.3401220.mp4)
In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.

References

[1]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[2]
Suthee Chaidaroon, Travis Ebesu, and Yi Fang. 2018. Deep Semantic Text Hashing with Weak Supervision. SIGIR, 1109--1112.
[3]
Suthee Chaidaroon and Yi Fang. 2017. Variational deep semantic hashing for text documents. In SIGIR. 75--84.
[4]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In SIGIR. ACM, 65--74.
[5]
Wei Dong, Qinliang Su, Dinghan Shen, and Changyou Chen. 2019. Document Hashing with Mixture-Prior Generative Models. In EMNLP. 5226--5235.
[6]
Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, and Christina Lioma. 2019 a. Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking. In Companion Proceedings of WWW. 994--1000.
[7]
Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2019 b. Unsupervised Neural Generative Semantic Hashing. In SIGIR. 735--744.
[8]
Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2020. Content-aware Neural Hashing for Cold-start Recommendation. In SIGIR. in press.
[9]
Casper Hansen, Christian Hansen, Jakob Grue Simonsen, and Christina Lioma. 2019 c. Neural weakly supervised fact check-worthiness detection with contrastive sampling-based ranking loss. In CLEF-2019 CheckThat! Lab .
[10]
Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In ICLR .
[11]
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning, Vol. 50, 7 (2009), 969--978.
[12]
Ying Shan, Jian Jiao, Jie Zhu, and JC Mao. 2018. Recurrent binary embedding for gpu-enabled exhaustive retrieval from billion-scale semantic vectors. In KDD. 2170--2179.
[13]
Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Ricardo Henao, and Lawrence Carin. 2018. NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing. In ACL. 2041--2050.
[14]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In NeurIPS. 1753--1760.
[15]
Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010a. Laplacian co-hashing of terms and documents. In ECIR. Springer, 577--580.
[16]
Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010b. Self-taught hashing for fast similarity search. In SIGIR. ACM, 18--25.
[17]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In NeurIPS. 649--657.

Cited By

View all
  • (2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
  • (2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
  • (2023)Exploiting Multiple Features for Hash Codes Learning with Semantic-Alignment-Promoting Variational Auto-encoderNatural Language Processing and Chinese Computing10.1007/978-3-031-44693-1_44(563-575)Online publication date: 8-Oct-2023
  • Show More Cited By

Index Terms

  1. Unsupervised Semantic Hashing with Pairwise Reconstruction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2020
      2548 pages
      ISBN:9781450380164
      DOI:10.1145/3397271
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 July 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. pairwise reconstruction
      2. semantic hashing
      3. variational

      Qualifiers

      • Short-paper

      Funding Sources

      • DABAI

      Conference

      SIGIR '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
      • (2023)An Efficient and Robust Semantic Hashing Framework for Similar Text SearchACM Transactions on Information Systems10.1145/357072541:4(1-31)Online publication date: 22-Mar-2023
      • (2023)Exploiting Multiple Features for Hash Codes Learning with Semantic-Alignment-Promoting Variational Auto-encoderNatural Language Processing and Chinese Computing10.1007/978-3-031-44693-1_44(563-575)Online publication date: 8-Oct-2023
      • (2022)SignRFFProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601564(17802-17817)Online publication date: 28-Nov-2022
      • (2022)Asymmetric similarity-preserving discrete hashing for image retrievalApplied Intelligence10.1007/s10489-022-04167-y53:10(12114-12131)Online publication date: 21-Sep-2022
      • (2021)Efficient Multi-modal Hashing with Online Query Adaption for Multimedia RetrievalACM Transactions on Information Systems10.1145/347718040:2(1-36)Online publication date: 27-Sep-2021
      • (2021)Unsupervised Multi-Index Semantic HashingProceedings of the Web Conference 202110.1145/3442381.3450014(2879-2889)Online publication date: 19-Apr-2021
      • (2021)Projected Hamming Dissimilarity for Bit-Level Importance Coding in Collaborative FilteringProceedings of the Web Conference 202110.1145/3442381.3450011(261-269)Online publication date: 19-Apr-2021
      • (2021)Long-Tail HashingProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462888(1328-1338)Online publication date: 11-Jul-2021
      • (2021)LASH: Large-scale Academic Deep Semantic HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3109433(1-1)Online publication date: 2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media