[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Document Hashing by Exploiting Noisy Neighborhood Information with Fault-Tolerant Mutual-Information-Preserving VAE

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14851))

Included in the following conference series:

  • 63 Accesses

Abstract

Semantic hashing is an effective technique for large-scale information retrieval. Currently, some methods have suggested learning high-quality binary hash codes of documents by leveraging both document contents and neighborhood information. However, it is found that erroneous connections often exist in the provided neighborhood information, but were never taken into account in these models. To alleviate their negative impacts on hash code learning, we first build a basic generative model to simultaneously model the document content and neighborhood. Then, we show that the basic generative model can be placed under a more general framework, dubbed mutual-information (MI) preserving variational auto-encoder (VAE). Capitalizing on this connection, a new hashing method that can tolerate the noisy characteristic of the neighborhood information is further developed by proposing a novel fault-tolerant lower bound for MI. Extensive experiments are conducted on six real-world datasets, and significant performance gains are observed over current state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 54.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Datasets published by VDSH [2]. 20Newsgroups briefly denoted as 20NG.

  2. 2.

    STEM papers in Arxiv from 2018 to 2022 collected by Kaggle.

  3. 3.

    Datasets with citation networks published by [10].

References

  1. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  2. Chaidaroon, S., Fang, Y.: Variational deep semantic hashing for text documents. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 75–84 (2017)

    Google Scholar 

  3. Dong, W., Su, Q., Shen, D., Chen, C.: Document hashing with mixture-prior generative models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. pp. 5226–5235 (Nov 2019)

    Google Scholar 

  4. Hansen, C., Hansen, C., Simonsen, J.G., Alstrup, S., Lioma, C.: Unsupervised neural generative semantic hashing. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019)

    Google Scholar 

  5. Hansen, C., Hansen, C., Simonsen, J.G., Alstrup, S., Lioma, C.: Unsupervised semantic hashing with pairwise reconstruction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020)

    Google Scholar 

  6. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

    Google Scholar 

  7. Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014)

    Google Scholar 

  8. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018)

    Google Scholar 

  9. Ou, Z., Su, Q., Yu, J., Liu, B., Wang, J., Zhao, R., Chen, C., Zheng, Y.: Integrating semantics and neighborhood information with graph-driven generative models for document retrieval. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 2238–2249 (Aug 2021)

    Google Scholar 

  10. Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)

    Google Scholar 

  11. Shen, D., Su, Q., Chapfuwa, P., Wang, W., Wang, G., Henao, R., Carin, L.: NASH: Toward end-to-end neural architecture for generative semantic hashing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. pp. 2041–2050 (Jul 2018)

    Google Scholar 

  12. Stratos, K., Wiseman, S.: Learning discrete structured representations by adversarially maximizing mutual information. In: Proceedings of the 37th International Conference on Machine Learning. pp. 9144–9154 (2020)

    Google Scholar 

  13. Wang, S., Cao, L., Wang, Y., Sheng, Q.Z., Orgun, M.A., Lian, D.: A survey on session-based recommender systems. ACM Comput. Surv. 54(7) (jul 2021)

    Google Scholar 

  14. Zheng, L., Su, Q., Shen, D., Chen, C.: Generative semantic hashing enhanced via Boltzmann machines. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 777–788 (Jul 2020)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62276280, U1811264), Guangzhou Science and Technology Planning Project (No. 2024A04J9967), the Fundamental Research Funds of the Central Universities, Sun Yat-Sen University (No. 23ptpy78). Qinliang Su is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinliang Su .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Su, Q., Li, Z., Wan, H., Lian, D. (2025). Document Hashing by Exploiting Noisy Neighborhood Information with Fault-Tolerant Mutual-Information-Preserving VAE. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14851. Springer, Singapore. https://doi.org/10.1007/978-981-97-5779-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5779-4_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5778-7

  • Online ISBN: 978-981-97-5779-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics