Document Hashing by Exploiting Noisy Neighborhood Information with Fault-Tolerant Mutual-Information-Preserving VAE

Jiayang Chen¹⁵,
Qinliang Su^15,16,
Zetong Li¹⁵,
Hai Wan¹⁵ &
…
Defu Lian¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14851))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

63 Accesses

Abstract

Semantic hashing is an effective technique for large-scale information retrieval. Currently, some methods have suggested learning high-quality binary hash codes of documents by leveraging both document contents and neighborhood information. However, it is found that erroneous connections often exist in the provided neighborhood information, but were never taken into account in these models. To alleviate their negative impacts on hash code learning, we first build a basic generative model to simultaneously model the document content and neighborhood. Then, we show that the basic generative model can be placed under a more general framework, dubbed mutual-information (MI) preserving variational auto-encoder (VAE). Capitalizing on this connection, a new hashing method that can tolerate the noisy characteristic of the neighborhood information is further developed by proposing a novel fault-tolerant lower bound for MI. Extensive experiments are conducted on six real-world datasets, and significant performance gains are observed over current state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 54.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploiting Multiple Features for Hash Codes Learning with Semantic-Alignment-Promoting Variational Auto-encoder

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Notes

1.
Datasets published by VDSH [2]. 20Newsgroups briefly denoted as 20NG.
2.
STEM papers in Arxiv from 2018 to 2022 collected by Kaggle.
3.
Datasets with citation networks published by [10].

References

Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Chaidaroon, S., Fang, Y.: Variational deep semantic hashing for text documents. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 75–84 (2017)
Google Scholar
Dong, W., Su, Q., Shen, D., Chen, C.: Document hashing with mixture-prior generative models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. pp. 5226–5235 (Nov 2019)
Google Scholar
Hansen, C., Hansen, C., Simonsen, J.G., Alstrup, S., Lioma, C.: Unsupervised neural generative semantic hashing. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019)
Google Scholar
Hansen, C., Hansen, C., Simonsen, J.G., Alstrup, S., Lioma, C.: Unsupervised semantic hashing with pairwise reconstruction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014)
Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018)
Google Scholar
Ou, Z., Su, Q., Yu, J., Liu, B., Wang, J., Zhao, R., Chen, C., Zheng, Y.: Integrating semantics and neighborhood information with graph-driven generative models for document retrieval. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 2238–2249 (Aug 2021)
Google Scholar
Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Google Scholar
Shen, D., Su, Q., Chapfuwa, P., Wang, W., Wang, G., Henao, R., Carin, L.: NASH: Toward end-to-end neural architecture for generative semantic hashing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. pp. 2041–2050 (Jul 2018)
Google Scholar
Stratos, K., Wiseman, S.: Learning discrete structured representations by adversarially maximizing mutual information. In: Proceedings of the 37th International Conference on Machine Learning. pp. 9144–9154 (2020)
Google Scholar
Wang, S., Cao, L., Wang, Y., Sheng, Q.Z., Orgun, M.A., Lian, D.: A survey on session-based recommender systems. ACM Comput. Surv. 54(7) (jul 2021)
Google Scholar
Zheng, L., Su, Q., Shen, D., Chen, C.: Generative semantic hashing enhanced via Boltzmann machines. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 777–788 (Jul 2020)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62276280, U1811264), Guangzhou Science and Technology Planning Project (No. 2024A04J9967), the Fundamental Research Funds of the Central Universities, Sun Yat-Sen University (No. 23ptpy78). Qinliang Su is the corresponding author.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Jiayang Chen, Qinliang Su, Zetong Li & Hai Wan
Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Qinliang Su
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Hefei, China
Defu Lian

Authors

Jiayang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qinliang Su
View author publications
You can also search for this author in PubMed Google Scholar
Zetong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hai Wan
View author publications
You can also search for this author in PubMed Google Scholar
Defu Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qinliang Su .

Editor information

Editors and Affiliations

Osaka University, Suita, Osaka, Japan
Makoto Onizuka
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
Beihang University, Beijing, China
Yongxin Tong
Osaka University, Osaka, Japan
Chuan Xiao
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
University of Grenoble Alpes, Saint-Martin d’Hères, France
Sihem Amer-Yahia
University of Michigan, Ann Arbor, MI, USA
H. V. Jagadish
Nagoya University, Nagoya, Japan
Kejing Lu

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Su, Q., Li, Z., Wan, H., Lian, D. (2025). Document Hashing by Exploiting Noisy Neighborhood Information with Fault-Tolerant Mutual-Information-Preserving VAE. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14851. Springer, Singapore. https://doi.org/10.1007/978-981-97-5779-4_32

Download citation

DOI: https://doi.org/10.1007/978-981-97-5779-4_32
Published: 11 January 2025
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5778-7
Online ISBN: 978-981-97-5779-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Document Hashing by Exploiting Noisy Neighborhood Information with Fault-Tolerant Mutual-Information-Preserving VAE

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Multiple Features for Hash Codes Learning with Semantic-Alignment-Promoting Variational Auto-encoder

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Document Hashing by Exploiting Noisy Neighborhood Information with Fault-Tolerant Mutual-Information-Preserving VAE

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Multiple Features for Hash Codes Learning with Semantic-Alignment-Promoting Variational Auto-encoder

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 362 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation