RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14608))

Included in the following conference series:

European Conference on Information Retrieval

1157 Accesses

Abstract

Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication. Generally, mainstream hashtag recommendation faces challenges in the comprehensive difficulty of newly posted tweets in response to new topics, and the accurate identification of mainstream hashtags beyond semantic correctness. However, previous retrieval-based methods based on a fixed predefined mainstream hashtag list excel in producing mainstream hashtags, but fail to understand the constant flow of up-to-date information. Conversely, generation-based methods demonstrate a superior ability to comprehend newly posted tweets, but their capacity is constrained to identifying mainstream hashtags without additional features. Inspired by the recent success of the retrieval-augmented technique, in this work, we attempt to adopt this framework to combine the advantages of both approaches. Meantime, with the help of the generator component, we could rethink how to further improve the quality of the retriever component at a low cost. Therefore, we propose Retr Ieval-augmented Generative Mainstream Hash Tag Recommender (RIGHT), which consists of three components: (i) a retriever seeks relevant hashtags from the entire tweet-hashtags set; (ii) a selector enhances mainstream identification by introducing global signals; and (iii) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags. The experimental results show that our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%. Code will be released at: https://github.com/ict-bigdatalab/RIGHT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

User-IBTM: An Online Framework for Hashtag Suggestion in Twitter

Personalized Hashtag Suggestion for Microblogs

Hashtag recommendation for short social media texts using word-embeddings and external knowledge

Article 14 October 2020

References

Asai, A., Min, S., Zhong, Z., Chen, D.: Retrieval-based language models and applications. In: ACL, Toronto, Canada, July 2023, pp. 41–46 (2023). https://doi.org/10.18653/v1/2023.acl-tutorials.6. https://aclanthology.org/2023.acl-tutorials.6
Brown, T.B., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Cao, M., Dong, Y., Wu, J., Cheung, J.C.K.: Factual error correction for abstractive summarization models. In: EMNLP, November 2020 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.506. https://aclanthology.org/2020.emnlp-main.506
Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: ACL, July 2017 (2017). https://doi.org/10.18653/v1/P17-1171. https://aclanthology.org/P17-1171
Chen, J., Zhang, R., Guo, J., Fan, Y., Cheng, X.: GERE: generative evidence retrieval for fact verification. In: SIGIR 2022, pp. 2184–2189. ACM (2022). https://doi.org/10.1145/3477495.3531827
Chen, J., Zhang, R., Guo, J., Liu, Y., Fan, Y., Cheng, X.: CorpusBrain: pre-train a generative retrieval model for knowledge-intensive language tasks. In: CIKM 2022, pp. 191–200. ACM (2022). https://doi.org/10.1145/3511808.3557271
Chen, J., et al.: Continual learning for generative retrieval over dynamic corpora. In: CIKM 2023, pp. 306–315. ACM (2023). https://doi.org/10.1145/3583780.3614821
Chen, J., et al.: A unified generative retriever for knowledge-intensive language tasks via prompt learning. In: SIGIR 2023, pp. 1448–1457. ACM (2023). https://doi.org/10.1145/3539618.3591631
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, June 2019 (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Ding, Z., Zhang, Q., Huang, X.: Automatic hashtag recommendation for microblogs using topic-specific translation model. In: COLING, December 2012 (2012). https://aclanthology.org/C12-2027
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: EMNLP, November 2021 (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552. https://aclanthology.org/2021.emnlp-main.552
Gao, Y., et al.: Retrieval-augmented multilingual keyphrase generation with retriever-generator iterative training. In: NAACL (2022). https://doi.org/10.18653/v1/2022.findings-naacl.92. https://aclanthology.org/2022.findings-naacl.92
Gong, Y., Zhang, Q., Huang, X.: Hashtag recommendation using Dirichlet process mixture models incorporating types of hashtags. In: EMNLP (2015). https://doi.org/10.18653/v1/D15-1046. https://aclanthology.org/D15-1046
Gong, Y., Zhang, Q., Huang, X.: Hashtag recommendation using dirichlet process mixture models incorporating types of hashtags. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) EMNLP (2015). https://doi.org/10.18653/v1/d15-1046. https://doi.org/10.18653/v1/d15-1046
Gong, Y., Zhang, Q.: Hashtag recommendation using attention-based convolutional neural network. In: Kambhampati, S. (ed.) IJCAI (2016). http://www.ijcai.org/Abstract/16/395
He, H., Zhang, H., Roth, D.: Rethinking with retrieval: Faithful large language model inference. arXiv preprint (2022). https://arxiv.org/pdf/2301.00303.pdf
He, S., Fan, R.Z., Ding, L., Shen, L., Zhou, T., Tao, D.: MerA: merging pretrained adapters for few-shot learning. arXiv preprint arXiv:2308.15982 (2023)
He, S., Fan, R.Z., Ding, L., Shen, L., Zhou, T., Tao, D.: Merging experts into one: improving computational efficiency of mixture of experts. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 2023, pp. 14685–14691. Association for Computational Linguistics (2023). https://aclanthology.org/2023.emnlp-main.907
Huang, H., Zhang, Q., Gong, Y., Huang, X.: Hashtag recommendation using end-to-end memory networks with hierarchical attention. In: COLING, December 2016 (2016). https://aclanthology.org/C16-1090
Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. (2023). https://doi.org/10.1145/3571730
Kim, J., Jeong, M., Choi, S., Hwang, S.: Structure-augmented keyphrase generation. In: EMNLP (2021). https://doi.org/10.18653/v1/2021.emnlp-main.209. https://aclanthology.org/2021.emnlp-main.209
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015). http://arxiv.org/abs/1412.6980
Kocoń, J., et al.: ChatGPT: jack of all trades, master of none. arXiv preprint (2023). https://arxiv.org/pdf/2302.10724.pdf
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW (2010). https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=7104 &context=sis_research
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.703
Lewis, P.S.H., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
Li, J., Sun, S., Yuan, W., Fan, R.Z., Zhao, H., Liu, P.: Generative judge for evaluating alignment. arXiv preprint arXiv:2310.05470 (2023)
Li, X., Zhu, X., Ma, Z., Liu, X., Shah, S.: Are ChatGPT and GPT-4 general-purpose solvers for financial text analytics? An examination on several typical tasks. arXiv preprint (2023). https://arxiv.org/pdf/2305.05862.pdf
Liu, Y., et al.: RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint (2019). https://arxiv.org/pdf/1907.11692.pdf
Mao, Q., et al.: Attend and select: a segment selective transformer for microblog hashtag generation. Knowl. Based Syst. 254, 109581 (2022). https://doi.org/10.1016/j.knosys.2022.109581. https://www.sciencedirect.com/science/article/pii/S0950705122007973
Mialon, G., et al.: Augmented language models: a survey. arXiv preprint (2023). https://arxiv.org/pdf/2302.07842.pdf
Ni, S., Bi, K., Guo, J., Cheng, X.: A comparative study of training objectives for clarification facet generation. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, pp. 1–10 (2023)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. preprint (2018). https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Radford, A., et al.: Language models are unsupervised multitask learners. preprint (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020). http://jmlr.org/papers/v21/20-074.html
Raunak, V., Menezes, A., Junczys-Dowmunt, M.: The curious case of hallucinations in neural machine translation. In: ACL, June 2021 (2021). https://doi.org/10.18653/v1/2021.naacl-main.92. https://aclanthology.org/2021.naacl-main.92
Ramos, R., Elliott, D., Martins, B.: Retrieval-augmented image captioning. In: EACL (2023). https://arxiv.org/abs/2302.08268
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994 (1994). https://doi.org/10.1007/978-1-4471-2099-5_24
Wang, S., Gan, T., Liu, Y., Wu, J., Cheng, Y., Nie, L.: Micro-influencer recommendation by multi-perspective account representation learning. IEEE Trans. Multimedia 25, 2749–2760 (2022)
Article Google Scholar
Wang, S., Gan, T., Liu, Y., Zhang, L., Wu, J., Nie, L.: Discover micro-influencers for brands via better understanding. IEEE Trans. Multimedia 24, 2595–2605 (2021)
Article Google Scholar
Wang, S., et al.: Training data is more valuable than you think: a simple and effective method by retrieving from training data. In: ACL, May 2022 (2022). https://doi.org/10.18653/v1/2022.acl-long.226. https://aclanthology.org/2022.acl-long.226
Wang, Y., Li, J., King, I., Lyu, M.R., Shi, S.: Microblog hashtag generation via encoding conversation contexts. In: NAACL, June 2019 (2019). https://doi.org/10.18653/v1/N19-1164. https://aclanthology.org/N19-1164
Weston, J., Chopra, S., Adams, K.: #TagSpace: semantic embeddings from hashtags. In: EMNLP, October 2014 (2014). https://doi.org/10.3115/v1/D14-1194. https://aclanthology.org/D14-1194
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: Toutanova, K., et al. (eds.) NAACL (2021). https://doi.org/10.18653/v1/2021.naacl-main.41
Zhang, H., Zhang, R., Guo, J., de Rijke, M., Fan, Y., Cheng, X.: From relevance to utility: evidence retrieval with feedback for fact verification. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics, EMNLP 2023, Singapore December 2023, pp. 6373–6384. Association for Computational Linguistics (2023). https://aclanthology.org/2023.findings-emnlp.422
Zhang, Q., Wang, J., Huang, H., Huang, X., Gong, Y.: Hashtag recommendation for multimodal microblog using co-attention network. In: Sierra, C. (ed.) IJCAI (2017). https://doi.org/10.24963/ijcai.2017/478
Zhang, Q., Wang, Y., Gong, Y., Huang, X.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: EMNLP, November 2016 (2016). https://doi.org/10.18653/v1/D16-1080. https://aclanthology.org/D16-1080
Zhang, X., et al.: Domain-specific NER via retrieving correlated samples. In: COLING (2022). https://aclanthology.org/2022.coling-1.211
Zhang, Y., Li, J., Song, Y., Zhang, C.: Encoding conversation context for neural keyphrase extraction from microblog posts. In: NAACL, June 2018 (2018). https://doi.org/10.18653/v1/N18-1151. https://aclanthology.org/N18-1151
Zheng, X., Mekala, D., Gupta, A., Shang, J.: News meets microblog: hashtag annotation via retriever-generator. arXiv preprint (2021). https://arxiv.org/abs/2104.08723

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (NSFC) under Grants No. 62372431, and 62006218, the Youth Innovation Promotion Association CAS under Grants No. 2021100, the project under Grants No. 2023YFA1011602, JCKY2022130C039 and 2021QY1701, and the Lenovo-CAS Joint Lab Youth Scientist Project. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Author information

Authors and Affiliations

ICT, CAS, CAS Key Lab of Network Data Science and Technology, Beijing, China
Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang & Xueqi Cheng
University of Chinese Academy of Sciences, Beijing, China
Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang & Xueqi Cheng

Authors

Run-Ze Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yixing Fan
View author publications
You can also search for this author in PubMed Google Scholar
Jiangui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ruqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xueqi Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiafeng Guo .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, RZ., Fan, Y., Chen, J., Guo, J., Zhang, R., Cheng, X. (2024). RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-56027-9_3
Published: 20 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

User-IBTM: An Online Framework for Hashtag Suggestion in Twitter

Personalized Hashtag Suggestion for Microblogs

Hashtag recommendation for short social media texts using word-embeddings and external knowledge

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

User-IBTM: An Online Framework for Hashtag Suggestion in Twitter

Personalized Hashtag Suggestion for Microblogs

Hashtag recommendation for short social media texts using word-embeddings and external knowledge

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation