[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2024)

Abstract

Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication. Generally, mainstream hashtag recommendation faces challenges in the comprehensive difficulty of newly posted tweets in response to new topics, and the accurate identification of mainstream hashtags beyond semantic correctness. However, previous retrieval-based methods based on a fixed predefined mainstream hashtag list excel in producing mainstream hashtags, but fail to understand the constant flow of up-to-date information. Conversely, generation-based methods demonstrate a superior ability to comprehend newly posted tweets, but their capacity is constrained to identifying mainstream hashtags without additional features. Inspired by the recent success of the retrieval-augmented technique, in this work, we attempt to adopt this framework to combine the advantages of both approaches. Meantime, with the help of the generator component, we could rethink how to further improve the quality of the retriever component at a low cost. Therefore, we propose Retr Ieval-augmented Generative Mainstream Hash Tag Recommender (RIGHT), which consists of three components: (i) a retriever seeks relevant hashtags from the entire tweet-hashtags set; (ii) a selector enhances mainstream identification by introducing global signals; and (iii) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags. The experimental results show that our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%. Code will be released at: https://github.com/ict-bigdatalab/RIGHT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 55.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Asai, A., Min, S., Zhong, Z., Chen, D.: Retrieval-based language models and applications. In: ACL, Toronto, Canada, July 2023, pp. 41–46 (2023). https://doi.org/10.18653/v1/2023.acl-tutorials.6. https://aclanthology.org/2023.acl-tutorials.6

  2. Brown, T.B., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

  3. Cao, M., Dong, Y., Wu, J., Cheung, J.C.K.: Factual error correction for abstractive summarization models. In: EMNLP, November 2020 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.506. https://aclanthology.org/2020.emnlp-main.506

  4. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to answer open-domain questions. In: ACL, July 2017 (2017). https://doi.org/10.18653/v1/P17-1171. https://aclanthology.org/P17-1171

  5. Chen, J., Zhang, R., Guo, J., Fan, Y., Cheng, X.: GERE: generative evidence retrieval for fact verification. In: SIGIR 2022, pp. 2184–2189. ACM (2022). https://doi.org/10.1145/3477495.3531827

  6. Chen, J., Zhang, R., Guo, J., Liu, Y., Fan, Y., Cheng, X.: CorpusBrain: pre-train a generative retrieval model for knowledge-intensive language tasks. In: CIKM 2022, pp. 191–200. ACM (2022). https://doi.org/10.1145/3511808.3557271

  7. Chen, J., et al.: Continual learning for generative retrieval over dynamic corpora. In: CIKM 2023, pp. 306–315. ACM (2023). https://doi.org/10.1145/3583780.3614821

  8. Chen, J., et al.: A unified generative retriever for knowledge-intensive language tasks via prompt learning. In: SIGIR 2023, pp. 1448–1457. ACM (2023). https://doi.org/10.1145/3539618.3591631

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, June 2019 (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423

  10. Ding, Z., Zhang, Q., Huang, X.: Automatic hashtag recommendation for microblogs using topic-specific translation model. In: COLING, December 2012 (2012). https://aclanthology.org/C12-2027

  11. Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: EMNLP, November 2021 (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552. https://aclanthology.org/2021.emnlp-main.552

  12. Gao, Y., et al.: Retrieval-augmented multilingual keyphrase generation with retriever-generator iterative training. In: NAACL (2022). https://doi.org/10.18653/v1/2022.findings-naacl.92. https://aclanthology.org/2022.findings-naacl.92

  13. Gong, Y., Zhang, Q., Huang, X.: Hashtag recommendation using Dirichlet process mixture models incorporating types of hashtags. In: EMNLP (2015). https://doi.org/10.18653/v1/D15-1046. https://aclanthology.org/D15-1046

  14. Gong, Y., Zhang, Q., Huang, X.: Hashtag recommendation using dirichlet process mixture models incorporating types of hashtags. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) EMNLP (2015). https://doi.org/10.18653/v1/d15-1046. https://doi.org/10.18653/v1/d15-1046

  15. Gong, Y., Zhang, Q.: Hashtag recommendation using attention-based convolutional neural network. In: Kambhampati, S. (ed.) IJCAI (2016). http://www.ijcai.org/Abstract/16/395

  16. He, H., Zhang, H., Roth, D.: Rethinking with retrieval: Faithful large language model inference. arXiv preprint (2022). https://arxiv.org/pdf/2301.00303.pdf

  17. He, S., Fan, R.Z., Ding, L., Shen, L., Zhou, T., Tao, D.: MerA: merging pretrained adapters for few-shot learning. arXiv preprint arXiv:2308.15982 (2023)

  18. He, S., Fan, R.Z., Ding, L., Shen, L., Zhou, T., Tao, D.: Merging experts into one: improving computational efficiency of mixture of experts. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 2023, pp. 14685–14691. Association for Computational Linguistics (2023). https://aclanthology.org/2023.emnlp-main.907

  19. Huang, H., Zhang, Q., Gong, Y., Huang, X.: Hashtag recommendation using end-to-end memory networks with hierarchical attention. In: COLING, December 2016 (2016). https://aclanthology.org/C16-1090

  20. Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. (2023). https://doi.org/10.1145/3571730

  21. Kim, J., Jeong, M., Choi, S., Hwang, S.: Structure-augmented keyphrase generation. In: EMNLP (2021). https://doi.org/10.18653/v1/2021.emnlp-main.209. https://aclanthology.org/2021.emnlp-main.209

  22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015). http://arxiv.org/abs/1412.6980

  23. Kocoń, J., et al.: ChatGPT: jack of all trades, master of none. arXiv preprint (2023). https://arxiv.org/pdf/2302.10724.pdf

  24. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW (2010). https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=7104 &context=sis_research

  25. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.703

  26. Lewis, P.S.H., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) NeurIPS (2020). https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html

  27. Li, J., Sun, S., Yuan, W., Fan, R.Z., Zhao, H., Liu, P.: Generative judge for evaluating alignment. arXiv preprint arXiv:2310.05470 (2023)

  28. Li, X., Zhu, X., Ma, Z., Liu, X., Shah, S.: Are ChatGPT and GPT-4 general-purpose solvers for financial text analytics? An examination on several typical tasks. arXiv preprint (2023). https://arxiv.org/pdf/2305.05862.pdf

  29. Liu, Y., et al.: RoBERTa: a robustly optimized bert pretraining approach. arXiv preprint (2019). https://arxiv.org/pdf/1907.11692.pdf

  30. Mao, Q., et al.: Attend and select: a segment selective transformer for microblog hashtag generation. Knowl. Based Syst. 254, 109581 (2022). https://doi.org/10.1016/j.knosys.2022.109581. https://www.sciencedirect.com/science/article/pii/S0950705122007973

  31. Mialon, G., et al.: Augmented language models: a survey. arXiv preprint (2023). https://arxiv.org/pdf/2302.07842.pdf

  32. Ni, S., Bi, K., Guo, J., Cheng, X.: A comparative study of training objectives for clarification facet generation. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, pp. 1–10 (2023)

    Google Scholar 

  33. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training. preprint (2018). https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

  34. Radford, A., et al.: Language models are unsupervised multitask learners. preprint (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

  35. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020). http://jmlr.org/papers/v21/20-074.html

  36. Raunak, V., Menezes, A., Junczys-Dowmunt, M.: The curious case of hallucinations in neural machine translation. In: ACL, June 2021 (2021). https://doi.org/10.18653/v1/2021.naacl-main.92. https://aclanthology.org/2021.naacl-main.92

  37. Ramos, R., Elliott, D., Martins, B.: Retrieval-augmented image captioning. In: EACL (2023). https://arxiv.org/abs/2302.08268

  38. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994 (1994). https://doi.org/10.1007/978-1-4471-2099-5_24

  39. Wang, S., Gan, T., Liu, Y., Wu, J., Cheng, Y., Nie, L.: Micro-influencer recommendation by multi-perspective account representation learning. IEEE Trans. Multimedia 25, 2749–2760 (2022)

    Article  Google Scholar 

  40. Wang, S., Gan, T., Liu, Y., Zhang, L., Wu, J., Nie, L.: Discover micro-influencers for brands via better understanding. IEEE Trans. Multimedia 24, 2595–2605 (2021)

    Article  Google Scholar 

  41. Wang, S., et al.: Training data is more valuable than you think: a simple and effective method by retrieving from training data. In: ACL, May 2022 (2022). https://doi.org/10.18653/v1/2022.acl-long.226. https://aclanthology.org/2022.acl-long.226

  42. Wang, Y., Li, J., King, I., Lyu, M.R., Shi, S.: Microblog hashtag generation via encoding conversation contexts. In: NAACL, June 2019 (2019). https://doi.org/10.18653/v1/N19-1164. https://aclanthology.org/N19-1164

  43. Weston, J., Chopra, S., Adams, K.: #TagSpace: semantic embeddings from hashtags. In: EMNLP, October 2014 (2014). https://doi.org/10.3115/v1/D14-1194. https://aclanthology.org/D14-1194

  44. Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: Toutanova, K., et al. (eds.) NAACL (2021). https://doi.org/10.18653/v1/2021.naacl-main.41

  45. Zhang, H., Zhang, R., Guo, J., de Rijke, M., Fan, Y., Cheng, X.: From relevance to utility: evidence retrieval with feedback for fact verification. In: Bouamor, H., Pino, J., Bali, K. (eds.) Findings of the Association for Computational Linguistics, EMNLP 2023, Singapore December 2023, pp. 6373–6384. Association for Computational Linguistics (2023). https://aclanthology.org/2023.findings-emnlp.422

  46. Zhang, Q., Wang, J., Huang, H., Huang, X., Gong, Y.: Hashtag recommendation for multimodal microblog using co-attention network. In: Sierra, C. (ed.) IJCAI (2017). https://doi.org/10.24963/ijcai.2017/478

  47. Zhang, Q., Wang, Y., Gong, Y., Huang, X.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: EMNLP, November 2016 (2016). https://doi.org/10.18653/v1/D16-1080. https://aclanthology.org/D16-1080

  48. Zhang, X., et al.: Domain-specific NER via retrieving correlated samples. In: COLING (2022). https://aclanthology.org/2022.coling-1.211

  49. Zhang, Y., Li, J., Song, Y., Zhang, C.: Encoding conversation context for neural keyphrase extraction from microblog posts. In: NAACL, June 2018 (2018). https://doi.org/10.18653/v1/N18-1151. https://aclanthology.org/N18-1151

  50. Zheng, X., Mekala, D., Gupta, A., Shang, J.: News meets microblog: hashtag annotation via retriever-generator. arXiv preprint (2021). https://arxiv.org/abs/2104.08723

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (NSFC) under Grants No. 62372431, and 62006218, the Youth Innovation Promotion Association CAS under Grants No. 2021100, the project under Grants No. 2023YFA1011602, JCKY2022130C039 and 2021QY1701, and the Lenovo-CAS Joint Lab Youth Scientist Project. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiafeng Guo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, RZ., Fan, Y., Chen, J., Guo, J., Zhang, R., Cheng, X. (2024). RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56027-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56026-2

  • Online ISBN: 978-3-031-56027-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics