[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3437963.3441667acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
tutorial
Open access

Pretrained Transformers for Text Ranking: BERT and Beyond

Published: 08 March 2021 Publication History

Abstract

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This tutorial, based on a forthcoming book, provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. We provide a synthesis of existing work as a single point of entry for both researchers and practitioners. Our coverage is grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly. Two themes pervade our treatment: techniques for handling long documents and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of their application are well understood. Nevertheless, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, we also attempt to prognosticate the future.

References

[1]
Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 3490--3496.
[2]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 (2020).
[3]
Zhuyun Dai and Jamie Callan. 2019 a. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval. arXiv:1910.10687 (2019).
[4]
Zhuyun Dai and Jamie Callan. 2019 b. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Paris, France, 985--988.
[5]
Zhuyun Dai and Jamie Callan. 2020. Context-Aware Document Term Weighting for Ad-Hoc Search. In Proceedings of The Web Conference 2020 (WWW '20). 1897--1907.
[6]
Cedric De Boom, Steven Van Canneyt, Thomas Demeester, and Bart Dhoedt. 1999. Representation Learning for Very Short Texts Using Weighted Word Embedding Aggregation. Pattern Recognition Letters, Vol. 80, C (1999), 150--156.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 4171--4186.
[8]
Cicero Nogueira dos Santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, and Bing Xiang. 2020. Beyond [CLS] through Ranking by Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1722--1727.
[9]
Luyu Gao, Zhuyun Dai, and Jamie Callan. 2020 a. Understanding BERT Rankers Under Distillation. In Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR 2020). 149--152.
[10]
Luyu Gao, Zhuyun Dai, Zhen Fan, and Jamie Callan. 2020 b. Complementing Lexical Retrieval with Semantic Residual Embedding. arXiv:2004.13969 (2020).
[11]
Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient Natural Language Response Suggestion for Smart Reply. arXiv:1705.00652 (2017).
[12]
Sebastian Hofst"atter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2020 a. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. arXiv:2010.02666 (2020).
[13]
Sebastian Hofst"atter, Markus Zlabinger, and Allan Hanbury. 2020 b. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking. In Proceedings of the 24th European Conference on Artificial Intelligence (ECAI 2020). Santiago de Compostela, Spain.
[14]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In Proceedings of 22nd International Conference on Information and Knowledge Management (CIKM 2013). San Francisco, California, 2333--2338.
[15]
Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020) .
[16]
Vladimir Karpukhin, Barlas Ouguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. arXiv:2004.04906 (2020).
[17]
Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). 39--48.
[18]
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, 6086--6096.
[19]
Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, and Yingfei Sun. 2020. PARADE: Passage Representation Aggregation for Document Reranking. arXiv:2008.09093 (2020).
[20]
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2020 a. Pretrained Transformers for Text Ranking: BERT and Beyond. arXiv:2010.06467 (2020).
[21]
Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. 2020 b. Distilling Dense Representations for Ranking using Tightly-Coupled Teachers. arXiv:2010.11386 (2020).
[22]
Wenhao Lu, Jian Jiao, and Ruofei Zhang. 2020. TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval. arXiv:2002.06275 (2020).
[23]
Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Expansion via Prediction of Importance with Contextualization. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). 1573--1576.
[24]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019). Paris, France, 1101--1104.
[25]
Yoshitomo Matsubara, Thuy Vu, and Alessandro Moschitti. 2020. Reranking for Efficient Transformer-Based Answer Selection. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). 1577--1580.
[26]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26 (NIPS 2013). Lake Tahoe, California, 3111--3119.
[27]
Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, and Nick Craswell. 2020. Conformer-Kernel with Query Term Independence for Document Retrieval. arXiv:2007.10434 (2020).
[28]
Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and Rich Caruana. 2016. A Dual Embedding Space Model for Document Ranking. arXiv:1602.01137v1 (2016).
[29]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019).
[30]
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020. 708--718.
[31]
Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019).
[32]
Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019 a. Multi-Stage Document Ranking with BERT. In arXiv:1910.14424 .
[33]
Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019 b. Document Expansion by Query Prediction. In arXiv:1904.08375 .
[34]
Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2020. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. arXiv:2010.08191 (2020).
[35]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 3982--3992.
[36]
Luca Soldaini and Alessandro Moschitti. 2020. The Cascade Transformer: an Application for Efficient Answer Sentence Selection. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5697--5708.
[37]
Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, and Jason Weston. 2018. StarSpace: Embed All The Things!. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018) .
[38]
Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, and Shaoping Ma. 2020. Leveraging Passage-Level Cumulative Gain for Document Ranking. In Proceedings of The Web Conference 2020 (WWW '20). 2421--2431.
[39]
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. arXiv:2007.00808 (2020).
[40]
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, and Jaap Kamps. 2018. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM 2018). Torino, Italy, 497--506.
[41]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. RepBERT: Contextualized Text Embeddings for First-Stage Retrieval. arXiv:2006.15498 (2020).

Cited By

View all
  • (2024)Extracting Political Interest Model from Interaction Data Based on Novel Word-level Bias AssignmentACM Transactions on Intelligent Systems and Technology10.1145/370264916:1(1-21)Online publication date: 31-Oct-2024
  • (2024)Neural Retrievers are Biased Towards LLM-Generated ContentProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671882(526-537)Online publication date: 25-Aug-2024
  • (2024)Enhancing Asymmetric Web Search through Question-Answer Generation and RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671517(6127-6136)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. Pretrained Transformers for Text Ranking: BERT and Beyond

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining
    March 2021
    1192 pages
    ISBN:9781450382977
    DOI:10.1145/3437963
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. text ranking
    2. transformers

    Qualifiers

    • Tutorial

    Funding Sources

    • Natural Sciences and Engineering Research Council of Canada
    • Canada First Research Excellence Fund

    Conference

    WSDM '21

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)683
    • Downloads (Last 6 weeks)109
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Extracting Political Interest Model from Interaction Data Based on Novel Word-level Bias AssignmentACM Transactions on Intelligent Systems and Technology10.1145/370264916:1(1-21)Online publication date: 31-Oct-2024
    • (2024)Neural Retrievers are Biased Towards LLM-Generated ContentProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671882(526-537)Online publication date: 25-Aug-2024
    • (2024)Enhancing Asymmetric Web Search through Question-Answer Generation and RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671517(6127-6136)Online publication date: 25-Aug-2024
    • (2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
    • (2024)Custom Architecture for Effective Semantic App Search: A Systematic Approach2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS64042.2024.10816842(1-6)Online publication date: 7-Nov-2024
    • (2024)Identifying Learning Leaders in Online Social Networks Based on Community of Practice Theoretical Framework and Information EntropyIEEE Access10.1109/ACCESS.2024.344645412(116622-116636)Online publication date: 2024
    • (2024)Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear StructureIEEE Access10.1109/ACCESS.2024.337851812(43734-43746)Online publication date: 2024
    • (2024)Efficient Classification of Malicious URLs: M-BERT—A Modified BERT Variant for Enhanced Semantic UnderstandingIEEE Access10.1109/ACCESS.2024.335709512(13453-13468)Online publication date: 2024
    • (2024)Mono-lingual text reuse detection for the Urdu language at lexical levelEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109003136(109003)Online publication date: Oct-2024
    • (2024)Integrating Social Environment in Machine Learning Model for Debiased RecommendationMobile and Ubiquitous Systems: Computing, Networking and Services10.1007/978-3-031-63992-0_14(219-230)Online publication date: 19-Jul-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media