[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3292500.3330677acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank

Published: 25 July 2019 Publication History

Abstract

Learning-to-Rank deals with maximizing the utility of a list of examples presented to the user, with items of higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification and regression based learning, support for learning-to-rank in deep learning has been limited. We introduce TensorFlow Ranking, the first open source library for solving large-scale ranking problems in a deep learning framework. It is highly configurable and provides easy-to-use APIs to support different scoring mechanisms, loss functions and evaluation metrics in the learning-to-rank setting. Our library is developed on top of TensorFlow and can thus fully leverage the advantages of this platform. TensorFlow Ranking has been deployed in production systems within Google; it is highly scalable, both in training and in inference, and can be used to learn ranking models over massive amounts of user activity data, which can include heterogeneous dense and sparse features. We empirically demonstrate the effectiveness of our library in learning ranking functions for large-scale search and recommendation applications in Gmail and Google Drive. We also show that ranking models built using our model scale well for distributed training, without significant impact on metrics. The proposed library is available to the open source community, with the hope that it facilitates further academic research and industrial applications in the field of learning-to-rank.

Supplementary Material

MP4 File (p2970-kumar.mp4)

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation. 265--283.
[2]
Qingyao Ai, Jiaxin Mao, Yiqun Liu, and W Bruce Croft. 2018a. Unbiased Learning to Rank: Theory and Practice. In 2018 ACM SIGIR International Conference on Theory of Information Retrieval. 1--2.
[3]
Qingyao Ai, Xuanhui Wang, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2018b. Learning Groupwise Scoring Functions Using Deep Neural Networks. arXiv preprint arXiv:1811.04415 (2018).
[4]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 8 (2013), 1798--1828.
[5]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In 22nd International Conference on Machine Learning . 89--96.
[6]
Christopher J.C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview . Technical Report Technical Report MSR-TR-2010--82. Microsoft Research.
[7]
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In 24th International Conference on Machine Learning. 129--136.
[8]
Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In 18th ACM Conference on Information and Knowledge Management. 621--630.
[9]
Yves Chauvin and David E Rumelhart. 2013. Backpropagation: theory, architectures, and applications .Psychology Press.
[10]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015).
[11]
Wei Chen, Tie-Yan Liu, Yanyan Lan, Zhi-Ming Ma, and Hang Li. 2009. Ranking Measures and Loss Functions in Learning to Rank. Advances in Neural Information Processing Systems. 315--323.
[12]
Heng-Tze Cheng, Zakaria Haque, Lichan Hong, Mustafa Ispir, Clemens Mewald, Illia Polosukhin, Georgios Roumpos, D Sculley, Jamie Smith, David Soergel, et almbox. 2017. Tensorflow estimators: Managing simplicity vs. flexibility in high-level machine learning frameworks. In 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1763--1771.
[13]
Wei Chu and Zoubin Ghahramani. 2005. Preference Learning with Gaussian Processes. In 22nd International Conference on Machine Learning . 137--144.
[14]
Nick Craswell. 2009. Mean reciprocal rank. Encyclopedia of Database Systems . Springer, 1703--1703.
[15]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, Vol. 12 (July 2011), 2121--2159.
[16]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics, Vol. 29, 5 (2001), 1189--1232.
[17]
Norbert Fuhr. 1989. Optimum Polynomial Retrieval Functions Based on the Probability Ranking Principle. ACM Transactions on Information Systems, Vol. 7, 3 (1989), 183--204.
[18]
Fredric C. Gey. 1994. Inferring Probability of Relevance Using the Method of Logistic Regression. In 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 222--231.
[19]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning .MIT Press Cambridge.
[20]
Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446.
[21]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In 22nd ACM International Conference on Multimedia. 675--678.
[22]
Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142.
[23]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 217--226.
[24]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data As Implicit Feedback. In 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval . 154--161.
[25]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In 10th ACM International Conference on Web Search and Data Mining. 781--789.
[26]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30. 3146--3154.
[27]
Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, Michael A Arbib (Ed.). MIT Press, 255--258.
[28]
Hang Li. 2011. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, Vol. 4, 1 (2011), 1--113.
[29]
Donald A Metzler, W Bruce Croft, and Andrew Mccallum. 2005. Direct maximization of rank-based metrics for information retrieval . CIIR report 429. University of Massachusetts.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems 26. 3111--3119.
[31]
Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. arXiv preprint arXiv:1705.01509 (2017).
[32]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In 27th International Conference on Machine Learning. 807--814.
[33]
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. TensorFlow-Serving: Flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139 (2017).
[34]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In AutoDiff Workshop at NIPS 2017 .
[35]
Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A General Approximation Framework for Direct Optimization of Information Retrieval Measures. Information Retrieval, Vol. 13, 4 (2010), 375--397.
[36]
Miikka P Silfverberg, Lingshuang Jack Mao, and Mans Hulden. 2018. Sound Analogies with Phoneme Embeddings. Proc. of the Society for Computation in Linguistics (SCiL) (2018), 136--144.
[37]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, Vol. 15, 1 (2014), 1929--1958.
[38]
Sandeep Tata, Alexandrin Popescul, Marc Najork, Mike Colagrosso, Julian Gibbons, Alan Green, Alexandre Mah, Michael Smith, Divanshu Garg, Cayden Meyer, et almbox. 2017. Quick Access: Building a Smart Experience for Google Drive. In 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 1643--1651.
[39]
Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In 1st International Conference on Web Search and Web Data Mining. 77--86.
[40]
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In 39th International ACM SIGIR conference on Research and Development in Information Retrieval . 115--124.
[41]
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018a. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In 11th ACM International Conference on Web Search and Data Mining. 610 --618.
[42]
Xuanhui Wang, Cheng Li, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2018b. The LambdaLoss Framework for Ranking Metric Optimization. In 27th ACM International Conference on Information and Knowledge Management. 1313--1322.
[43]
Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In 25th International Conference on Machine Learning. 1192--1199.
[44]
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . 55--64.
[45]
Jun Xu and Hang Li. 2007. AdaRank: A Boosting Algorithm for Information Retrieval. In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval . 391--398.
[46]
Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond Position Bias: Examining Result Attractiveness As a Source of Presentation Bias in Clickthrough Data. In 19th International Conference on World Wide Web . 1011--1018.
[47]
Hamed Zamani, Michael Bendersky, Xuanhui Wang, and Mingyang Zhang. 2017. Situational Context for Ranking in Personal Search. In 26th International Conference on World Wide Web. 1531--1540.
[48]
Mu Zhu. 2004. Recall, precision and average precision . Technical Report. Department of Statistics and Actuarial Science, University of Waterloo.

Cited By

View all
  • (2024)Towards a Technical Debt for AI-based Recommender SystemProceedings of the 7th ACM/IEEE International Conference on Technical Debt10.1145/3644384.3648574(36-39)Online publication date: 14-Apr-2024
  • (2024)A Self-boosted Framework for Calibrated RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671570(6226-6235)Online publication date: 25-Aug-2024
  • (2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. learning-to-rank
    3. machine learning
    4. recommender systems

    Qualifiers

    • Research-article

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)304
    • Downloads (Last 6 weeks)29
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards a Technical Debt for AI-based Recommender SystemProceedings of the 7th ACM/IEEE International Conference on Technical Debt10.1145/3644384.3648574(36-39)Online publication date: 14-Apr-2024
    • (2024)A Self-boosted Framework for Calibrated RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671570(6226-6235)Online publication date: 25-Aug-2024
    • (2024)Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search DatasetProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657892(1546-1556)Online publication date: 10-Jul-2024
    • (2024)PlayerRank: Leveraging Learning-to-Rank AI for Player Positioning in CricketIEEE Access10.1109/ACCESS.2024.349552812(177504-177519)Online publication date: 2024
    • (2024)Engineering Features From Advanced Medical Technology Initiative Submissions to Enable Predictive Modeling for Proposal SuccessMilitary Medicine10.1093/milmed/usae063189:Supplement_3(149-155)Online publication date: 19-Aug-2024
    • (2024)Feature engineering in learning-to-rank for community question answering taskInternational Journal of Computers and Applications10.1080/1206212X.2024.238064746:8(555-566)Online publication date: 7-Aug-2024
    • (2024)Learning bivariate scoring functions for rankingDiscover Computing10.1007/s10791-024-09444-727:1Online publication date: 27-Sep-2024
    • (2024)Zero-shot Automated Class Imbalanced LearningPattern Recognition10.1007/978-3-031-78383-8_10(140-155)Online publication date: 2-Dec-2024
    • (2024)Learning-to-Rank with Nested FeedbackAdvances in Information Retrieval10.1007/978-3-031-56063-7_22(306-315)Online publication date: 24-Mar-2024
    • (2023)RD-SuiteProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667673(35748-35760)Online publication date: 10-Dec-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media