short-paper

X-DART: Blending Dropout and Pruning for Efficient Learning to Rank

Authors:

Claudio Lucchese,

Franco Maria Nardini,

Salvatore Orlando,

Raffaele Perego,

Salvatore TraniAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1077 - 1080

https://doi.org/10.1145/3077136.3080725

Published: 07 August 2017 Publication History

Get Access

Abstract

In this paper we propose X-DART, a new Learning to Rank algorithm focusing on the training of robust and compact ranking models. Motivated from the observation that the last trees of MART models impact the prediction of only a few instances of the training set, we borrow from the DART algorithm the dropout strategy consisting in temporarily dropping some of the trees from the ensemble while new weak learners are trained. However, differently from this algorithm we drop permanently these trees on the basis of smart choices driven by accuracy measured on the validation set. Experiments conducted on publicly available datasets shows that X-DART outperforms DART in training models providing the same effectiveness by employing up to 40% less trees.

References

[1]

G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. 2016. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management Vol. 52, 6 (2016), 1161 -- 1177.

Digital Library

Google Scholar

[2]

T. Chen and C. Guestrin 2016. XGBoost: A Scalable Tree Boosting System. In Proc. ACM SIGKDD. ACM, 785--794.

Digital Library

Google Scholar

[3]

D. Dato, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini 2016. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Trans. Inf. Syst. Vol. 35, 2, Article bibinfoarticleno15 (2016), bibinfonumpages15:1--15:31 pages.

Digital Library

Google Scholar

[4]

J. H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics Vol. 29 (2000), 1189--1232.

Crossref

Google Scholar

[5]

A. Gulin, I. Kuralenok, and D. Pavlov 2011. Winning The Transfer Learning Track of Yahoo!'s Learning To Rank Challenge with YetiRank. Yahoo! Learning to Rank Challenge. 63--76.

Google Scholar

[6]

C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, F. Silvestri, and S. Trani. Post-Learning Optimization of Tree Ensembles for Efficient Ranking Proc. ACM SIGIR '16'. ACM, 949--952.

Google Scholar

[7]

C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2015. QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees Proc. ACM SIGIR. 73--82.

Google Scholar

[8]

K.V. Rashmi and R. Gilad-Bachrach 2015. Dart: Dropouts meet multiple additive regression trees. Journal of Machine Learning Research Vol. 38 (2015).

Google Scholar

[9]

M. D. Smucker, J. Allan, and B. Carterette 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation Proc. CIKM. ACM.

Google Scholar

[10]

N. Srivastava, G. E Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958.

Digital Library

Google Scholar

[11]

Q. Wu, C.J.C. Burges, K.M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).

Google Scholar

Cited By

View all

Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
Sutou AWang J(2024)Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence FunctionsIEEE Access10.1109/ACCESS.2024.352015912(193473-193486)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3520159
Bruch SMackenzie JMaistro MNardini FChen HDuh WHuang HKato MMothe JPoblete B(2023)ReNeuIR at SIGIR 2023: The Second Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591922(3456-3459)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591922
Show More Cited By

Index Terms

X-DART: Blending Dropout and Pruning for Efficient Learning to Rank
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

X-CLEaVER: Learning Ranking Ensembles by Growing and Pruning Trees
Regular Papers

Learning-to-Rank (LtR) solutions are commonly used in large-scale information retrieval systems such as Web search engines, which have to return highly relevant documents in response to user query within fractions of seconds. The most effective LtR ...
Selective Gradient Boosting for Effective Learning to Rank
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Learning an effective ranking function from a large number of query-document examples is a challenging task. Indeed, training sets where queries are associated with a few relevant documents and a large number of irrelevant ones are required to model ...
Estimating student dropout in distance higher education using semi-supervised techniques
PCI '15: Proceedings of the 19th Panhellenic Conference on Informatics

Nowadays, distance higher education has rapidly increased due to advance and integration of information and communications' technology. Students who attend online distance courses have often family obligations and job commitments and are usually in '...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

European Commission

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
241
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
Sutou AWang J(2024)Influence-Balanced XGBoost: Improving XGBoost for Imbalanced Data Using Influence FunctionsIEEE Access10.1109/ACCESS.2024.352015912(193473-193486)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3520159
Bruch SMackenzie JMaistro MNardini FChen HDuh WHuang HKato MMothe JPoblete B(2023)ReNeuIR at SIGIR 2023: The Second Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591922(3456-3459)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591922
Wang KLu JLiu AZhang GXiong L(2023)Evolving Gradient Boost: A Pruning Scheme Based on Loss Improvement Ratio for Learning Under Concept DriftIEEE Transactions on Cybernetics10.1109/TCYB.2021.310979653:4(2110-2123)Online publication date: Apr-2023
https://doi.org/10.1109/TCYB.2021.3109796
Busolin FLucchese CNardini FOrlando SPerego RTrani S(2023)Early Exit Strategies for Learning-to-Rank CascadesIEEE Access10.1109/ACCESS.2023.333108811(126691-126704)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3331088
Dato DMacAvaney SNardini FPerego RTonellotto NAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)The Istella22 DatasetProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531740(3099-3107)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531740
Bruch SLucchese CNardini FAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)ReNeuIR: Reaching Efficiency in Neural Information RetrievalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531704(3462-3465)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531704
Gil-Costa VLoor FMolina RNardini FPerego RTrani S(2022)Ensemble Model Compression for Fast and Energy-Efficient Ranking on FPGAsAdvances in Information Retrieval10.1007/978-3-030-99736-6_18(260-273)Online publication date: 5-Apr-2022
https://doi.org/10.1007/978-3-030-99736-6_18
Busolin FLucchese CNardini FOrlando SPerego RTrani SDiaz FShah CSuel TCastells PJones RSakai T(2021)Learning Early Exit Strategies for Additive Ranking EnsemblesProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463088(2217-2221)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463088
Lucchese CNardini FOrlando SPerego RTrani SHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Query-level Early Exit for Additive Learning-to-Rank EnsemblesProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401256(2033-2036)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401256
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

X-CLEaVER: Learning Ranking Ensembles by Growing and Pruning Trees

Selective Gradient Boosting for Effective Learning to Rank

Estimating student dropout in distance higher education using semi-supervised techniques