short-paper

Open access

Block-distributed Gradient Boosted Trees

Authors:

Theodore Vasiloudis,

Hyunsu Cho,

Henrik BoströmAuthors Info & Claims

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1025 - 1028

https://doi.org/10.1145/3331184.3331331

Published: 18 July 2019 Publication History

PDF eReader

Abstract

The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.

References

[1]

Peter Bühlmann and Torsten Hothorn. 2007. Boosting Algorithms: Regularization, Prediction and Model Fitting. Statist. Sci., Vol. 22 (2007), 477--505.

Google Scholar

[2]

Tianqi Chen, Ignacio Cano, and Tianyi Zhou. 2015. RABIT: A Reliable Allreduce and Broadcast Interface. Technical Report. University of Washington.

Google Scholar

[3]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). 785--794.

Digital Library

Google Scholar

[4]

Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Trans. Inf. Syst., Vol. 35, 2 (2016), 15:1--15:31.

Digital Library

Google Scholar

[5]

Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Qui nonero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (ADKDD'14). 5:1--5:9.

Digital Library

Google Scholar

[6]

Jiawei Jiang, Bin Cui, Ce Zhang, and Fangcheng Fu. 2018. DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 1363--1376.

Digital Library

Google Scholar

[7]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30. 3146--3154.

Digital Library

Google Scholar

[8]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 583--598.

Digital Library

Google Scholar

[9]

Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014. Communication Efficient Distributed Machine Learning with the Parameter Server. In Advances in Neural Information Processing Systems 27. 19--27.

Digital Library

Google Scholar

[10]

Ping Li, Qiang Wu, and Christopher J. Burges. 2008. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting. In Advances in Neural Information Processing Systems 20. 897--904.

Digital Library

Google Scholar

[11]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2015. QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15). 73--82.

Digital Library

Google Scholar

[12]

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, and Tie-Yan Liu. 2016. A Communication-Efficient Parallel Algorithm for Decision Tree. In Advances in Neural Information Processing Systems 29. 1279--1287.

Digital Library

Google Scholar

[13]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31. 6638--6648.

Digital Library

Google Scholar

[14]

Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Int. J. High Perform. Comput. Appl., Vol. 19, 1 (2005), 49--66.

Digital Library

Google Scholar

Index Terms

Block-distributed Gradient Boosted Trees
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
  2. Machine learning

Recommendations

Extreme Gradient Boosted Multi-label Trees for Dynamic Classifier Chains
Discovery Science
Abstract
Classifier chains is a key technique in multi-label classification, since it allows to consider label dependencies effectively. However, the classifiers are aligned according to a static order of the labels. In the concept of dynamic classifier ...
Gradient boosted trees for evolving data streams
Abstract
Gradient Boosting is a widely-used machine learning technique that has proven highly effective in batch learning. However, its effectiveness in stream learning contexts lags behind bagging-based ensemble methods, which currently dominate the ...
GradSA: Gradient Sparsification and Accumulation for Communication-Efficient Distributed Deep Learning
Green, Pervasive, and Cloud Computing
Abstract
Large-scale distributed deep learning is of great importance in various applications. For distributed training, the inter-node gradient communication often becomes the performance bottleneck. Gradient sparsification has been proposed to reduce the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2019

1512 pages

ISBN:9781450361729

DOI:10.1145/3331184

General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Swedish Foundation for Strategic Research

Conference

SIGIR '19

Sponsor:

SIGIR

SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 21 - 25, 2019

Paris, France

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
997
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)13

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Extreme Gradient Boosted Multi-label Trees for Dynamic Classifier Chains

Gradient boosted trees for evolving data streams

GradSA: Gradient Sparsification and Accumulation for Communication-Efficient Distributed Deep Learning