[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/BigData.2015.7363760guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Machine learning at the limit

Published: 29 October 2015 Publication History

Abstract

Many systems have been developed for machine learning at scale. Performance has steadily improved, but there has been relatively little work on explicitly defining or approaching the limits of performance. In this paper we describe the application of roofline design, an approach borrowed from computer architecture, to large-scale machine learning. In roofline design, one exposes ALU, memory, and network limits, and the constraints they imply for algorithms. Using roofline design, we have developed a system called BIDMach which has demonstrated the highest performance to date for many ML problems. On one GPU-accelerated node, it generally outperforms other single-machine toolkits and cluster toolkits running on 100s of nodes. This performance level is enabled by a relatively small number of rooflined matrix primitives. Such performance implies a dramatic reduction in the energy used to perform these calculations. Beyond matrix kernels, roofline design can be applied to the end-to-end design of machine learning algorithms which minimize memory usage to optimize speed. This approach offers a further 2x to 3x gain in performance. Roofline design can also be applied to network primitives. We describe recent work on a sparse allreduce primitive called Kylix. We have shown that Kylix approaches the practical network throughput limit for allreduce, a basic primitive for distributed machine learning. Using Kylix, we describe an efficient transformation from model-parallel to data-parallel calculations. This transformation uses a secondary storage roofline, with similar parameters to the network. Finally, we describe several deployments of these techniques on real-world problems in two large internet companies. Once again, single node rooflined design demonstrated substantial gains over alternatives on either single nodes or clusters.

Cited By

View all
  • (2024)BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and PerformanceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695953(313-329)Online publication date: 4-Nov-2024
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
BIG DATA '15: Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)
October 2015
3094 pages
ISBN:9781479999262

Publisher

IEEE Computer Society

United States

Publication History

Published: 29 October 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and PerformanceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695953(313-329)Online publication date: 4-Nov-2024
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
  • (2023)SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMMProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589054(1-15)Online publication date: 17-Jun-2023
  • (2021)FULL-W2VProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460373(455-466)Online publication date: 3-Jun-2021
  • (2018)Matrix Factorization on GPUs with Memory Optimization and Approximate ComputingProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225096(1-10)Online publication date: 13-Aug-2018
  • (2017)Optimizing Word2Vec Performance on Multicore SystemsProceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms10.1145/3149704.3149768(1-9)Online publication date: 12-Nov-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media