[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Flare & lantern: efficiently swapping horses midstream

Published: 01 August 2019 Publication History

Abstract

Running machine learning (ML) workloads at scale is as much a data management problem as a model engineering problem. Big performance challenges exist when data management systems invoke ML classifiers as user-defined functions (UDFs) or when stand-alone ML frameworks interact with data stores for data loading and pre-processing (ETL). In particular, UDFs can be precompiled or simply a black box for the data management system and the data layout may be completely different from the native layout, thus adding overheads at the boundaries. In this demo, we will show how bottlenecks between existing systems can be eliminated when their engines are designed around runtime compilation and native code generation, which is the case for many state-of-the-art relational engines as well as ML frameworks. We demonstrate an integration of Flare (an accelerator for Spark SQL), and Lantern (an accelerator for TensorFlow and PyTorch) that results in a highly optimized end-to-end compiled data path, switching between SQL and ML processing with negligible overhead.

References

[1]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous distributed systems, 2015.
[2]
M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: relational data processing in Spark. In SIGMOD, pages 1383--1394. ACM, 2015.
[3]
K. J. Brown, H. Lee, T. Rompf, A. K. Sujeeth, C. De Sa, C. Aberger, and K. Olukotun. Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns. CGO 2016, pages 194--205. ACM, 2016.
[4]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
[5]
G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf. Flare: Optimizing apache spark with native compilation for scale-up architectures and medium-size data. In OSDI, pages 799--815. USENIX Association, 2018.
[6]
Y. Futamura. Partial evaluation of computation process --- an approach to a compiler-compiler. Transactions of the Institute of Electronics and Communication Engineers of Japan, 54-C(8):721--728, 1971.
[7]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[8]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.
[9]
S. Palkar, J. J. Thomas, A. Shanbhag, D. Narayanan, H. Pirk, M. Schwarzkopf, S. Amarasinghe, M. Zaharia, and S. InfoLab. Weld: A common runtime for high performance data analytics. In CIDR, 2017.
[10]
T. Rompf and M. Odersky. Lightweight Modular Staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55(6):121--130, 2012.
[11]
A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. TECS, 13(4s):134, 2014.
[12]
R. Y. Tahboub, G. M. Essertel, and T. Rompf. How to architect a query compiler, revisited. In SIGMOD Conference, pages 307--322. ACM, 2018.
[13]
The Transaction Processing Council. TPC-H Version 2.15.0.
[14]
F. Wang, J. M. Decker, X. Wu, G. M. Essertel, and T. Rompf. Backpropagation with callbacks: Foundations for efficient and expressive differentiable programming. In NeurIPS, pages 10201--10212, 2018.
[15]
F. Wang and T. Rompf. A language and compiler view on differentiable programming. ICLR Workshop Track, 2018.
[16]
F. Wang, X. Wu, G. M. Essertel, J. M. Decker, and T. Rompf. Demystifying differentiable programming: Shift/reset the penultimate backpropagator. CoRR, abs/1803.10228, 2018.

Cited By

View all
  • (2021)ModularisProceedings of the VLDB Endowment10.14778/3484224.348422914:13(3308-3321)Online publication date: 1-Sep-2021
  • (2021)Towards a polyglot framework for factorized MLProceedings of the VLDB Endowment10.14778/3476311.347637214:12(2918-2931)Online publication date: 28-Oct-2021
  • (2020)Architecting a Query Compiler for Spatial WorkloadsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389701(2103-2118)Online publication date: 11-Jun-2020
  1. Flare & lantern: efficiently swapping horses midstream

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 12
    August 2019
    547 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2019
    Published in PVLDB Volume 12, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)ModularisProceedings of the VLDB Endowment10.14778/3484224.348422914:13(3308-3321)Online publication date: 1-Sep-2021
    • (2021)Towards a polyglot framework for factorized MLProceedings of the VLDB Endowment10.14778/3476311.347637214:12(2918-2931)Online publication date: 28-Oct-2021
    • (2020)Architecting a Query Compiler for Spatial WorkloadsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389701(2103-2118)Online publication date: 11-Jun-2020

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media