[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2486767.2486771acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Published: 23 June 2013 Publication History

Abstract

Incremental gradient descent is a general technique to solve a large class of convex optimization problems arising in many machine learning tasks. GLADE is a parallel infrastructure for big data analytics providing a generic task specification interface. In this paper, we present a scalable and efficient parallel solution for incremental gradient descent in GLADE. We provide empirical evidence that our solution is limited only by the physical hardware characteristics, uses effectively the available resources, and achieves maximum scalability. When deployed in the cloud, our solution has the potential to dramatically reduce the cost of complex analytics over massive datasets.

References

[1]
S. Arumugam and al. The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses. In SIGMOD 2010.
[2]
D. P. Bertsekas. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. MIT 2010.
[3]
Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB, 3(1), 2010.
[4]
Y. Cheng, C. Qin, and F. Rusu. GLADE: Big Data Analytics Made Easy. In SIGMOD 2012.
[5]
S. Cohen. User-Defined Aggregate Functions: Bridging Theory and Practice. In SIGMOD 2006.
[6]
J. Duchi, A. Agarwal, and M. J. Wainwright. Distributed Dual Averaging in Networks. In NIPS 2010.
[7]
X. Feng, A. Kumar, B. Recht, and C. Ré. Towards a Unified Architecture for in-RDBMS Analytics. In SIGMOD 2012.
[8]
R. Gemulla and al. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. In KDD 2011.
[9]
J. Hellerstein and al. The MADlib Analytics Library or MAD Skills, the SQL. In VLDB 2012.
[10]
J. Langford and al. Slow Learners are Fast. In NIPS 2009.
[11]
F. Niu, B. Recht, C. Ré, and S. J. Wright. A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS 2011.
[12]
C. Qin and F. Rusu. PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation. CoRR, abs/1206.0051, 2012.
[13]
F. Rusu and A. Dobra. GLADE: A Scalable Framework for Efficient Analytics. OS Review, 46(1), 2012.
[14]
M. Zaharia and al. Resilient Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI 2012.
[15]
M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized Stochastic Gradient Descent. In NIPS 2010.

Cited By

View all
  • (2019)Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display ServerIEEE Transactions on Mobile Computing10.1109/TMC.2018.285780918:5(1174-1187)Online publication date: 1-May-2019
  • (2019)Stochastic Gradient Descent on Modern Hardware: Multi-core CPU or GPU? Synchronous or Asynchronous?2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00113(1063-1072)Online publication date: May-2019
  • (2019)A Survey on Railway Assets: A Potential Domain for Big Data2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977714(1-6)Online publication date: Sep-2019
  • Show More Cited By

Index Terms

  1. Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DanaC '13: Proceedings of the Second Workshop on Data Analytics in the Cloud
    June 2013
    49 pages
    ISBN:9781450322027
    DOI:10.1145/2486767
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'13
    Sponsor:

    Acceptance Rates

    DanaC '13 Paper Acceptance Rate 9 of 16 submissions, 56%;
    Overall Acceptance Rate 19 of 34 submissions, 56%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display ServerIEEE Transactions on Mobile Computing10.1109/TMC.2018.285780918:5(1174-1187)Online publication date: 1-May-2019
    • (2019)Stochastic Gradient Descent on Modern Hardware: Multi-core CPU or GPU? Synchronous or Asynchronous?2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00113(1063-1072)Online publication date: May-2019
    • (2019)A Survey on Railway Assets: A Potential Domain for Big Data2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977714(1-6)Online publication date: Sep-2019
    • (2019)Prioritizing Factors Used in Designing of Test Cases: An ISM-MICMAC Based Analysis2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977643(1-4)Online publication date: Sep-2019
    • (2017)Dot-Product JoinProceedings of the 29th International Conference on Scientific and Statistical Database Management10.1145/3085504.3085512(1-12)Online publication date: 27-Jun-2017
    • (2017)Heterogeneity-aware Distributed Parameter ServersProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035933(463-478)Online publication date: 9-May-2017
    • (2017)KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics2017 IEEE 33rd International Conference on Data Engineering (ICDE)10.1109/ICDE.2017.109(535-546)Online publication date: Apr-2017
    • (2016)Learning Linear Regression Models over Factorized JoinsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882939(3-18)Online publication date: 26-Jun-2016
    • (2016)Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications2016 45th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2016.31(115-122)Online publication date: Aug-2016
    • (2015)Speculative Approximations for Terascale Distributed Gradient Descent OptimizationProceedings of the Fourth Workshop on Data analytics in the Cloud10.1145/2799562.2799563(1-10)Online publication date: 31-May-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media