More Web Proxy on the site http://driver.im/

research-article

Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE

Authors:

Florin RusuAuthors Info & Claims

DanaC '13: Proceedings of the Second Workshop on Data Analytics in the Cloud

Pages 16 - 20

https://doi.org/10.1145/2486767.2486771

Published: 23 June 2013 Publication History

Abstract

Incremental gradient descent is a general technique to solve a large class of convex optimization problems arising in many machine learning tasks. GLADE is a parallel infrastructure for big data analytics providing a generic task specification interface. In this paper, we present a scalable and efficient parallel solution for incremental gradient descent in GLADE. We provide empirical evidence that our solution is limited only by the physical hardware characteristics, uses effectively the available resources, and achieves maximum scalability. When deployed in the cloud, our solution has the potential to dramatically reduce the cost of complex analytics over massive datasets.

References

[1]

S. Arumugam and al. The DataPath System: A Data-Centric Analytic Processing Engine for Large Data Warehouses. In SIGMOD 2010.

Digital Library

[2]

D. P. Bertsekas. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. MIT 2010.

[3]

Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB, 3(1), 2010.

Digital Library

[4]

Y. Cheng, C. Qin, and F. Rusu. GLADE: Big Data Analytics Made Easy. In SIGMOD 2012.

Digital Library

[5]

S. Cohen. User-Defined Aggregate Functions: Bridging Theory and Practice. In SIGMOD 2006.

Digital Library

[6]

J. Duchi, A. Agarwal, and M. J. Wainwright. Distributed Dual Averaging in Networks. In NIPS 2010.

Digital Library

[7]

X. Feng, A. Kumar, B. Recht, and C. Ré. Towards a Unified Architecture for in-RDBMS Analytics. In SIGMOD 2012.

Digital Library

[8]

R. Gemulla and al. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. In KDD 2011.

Digital Library

[9]

J. Hellerstein and al. The MADlib Analytics Library or MAD Skills, the SQL. In VLDB 2012.

Digital Library

[10]

J. Langford and al. Slow Learners are Fast. In NIPS 2009.

Digital Library

[11]

F. Niu, B. Recht, C. Ré, and S. J. Wright. A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS 2011.

Digital Library

[12]

C. Qin and F. Rusu. PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation. CoRR, abs/1206.0051, 2012.

[13]

F. Rusu and A. Dobra. GLADE: A Scalable Framework for Efficient Analytics. OS Review, 46(1), 2012.

Digital Library

[14]

M. Zaharia and al. Resilient Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI 2012.

Digital Library

[15]

M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized Stochastic Gradient Descent. In NIPS 2010.

Digital Library

Cited By

Marz SZanden BGao W(2019)Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display ServerIEEE Transactions on Mobile Computing10.1109/TMC.2018.285780918:5(1174-1187)Online publication date: 1-May-2019
https://doi.org/10.1109/TMC.2018.2857809
Ma YRusu FTorres M(2019)Stochastic Gradient Descent on Modern Hardware: Multi-core CPU or GPU? Synchronous or Asynchronous?2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00113(1063-1072)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00113
Jain YYogesh (2019)A Survey on Railway Assets: A Potential Domain for Big Data2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977714(1-6)Online publication date: Sep-2019
https://doi.org/10.1109/ICICT46931.2019.8977714
Show More Cited By

Index Terms

Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE
1. Information systems
  1. Data management systems

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DanaC '13: Proceedings of the Second Workshop on Data Analytics in the Cloud

June 2013

49 pages

ISBN:9781450322027

DOI:10.1145/2486767

Conference Chairs:
Kostas Tzoumas
Technische Universität Berlin
,
Shivnath Babu
Duke University

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SIGMOD/PODS'13

Sponsor:

SIGMOD

SIGMOD/PODS'13: International Conference on Management of Data

June 23, 2013

New York, New York

Acceptance Rates

DanaC '13 Paper Acceptance Rate 9 of 16 submissions, 56%;

Overall Acceptance Rate 19 of 34 submissions, 56%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marz SZanden BGao W(2019)Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display ServerIEEE Transactions on Mobile Computing10.1109/TMC.2018.285780918:5(1174-1187)Online publication date: 1-May-2019
https://doi.org/10.1109/TMC.2018.2857809
Ma YRusu FTorres M(2019)Stochastic Gradient Descent on Modern Hardware: Multi-core CPU or GPU? Synchronous or Asynchronous?2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00113(1063-1072)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00113
Jain YYogesh (2019)A Survey on Railway Assets: A Potential Domain for Big Data2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977714(1-6)Online publication date: Sep-2019
https://doi.org/10.1109/ICICT46931.2019.8977714
Jain PSharma S(2019)Prioritizing Factors Used in Designing of Test Cases: An ISM-MICMAC Based Analysis2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977643(1-4)Online publication date: Sep-2019
https://doi.org/10.1109/ICICT46931.2019.8977643
Qin CRusu FChoudhary AWu KDong B(2017)Dot-Product JoinProceedings of the 29th International Conference on Scientific and Statistical Database Management10.1145/3085504.3085512(1-12)Online publication date: 27-Jun-2017
https://dl.acm.org/doi/10.1145/3085504.3085512
Jiang JCui BZhang CYu LChirkova RYang JSuciu D(2017)Heterogeneity-aware Distributed Parameter ServersProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035933(463-478)Online publication date: 9-May-2017
https://dl.acm.org/doi/10.1145/3035918.3035933
Sparks EVenkataraman SKaftan TFranklin MRecht B(2017)KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics2017 IEEE 33rd International Conference on Data Engineering (ICDE)10.1109/ICDE.2017.109(535-546)Online publication date: Apr-2017
https://doi.org/10.1109/ICDE.2017.109
Schleich MOlteanu DCiucanu RÖzcan FKoutrika GMadden S(2016)Learning Linear Regression Models over Factorized JoinsProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882939(3-18)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1145/2882903.2882939
Wang BTorres MLi DZhao JRusu F(2016)Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications2016 45th International Conference on Parallel Processing Workshops (ICPPW)10.1109/ICPPW.2016.31(115-122)Online publication date: Aug-2016
https://doi.org/10.1109/ICPPW.2016.31
Qin CRusu F(2015)Speculative Approximations for Terascale Distributed Gradient Descent OptimizationProceedings of the Fourth Workshop on Data analytics in the Cloud10.1145/2799562.2799563(1-10)Online publication date: 31-May-2015
https://dl.acm.org/doi/10.1145/2799562.2799563
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten