More Web Proxy on the site http://driver.im/

research-article

DimmWitted: a study of main-memory statistical analytics

Authors:

Christopher RéAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 7, Issue 12

Pages 1283 - 1294

https://doi.org/10.14778/2732977.2733001

Published: 01 August 2014 Publication History

Abstract

We perform the first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine. Statistical analytics systems differ from conventional SQL-analytics in the amount and types of memory incoherence that they can tolerate. Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. We study this new tradeoff space and discover that there are tradeoffs between hardware and statistical efficiency. We argue that our tradeoff study may provide valuable information for designers of analytics engines: for each system we consider, our prototype engine can run at least one popular task at least 100× faster. We conduct our study across five architectures using popular models, including SVMs, logistic regression, Gibbs sampling, and neural networks.

References

[1]

A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. A reliable effective terascale linear learning system. ArXiv e-prints, 2011.

[2]

M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, pages 1064--1075, 2012.

Digital Library

[3]

J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In PLDI, pages 112--125, 1993.

Digital Library

[4]

C. Balkesen and et al. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, pages 85--96, 2013.

Digital Library

[5]

M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: expressing locality and independence with logical regions. In SC, page 66, 2012.

Digital Library

[6]

L. Bergstrom. Measuring NUMA effects with the STREAM benchmark. ArXiv e-prints, 2011.

[7]

S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In ASPLOS, 1994.

Digital Library

[8]

H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun. A domain-specific approach to heterogeneous parallelism. In PPOPP, pages 35--46, 2011.

Digital Library

[9]

C. Chasseur and J. M. Patel. Design and evaluation of storage organizations for read-optimized main memory databases. PVLDB, pages 1474--1485, 2013.

Digital Library

[10]

J. Dean and et al. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012.

[11]

A. Ghoting and et al. Cache-conscious frequent pattern mining on modern and emerging processors. VLDBJ, 2007.

Digital Library

[12]

A. Ghoting and et al. SystemML: Declarative machine learning on MapReduce. In ICDE, pages 231--242, 2011.

Digital Library

[13]

J. M. Hellerstein and et al. The MADlib analytics library: Or MAD skills, the SQL. PVLDB, pages 1700--1711, 2012.

Digital Library

[14]

R. Jin, G. Yang, and G. Agrawal. Shared memory parallelization of data mining algorithms: Techniques, programming interface, and performance. TKDE, 2005.

Digital Library

[15]

M. J. Johnson, J. Saunderson, and A. S. Willsky. Analyzing Hogwild parallel Gaussian Gibbs sampling. In NIPS, 2013.

[16]

C. Kim and et al. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2009.

Digital Library

[17]

A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012.

Digital Library

[18]

Q. V. Le and et al. Building high-level features using large scale unsupervised learning. In ICML, pages 8595--8598, 2012.

[19]

Y. LeCun and et al. Gradient-based learning applied to document recognition. IEEE, pages 2278--2324, 1998.

[20]

Y. Li and et al. NUMA-aware algorithms: the case of data shuffling. In CIDR, 2013.

[21]

Y. Low and et al. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, pages 716--727, 2012.

Digital Library

[22]

T. M. Mitchell. Machine Learning. McGraw-Hill, USA, 1997.

Digital Library

[23]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP, 2013.

Digital Library

[24]

D. Nguyen, A. Lenharth, and K. Pingali. Deterministic Galois: On-demand, portable and parameterless. In ASPLOS, 2014.

Digital Library

[25]

F. Niu and et al. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, pages 693--701, 2011.

[26]

S. Parthasarathy, M. J. Zaki, M. Ogihara, and W. Li. Parallel data mining for association rules on shared memory systems. Knowl. Inf. Syst., pages 1--29, 2001.

Digital Library

[27]

L. Qiao and et al. Main-memory scan sharing for multi-core CPUs. PVLDB, pages 610--621, 2008.

Digital Library

[28]

V. Raman and et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, pages 1080--1091, 2013.

Digital Library

[29]

P. Richtárik and M. Taká&ccirc;. Parallel coordinate descent methods for big data optimization. ArXiv e-prints, 2012.

[30]

C. P. Robert and G. Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer, USA, 2005.

[31]

A. Smola and S. Narayanamurthy. An architecture for parallel topic models. PVLDB, pages 703--710, 2010.

Digital Library

[32]

S. Sonnenburg and et al. The SHOGUN machine learning toolbox. J. Mach. Learn. Res., pages 1799--1802, 2010.

Digital Library

[33]

E. Sparks and et al. MLI: An API for distributed machine learning. In ICDM, pages 1187--1192, 2013.

[34]

S. Sridhar and et al. An approximate, efficient LP solver for LP rounding. In NIPS, pages 2895--2903, 2013.

[35]

A. K. Sujeeth and et al. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In ICML, pages 609--616, 2011.

[36]

S. Tu and et al. Speedy transactions in multicore in-memory databases. In SOSP, pages 18--32, 2013.

Digital Library

[37]

M. Zaharia and et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.

Digital Library

[38]

M. Zaki and et al. Parallel classification for data mining on shared-memory multiprocessors. In ICDE, pages 198--205, 1999.

Digital Library

[39]

M. J. Zaki and et al. New algorithms for fast discovery of association rules. In KDD, pages 283--286, 1997.

Cited By

Kogge PVap JPepple D(2024)Preparing for Future Heterogeneous Systems Using Migrating ThreadsProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643801(15-22)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3642961.3643801
Miao XShi YYang ZCui BJia Z(2023)SDPipe: A Semi-Decentralized Framework for Heterogeneity-Aware Pipeline-parallel TrainingProceedings of the VLDB Endowment10.14778/3598581.359860416:9(2354-2363)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.14778/3598581.3598604
Chen ZXu CQian WZhou ADehnavi MKulkarni MKrishnamoorthy S(2023)Elastic Averaging for Efficient Pipelined DNN TrainingProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577484(380-391)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577484
Show More Cited By

Recommendations

Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA'16

Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
WOM-Code Solutions for Low Latency and High Endurance in Phase Change Memory
This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM—attributed to PCM SET—...
A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 7, Issue 12

August 2014

296 pages

ISSN:2150-8097

Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2014

Published in PVLDB Volume 7, Issue 12

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

50
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kogge PVap JPepple D(2024)Preparing for Future Heterogeneous Systems Using Migrating ThreadsProceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions10.1145/3642961.3643801(15-22)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3642961.3643801
Miao XShi YYang ZCui BJia Z(2023)SDPipe: A Semi-Decentralized Framework for Heterogeneity-Aware Pipeline-parallel TrainingProceedings of the VLDB Endowment10.14778/3598581.359860416:9(2354-2363)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.14778/3598581.3598604
Chen ZXu CQian WZhou ADehnavi MKulkarni MKrishnamoorthy S(2023)Elastic Averaging for Efficient Pipelined DNN TrainingProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577484(380-391)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577484
Yuan BWolfe CDun CTang YKyrillidis AJermaine C(2022)Distributed learning of fully connected neural networks using independent subnet trainingProceedings of the VLDB Endowment10.14778/3529337.352934315:8(1581-1590)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.14778/3529337.3529343
Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang CIves ZBonifati AEl Abbadi A(2022)In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data ShuffleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526150(1286-1300)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526150
Bandle MGiceva J(2021)Database technology for the massesProceedings of the VLDB Endowment10.14778/3476249.347629614:11(2483-2490)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476296
Yin LZhang YZhang ZPeng YZhao P(2021)ParaXProceedings of the VLDB Endowment10.14778/3447689.344769214:6(864-877)Online publication date: 12-Apr-2021
https://dl.acm.org/doi/10.14778/3447689.3447692
Guo YZhang ZJiang JWu WZhang CCui BLi J(2021)Model averaging in distributed machine learning: a case study with Apache SparkThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00664-730:4(693-712)Online publication date: 15-Apr-2021
https://dl.acm.org/doi/10.1007/s00778-021-00664-7
Luo QHe JZhuo YQian XLarus JCeze LStrauss K(2020)Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized TrainingProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378499(401-416)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378499
Mitra SHasanuzzaman MSaha S(2020)A Unified Multi-view Clustering Algorithm Using Multi-objective Optimization Coupled with Generative ModelACM Transactions on Knowledge Discovery from Data10.1145/336567314:1(1-31)Online publication date: 3-Feb-2020
https://dl.acm.org/doi/10.1145/3365673
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents