[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

ESTIMA: extrapolating scalability of in-memory applications

Published: 27 February 2016 Publication History

Abstract

This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ESTIMA extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying ESTIMA is the use of stalled cycles (e.g. cycles that the processor spends waiting for various events, such as cache misses or waiting on a lock). ESTIMA measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. ESTIMA can be effectively used to predict the scalability of in-memory applications. For instance, using measurements of memcached and SQLite on a desktop machine, we obtain accurate predictions of their scalability on a server. Our extensive evaluation on a large number of in-memory benchmarks shows that ESTIMA has generally low prediction errors.

References

[1]
N. I. Akhiezer. Theory of Approximation. Dover Publications, 1992.
[2]
A. R. Alameldeen and D. A. Wood. Ipc considered harmful for multiprocessor workloads. IEEE Micro, 26(4), 2006.
[3]
AMD. BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors, 2010.
[4]
B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. ICS '08. ACM, 2008.
[5]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, 2008.
[6]
L. Carrington, A. Snavely, and N. Wolter. A performance prediction framework for scientific applications. Future Generation Computer Systems, 22(3), 2006.
[7]
C. Coarfa, J. Mellor-Crummey, N. Froyd, and Y. Dotsenko. Scalability analysis of spmd codes using expectations. ICS '07. ACM, 2007.
[8]
M. E. Crovella and T. J. LeBlanc. Parallel performance prediction using lost cycles analysis. IEEE Computer Society Press, 1994.
[9]
C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: Sql server's memory-optimized oltp engine. SIGMOD '13. ACM, 2013.
[10]
A. Dragojević, P. Felber, V. Gramoli, and R. Guerraoui. Why stm can be more than a research toy. Commun. ACM, 54(4), Apr. 2011.
[11]
A. Dragojević, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In PLDI '09, New York, NY, USA, 2009. ACM.
[12]
B. Fan, D. G. Andersen, and M. Kaminsky. Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. NSDI'13. USENIX Association, 2013.
[13]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS '12. ACM, 2012.
[14]
B. Haagdorens, T. Vermeiren, and M. Goossens. Improving the performance of signature-based network intrusion detection sensors by multi-threading. In WISA'04, 2005.
[15]
A. Heindl and G. Pokam. An analytic framework for performance modeling of software transactional memory. Computer Networks, 53(8), 2009.
[16]
A. Heindl and G. Pokam. An analytic model for optimistic stm with lazy locking. In Analytical and Stochastic Modeling Techniques and Applications. Springer, 2009.
[17]
A. Heindl, G. Pokam, and A.-R. Adl-Tabatabai. An analytic model of optimistic software transactional memory. In ISPASS 2009. IEEE, 2009.
[18]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA '93. ACM, 1993.
[19]
A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. International Journal of High Performance Computing Applications, 14(4), 2000.
[20]
I. Intel. and ia-32 architectures software developers manual volume 3b: System programming guide. Part, 1, 2007.
[21]
E. Ipek, B. R. De Supinski, M. Schulz, and S. A. McKee. An approach to performance prediction for parallel applications. In Euro-Par 2005 Parallel Processing. Springer, 2005.
[22]
R. Jain. The Art of Computer Systems Performance Analysis: techniques for experimental design, measurement, simulation, and modeling. Wiley, 1991.
[23]
V. Jiménez, F. J. Cazorla, R. Gioiosa, M. Valero, C. Boneti, E. Kursun, C.-Y. Cher, C. Isci, A. Buyuktosunoglu, and P. Bose. Power and thermal characterization of power6 system. In PACT '10. ACM, 2010.
[24]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC2001. ACM, 2001.
[25]
B. C. Lee, D. M. Brooks, B. R. De Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In PPoPP '07. ACM, 2007.
[26]
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. Mica: A holistic approach to fast in-memory key-value storage. NSDI'14. USENIX Association, 2014.
[27]
G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In ACM SIGMETRICS Performance Evaluation Review, volume 32. ACM, 2004.
[28]
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC, 2008.
[29]
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. NSDI'13. USENIX Association, 2013.
[30]
G. R. Nudd, D. J. Kerbyson, E. Papaefstathiou, S. C. Perry, J. S. Harper, and D. V. Wilcox. PACEa toolset for the performance prediction of parallel and distributed systems. International Journal of High Performance Computing Applications, 14(3), 2000.
[31]
C. R. M. Olschanowsky. Hpc Application Address Stream Compression, Replay and Scaling. PhD thesis, La Jolla, CA, USA, 2011.
[32]
C. A. Petri. Communication with automata, new york: Griffiss air force base. Technical report, Tech. Rep. RADC-TR-65-377, 1966.
[33]
J. R. Phillips. Zunzun.com. http://www.zunzun.com.
[34]
D. E. Porter and E. Witchel. Understanding transactional memory performance. In ISPASS 2010. IEEE, 2010.
[35]
J. Ruppert. A delaunay refinement algorithm for quality 2-dimensional mesh generation. Journal of algorithms, 18(3), 1995.
[36]
N. Shavit and D. Touitou. Software transactional memory. In PODC '95. ACM, 1995.
[37]
K. Singh, M. Bhadauria, and S. A. McKee. Real time power estimation and thread scheduling via performance counters. SIGARCH Comput. Archit. News, 37(2), July 2009.
[38]
J. Torrellas, Y. Solihin, and V. Lam. Scal-tool: Pinpointing and quantifying scalability bottlenecks in dsm multiprocessors. In Supercomputing, ACM/IEEE 1999 Conference, Nov 1999.
[39]
T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In PACT '09. IEEE Computer Society, 2009.
[40]
A. Vega, A. Buyuktosunoglu, and P. Bose. Smt-centric power-aware thread placement in chip multiprocessors. In PACT '13. IEEE Press, 2013.
[41]
R. West, P. Zaroo, C. A. Waldspurger, and X. Zhang. Online cache modeling for commodity multicore processors. In PACT '10. ACM, 2010.
[42]
L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC2005. IEEE, 2005.
[43]
A. Yasin. A top-down method for performance analysis and counters architecture. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, March 2014.
[44]
J. Zhai, W. Chen, and W. Zheng. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In ACM Sigplan Notices, volume 45. ACM, 2010.

Cited By

View all
  • (2017)Following the Blind Seer – Creating Better Performance Models Using Less InformationEuro-Par 2017: Parallel Processing10.1007/978-3-319-64203-1_8(106-118)Online publication date: 1-Aug-2017
  • (2024)A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark PerformanceIEEE Transactions on Computers10.1109/TC.2024.336594173:5(1262-1274)Online publication date: May-2024
  • (2023)Evaluating Task-Level CPU Efficiency for Distributed Stream Processing SystemsBig Data and Cognitive Computing10.3390/bdcc70100497:1(49)Online publication date: 10-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    February 2016
    420 pages
    ISBN:9781450340922
    DOI:10.1145/2851141
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016
Published in SIGPLAN Volume 51, Issue 8

Check for updates

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Following the Blind Seer – Creating Better Performance Models Using Less InformationEuro-Par 2017: Parallel Processing10.1007/978-3-319-64203-1_8(106-118)Online publication date: 1-Aug-2017
  • (2024)A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark PerformanceIEEE Transactions on Computers10.1109/TC.2024.336594173:5(1262-1274)Online publication date: May-2024
  • (2023)Evaluating Task-Level CPU Efficiency for Distributed Stream Processing SystemsBig Data and Cognitive Computing10.3390/bdcc70100497:1(49)Online publication date: 10-Mar-2023
  • (2022)MAPPER: Managing Application Performance via Parallel Efficiency Regulation*ACM Transactions on Architecture and Code Optimization10.1145/350176719:2(1-26)Online publication date: 24-Mar-2022
  • (2019)Using Empirical Data for Scalability Analysis of Parallel ApplicationsParallel Computational Technologies10.1007/978-3-030-28163-2_5(58-73)Online publication date: 2-Aug-2019
  • (2017)The Dangers and Complexities of SQLite BenchmarkingProceedings of the 8th Asia-Pacific Workshop on Systems10.1145/3124680.3124719(1-6)Online publication date: 2-Sep-2017
  • (2017)PandiaProceedings of the Twelfth European Conference on Computer Systems10.1145/3064176.3064177(254-269)Online publication date: 23-Apr-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media