[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Towards software performance engineering for multicore and manycore systems

Published: 10 January 2014 Publication History

Abstract

In the era of multicore and manycore processors, a systematic engineering approach for software performance becomes more and more crucial to the success of modern software systems. This article argues for more software performance engineering research specifically for multicore and manycore systems, which will have a profound impact on software engineering practices.

References

[1]
http://highscalability.com/, last checked 2013-01-10.
[2]
C. U. Smith and L. G. Williams, Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software. Addison-Wesley, 2002.
[3]
M. Woodside, G. Franks, and D. C. Petriu, "The Future of Software Performance Engineering," in Proc. Future of Software Engineering (FOSE'07). IEEE Computer Society, 2007, pp. 171--187.
[4]
W. Hwu, S. Ryoo, S.-Z. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel, "Implicitly parallel programming models for thousand-core microprocessors," in Proc. 44th ACM/IEEE Design Automation Conference (DAC '07), june 2007, pp. 754--759.
[5]
H. Vandierendonck and T. Mens, "Techniques and tools for parallelizing software," IEEE Softw., vol. 29, no. 2, pp. 22--25, 2012.
[6]
C. A. Schaefer, V. Pankratius, and W. F. Tichy, "Engineering parallel applications with tunable architectures," in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ser. ICSE '10. ACM, 2010, pp. 405--414.
[7]
D. Petriu and M. Woodside, "An intermediate metamodel with scenarios and resources for generating performance models from uml designs," Software and Systems Modeling, vol. 6, no. 2, pp. 163--184, 2007.
[8]
http://velocityconf.com/, last checked 2013-01-10.
[9]
A. Snavely and D. M. Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading Processor," in Proceedings of ASPLOS 2000, 2000.
[10]
X. E. Chen and T. M. Aamodt, "A first-order fine-grained multithreaded throughput model," in Proceedings of HPCA 2009. IEEE, 2009, pp. 329--340.
[11]
S. Eyerman and L. Eeckhout, "Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling," 2010.
[12]
S. Zhuravlev, S. Blagodurov, and A. Fedorova, "Addressing Shared Resource Contention in Multicore Processors via Scheduling," in Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). ACM, 2010, pp. 129--142.
[13]
D. Eklov, D. Black-Schaffer, and E. Hagersten, "Fast Modeling of Shared Caches in Multicore Systems." ACM, 2011, pp. 147--157.
[14]
V. Babka, P. Libić, T. Martinec, and P. Tuma, "On the accuracy of cache sharing models," in Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, ser. ICPE '12. New York, NY, USA: ACM, 2012, pp. 21--32. {Online}. Available: http://doi.acm.org/10.1145/2188286.2188294
[15]
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger, "Power challenges may end the multicore era," Commun. ACM, vol. 56, no. 2, pp. 93--102, Feb. 2013. {Online}. Available: http://doi.acm.org/10.1145/2408776.2408797
[16]
J. Srinivasan and S. V. Adve, "Predictive Dynamic Thermal Management for Multimedia Applications," in Proceedings of ICS 2003. ACM, 2003.
[17]
Y. Li, D. Brooks, Z. Hu, and K. Skadron, "Performance, Energy, and Thermal Considerations for SMT and CMP Architectures," in Proceedings of HPCA 2005. IEEE, 2005.
[18]
A. Vajda, "Debugging and performance analysis of many-core programs," in Programming Many-Core Chips. Springer US, 2011, pp. 117--126.
[19]
H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, "Demystifying GPU Microarchitecture through Microbenchmarking," in Proceedings of ISPASS 2010, 2010.
[20]
V. Babka and P. Tuma, "Can linear approximation improve performance prediction ?" in Computer Performance Engineering, ser. Lecture Notes in Computer Science, N. Thomas, Ed., vol. 6977. Springer Berlin Heidelberg, 2011, pp. 250--264.
[21]
J. Happe, H. Groenda, and R. H. Reussner, "Performance Evaluation of Scheduling Policies in Symmetric Multiprocessing Environments," in Proceedings of the 17th IEEE International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'09), 2009. {Online}. Available: http://sdqweb.ipd.uka.de/publications/pdfs/happe2009b.pdf
[22]
J. Happe, H. Groenda, M. Hauck, and R. H. Reussner, "A prediction model for software performance in symmetric multiprocessing environments," in Proceedings of the 2010 Seventh International Conference on the Quantitative Evaluation of Systems, ser. QEST '10. Washington, DC, USA: IEEE Computer Society, 2010, pp. 59--68. {Online}. Available: http://dx.doi.org/10.1109/QEST.2010.15
[23]
D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat, "Enforcing performance isolation across virtual machines in xen," in Proceedings of MIDDLEWARE 2006. Springer, 2006, pp. 342--362.
[24]
Y. Koh, R. Knauerhase, P. Brett, M. Bowman, Z. Wen, and C. Pu, "An Analysis of Performance Interference Effects in Virtual Environments," in Proceedings of ISPASS 2007, 2007.
[25]
D. Ardagna, M. Tanelli, M. Lovera, and L. Zhang, "Black-box performance models for virtualized web service applications," in Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering, ser. WOSP/SIPEW '10. New York, NY, USA: ACM, 2010, pp. 153--164. {Online}. Available: http://doi.acm.org/10.1145/1712605.1712630
[26]
A. Arcangeli, "AutoNUMA Linux Kernel Patch Set," http://lwn.net/Articles/488686.
[27]
P. Zijlstra, "SchedNUMA Linux Kernel Patch Set," http://lwn.net/Articles/486850.
[28]
J. A. Lorenzo, J. C. Pichel, F. F. Rivera, T. F. Pena, and J. C. Cabaleiro, "A Flexible and Dynamic Page Migration Infrastructure based on Hardware Counters," Journal of Supercomputing, vol. 65, no. 2, 2013.
[29]
E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson, "Hoard: A Scalable Memory Allocator for Multithreaded Applications," ACM SIGPLAN Notices, vol. 35, no. 11, pp. 117--128, 2000.
[30]
M. Michael, "Scalable Lock-Free Dynamic Memory Allocation," in Proceedings of PLDI 2004, 2004.
[31]
S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos, "Scalable Locality-Conscious Multithreaded Memory Allocation," in Proceedings of ISMM 2006, 2006.
[32]
S. Kahan and P. Konecny, "MAMA: A Memory Allocator for Multithreaded Architectures," in Proceedings of PPOPP 2006, 2006.
[33]
V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar, "Supporting gpu sharing in cloud environments with a transparent runtime consolidation framework," in Proceedings of HPDC 2011, 2011.
[34]
E. Andreasson, "JVM Performance Optimization Series," http://www.javaworld.com/javaworld/jw-08- 2012/120821-jvm-performance-optimizationoverview.html.
[35]
S. M. Blackburn, P. Cheng, and K. S. McKinley, "Myths and realities: The performance impact of garbage collection," in Proceedings of SIGMETRICS 2004. ACM, 2004.
[36]
D. Vengerov, "Modeling, analysis and throughput optimization of a generational garbage collector," in Proceedings of ISMM 2009, 2009.
[37]
G. S. Nick Mitchell, "The causes of bloat, the limits of health," in Proceedings of OOPSLA 2007, 2007.
[38]
C. Dave and R. Eigenmann, "Automatically tuning parallel and parallelized programs," in Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing, ser. LCPC'09. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 126--139.
[39]
S. Ramos, G. L. Taboada, J. T. Roberto R. Exposito, and R. Doallo, "Design of scalable java communication middleware for multi-core systems," The Computer Journal, 2013. {Online}. Available: http://dx.doi.org/10.1093/comjnl/bxs122
[40]
K. Sachs, S. Kounev, and A. Buchmann, "Performance Modeling and Analysis of Message-Oriented Event-Driven Systems," Software and Systems Modeling, 2012.
[41]
D. Ardagna, C. Ghezzi, and R. Mirandola, "Rethinking the use of models in software architecture," in Proceedings of the 4th International Conference on Quality of Software-Architectures: Models and Architectures, ser. QoSA '08. Springer, 2008, pp. 1--27.
[42]
Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, "Cloudscale: Elastic resource scaling for multi-tenant cloud systems," in Proceedings of SOCC 2011. ACM, 2011, pp. 5:1--5:14. {Online}. Available: http://doi.acm.org/10.1145/2038916.2038921
[43]
Y. Liu, I. Gorton, L. Bass, C. Hoang, and S. Abanmi, "Mems: a method for evaluating middleware architectures," in Proc. 2nd International Conference on the Quality of Software Architectures (QoSA'06), ser. LNCS. Springer, 2006, pp. 9--26.
[44]
J. Happe, S. Becker, C. Rathfelder, H. Friedrich, and R. H. Reussner, "Parametric Performance Completions for Model-Driven Performance Prediction," Performance Evaluation, vol. 67, no. 8, pp. 694--716, 2010. {Online}. Available: http://dx.doi.org/10.1016/j.peva.2009.07.006
[45]
T. Horikawa, "An approach for scalability-bottleneck solution: identification and elimination of scalability bottlenecks in a dbms," in Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering, ser. ICPE '11. New York, NY, USA: ACM, 2011, pp. 31--42. {Online}. Available: http://doi.acm.org/10.1145/1958746.1958756
[46]
R. Osman and W. J. Knottenbelt, "Database system performance evaluation models: A survey," Perform. Eval., vol. 69, no. 10, pp. 471--493, Oct. 2012.
[47]
A. Tarvo and S. P. Reiss, "Using computer simulation to predict the performance of multithreaded programs," in Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, ser. ICPE '12. New York, NY, USA: ACM, 2012, pp. 217--228. {Online}. Available: http://doi.acm.org/10.1145/2188286.2188320
[48]
A. Martens, H. Koziolek, S. Becker, and R. Reussner, "Automatically improve software architecture models for performance, reliability, and cost using evolutionary algorithms," in Proc. 1st Joint WOSP/SIPEW International Conference on Performance Engineering (WOSP/SIPEW'10). ACM, January 2010, pp. 105--116.
[49]
H. Li, G. Casale, and T. Ellahi, "Sla-driven planning and optimization of enterprise applications," in Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering, ser. WOSP/SIPEW '10. New York, NY, USA: ACM, 2010, pp. 117--128. {Online}. Available: http://doi.acm.org/10.1145/1712605.1712625
[50]
T. de Gooijer, A. Jansen, H. Koziolek, and A. Koziolek, "An industrial case study of performance and cost design space exploration," in Proc. 3rd Int. Conf. on Performance Engineering (ICPE'12). ACM, April 2012, pp. 205--216.
[51]
J. Zheng and K. E. Harper, "Concurrency design patterns, software quality attributes and their tactics," in Proceedings of the 3rd International Workshop on Multicore Software Engineering, ser. IWMSE '10. New York, NY, USA: ACM, 2010, pp. 40--47. {Online}. Available: http://doi.acm.org/10.1145/1808954.1808964
[52]
H. Koziolek, "Performance evaluation of component-based software systems: A survey," Perform. Eval., vol. 67, no. 8, pp. 634--658, 2010.
[53]
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, "Producing Wrong Data Without Doing Anything Obviously Wrong," in Proceedings of ASPLOS 2009, 2009.
[54]
C. Curtsinger and E. D. Berger, "Stabilizer: Statistically Sound Performance Evaluation," in Proceedings of ASPLOS 2013, 2013.
[55]
N. Yigitbasi, A. Iosup, D. Epema, and S. Ostermann, "C-meter: A framework for performance analysis of computing clouds," in Proceedings of CCGRID 2009. IEEE, 2009, pp. 472--477. {Online}. Available: http://dx.doi.org/10.1109/CCGRID.2009.40
[56]
K. E. Harper, J. Zheng, and S. Mahate, "Experiences in initiating concurrency software research efforts," in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ser. ICSE '10. New York, NY, USA: ACM, 2010, pp. 139--148. {Online}. Available: http://doi.acm.org/10.1145/1810295.1810316
[57]
J. Dean and S. Ghemawat, "Mapreduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107--113, 2008.
[58]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou, "Cilk: an efficient multithreaded runtime system," in Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, ser. PPOPP '95. ACM, 1995, pp. 207--216.
[59]
J. Yan and W. Zhang, "WCET Analysis for Multi-Core Processors with Shared L2 Instruction Caches," in Proc. IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'08). IEEE, 2008, pp. 80--89.
[60]
C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, and R. Wilhelm, "Predictability considerations in the design of multi-core embedded systems," in Proceedings of Embedded Real Time Software and Systems (ERTS'10), 2010, pp. 36--42.
[61]
M. Oriol, M. Wahler, R. Steiger, S. Stoeter, E. Vardar, H. Koziolek, and A. Kumar, "FASA: a scalable software framework for distributed control systems," in Proc. 3rd Int. ACM SIGSOFT Symposium on Architecting Critical Systems (ISARCS'12). ACM, June 2012, pp. 51--60.
[62]
A. Gustavsson, A. Ermedahl, B. Lisper, and P. Pettersson, "Towards wcet analysis of multicore architectures using uppaal," in Proceedings of the 10th International Workshop on Worst-Case Execution Time Analysis. Österreichische Computer Gesellschaft, July 2010, pp. 103--113.
[63]
R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenström, "The worst-case execution-time problem - overview of methods and survey of tools," ACM Trans. Embed. Comput. Syst., vol. 7, no. 3, pp. 36:1--36:53, May 2008. {Online}. Available: http://doi.acm.org/10.1145/1347375.1347389
[64]
B. H. C. Cheng, R. de Lemos, H. Giese, P. Inverardi, and J. Magee, Eds., Software Engineering for Self-Adaptive Systems, ser. Lecture Notes in Computer bScience, vol. 5525. Springer, 2009.

Cited By

View all
  • (2023)A systematic mapping study of software performance researchSoftware: Practice and Experience10.1002/spe.318553:5(1249-1270)Online publication date: 2-Jan-2023
  • (2020)High Performance Computing (HPC) Applications in Industry 4.0 (I4.0) for the betterment of humanity2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC)10.1109/R10-HTC49770.2020.9356990(1-6)Online publication date: 1-Dec-2020
  • (2019)Discovering Fuzzy Rules with Parallelized Linguistic Variable Elimination2019 11th International Conference on Knowledge and Smart Technology (KST)10.1109/KST.2019.8687600(1-5)Online publication date: Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 41, Issue 3
December 2013
111 pages
ISSN:0163-5999
DOI:10.1145/2567529
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2014
Published in SIGMETRICS Volume 41, Issue 3

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A systematic mapping study of software performance researchSoftware: Practice and Experience10.1002/spe.318553:5(1249-1270)Online publication date: 2-Jan-2023
  • (2020)High Performance Computing (HPC) Applications in Industry 4.0 (I4.0) for the betterment of humanity2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC)10.1109/R10-HTC49770.2020.9356990(1-6)Online publication date: 1-Dec-2020
  • (2019)Discovering Fuzzy Rules with Parallelized Linguistic Variable Elimination2019 11th International Conference on Knowledge and Smart Technology (KST)10.1109/KST.2019.8687600(1-5)Online publication date: Jan-2019
  • (2019)Priority-grouping method for parallel multi-scheduling in GridJournal of Computer and System Sciences10.1016/j.jcss.2014.12.00981:6(943-957)Online publication date: 1-Jan-2019
  • (2018)Multi-Objective Optimization of Deployment Topologies for Distributed ApplicationsACM Transactions on Internet Technology10.1145/310615818:2(1-21)Online publication date: 20-Jan-2018
  • (2017)Parallelization, Modeling, and Performance Prediction in the Multi-/Many Core Area: A Systematic Literature Review2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2)10.1109/SC2.2017.15(48-55)Online publication date: Nov-2017
  • (2017)Online Learning of Run-Time Models for Performance and Resource Management in Data CentersSelf-Aware Computing Systems10.1007/978-3-319-47474-8_17(507-528)Online publication date: 24-Jan-2017
  • (2016)Optimization of Deployment Topologies for Distributed Enterprise Applications2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA)10.1109/QoSA.2016.11(106-115)Online publication date: Apr-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media