More Web Proxy on the site http://driver.im/

article

The impact of system design parameters on application noise sensitivity

Authors:

Kurt B. Ferreira,

Patrick G. Bridges,

Ron Brightwell,

Kevin T. PedrettiAuthors Info & Claims

Cluster Computing, Volume 16, Issue 1

Pages 117 - 129

https://doi.org/10.1007/s10586-011-0178-3

Published: 01 March 2013 Publication History

Abstract

Operating system (OS) noise, or jitter, is a key limiter of application scalability in high end computing systems. Several studies have attempted to quantify the sources and effects of system interference, though few of these studies show the influence that architectural and system characteristics have on the impact of noise at scale. In this paper, we examine the impact of three such system properties: platform balance, noisy node distribution, and the choice of collective algorithm. Using a previously-developed noise injection tool, we explore how the impact of noise varies with these platform characteristics. We provide detailed performance results that indicate that a system with relatively less network bandwidth is able to absorb more noise than a system with more network bandwidth. Our results also show that application performance can be significantly degraded by only a subset of noisy nodes. Furthermore, the placement of the noisy nodes is also important, especially for applications that make substantial use of tree-based collective communication operations. Lastly, performance results indicate that non-blocking collective operations have the ability to greatly mitigate the impact of OS interference. When combined, these results show that the impact of OS noise is not solely a property of application communication behavior, but is also influenced by other properties of the system architecture and system software environment.

References

[1]

Alam, S.R., Vetter, J.S.: An analysis of system balance requirements for scientific applications. In: ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing, pp. 229-236. IEEE Computer Society, Washington (2006).

Digital Library

[2]

Almási, G., Heidelberger, P., Archer, C.J., Martorell, X., Erway, C.C., Moreira, J.E., Steinmacher-Burow, B., Zheng, Y.: Optimization of MPI collective communication on BlueGene/L systems. In: ICS '05: Proceedings of the 19th annual international conference on Supercomputing, New York, NY, USA, pp. 253-262. ACM Press, New York (2005).

Digital Library

[3]

Beckman, P., Iskra, K., Yoshii, K., Coghlan, S.: The influence of operating systems on the performance of collective operations at extreme scale. In: IEEE Conference on Cluster Computing, September (2006).

[4]

Brightwell, R., Hudson, T., Pedretti, K.T., Underwood, K.D.: SeaStar Interconnect: balanced bandwidth for scalable performance. IEEE MICRO 26(3), 41-57 (2006).

Digital Library

[5]

Durstenfeld, R.: Algorithm 235: random permutation. Commun. ACM 7(7), 420 (1964).

[6]

Ferreira, K.B., Brightwell, R., Bridges, P.G.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Supercomputing'08) November (2008).

Digital Library

[7]

Hertel, J.E.S., Bell, R., Elrick, M., Farnsworth, A., Kerley, G., McGlaun, J., Petney, S., Silling, S., Taylor, P., Yarrington, L.: CTH: a software family for multi-dimensional shock physics analysis. In: Proceedings of the 19th International Symposium on Shock Waves, held at Marseille, France, July, pp. 377-382 (1993).

[8]

Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and performance analysis of non-blocking collective operations for MPI. In: Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07, Nov. IEEE Computer Society/ACM, New York (2007).

Digital Library

[9]

Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Nov. (2010).

[10]

Hoefler, T., Schneider, T., Lumsdaine, A.: Loggopsim--simulating large-scale applications in the LogGOPS model, Jun. (2010), Accepted at the ACM Workshop on Large-Scale System and Application Performance (LSAP 2010).

[11]

Jones, T., Tuel, W., Brenner, L., Fier, J., Caffrey, P., Dawson, S., Neely, R., Blackmore, R., Maskell, B., Tomlinson, P., Roberts, M.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Proceedings of SC'03 (2003).

[12]

Katramatos, D., Chapin, S.J., Hillman, P., Fisk, L.A., van Dresser, D.: Cross-operating system process migration on a massively parallel processor. Technical Report CS-98-28, University of Virginia (1998).

[13]

Kerbyson, D.J., Jones, P.W.: A performance model of the Parallel Ocean Program. Int. J. High Perform. Comput. Appl. 19(3), 261-276 (2005).

Digital Library

[14]

Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Denver, CO, pp. 37-48. ACM Press, New York (2001).

Digital Library

[15]

Mann, P.D.V., Mittaly, U.: Handling OS jitter on multicore multithreaded systems. In: IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1-12. IEEE Computer Society, Washington (2009).

[16]

Moreira, J., Brutman, M., Castanos, J., Gooding, T., Inglett, T., Lieber, D., McCarthy, P., Mundy, M., Parker, J., Wallenfelt, B., Giampapa, M., Engelsiepen, T., Haskin, R.: Designing a highly-scalable operating system: The Blue Gene/L story. In: Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC'06), Tampa, Florida, November (2006).

[17]

Nataraj, A., Morris, A., Malony, A.D., Sottile, M., Beckman, P.: The ghost in the machine: observing the effects of kernel operation on parallel application performance. In: Proceedings of SC'07 (2007).

Digital Library

[18]

Pedretti, K.T., Vaughan, C., Hemmert, K.S., Barrett, B.: Application sensitivity to link and injection bandwidth on a Cray XT4 system. In: Proceedings of the 2008 Cray User Group Annual Technical Conference, May (2008).

[19]

Petrini, F., Kerbyson, D., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the International Conference on High-Performance Computing and Networking, Phoenix, AZ (2003).

[20]

Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.: Performance analysis of MPI collective operations. Clust. Comput. 10(2), 127-143 (2007).

Digital Library

[21]

Straalen, B.V., Shalf, J., Ligocki, T., Keen, N., Yan, W.-S.: Scalability challenges for massively parallel AMR applications. In: Proceedings of the International Parallel and Distributed Processing Symposium, May (2009).

[22]

Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 49-66 (2005).

Digital Library

[23]

Zajcew, R., Roy, P., Black, D., Peak, C., Guedes, P., Kemp, B., LoVerso, J., Leibensperger, M., Barnett, M., Rabii, F., Netterwala, D.: An OSF/1 UNIX for Massively Parallel Multicomputers. In: Proceedings of the 1993 Winter USENIX Technical Conference, January, pp. 449-468 (1993).

[24]

Zhu, H., Goodell, D., Gropp W.i., Thakur R.: Hierarchical collectives in MPICH2. In: Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 325-326. Springer Berlin, Heidelberg (2009).

Digital Library

Cited By

Corbin GDaoud NMohr Bde Morais GWolf F(2024)Are Noise-Resilient Logical Timers Useful for Performance Analysis?Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00192(1519-1530)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00192
You XXuan ZYang HLuan ZLiu YQian D(2024)GVARP: Detecting Performance Variance on Large-Scale Heterogeneous SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00063(1-16)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00063
Prichard RStrasser W(2024)When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance ComputingCluster Computing10.1007/s10586-024-04353-227:7(9123-9136)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10586-024-04353-2
Show More Cited By

The impact of system design parameters on application noise sensitivity

Recommendations

The Impact of System Design Parameters on Application Noise Sensitivity
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster Computing

Operating system noise, or “jitter,” is a key limiter of application scalability in high end computing systems. Several studies have attempted to quantify the sources and effects of system interference, though few of these studies show the influence ...
Characterizing application sensitivity to OS interference using kernel-level noise injection
SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Operating system noise has been shown to be a key limiter of application scalability in high-end systems. While several studies have attempted to quantify the sources and effects of system interference using user-level mechanisms, there are few ...
The Aviation System Analysis Capability Noise Impact Model

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Cluster Computing

Cluster Computing Volume 16, Issue 1

March 2013

196 pages

ISSN:1386-7857

Issue’s Table of Contents

Copyright © Copyright © 2013 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 March 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Corbin GDaoud NMohr Bde Morais GWolf F(2024)Are Noise-Resilient Logical Timers Useful for Performance Analysis?Proceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00192(1519-1530)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00192
You XXuan ZYang HLuan ZLiu YQian D(2024)GVARP: Detecting Performance Variance on Large-Scale Heterogeneous SystemsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00063(1-16)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00063
Prichard RStrasser W(2024)When Fewer Cores Is Faster: A Parametric Study of Undersubscription in High-Performance ComputingCluster Computing10.1007/s10586-024-04353-227:7(9123-9136)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10586-024-04353-2
Zheng LZhai JTang XWang HYu TJin YSong SChen WLee JAgrawal KSpear M(2022)VaproProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508411(150-162)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508411
Utrera GFarreras MFornes J(2019)Task PackingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2019.08.003134:C(37-49)Online publication date: 1-Dec-2019
https://dl.acm.org/doi/10.1016/j.jpdc.2019.08.003
Tang XZhai JQian XHe BXue WChen W(2018)vSensorACM SIGPLAN Notices10.1145/3200691.317849753:1(124-136)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178497
Tian BHuang JMozafari BSchoenebeck G(2018)Contention-aware lock scheduling for transactional databasesProceedings of the VLDB Endowment10.1145/3187009.317774011:5(648-662)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1145/3187009.3177740
Tang XZhai JQian XHe BXue WChen WKrall AGross T(2018)vSensorProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178497(124-136)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3178487.3178497
Tian BHuang JMozafari BSchoenebeck G(2018)Contention-aware lock scheduling for transactional databasesProceedings of the VLDB Endowment10.1145/3177732.317774011:5(648-662)Online publication date: 5-Oct-2018
https://dl.acm.org/doi/10.1145/3177732.3177740
Papadopoulou NGoumas GKoziris N(2017)Predictive communication modeling for HPC applicationsCluster Computing10.1007/s10586-017-0821-820:3(2725-2747)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s10586-017-0821-8
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents