More Web Proxy on the site http://driver.im/

research-article

Slices: Provisioning Heterogeneous HPC Systems

Authors:

Alexander Merritt,

Naila Farooqui,

Magdalena Slawinska,

Ada Gavrilovska,

Karsten Schwan,

Vishakha GuptaAuthors Info & Claims

XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

Article No.: 46, Pages 1 - 8

https://doi.org/10.1145/2616498.2616531

Published: 13 July 2014 Publication History

Abstract

High-end computing systems are becoming increasingly heterogeneous, with nodes comprised of multiple CPUs and accelerators, like GPGPUs, and with potential additional heterogeneity in memory configurations and network connectivities. Further, as we move to exascale systems, the view of their future use is one in which simulations co-run with online analytics or visualization methods, or where a high fidelity simulation may co-run with lower order methods and/or with programs performing uncertainty quantification. To explore and understand the challenges when multiple applications are mapped to heterogeneous machine resources, our research has developed methods that make it easy to construct 'virtual hardware platforms' comprised of sets of CPUs and GPGPUs custom-configured for applications when and as required. Specifically, the 'slicing' runtime presented in this paper manages for each application a set of resources, and at any one time, multiple such slices operate on shared underlying hardware. This paper describes the slicing abstraction and its ability to configure cluster hardware resources. It experiments with application scale-out, focusing on their computationally intensive GPGPU-based computations, and it evaluates cluster-level resource sharing across multiple slices on the Keeneland machine, an XSEDE resource.

References

[1]

H. Abbasi et al. DataStager: scalable data staging services for petascale applications. In HPDC, 2009.

Digital Library

[2]

Amazon Inc. High Performance Computing Using Amazon EC2. http://aws.amazon.com/ec2/hpc-applications/.

[3]

A. Athalye et al. GPU aware MPI (GAMPI) -- a CUDA-based approach. Technical report, University of Texas at Austin, 2010.

[4]

C. Augonnet et al. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Euro-Par, 2011.

Digital Library

[5]

A. Barak et al. A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU Devices. PPAAC, 2010.

[6]

P. Barham, B. Dragovic, K. Fraser, et al. Xen and the art of virtualization. In SOSP, Bolton Landing, USA, 2003.

Digital Library

[7]

M. Becchi et al. A virtual memory based runtime to support multi-tenancy in clusters with gpus. In HPDC, 2012.

Digital Library

[8]

M. Boyer et al. Load balancing in a changing world: dealing with heterogeneity and performance variability. In CF, 2013.

Digital Library

[9]

N. Carter et al. Runnemede: An architecture for ubiquitous high-performance computing. In HPCA, 2013.

Digital Library

[10]

A. Danalis et al. The scalable heterogeneous computing (shoc) benchmark suite. In GPGPU-3, 2010.

Digital Library

[11]

J. Duato et al. Enabling CUDA acceleration within virtual machines using rCUDA. In HiPC, 2011.

Digital Library

[12]

N. Farooqui et al. Lynx: A dynamic instrumentation system for data-parallel applications on gpgpu architectures. In ISPASS, 2012.

Digital Library

[13]

M. Garland et al. Designing a unified programming model for heterogeneous machines. In Supercomputing, 2012.

Digital Library

[14]

G. Giunta et al. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. In Euro-Par, 2010.

Digital Library

[15]

I. Grasso et al. Libwater: Heterogeneous distributed computing made easy. In ICS, New York, NY, USA, 2013.

Digital Library

[16]

V. Gupta et al. GViM: GPU-accelerated Virtual Machines. In HPCVirt, Nuremberg, Germany, 2009.

Digital Library

[17]

V. Gupta et al. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In USENIX ATC, 2011.

Digital Library

[18]

Hewlett-Packard. Hp moonshot. http://thedisruption.com/, 2014.

[19]

V. J. Jiménez, L. Vilanova, I. Gelado, et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures. In HiPEAC, 2009.

Digital Library

[20]

S. Kato et al. Timegraph: Gpu scheduling for real-time multi-tasking environments. In USENIX ATC, 2011.

Digital Library

[21]

S. Kato et al. Gdev: First-Class GPU Resource Management in the Operating System. In USENIX ATC, 2012.

Digital Library

[22]

Keeneland web site. http://keeneland.gatech.edu/, 2013.

[23]

Khronos Group. The OpenCL Specification. http://tinyurl.com/OpenCL08, 2008.

[24]

P. Kogge et al. Exascale computing study: Technology challenges in achieving exascale systems. Technical report, University of Notre Dame, CSE Dept., 2008.

[25]

S. Kumar et al. Netbus: A transparent mechanism for remote device access in virtualized systems. Technical report, CERCS, 2008.

[26]

H. A. Lagar-Cavilla et al. VMM-independent graphics acceleration. In VEE, San Diego, CA, 2007.

Digital Library

[27]

J. Lange et al. Palacios: A New Open Source Virtual Machine Monitor for Scalable High Performance Computing. In IPDPS, 2010.

[28]

O. S. Lawlor. Message Passing for GPGPU Clusters: cudaMPI. In IEEE Cluster PPAC Workshop, 2009.

[29]

A. Merritt et al. Shadowfax: Scaling in heterogeneous cluster systems via gpgpu assemblies. In VTDC, 2011.

Digital Library

[30]

K. Moreland. Oh, &#*@! exascale! the effect of emerging architectures on scientific discovery. SCC'12.

Digital Library

[31]

K. Moreland et al. An image compositing solution at scale. In SC, 2011.

Digital Library

[32]

NVIDIA Corp. NVIDIA CUDA Compute Unified Device Architecture. http://tinyurl.com/cx3tl3, 2007.

[33]

S. Pai et al. Improving gpgpu concurrency with elastic kernels. In ASPLOS, 2013.

Digital Library

[34]

S. Panneerselvam et al. Operating systems should manage accelerators. In HotPar, 2012.

Digital Library

[35]

S. J. Pennycook et al. Performance analysis of a hybrid MPI/CUDA implementation of the NAS LU benchmark. SIGMETRICS Perform. Eval. Rev., 2011.

Digital Library

[36]

R. Phull et al. Interference-driven resource management for gpu-based heterogeneous clusters. HPDC, 2012.

Digital Library

[37]

J. Planas et al. Self-adaptive ompss tasks in heterogeneous environments. In IPDPS, 2013.

Digital Library

[38]

S. J. Plimpton. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comp. Phys., 117:1--19, 1995.

Digital Library

[39]

V. T. Ravi et al. Scheduling concurrent applications on a cluster of cpu-gpu nodes. In CCGRID, Washington, DC, USA, 2012.

Digital Library

[40]

C. J. Rossbach et al. Ptask: Operating system abstractions to manage gpus as compute devices. In SOSP, 2011.

Digital Library

[41]

D. Sengupta et al. Multi-tenancy on gpgpu-based servers. In VTDC, 2013.

Digital Library

[42]

L. Shi et al. vCUDA: GPU accelerated high performance computing in virtual machines. 2009.

[43]

M. Strengert et al. CUDASA: Compute Unified Device and Systems Architecture. In EGPGV, 2008.

Digital Library

[44]

J. A. Stuart et al. Message passing on data-parallel architectures. In IPDPS, 2009.

Digital Library

[45]

J. Vetter et al. Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community. Computing in Science Engineering, 13(5):90--95, 2011.

Digital Library

[46]

S. Xiao et al. VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units. InPar, 2012.

Cited By

Farooqui NKaeli DCavazos J(2016)A systems perspective on GPU computingProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884057(72-81)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2884045.2884057

Index Terms

Slices: Provisioning Heterogeneous HPC Systems
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies
VTDC '11: Proceedings of the 5th international workshop on Virtualization technologies in distributed computing

Systems with specialized processors such as those used for accel- erating computations (like NVIDIA's graphics processors or IBM's Cell) have proven their utility in terms of higher performance and lower power consumption. They have also been shown to ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Superblock-based performance optimization for Sunway Math Library on SW26010 many-core processor
Abstract
The SW26010 many-core processor is based on the Sunway architecture that is composed of management and computing processing elements (MPE and CPE, respectively), each of which is equipped with a stand-alone math library. The issue is that each ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

July 2014

445 pages

ISBN:9781450328937

DOI:10.1145/2616498

General Chair:
Scott Lathrop
National Center for Supercomputing Applications
,
Program Chair:
Jay Alameda
National Center for Supercomputing Applications

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

NSF: National Science Foundation
Drexel University
Indiana University: Indiana University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

XSEDE '14

XSEDE '14: Annual Conference of the Extreme Science and Engineering Discovery Environment

July 13 - 18, 2014

GA, Atlanta, USA

Acceptance Rates

XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;

Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
145
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Farooqui NKaeli DCavazos J(2016)A systems perspective on GPU computingProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884057(72-81)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2884045.2884057

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents