Abstract
A preemptive gang scheduler is developed and evaluated. The gang scheduler, called SCore-D, is implemented on top of a UNIX operating system and runs on workstation and PC clusters connected by Myrinet, a giga-bit class, high-performance network.
To have high-performance communication at the user-level and a multi-user environment simultaneously, we propose network preemption to save and restore network context as well as process contexts when switching distributed processes. We also developed a high-performance, user-level communication library, PM. PM and SCore-D collaborate for the network preemption. When user processes are gang-scheduled, communication messages are first flushed, then the messages and pending messages in the receive and send buffers are saved and restored. Unlike CM-5's All-Fall-Down mechanism, our gang-scheduling scheme is all software; no special hardware support is assumed. Also there is no limitation on network topology and partitioning.
The overhead of the gang scheduler is measured on our new PC cluster, which consists of 64 PentiumPros connected by Myrinet. NAS parallel benchmark programs are used for the evaluation. We found that the message flushing time and network preemption time depends on the communication patterns of the application programs. We also found that the time of saving and restoring network context occupies more than two third of gang scheduling time. Evaluation shows that the slowdown of user program execution due to the gang scheduling is less than 9%when the time slice is 100 msec.
Preview
Unable to display preview. Download preview PDF.
References
Remzi H. Arpaci, Andrea C. Dusseau, Amin M. Vahdat, Lok T. Liu, Thomas E. Anderson, and David A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. UC Berkeley Technical Report CS-94-838, Computer Science Division, University of California, Berkeley, 1994.
D. H. Bailey, J. T. Barton, T. A. Lasinski, and H. D. Simon. The NAS Parallel Benchmarks. NASA Technical Memorandum 103863, NASA Ames Research Center, 1993.
Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet: A Gigabitper-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.
Mani Chandy and Leslie Lamport. Distributed snapshot: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63–75, February 1985.
Brent N. Chun, Alan M. Mainwaring, and David E. Culler. Virtual Network Transport Protocols for Myrinet. In Hot Interconnect'97, August 1997.
Hubertus Franke, Pratap Pattnaik, and Larry Rudolph. Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems. In Frontier'96, pages 1–9, October 1996.
Dror G. Feitelson and Larry Rudolph. Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing, 16(4):306–318, 1992.
A. Gupta, A. Tucker, and Shigeru Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In ACM SIGMETRICS, pages 120–132, 1991.
Brent Gorda and Rich Wolski. Time Sharing Massively Parallel Machines. In 1995 International Conference on Parallel Processing, volume II, pages 214–217, August 1995.
Atsushi Hori, Yutaka Ishikawa, Hiroki Konaka, Munenori Maeda, and Takashi Tomokiyo. A Scalable Time-Sharing Scheduling for Partitionable, Distributed Memory Parallel Machines. In Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences, Vol. II, pages 173–182. IEEE Computer Society Press, January 1995.
Atsushi Hori, Yutaka Ishikawa, Jörg Nolte, Hiroki Konaka, Munenori Maeda, and Takashi Tomokiyo. Time Space Sharing Scheduling: A Simulation Analysis. In S. Haridi, K. Ali, and P. Magnusson, editors, Euro-Par'95 Parallel Processing, volume 966 of Lecture Notes in Computer Science, pages 623–634. Springer-Verlag, August 1995.
Atsushi Hori, Hiroshi Tezuka, Yutaka Ishikawa, Noriyuki Soda, Hiroki Konaka, and Munenori Maeda. Implementation of Gang-Scheduling on Workstation Cluster. In D. G. Feitelson and L. Rudolph, editors, IPPS'96 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1162 of Lecture Notes in Computer Science, pages 76–83. Springer-Verlag, April 1996.
Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. Global State Detection using Network Preemption. In D. G. Feitelson and L. Rudolph, editors, IPPS'97 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 262–276. Springer-Verlag, April 1997.
Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. User-level Parallel Operating System for Clustered Commodity Computers. In Proceedings of Cluster Computing Conference '97, March 1997.
Yutaka Ishikawa. Multi Thread Template Library — MPC++ Version 2.0 Level 0 Document —. Technical Report TR-96012, RWC, September 1996.
Tomio Kamada, Satoshi Matsuoka, and Akinori Yonezawa. Efficient Parallel Global Garbage Collection on Massively Parallel Computers. In Supercomputing Conference, pages 79–88, 1994.
Richard N. Lagerstrom and Stephan K. Gipp. PScheD Political Scheduling on the CRAY T3E. In D. G. Feitelson and L. Rudolph, editors, Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 117–138. Springer-Verlag, April 1997.
J. Misra. Detecting termination of distributed computations using markers. In Second ACM Symposium on Principles Distributed Computing, pages 290–294, August 1983.
Francis O'Carroll, Atsushi Hori, Hiroshi Tezuka, Yutaka Ishikawa, and Mitsuhisa Sato. Performance of MPI on Workstation/PC Clusters using Myrinet. In Proceedings of Cluster Computing Conference '97, March 1997.
John K. Ousterhout, Donald A. Scelza, and Pradeep S. Sindhu. Medusa: An Experiment in Distributed Operating System Structure. Communications of the ACM, 23(2):92–105, February 1980.
John K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of Third International Conference on Distributed Computing Systems, pages 22–30, 1982.
Scott Pakin, Mario Lauria, and Andrew Chien. High Performance Messaging on Workstations: Illinoi Fast Messages (FM) for Myrinet. In Supercomputing'95, December 1995.
Thinking Machines Corporation. NI Systems Programming, October 1992. Version 7.1.
Hiroshi Tezuka, Atsushi Hori, Yutaka Ishikawa, and Mitsuhisa Sato. PM: An Operating System Coordinated High Performance Communication Library. In Peter Sloot Bob Hertzberger, editor, High-Performance Computing and Networking, volume 1225 of Lecture Notes in Computer Science, pages 708–717. Springer-Verlag, April 1997.
Thorston von Eicken, Anindya Basu, and Werner Vogels. U-Net: A User Level Network Interface for Parallel and Distributed Computing. In Fifteenth ACM Sumposium on Operating Systems Principles, pages 40–53, 1995.
Roman Zajcew, Paul Roy, David Black, Chris Peak, Paulo Guedes, Bradford Kemp, John Lo Verso, Michael Leibensperger, Michael Branett, Faramarz Rabii, and Durriya Netterwala. An OSF/1 UNIX for Massively Parallel Multicomputers. In San Diego Conference Proceedings of 1993 Winter USENIX, pages 449–468, January 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hori, A., Tezuka, H., Ishikawa, Y. (1998). Overhead analysis of preemptive gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1998. Lecture Notes in Computer Science, vol 1459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053989
Download citation
DOI: https://doi.org/10.1007/BFb0053989
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64825-3
Online ISBN: 978-3-540-68536-4
eBook Packages: Springer Book Archive