Abstract
Networked systems provide a cost-effective platform for parallel computing, but the applications have to deal with the changing availability of computation and communication resources. Network-awareness is a recent attempt to bridge the gap between the realities of networks and the demands of applications. Network-aware applications obtain information about their execution environment and dynamically adapt to enhance their performance. Adaptation is especially important for synchronous parallel applications since a single busy communication link can become the bottleneck and degrade overall performance dramatically. This paper presents Remos, a uniform API that allows applications to obtain relevant network information, and reports on the development of parallel applications in this environment. The challenges in defining a uniform interface include network heterogeneity, diversity and variability in network traffic, and resource sharing in the network and even inside an application. The first implementation of the Remos system is hosted on an IP-based network testbed. The paper reports on our methodology for developing adaptive parallel applications for high-speed networks with Remos, and presents results that highlight the importance and effectiveness of adaptive parallel computing.
Effort sponsored by the Advanced Research Projects Agency and Rome Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-96-1-0287. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes not with standing any copyright annotation thereon.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ATM User-Network Interface Specification. Version 4.0, 1996. ATM Forum document.
Bao, H., Bielak, J., Ghattas, O., O’Hallaron, D. R., Kallivokas, L. F., Shewchuk, J. R., and Xu, J. Earthquake ground motion modeling on parallel computers. In Proceedings of Supercomputing’ 96 Pittsburgh, PA, Nov. 1996).
Bolliger, J., and Gross, T. A framework-based approach to the development of network-aware applications. IEEE Trans. Softw. Eng. 24, 5 (May 1998), 376–390.
Case, J., McCloghrie, K., Rose, M., and Waldbusser, S. Protocol Operations for Version 2 of the Simple Network Management Protocol (SNMPv2), January 1999. RFC 1905.
DeWitt, T., Gross, T., Lowekamp, B., Miller, N., Steenkiste, P., Subhlok, J., and Sutherland, D. Remos: A resource monitoring system for network-aware applications. Tech. Rep. CMU-CS-97-194, Carnegie Mellon University, Dec 1997.
Dinda, P. Statistical properties of host load in a distributed environment. In Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers Pittsburgh, PA, May 1998.
Eckhardt, D., and Steenkiste, P. A Wireless MAC with Service Guarantees. In preparation, 1998.
Forum, T. M. MPI: A Message Passing Interface. InProceedings of Supercomputing’ 93 (Oregon, November 1993), ACM/IEEE, pp. 878–883.
Foster, I., and Kesselman, K. Globus: A metacomputing infrastructure toolkit. Journal of Supercomputer Applications 112 (1997), 115–128.
Geist, G. A., and Sunderam, V. S. The PVM System: Supercomputer Level Concurrent Computation on a Heterogeneous Network of Workstations. In Proceedings of the Sixth Distributed Memory Computing Conference (April 1991), IEEE, pp. 258–261.
Grimshaw, A., Wulf, W., and Legion Team. The Legion vision of a worldwide virtual computer. Communications of the ACM 401 (January 1997).
Gross, T., O’Hallaron, D., and Subhlok, J. Task parallelism in a High Performance Fortran framework. IEEE Parallel & Distributed Technology 23 (Fall 1994), 16–26.
Hahne, E. L. Round-robin scheduling for max-min fairness in data networks. IEEE Journal on Selected Areas in Communication 97 (September 1991).
Inouye, J., Cen, S., Pu, C., and Walpole, J. System support for mobile multimedia applications. In Proceedings of the 7thInternational Workshop on Network and Operating System Support for Digital Audio and Video (St. Louis, May 1997), pp. 143–154.
Jaffe, J. M. Bottleneck flow control. IEEE Transactions on Communications 297 (July 1981), 954–962.
Jain, R. The Art of Computer Systems Performance Analysis. John Wiley & Sons, Inc., 1991.
Jain, R. Congestion control and traffic management in ATM networks: Recent advances and a survey. Computer Networks and ISDN Systems (February 1995).
Koelbel, C., Loveman, D., Steele, G., and Zosel, M. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.
Litzkow, M., Livny, M., and Mutka, M. Condor — A hunter of idle workstations. In Proceedings of the Eighth Conference on Distributed Computing Systems (San Jose, California, June 1988).
Sharma, S., Ponnusamy, R., Moon, B., Hwang, Y., Das, R., and Saltz, J. Run-time and compile-time support for adaptive irregular problems. In Proceedings of Supercomputing’ 94 (Washington, DC, Nov 1994), pp.97–106.
Siegell, B. Automatic Generation of Parallel Programs with Dynamic Load Balancing for a Network of Workstations. PhD thesis, Department of Computer and Electrical Engineering, Carnegie Mellon University, 1995. Also appeared as technical report CMU-CS-95-168.
Siegell, B., and Steenkiste, P. Automatic selection of load balancing parameters using compile-time and run-time information. Concurrency-Practice and Experience 93 (1996), 275–317.
Stemm, M., Seshan, S., and Katz, R. Spand: Shared passive network performance discovery. In USENIX Symposium on Internet Technologies and Systems (Monterey, CA, June 1997).
Subhlok, J., Steenkiste, P., Stichnoth, J., and Lieu, P. Airshed pollution modeling: A case study in application development in an HPF environment. In 12th International Parallel Processing Symposium (Orlando, FL, April 1998).
Subhlok, J., and Vondran, G. Optimal latency-throughput tradeoffs for data parallel pipelines. In Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (Padua, Italy, June 1996), pp. 62–71.
Subhlok, J., and Yang, B. A new model for integrated nested task and data parallel programming. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (June 1997), ACM.
Tangmunarunkit, H., and Steenkiste, P. Network-aware distributed computing: A case study. In Second Workshop on Runtime Systems for Parallel Programming (RTSPP) (Orlando), March 1998), IEEE, p. Proceedings to be published by Springer. Held in conjunction with IPPS’ 98.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lowekamp, B., Miller, N., Sutherland, D., Gross, T., Steenkiste, P., Subhlok, J. (1999). Network-Aware Parallel Computing with Remos. In: Chatterjee, S., et al. Languages and Compilers for Parallel Computing. LCPC 1998. Lecture Notes in Computer Science, vol 1656. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48319-5_7
Download citation
DOI: https://doi.org/10.1007/3-540-48319-5_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66426-0
Online ISBN: 978-3-540-48319-9
eBook Packages: Springer Book Archive