[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Active messages: a mechanism for integrated communication and computation

Published: 01 April 1992 Publication History

Abstract

The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of message driven machines demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware support for active messages is desirable and we outline a range of enhancements to mainstream processors.

References

[1]
Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR- Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987.
[2]
B. N. Bershad, T. E. Anderson, E. D. Lazowska, and H. M. Levy. Lightweight Remote Procedure Call. ACM Trans. on Computer Systems, 8(1), February 1990.
[3]
D. Culler, A. Sah, K. Schauser, T. yon Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. (Also available as Technical Report UCB/CSD 91/591, CS Div., University of California at Berkeley).
[4]
D.E. Culler and Arvind. Resource Requirements of Dataflow Programs. In Proc. of the 15th Ann. Int. Syrup. on Comp. Arch., pages 141-150, Hawaii, May 1988.
[5]
W. Dally and et al. Architecture of a Message-Driven Processor. In Proc. of the 14th Annual Int. Syrup. on Comp. Arch., pages 189-196, June 1987.
[6]
W. Dally and et al. The J-Machine: A Fine-Grain Concurrent Computer. In IFIP Congress, 1989.
[7]
J. J. Dongarra. Performance of Various Computers Using Standard Linear Equations Software. Technical Report CS-89-85, Computer Science Dept., Univ. of Tennessee, Knoxville, TN 37996, December 1991.
[8]
T. H. Dunigan. Performance of a Second (3eneration Hypercube. Technical Report ORNL/TM-10881, Oak Ridge Nat'l Lab, November 1988.
[9]
G. Fox. ProgrammingConcurrentProcessors. Addison Wesley, 1989.
[10]
R. H. Halstead, Jr. Multilisp: A Language for Concurrent Symbolic Computation. ACM Transactions on Programming Languages and Systems, 7(4):501-538, October 1985.
[11]
W. Horwat, A. A. Chien, and W. J. Dally. Experience with CST: Programming and Implementation. In Proc. ofthe ACM SIGPLAN '89 Conference o n Pro gr ammin g Lang ua g e Design and Implementation, 1989.
[12]
Intel. Personal communication, 1991.
[13]
D. Johnson. Trap Architectures for Lisp Sy.,;tems. In Proc. of the 1990 ACM conf. on Lisp and Functional Programming, June 1990.
[14]
R. S. Nikhil. The Parallel Programming Language Id and its Compilation for Parallel Machines. In Proc. Workshop on Massive Parallelism, Amalfi, Italy, October' 1989. Academic Press, 1991. Also: CSG Memo 313, Mrr Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA.
[15]
R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A Killer Micro for A Brave New World. Technical Report CSG Memo 325, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, January 1991.
[16]
G. M. Papadopoulos. Implementation of a General Purpose Dataflow Multiprocessor. Technical Report TR432, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, September 1988. (PhD Thesis, Dept. of EECS, MIT).
[17]
G. M. Papadopoulos and D. E. Culler. Monsoon: an Explicit Token-Store Architecture. In Proc. of the 17th Annual Int. Syrup. on Comp. Arch., Seattle, Washington, May 1990.
[18]
A. Thekkath and H. M. Levy. Limits to I_x)w-Latency RPC. Technical Report TR 91-06-01, Dept. of Computer Science and Engineering, University of Washington, Seattle WA 98195, 1991.

Cited By

View all
  • (2024)Runtime support for CPU-GPU high-performance computing on distributed memory platformsFrontiers in High Performance Computing10.3389/fhpcp.2024.14170402Online publication date: 19-Jul-2024
  • (2024)Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AIProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00109(1-17)Online publication date: 17-Nov-2024
  • (2024)Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00054(643-656)Online publication date: 2-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 20, Issue 2
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)
May 1992
429 pages
ISSN:0163-5964
DOI:10.1145/146628
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture
    May 1992
    439 pages
    ISBN:0897915097
    DOI:10.1145/139669

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1992
Published in SIGARCH Volume 20, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)510
  • Downloads (Last 6 weeks)57
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Runtime support for CPU-GPU high-performance computing on distributed memory platformsFrontiers in High Performance Computing10.3389/fhpcp.2024.14170402Online publication date: 19-Jul-2024
  • (2024)Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AIProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00109(1-17)Online publication date: 17-Nov-2024
  • (2024)Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00054(643-656)Online publication date: 2-Nov-2024
  • (2024)The C++ Standard Library for Parallelism and Concurrency (HPX)Parallel C++10.1007/978-3-031-54369-2_6(49-55)Online publication date: 3-Feb-2024
  • (2024)Some Remarks on MPI+OpenMP and HPXParallel C++10.1007/978-3-031-54369-2_16(179-182)Online publication date: 3-Feb-2024
  • (2024)Distributed Programming Using HPXParallel C++10.1007/978-3-031-54369-2_14(147-162)Online publication date: 3-Feb-2024
  • (2023)Code Integrity and Confidentiality: An Active Data Approach for Active and Healthy AgeingSensors10.3390/s2310479423:10(4794)Online publication date: 16-May-2023
  • (2023)Enhancing Distributed Graph Matching Algorithm with MPI RMA based Active Messages2023 9th International Conference on Computer and Communications (ICCC)10.1109/ICCC59590.2023.10507290(1952-1961)Online publication date: 8-Dec-2023
  • (2022)FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL57034.2022.00042(218-224)Online publication date: Aug-2022
  • (2022)Model-based selection of optimal MPI broadcast algorithms for multi-core clustersJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.03.012165(1-16)Online publication date: Jul-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media