[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/645533.656340guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Hardware Versus Software Implementation of COMA

Published: 11 August 1997 Publication History

Abstract

Traditionally, cache coherence in multiprocessors has been maintained in hardware. However, the cost-effectiveness of hardwired protocols is questionable. Virtual Shared Memory systems have highlighted the many advantages of software-implemented protocols, albeit at a performance price. The performance gap is narrowed by hybrid systems with the addition of hardware support for fine-grain sharing. We have developed a software protocol for a COMA (Cache-Only Memory Architecture). We call the system SC-COMA for Software-Controlled COMA, to emphasize that the protocol engine is emulated by software executed on the main processor. Contrary to user-level protocols, the software handling coherence events in SC-COMA runs in sub-kernel mode, transparently providing the same services to applications as a hardware counterpart. The software emulation layer has been written and we compare SC-COMA to an idealized hardware COMA through detailed simulations. Our results show that SC-COMA is competitive. On systems with 32 processors, it achieves a slowdown of 11-56% with respect to its hardware counterpart, across a range of applications and memory pressures. SC-COMA scales well, up to 32 nodes. A study on the impact of faster processors on SC-COMA's relative performance indicates a consistent improvement, but with a limitation due to the loosely-integrated design. We conclude that SC-COMA is a viable solution to easily transform networks of workstations into powerful multiprocessors.

References

[1]
M. Björklund, F. Dahlgren, P. Stenström. Using Hints to Reduce the Read Miss Penalty for Flat COMA Protocols. Proc of the 28th Hawaii International Conference on System Sciences, pp. 242-251, 1995.
[2]
W.J. Bolosky. Software Coherence in Multiprocessor Memory Systems. PhD. Thesis. University of Rochester, 1993.
[3]
H. Burkhardt III et al. Overview of the KSR-1 Computer System. Technical Report KSR-TR-9202001, Kendall Square Research, Feb. 1992.
[4]
D. Chaiken, A. Agarwal. Software Extended Coherent Shared Memory: Performance and Cost. Proc. of the 21st Int. Symposium on Computer Architecture, pp. 314-324, May 1994.
[5]
Derek Chiou et al. StarT-NG: Delivering Seamless Parallel Computing. Euro-Par'95, Aug. 1995.
[6]
M. Dubois, J. Skeppstedt, P. Stenström. Essential Misses and Data Traffic in Coherence Protocols. Journal of Parallel and Distributed Computing, Vol. 29, No.2, pp. 108-125, Sep. 1995.
[7]
S. Dwarkadas, P. Keheler, A.L. Cox, W. Zwaenepoel. Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology. Proc. of the 20th Annual Int. Symp. on Computer Architecture, pp. 144-155, 1993.
[8]
H. Grahn, P. Stenström. Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors. Proc. of the 22nd Annual International Symposium on Computer Architecture. Santa Margherita, Italy, Jun. 1995.
[9]
E. Hagersten, A. Landin, S. Haridi. DDM-A Cache-Only Memory Architecture. IEEE Computer, Vol. 25, No.9, pp. 44-54, Sep. 1992.
[10]
M. Heinrich et al. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. Proc. of the Sixth Int. Conf. on Arch. Support for Programming Languages and Operating Systems, pp 274-285, 1994.
[11]
C. Holt, M. Heinrich, J.P. Singh, E. Rothberg, J. Hennessy. The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors. Technical Report CSL-TR-95-660, Computer Systems Laboratory. Stanford University, Jan. 1995.
[12]
M. Horowitz, M. Martonosi, T.C. Mowry, M.D. Smith. Informing Memory Operations: Providing Memory Performance Feedback in Modem Processors. Proceedings of the 23rd Annual Symposium on Computer Architecture, pages 260-270, 1996.
[13]
T. Joe. COMA-F: a Non-Hierarchical Cache Only Memory Architecture. PhD. Thesis, Stanford University, Mar. 1995.
[14]
J. Kubiatowicz, D. Chaiken, A. Agarwal. Closing the Window of Vulnerability in Multiphase Memory Transactions. Proc. of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM Sigplan Notices, Volume 27, Number 9, Sep. 1992.
[15]
J. Kuskin et al. The Stanford FLASH Multiprocessor. Proc. of the 21st Annual International Symposium on Computer Architecture. Apr. 1994.
[16]
D.E. Lenoski et al. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. Proc. of the 17th Annual Int. Symp. on Computer Architecture, pp 148-159, 1990
[17]
K. Li. IVY: A Shared Virtual Memory System for Parallel Computing. Proc. of the Int. Parallel Processing Conference pp. 94-101, 1988.
[18]
A. Moga, A. Gefflaut, M. Dubois. Hardware versus Software Implementation of COMA. Technical Report CENG 97-03. University of Southern California. Jan. 1997.
[19]
A. Nowatzyk et al. The S3.mp Scalable Shared Memory Multiprocessor. Proc. of the Int. Parallel Processing Conference, pp. I-1-I-10, 1995.
[20]
S.K. Reinhardt, J.R. Larus, D.A. Wood. Tempest and Typhoon: User-level Shared Memory. Proc. of the 21st Annual International Symposium on Computer Architecture, pages 325- 337, Apr. 1994.
[21]
S.K. Reinhardt, R.W. Pfile, D.A. Wood. Decoupled Hardware Support for Distributed Shared Memory. Proc. of the 23rd International Symposium on Computer Architecture, May 1996.
[22]
A. Saulsbury, T. Wilkinson, J. Carter and A. Landin. An Argument for Simple COMA. Proc. of the 1st Symposium on High-Peiformance Computer Architecture, pages 276-285, Raleigh, Jan. 1995.
[23]
A. Saulsbury, F. Pong, A. Nowatzyk. Missing the Memory Wall: The Case for Processor/Memory Integration. Proc. of the 23rd Annual Int. Symp. on Computer Architecture, 1996.
[24]
D.J. Scales, K. Gharachorloo, C.A. Thekkath. Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory. Proceedings of the 3rd International Symposium on High-Peiformance Computer Architecture, 1997.
[25]
I. Schoinas, B. Falsafi, A.R. Lebeck, S.K. Reinhardt, J.R. Larus and D.A. Wood. Fine-grain Access Control for Distributed Shared Memory. Proc. of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. Oct. 1994.
[26]
P. Stenström, T. Joe, A. Gupta. Comparative Performance Evaluation of Cache-Coherent NUMA and COMA architectures. Proc. of the 19th Annual Symposium on Computer Architecture, pages 80-91, May 1992.
[27]
J. Torrellas, D. Padua. The Illinois Aggressive COMA Multiprocessor Project (I-ACOMA). 6th Symposium on the Frontiers of Massively Parallel Computing, Oct. 1996.
[28]
S. Woo et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. Proc. of the 23rd Int. Symp. on Computer Architecture, pp. 24-36, 1995.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICPP '97: Proceedings of the international Conference on Parallel Processing
August 1997
334 pages
ISBN:081868108X

Publisher

IEEE Computer Society

United States

Publication History

Published: 11 August 1997

Author Tags

  1. COMA
  2. distributed shared memory
  3. networks of workstations
  4. performance evaluation.
  5. software cache coherence

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2005)Moving Address Translation Closer to Memory in Distributed Shared-Memory MultiprocessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2005.8416:7(612-623)Online publication date: 1-Jul-2005
  • (2004)Tolerating Late Memory Traps in Dynamically Scheduled ProcessorsIEEE Transactions on Computers10.1109/TC.2004.1853:6(732-743)Online publication date: 1-Jun-2004
  • (1999)Tolerating late memory traps in ILP processorsACM SIGARCH Computer Architecture News10.1145/307338.30098627:2(76-87)Online publication date: 1-May-1999
  • (1999)Tolerating late memory traps in ILP processorsProceedings of the 26th annual international symposium on Computer architecture10.1145/300979.300986(76-87)Online publication date: 2-May-1999
  • (1998)Options for dynamic address translation in COMAsACM SIGARCH Computer Architecture News10.1145/279361.27939026:3(214-225)Online publication date: 16-Apr-1998
  • (1998)Options for dynamic address translation in COMAsProceedings of the 25th annual international symposium on Computer architecture10.1145/279358.279390(214-225)Online publication date: 16-Apr-1998

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media