[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1995896.1995948acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

Published: 31 May 2011 Publication History

Abstract

The Cray XMT architecture has incited curiosity among computer architect and system software designers for its architecture support of fine-grain in-memory synchronization. Although such discussion go back thirty years, there is a lack of practical experimental platforms that can evaluate major technological trends, such as fine-grain in-memory synchronization. The need for these platforms becomes apparent when dealing with new massive many-core designs and applications.
This paper studies the feasibility, usefulness and trade-offs of fine-grain in-memory synchronization support in a real-world large-scale many-core chip (IBM Cyclops-64). We extended the original Cyclops-64 architecture design at gate level to support the fine-grain in-memory synchronization feature. We performed an in-depth study of a well-known kernel code: the wavefront computation. Several versions of the kernel were used to test the effects of different synchronization constructs using our chip emulation framework. Furthermore, we tested selected OpenMP kernel loops against existing software-based synchronization approaches.
In our wavefront benchmark study, the combination of fine-grain dataflow-like in-memory synchronization with non-strict scheduling methods yields a thirty percent improvement over the best optimized traditional synchronization method provided by the original Cyclops-64 design. For the OpenMP kernel loops, we achieved speeds of three to fourteen times the speed of software-based synchronization methods.

References

[1]
A. Agarwal, J. Kubiatowicz, D. Kranz, B. Lim, D. Yeung, G. D'Souza, and M. Parkin. Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors. IEEE Micro, 13(3):48--61, 1993.
[2]
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera Computer System. In Proceedings of the 4th International Conference on Supercomputing, pages 1--6. ACM, 1990.
[3]
R. Arvind, R. Nikhil, and K. Pingali. I-Structures: Data Structures for Parallel Computing. TOPLAS, 11(4):598--632, 1989.
[4]
B. Carlstrom, A. McDonald, H. Chafi, J. Chung, C. Minh, C. Kozyrakis, and K. Olukotun. The Atomos Transactional Programming Language. ACM SIGPLAN Notices, 41(6):13, 2006.
[5]
W. Dally, L. Chao, A. Chien, S. Hassoun, W. Horwat, J. Kaplan, P. Song, B. Totty, and S. Wills. Architecture of a Message-Driven Processor. In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 189--196. ACM, 1987.
[6]
J. del Cuvillo, W. Zhu, Z. Hu, and G. Gao. TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture. In 19th IEEE International Parallel and Distributed Processing Symposium, 2005. Proceedings, page 8, 2005.
[7]
J. Del Cuvillo, W. Zhu, Z. Hu, and G. Gao. Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture. In High-Performance Computing in an Advanced Collaborative Environment, 2006. HPCS 2006. 20th International Symposium on, pages 9--9, 2006.
[8]
J. Dennis. The Evolution of 'Static' Dataflow Architecture. Advanced Topics in Data-Flow Computing, pages 35--91.
[9]
J. Dennis, J. Fosseen, and J. Linderman. Data Flow Schemas. In International Symposium on Theoretical Programming, pages 187--216. Springer, 1974.
[10]
J. Feo, D. Harper, S. Kahan, and P. Konecny. Eldorado. In Proceedings of the 2nd Conference on Computing Frontiers, page 34. ACM, 2005.
[11]
M. Fillo, S. Keckler, W. Dally, N. Carter, A. Chang, Y. Gurevich, and W. Lee. The M-Machine Multicomputer. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 146--156. IEEE Computer Society Press, 1995.
[12]
D. Gajski, D. Kuck, D. Lawrie, and A. Sameh. CEDAR: A Large Scale Multiprocessor. ACM SIGARCH Computer Architecture News, 11(1):7--11, 1983.
[13]
M. Herlihy and J. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, page 300. ACM, 1993.
[14]
A. Kejariwal, H. Saito, X. Tian, M. Girkar, W. Li, U. Banerjee, A. Nicolau, and C. Polychronopoulos. Lightweight Lock-Free Synchronization Methods for Multithreading. In Proceedings of the 20th Annual International Conference on Supercomputing, pages 361--371. ACM, 2006.
[15]
M. Noakes, D. Wallach, and W. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In ACM SIGARCH Computer Architecture News, volume 21, pages 224--235. ACM, 1993.
[16]
G. Papadopoulos and D. Culler. Monsoon: An Explicit Token-Store Architecture. ACM SIGARCH Computer Architecture News, 18(3a):82--91, 1990.
[17]
J. Ributzka, Y. Hayashi, F. Chen, and G. Gao. DEEP: An Iterative FPGA-based Many-core Emulation System for Chip Verification and Architecture Research. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 115--118. ACM, 2011.
[18]
H. Sakane, L. Yakay, V. Karna, C. Leung, and G. Gao. DIMES: An Iterative Emulation Platform for Multiprocessor-System-On-Chip Designs. In 2003 IEEE International Conference on Field-Programmable Technology (FPT), 2003. Proceedings, pages 244--251, 2003.
[19]
B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. Real-Time Signal Processing IV, pages 241--248, 1982.
[20]
K. R. Traub. A Compiler for the MIT Tagged-token Dataflow Architecture. 1986.
[21]
Y. Zhang, T. Jeong, F. Chen, H. Wu, R. Nitzsche, and G. Gao. A Study of the On-Chip Interconnection Network for the IBM Cyclops64 Multi-Core Architecture. In 20th International Parallel and Distributed Processing Symposium (IPDPS), page 10. IEEE, 2006.
[22]
W. Zhu, V. Sreedhar, Z. Hu, and G. Gao. Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures. In Proceedings of the 34th Annual International Symposium on Computer Architecture, page 45. ACM, 2007.

Index Terms

  1. The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '11: Proceedings of the international conference on Supercomputing
    May 2011
    398 pages
    ISBN:9781450301022
    DOI:10.1145/1995896
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. emulation
    2. fine-grain synchronization
    3. many-core architectures

    Qualifiers

    • Research-article

    Conference

    ICS '11
    Sponsor:
    ICS '11: International Conference on Supercomputing
    May 31 - June 4, 2011
    Arizona, Tucson, USA

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 162
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media