More Web Proxy on the site http://driver.im/

tutorial

Programming a Multicore Architecture without Coherency and Atomic Operations

Authors:

Jochem H. Rutgers,

Marco J. G. Bekooij,

Gerard J. M. SmitAuthors Info & Claims

PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Pages 29 - 38

https://doi.org/10.1145/2560683.2560697

Published: 16 October 2018 Publication History

Abstract

It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory.

In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a λ-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because λ-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex---the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.

References

[1]

G. Blake, R. Dreslinski, and T. Mudge, "A survey of multicore processors," Signal Processing Magazine, IEEE, vol. 26, pp. 26--37, 2009.

[2]

B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. Adve, V. Adve, N. Carter, and C.-T. Chou, "DeNovo: Rethinking the memory hierarchy for disciplined parallelism," in Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, 2011, pp. 155--166.

Digital Library

[3]

F. Petrot, A. Greiner, and P. Gomez, "On cache coherency and memory consistency issues in NoC based shared memory multiprocessor SoC architectures," in 9th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2006, pp. 53--60.

Digital Library

[4]

J. Howard, S. Dighe, Y. Hoskote et al., "A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, 2010, pp. 108--109.

[5]

S. V. Adve and H.-J. Boehm, "Memory models: a case for rethinking parallel languages and hardware," Commun. ACM, vol. 53, pp. 90--101, 2010.

Digital Library

[6]

E. Lee, "The problem with threads," Computer, vol. 39, 2006.

Digital Library

[7]

Clojure, 2007. {Online}. Available: http://clojure.org

[8]

C. Grelck, "Shared memory multiprocessor support for sac," in Implementation of Functional Languages, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1999, vol. 1595, pp. 38--53.

Digital Library

[9]

S. Marlow, S. Peyton Jones, and S. Singh, "Runtime support for multicore Haskell," SIGPLAN Not., vol. 44, pp. 65--78, 2009.

Digital Library

[10]

S. P. Jones, A. Gordon, and S. Finne, "Concurrent Haskell," in Proceedings of the 23 rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, vol. 21, no. 24, 1996, pp. 295--308.

Digital Library

[11]

M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow, "Data parallel Haskell: a status report," in Proceedings of the 2007 workshop on Declarative aspects of multicore programming, ser. DAMP '07, 2007, pp. 10--18.

Digital Library

[12]

House, 2006. {Online}. Available: http://programatica.cs.pdx.edu/House/

[13]

J. Armstrong, R. Virding, C. Wikström, and M. Williams, "Concurrent programming in ERLANG," 1993.

Digital Library

[14]

R. Loogen, Y. Ortega-Mallén, and R. Peña-Marí, "Parallel functional programming in Eden," Journal of Functional Programming, vol. 15, pp. 431--475, 2005.

Digital Library

[15]

K. C. Sivaramakrishnan, L. Ziarek, R. Prasad, and S. Jagannathan, "Lightweight asynchrony using parasitic threads," in Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming, ser. DAMP '10, 2010, pp. 63--72.

Digital Library

[16]

D. Culler, J. P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, 1998, ISBN 1558603433.

Digital Library

[17]

A. Boeijink, P. K. F. Hölzenspies, and J. Kuper, "Introducing the PilGRIM: A processor for executing lazy functional languages," in Implementation and Application of Functional Languages, ser. Lecture Notes in Computer Science. Springer, 2011, vol. 6647, pp. 54--71.

Digital Library

[18]

A. Bhattacharjee, G. Contreras, and M. Martonosi, "Parallelization libraries: Characterizing and reducing overheads," ACM Trans. Archit. Code Optim., vol. 8, pp. 5:1--5:29, 2011. {Online}. Available: http://doi.acm.org/10.1145/1952998.1953003

Digital Library

[19]

J. J. Tithi, D. Matani, G. Menghani, and R. A. Chowdhury, "Avoiding locks and atomic instructions in shared-memory parallel BFS using optimistic parallelization," in Proceedings of the Workshop on Multithreaded Architectures and Applications (MTAAP 2013), 2013, pp. 1628--1637.

Digital Library

[20]

R. Nasre, M. Burtscher, and K. Pingali, "Atomic-free irregular computations on GPUs," in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, ser. GPGPU-6, 2013, pp. 96--107.

Digital Library

[21]

J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Commun. ACM, vol. 51, pp. 107--113, 2008.

Digital Library

[22]

A. Church and J. B. Rosser, "Some properties of conversion," Transactions of the American Mathematical Society, vol. 39, no. 3, pp. 472--482, 1936.

[23]

T. Johnsson, "Lambda lifting: Transforming programs to recursive equations," in Functional Programming Languages and Computer Architecture, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1985, vol. 201, pp. 190--203.

Digital Library

[24]

T. A. Anderson, "Optimizations in a private nursery-based garbage collector," in Proceedings of the 2010 international symposium on Memory management, ser. ISMM '10, 2010, pp. 21--30.

Digital Library

[25]

R. Jones, A. Hosking, and E. Moss, The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman and Hall, Jan. 2012, iSBN 978-1-4200-8279-1.

Digital Library

[26]

F. Henderson, "Accurate garbage collection in an uncooperative environment," in Proceedings of the 3rd international symposium on Memory management, ser. ISMM '02, 2002, pp. 150--156.

Digital Library

[27]

L. Lamport, "A new solution of Dijkstra's concurrent programming problem," Commun. ACM, vol. 17, pp. 453--455, 1974.

Digital Library

[28]

J. H. Rutgers, M. J. G. Bekooij, and G. J. M. Smit, "Portable Memory Consistency for software managed distributed memory in many-core SoC," in 20th Reconfigurable Architectures Workshop (RAW 2013), 2013, pp. 212--221.

Digital Library

[29]

M. Herlihy, "A methodology for implementing highly concurrent data objects," ACM Trans. Program. Lang. Syst., vol. 15, pp. 745--770, 1993.

Digital Library

[30]

Intel, "Intel 64 architecture memory ordering white paper," SKU 318147-001, 2007.

[31]

B. Bershad, M. Zekauskas, and W. Sawdon, "The Midway distributed shared memory system," in Compcon Spring '93, Digest of Papers., 1993, pp. 528--537.

[32]

J. H. Rutgers, M. J. G. Bekooij, and G. J. M. Smit, "Evaluation of a connectionless NoC for a real-time distributed shared memory manycore system," in Proceedings of the 15th Euromicro Conference on Digital System Design, DSD 2012, 2012, pp. 727--730.

Digital Library

[33]

NoFib Haskell Benchmark Suite, 2010. {Online}. Available: http://darcs.haskell.org/nofib/

Index Terms

Programming a Multicore Architecture without Coherency and Atomic Operations
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Functional languages

Recommendations

Programming a Multicore Architecture without Coherency and Atomic Operations
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by ...
Simplicity Versus Accuracy in a Model of Cache Coherency Overhead

The important factors building a model of coherency overhead for a single-bus, shared memory multiprocessor are analyzed. Three architectural features are examined: the size of the coherency block, the cache size, and the type of bus operation used to ...
Design and performance evaluation of an expendable modular directory scheme for maintaining cache coherency in multiprocessor systems
SSST '95: Proceedings of the 27th Southeastern Symposium on System Theory (SSST'95)

In multiprocessor systems with private caches, inconsistencies between blocks contained within separate catches can occur. A scheme must be devised to eliminate data inconsistencies and maintain coherency within the caches. One category of schemes for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

February 2014

156 pages

ISBN:9781450326575

DOI:10.1145/2578948

Conference Chairs:
Pavan Balaji
Argonne National Laboratory, USA
,
Minyi Guo
Shanghai Jiao Tong, University, China
,
Zhiyi Huang
University of Otago, New Zealand

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Refereed limited

Conference

PPoPP '14

Sponsor:

SIGPLAN

PPoPP '14: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 15 - 19, 2014

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
27
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents