[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2560683.2560697acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
tutorial

Programming a Multicore Architecture without Coherency and Atomic Operations

Published: 16 October 2018 Publication History

Abstract

It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory.
In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a λ-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because λ-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex---the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.

References

[1]
G. Blake, R. Dreslinski, and T. Mudge, "A survey of multicore processors," Signal Processing Magazine, IEEE, vol. 26, pp. 26--37, 2009.
[2]
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. Adve, V. Adve, N. Carter, and C.-T. Chou, "DeNovo: Rethinking the memory hierarchy for disciplined parallelism," in Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, 2011, pp. 155--166.
[3]
F. Petrot, A. Greiner, and P. Gomez, "On cache coherency and memory consistency issues in NoC based shared memory multiprocessor SoC architectures," in 9th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD), 2006, pp. 53--60.
[4]
J. Howard, S. Dighe, Y. Hoskote et al., "A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, 2010, pp. 108--109.
[5]
S. V. Adve and H.-J. Boehm, "Memory models: a case for rethinking parallel languages and hardware," Commun. ACM, vol. 53, pp. 90--101, 2010.
[6]
E. Lee, "The problem with threads," Computer, vol. 39, 2006.
[7]
Clojure, 2007. {Online}. Available: http://clojure.org
[8]
C. Grelck, "Shared memory multiprocessor support for sac," in Implementation of Functional Languages, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1999, vol. 1595, pp. 38--53.
[9]
S. Marlow, S. Peyton Jones, and S. Singh, "Runtime support for multicore Haskell," SIGPLAN Not., vol. 44, pp. 65--78, 2009.
[10]
S. P. Jones, A. Gordon, and S. Finne, "Concurrent Haskell," in Proceedings of the 23 rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, vol. 21, no. 24, 1996, pp. 295--308.
[11]
M. M. T. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow, "Data parallel Haskell: a status report," in Proceedings of the 2007 workshop on Declarative aspects of multicore programming, ser. DAMP '07, 2007, pp. 10--18.
[12]
House, 2006. {Online}. Available: http://programatica.cs.pdx.edu/House/
[13]
J. Armstrong, R. Virding, C. Wikström, and M. Williams, "Concurrent programming in ERLANG," 1993.
[14]
R. Loogen, Y. Ortega-Mallén, and R. Peña-Marí, "Parallel functional programming in Eden," Journal of Functional Programming, vol. 15, pp. 431--475, 2005.
[15]
K. C. Sivaramakrishnan, L. Ziarek, R. Prasad, and S. Jagannathan, "Lightweight asynchrony using parasitic threads," in Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming, ser. DAMP '10, 2010, pp. 63--72.
[16]
D. Culler, J. P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, 1998, ISBN 1558603433.
[17]
A. Boeijink, P. K. F. Hölzenspies, and J. Kuper, "Introducing the PilGRIM: A processor for executing lazy functional languages," in Implementation and Application of Functional Languages, ser. Lecture Notes in Computer Science. Springer, 2011, vol. 6647, pp. 54--71.
[18]
A. Bhattacharjee, G. Contreras, and M. Martonosi, "Parallelization libraries: Characterizing and reducing overheads," ACM Trans. Archit. Code Optim., vol. 8, pp. 5:1--5:29, 2011. {Online}. Available: http://doi.acm.org/10.1145/1952998.1953003
[19]
J. J. Tithi, D. Matani, G. Menghani, and R. A. Chowdhury, "Avoiding locks and atomic instructions in shared-memory parallel BFS using optimistic parallelization," in Proceedings of the Workshop on Multithreaded Architectures and Applications (MTAAP 2013), 2013, pp. 1628--1637.
[20]
R. Nasre, M. Burtscher, and K. Pingali, "Atomic-free irregular computations on GPUs," in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, ser. GPGPU-6, 2013, pp. 96--107.
[21]
J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Commun. ACM, vol. 51, pp. 107--113, 2008.
[22]
A. Church and J. B. Rosser, "Some properties of conversion," Transactions of the American Mathematical Society, vol. 39, no. 3, pp. 472--482, 1936.
[23]
T. Johnsson, "Lambda lifting: Transforming programs to recursive equations," in Functional Programming Languages and Computer Architecture, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1985, vol. 201, pp. 190--203.
[24]
T. A. Anderson, "Optimizations in a private nursery-based garbage collector," in Proceedings of the 2010 international symposium on Memory management, ser. ISMM '10, 2010, pp. 21--30.
[25]
R. Jones, A. Hosking, and E. Moss, The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman and Hall, Jan. 2012, iSBN 978-1-4200-8279-1.
[26]
F. Henderson, "Accurate garbage collection in an uncooperative environment," in Proceedings of the 3rd international symposium on Memory management, ser. ISMM '02, 2002, pp. 150--156.
[27]
L. Lamport, "A new solution of Dijkstra's concurrent programming problem," Commun. ACM, vol. 17, pp. 453--455, 1974.
[28]
J. H. Rutgers, M. J. G. Bekooij, and G. J. M. Smit, "Portable Memory Consistency for software managed distributed memory in many-core SoC," in 20th Reconfigurable Architectures Workshop (RAW 2013), 2013, pp. 212--221.
[29]
M. Herlihy, "A methodology for implementing highly concurrent data objects," ACM Trans. Program. Lang. Syst., vol. 15, pp. 745--770, 1993.
[30]
Intel, "Intel 64 architecture memory ordering white paper," SKU 318147-001, 2007.
[31]
B. Bershad, M. Zekauskas, and W. Sawdon, "The Midway distributed shared memory system," in Compcon Spring '93, Digest of Papers., 1993, pp. 528--537.
[32]
J. H. Rutgers, M. J. G. Bekooij, and G. J. M. Smit, "Evaluation of a connectionless NoC for a real-time distributed shared memory manycore system," in Proceedings of the 15th Euromicro Conference on Digital System Design, DSD 2012, 2012, pp. 727--730.
[33]
NoFib Haskell Benchmark Suite, 2010. {Online}. Available: http://darcs.haskell.org/nofib/

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores
February 2014
156 pages
ISBN:9781450326575
DOI:10.1145/2578948

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache coherency
  2. distributed shared memory
  3. embedded system
  4. functional language
  5. memory model

Qualifiers

  • Tutorial
  • Refereed limited

Conference

PPoPP '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 27
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media