[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Modeling and Analyzing Dataflow Applications on NoC-Based Many-Core Architectures

Published: 21 April 2015 Publication History

Abstract

The advent of chip-level parallel architectures prompted a renewal of interest into dataflow process networks. The trend is to model an application independently from the architecture, then the model is morphed to best fit the target architecture. One downplayed aspect is the mapping of communications through the on-chip topology. The cost of such communications is often prevalent with regard to computations.
This article establishes a dataflow process network called K-periodically Routed Graph (KRG), which serves the role of representing the various routing decisions during the transformation of a genuine application into a architecture-aware version for this application.

References

[1]
Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. 2013. FastFlow: High-level and efficient streaming on multi-core. In Programming Multi-Core and Many-Core Computing Systems, S. Pllana and F. Xhafa (Eds.). Wiley.
[2]
Randy Allen and Ken Kennedy. 1984. Automatic loop interchange (with retrospective). In Best of PLDI, Kathryn S. McKinley (Ed.). ACM, New York, NY, 75--90.
[3]
Eitan Altman, Bruno Gaujal, and Arie Hordijk. 2000. Balanced sequences and optimal routing. Journal of the ACM 47, 4, 752--775.
[4]
Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Design, Automation, and Test in Europe Conference Exhibition (DATE). 983--987.
[5]
Shuvra S. Bhattacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, Norwell, MA.
[6]
Greet Bilsen, Marc Engels, Rudy Lauwereins, and Jean A. Peperstraete. 1995. Cyclo-static dataflow. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), Vol. 5. 3255--3258.
[7]
Julien Boucaron, Jean-Vivien Millo, and Robert De Simone. 2006. Latency-insensitive design and central repetitive scheduling. In Proceedings of the 4th ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE’06). IEEE, Los Alamitos, CA, 175--183.
[8]
Florian Brandner and Martin Schoeberl. 2012. Static routing in symmetric real-time network-on-chips. In Proceedings of the 20th International Conference on Real-Time and Network Systems (RTNS’12). ACM, New York, NY, 61--70.
[9]
David Broman, Michael Zimmer, Yooseong Kim, Hokeun Kim, Jian Cai, Aviral Shrivastava, Stephen A. Edwards, and Edward A. Lee. 2013. Precision timed infrastructure: Design challenges. In Proceedings of the Electronic System Level Synthesis Conference (ESLsyn’13). 1--6. http://chess.eecs.berkeley.edu/pubs/993.html.
[10]
Joseph T. Buck. 1993. Scheduling Dynamic Dataflow Graphs with Bounded Memory Using the Token Flow Model. Ph.D. Dissertation. University of California, Berkeley.
[11]
José Cano, José Flich, José Duato, Marcello Coppola, and Riccardo Locatelli. 2011. Efficient routing implementation in complex systems-on-chip designs. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip (NOCS’11). ACM, New York, NY, 1--8.
[12]
Rohit Chandra, Leonardo Dagun, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon. 2001. Parallel programming in OpenMP. Morgan Kaufmann, San Francisco, CA. http://opac.inria.fr/record=b1101261.
[13]
Piotr Chrzastowski-Wachtel and Marek Raczunas. 1993. Liveness of weighted circuits and the Diophantine problem of Frobenius. In Fundamentals of Computation Theory. Springer, 171--180.
[14]
Anthony Coadou. 2010. Réseaux de processus flots de données avec routage pour la modélisation de systèmes embarqués. Ph.D. Dissertation. University of Nice Sophia Antipolis.
[15]
Albert Cohen, Marc Duranton, Christine Eisenbeis, Claire Pagetti, Florence Plateau, and Marc Pouzet. 2006. N-synchronous Kahn networks: A relaxed model of synchrony for real-time systems. In Proceedings of POPL’06: Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, New York, NY, 180--193.
[16]
Frederic Commoner, Anatol W. Holt, Shimon Even, and Amir Pnueli. 1971. Marked directed graph. Journal of Computer and System Sciences 5, 511--523.
[17]
Michel Cosnard and Denis Trystram. 1993. Algorithmes et architectures parallèles. InterEditions, Paris. http://opac.inria.fr/record=b1077080.
[18]
Loïc Cudennec and Renaud Sirdey. 2012. Parallelism reduction based on pattern substitution in dataflow oriented programming languages. Procedia Computer Science 9, 146--155.
[19]
Giovanni de Micheli and Luca Benini. 2006. Networks on Chips. Morgan Kauffmann (Elsevier).
[20]
Jean de Rumeur. 1994. Communication dans les réseaux de processeurs. Masson, Paris, France.
[21]
Manel Djemal, Francois Pecheux, Dumitru Potop-Butucaru, Robert de Simone, Franck Wajsburt, and Zhen Zhang. 2012. Programmable routers for efficient mapping of applications onto NoC-based MPSoCs. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP’12). 1--8.
[22]
Paul Feautrier. 1992a. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. International Journal of Parallel Programming 21, 5, 313--347.
[23]
Paul Feautrier. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International Journal of Parallel Programming 21, 6, 389--420.
[24]
Pascal Fradet, Alain Girault, and Peter Poplavkoy. 2012. SPDF: A schedulable parametric data-flow MoC. In DATE, W. Rosenstiel and L. Thiele (Eds.). IEEE, Los Alamitos, CA, 769--774.
[25]
Kees Goossens and Andreas Hansson. 2010. The aethereal network on chip after ten years: Goals, evolution, lessons, and future. In Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC’10). 306--311.
[26]
Michael I. Gordon. 2010. Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.
[27]
Juraj Hromkovič, Ralf Klasing, Andrzej Pelc, Peter Ružička, and Walter Unger. 2005. Dissemination of Information in Communication Networks: Part I. Broadcasting, Gossiping, Leader Election, and Fault-Tolerance. Springer-Verlag.
[28]
Gilles Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing 74: Proceedings of the IFIP Congress 74. 471--475.
[29]
Kalray. 2012. MPPA Manycore. Retrieved March 18, 2015, from http://www.kalray.eu/products/mppa-manycore.
[30]
Michal Karczmarek, William Thies, and Saman Amarasinghe. 2003. Phased scheduling of stream programs. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES’03). ACM, New York, NY, 103--112.
[31]
Richard M. Karp, Raymond E. Miller, and Shmuel Winograd. 1967. The organization of computations for uniform recurrence equations. Journal of the ACM 14, 3, 563--590.
[32]
Bart Kienhuis, Edwin Rijpkema, and Ed Deprettere. 2000. Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In Proceedings of the 8th International Workshop on Hardware/Software Codesign (CODES’00). 13--17.
[33]
Hermann Kopetz and Günther Bauer. 2003. The time-triggered architecture. Proceedings of the IEEE 91, 1, 112--126.
[34]
Leslie Lamport. 1974. The parallel execution of DO loops. Communications of the ACM 17, 2, 83--93.
[35]
Edward A Lee. 2006. The problem with threads. Computer 39, 5, 33--42.
[36]
Edward A. Lee and David G. Messerschmitt. 1987a. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers C-36, 1, 24--35.
[37]
Edward A. Lee and David G. Messerschmitt. 1987b. Synchronous data flow. Proceeding of the IEEE 75, 9, 1235--1245.
[38]
F. Thomson Leighton. 1992. Introduction to Parallel Algorithms and Architectures: Array, Trees, Hypercubes. Morgan Kaufmann, San Francisco, CA.
[39]
Diego Melpignano, Luca Benini, Eric Flamand, Bruno Jego, Thierry Lepley, Germain Haugou, Fabien Clermidy, and Denis Dutoit. 2012. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). 1137--1142.
[40]
Jean-Vivien Millo and Robert Simone. 2013. Explicit routing schemes for implementation of cellular automata on processor arrays. Natural Computing 12, 3, 353--368.
[41]
Robin Milner. 1982. A Calculus of Communicating Systems. Springer-Verlag, New York, NY.
[42]
Hristo Nikolov, Todor Stefanov, and Ed Deprettere. 2008. Systematic and automated multiprocessor system design, programming, and implementation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27, 3, 542--555.
[43]
Thomas Parks. 1995. Bounded Scheduling of Process Networks. Ph.D. Dissertation. Department of EECS, University of California, Berkeley.
[44]
Carl A. Petri. 1962. Kommunikation mit Automaten. Ph.D. Dissertation. Technische Universitat Darmstadt, Germany.
[45]
Ville Rantala, Teijo Lehtonen, and Juha Plosila. 2006. Network on Chip Routing Algorithms. Turku Centre for Computer Science.
[46]
Kaushik Ravindran, Arkadeb Ghosal, Rhishikesh Limaye, Guoqiang Wang, Guang Yang, and Hugo Andrade. 2012. Analysis techniques for static dataflow models with access patterns. In Proceedings of the Conference on Design and Architectures for Signal and Image Processing (DASIP’12). 1--8.
[47]
Faizal A. Samman, Thomas Hollstein, and Mandfred Glesner. 2008. Multicast parallel pipeline router architecture for network-on-chip. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’08). 1396--1401.
[48]
Bart D. Theelen, Marc C. W. Geilen, Twan Basten, Jeroen P. M. Voeten, Stefan V. Gheorghita, and Sander Stuijk. 2006. A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In Proceedings of the 4th IEEE/ACM International Conference on Formal Methods and Models for Co-Design (MEMOCODE’06). 185--194.
[49]
Sven Verdoolaege. 2013. Polyhedral process networks. In Handbook of Signal Processing Systems, S. S. Bhattacharyya, E. F. Deprettere, R. Leupers, and J. Takala (Eds.). Springer, New York, NY, 1335--1375.
[50]
Miao Wang and François Bodin. 2011. Compiler-directed memory management for heterogeneous MPSoCs. Journal of Systems Architecture 57, 1, 134--145.
[51]
David Whelihan. 2013. NoCsim. Retrieved March 18, 2015, from http://nocsim.sourceforge.net/.
[52]
Maarten H. Wiggers, Marco J. G. Bekooij, and Gerard J. M. Smit. 2008. Buffer capacity computation for throughput constrained streaming applications with data-dependent inter-task communication. In Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS’08). IEEE, Los Alamitos, CA, 183--194.

Cited By

View all
  • (2016)A formal approach to the mapping of tasks on an heterogenous multicore, energy-aware architectureProceedings of the 14th ACM-IEEE International Conference on Formal Methods and Models for System Design10.5555/3343414.3343436(153-162)Online publication date: 18-Nov-2016
  • (2016)Notifying memoriesProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2898051(1-6)Online publication date: 5-Jun-2016
  • (2016)A formal approach to the mapping of tasks on an heterogenous multicore, energy-aware architecture2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)10.1109/MEMCOD.2016.7797760(153-162)Online publication date: Nov-2016

Index Terms

  1. Modeling and Analyzing Dataflow Applications on NoC-Based Many-Core Architectures

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 14, Issue 3
      Special Issue on Embedded Platforms for Crypto and Regular Papers
      May 2015
      515 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/2764962
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 21 April 2015
      Accepted: 01 November 2014
      Revised: 01 November 2014
      Received: 01 April 2014
      Published in TECS Volume 14, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Adequation algorithm architecture
      2. dataflow process network
      3. network-on-chip
      4. routing

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)A formal approach to the mapping of tasks on an heterogenous multicore, energy-aware architectureProceedings of the 14th ACM-IEEE International Conference on Formal Methods and Models for System Design10.5555/3343414.3343436(153-162)Online publication date: 18-Nov-2016
      • (2016)Notifying memoriesProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2898051(1-6)Online publication date: 5-Jun-2016
      • (2016)A formal approach to the mapping of tasks on an heterogenous multicore, energy-aware architecture2016 ACM/IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)10.1109/MEMCOD.2016.7797760(153-162)Online publication date: Nov-2016

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media