ACOTES Project: Advanced Compiler Technologies for Embedded Streaming

Harm Munk¹,
Eduard Ayguadé⁵,
Cédric Bastoul³,
Paul Carpenter⁵,
Zbigniew Chamski¹,
Albert Cohen³,
Marco Cornero⁴,
Philippe Dumont^1,3,
Marc Duranton¹,
Mohammed Fellahi³,
Roger Ferrer⁵,
Razya Ladelsky²,
Menno Lindwer⁶,
Xavier Martorell⁵,
Cupertino Miranda³,
Dorit Nuzman²,
Andrea Ornstein⁴,
Antoniu Pop⁷,
Sebastian Pop⁸,
Louis-Noël Pouchet³,
Alex Ramírez⁵,
David Ródenas⁵,
Erven Rohou⁴,
Ira Rosen²,
Uzi Shvadron²,
Konrad Trifunović³ &
…
Ayal Zaks²

227 Accesses
8 Citations
Explore all metrics

Abstract

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Daedalus: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Accelerating Real-Time Applications with Predictable Work-Stealing

References

Blossom E.: GNU Radio: tools for exploring the radio frequency spectrum. Linux J. 122, 4 (2004)
Google Scholar
International Organization for Standardization, ISO/IEC JTC1/SC29/WG11: Coding of Moving Pictures and Audio, Overview of the MPEG-4 Standard, http://www.chiariglione.org/mpeg
Balart, J., Duran, A., Gonzàlez, M., Martorell, X., Ayguadé, E., Labarta, J.: Nanos mercurium: a research compiler for OpenMP. In: Proceedings of the European Workshop on OpenMP 2004, October 2004
StreamIt Language Specification Version 2.1 (September 2006)
Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 151–162 (2006)
Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: Proceedings of the ACM Conference on Programming Languages Design and Implementation (PLDI’08), pp. 114–124. ACM, New York (June 2008)
Lee E.A., Messerschmitt D.G.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–25 (1987)
Article MATH Google Scholar
Feautrier, P.: Scalable and modular scheduling. In: Pimentel, A.D., Vassiliadis, S. Computer Systems: Architectures, Modeling and Simulation (SAMOS’04), number 3133 in LNCS, pp. 433–442. Springer, Berlin (2004)
Pop, A., Pop, S.: A proposal for lastprivate clause on OpenMP task pragma, Technical report, MINES ParisTech, CRI—Centre de Recherche en Informatique, Mathématiques et Systèmes, 35 rue St Honoré 77305 Fontainebleau-Cedex, France. http://www.cri.ensmp.fr/classement/doc/A-403.pdf (2009)
Pouchet, L.-N., Bastoul, C., Cohen, A., Cavazos, J.: Iterative optimization in the polyhedral model: part II, multidimensional time. In: ACM Conference on Programming Language Design and Implementation (PLDI’08), Tucson, Arizona (June 2008)
Fatahalian, K., Horn, D.R., Knightd, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: ACM/IEEE conference on Supercomputing (SC’06) (2006)
OpenMP Organization: OpenMP Application Program Interface, v. 3.0. http://www.openmp.org (May 2008)
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Federico, M., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in OpenMP. In: Proceedings of the 3rd International Workshop on OpenMP (IWOMP), pp. 1–12 (2007)
Gonzàlez, M., Ayguadé, E., Martorell, X., Labarta, J.: Complex pipelined executions in OpenMP parallel applications. In: Proceedings of the 2001 International Conference on Parallel Processing (ICPP), pp. 295–304. IEEE Computer Society, Washington, DC (2001)
Gonzalez, M., Ayguade, E., Martorell, X., Labarta, J.: Exploiting pipelined executions in OpenMP. In: Proceedings of the 2003 International Conference on Parallel Processing (ICPP’03), pp. 153–160 (2003)
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the cell BE architecture. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)
Nijhuis, M., Bos, H., Bal, H., Augonnet, C.: Mapping and synchronizing streaming applications on Cell processors. In: International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Paphos, Cyprus (January 2009)
ILOG. CPLEX Math Programming Engine. http://www-304.ibm.com/jct01003c/software/integration/optimization/cplex
Lundgren, W.I., Barnes, K.B., Steed, J.W.: Gedae: Auto Coding to a Virtual Machine. www.gedae.com (2004)
Carpenter, P., Ramirez, A., Martorell, X., Rodenas, D., Ferrer, R.: Report on Streaming Programming Model and Abstract Streaming Machine Final Version. Deliverable D2.2, IST ACOTES Project (September 2008)
Carpenter, P., Rodenas, D., Martorell, X., Ramirez, A., Ayguadé, E.: A streaming machine description and programming model. In: Vassiliadis S., et al. (eds.), Proceedings of the International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Lecture Notes in Computer Science, vol. 4599, pp. 107–116. Springer, Berlin (August 2007)
Fursin, G., Cohen, A.: Building a practical iterative interactive compiler. In: 1st Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART’07), Colocated with HiPEAC 2007 Conference (2007)
Girona, S., Labarta, J., Badia, R.M.: Validation of dimemas communication model for MPI collective operations. In: Proceedings of the 7th European PVM/MPI Users’ Group Meeting, Lecture Notes In Computer Science, vol. 1908, pp. 39–46. Springer (2000)
Carpenter, P.M., Ramirez, A., Ayguade, E.: Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. In: High Performance Embedded Architectures and Compilers 5th International Conference, HiPEAC’10, pp. 96–110. Springer (January 2010)
Pop, A., Pop, S., Jagasia, H., Sjödin, J., Kelly, P.H.J.: Improving GNU compiler collection infrastructure for streamization. In: Proceedings of the 2008 GCC Developers’ Summit, pp. 77–86. http://www.gccsummit.org/2008 (2008)
Fellahi, M., Cohen, A.: Software pipelining in nested loops with prolog-epilog merging. In: International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09), LNCS. Springer, Paphos, Cyprus (January 2009)
Girbal, S., Vasilache, N., Bastoul, C., Cohen, A., Parello, D., Sigler, M., Temam, O.: Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. 34(3), 261–317 (June 2006), special issue on Microgrids
Allen R., Kennedy K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, Burlington (2001)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelization and locality optimization system. In: ACM Conference on Programming Languages Design and Implementation (PLDI’08). Tucson, AZ (June 2008)
Naishlos, D.: Autovectorization in GCC. In: Proceedings of the GCC Developers’ summit, pp. 105–118. ftp://gcc.gnu.org/pub/gcc/summit/2004/Autovectorization.pdf (June 2004)
Scarborough R.G., Kolsky H.G.: A vectorizing Fortran compiler. IBM J. Res. Dev. 30(2), 163–171 (1986)
Article MathSciNet Google Scholar
Wolfe M.: High Performance Compilers for Parallel Computing. Addison Wesley, Reading (1996)
MATH Google Scholar
Ngo, V.: Parallel loop transformation techniques for vector-based multiprocessor systems. Ph.D. thesis, University of Minnesota (1994)
Naishlos, D., Biberstein, M., Ben-David, S., Zaks, A.: Vectorizing for a SIMdD DSP architecture. In: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 2–11 (2003)
Nuzman, D., Namolaru, M., Zaks, A., Derby, J.H.: Compiling for an indirect vector register architecture. In: Proceedings of the 5th Conference on Computing Frontiers, pp. 199–208 (2008)
Nuzman, D., Zaks, A.: Outer-loop vectorization—revisited for short SIMD architectures. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 2–11 (October 2008)
Trifunovic, K., Nuzman, D., Cohen, A., Zaks, A., Rosen, I.: Polyhedral-model guided loop-nest auto-vectorization. In: Parallel Architecture and Compilation Techniques (PACT’09). Raleigh (September 2009)
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 132–143 (June 2006)
Lee, C.G.: UTDSP Benchmarks. http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html (1998)
Gschwind M., Erb D., Manning S., Nutter M.: An open source environment for cell broadband engine system software. IEEE Comput. 40(6), 37–47 (2007)
Google Scholar
Weigand, U.: Porting the GNU tool chain to the cell architecture. In: Proceedings of the GCC Developers’ Summit, pp. 185–198. Ottawa, Canada (June 2005)
Rosen, I., Elliston, B., Eres, R., Modra, A., Nuzman, D., Weigand, U., Zaks, A., Edelsohn, D.: Compiling effectively for cell B.E. with GCC. In: 14th Workshop on Compilers for Parallel Computing (CPC) (January 2009)
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California (March 2004)
ECMA International: Rue du Rhône 114, 1204 Geneva, Switzerland. Common Language Infrastructure (CLI) Partitions I to IV, 4th edn. (June 2006)
Novell: The Mono Project. http://www.mono-project.com
Southern Storm Software, Pty Ltd: DotGNU Project. http://dotgnu.org
Campanoni S., Agosta G., Reghizzi S.C.: A parallel dynamic compiler for CIL bytecode. SIGPLAN Not. 43(4), 11–20 (2008)
Article Google Scholar
Cornero, M., Rohou, E., Ornstein, A., Ladelsky, R.: Report on Back-end Formats. Deliverable D5.3, IST ACOTES Project (December 2007)
Costa, R., Ornstein, A.C., Rohou, E.: CLI back-end in GCC. In: Proceedings of the GCC Developers’ Summit, pp. 111–116 (July 2007)
Svelto, G., Ornstein, A., Rohou, E.: A stack-based internal representation for GCC. In: First International Workshop on GCC Research Opportunities (GROW), in Conjunction with HiPEAC 2009, pp. 37–48 (January 2009)
Bodin, F., Kisuki, T., Knijnenburg, P.M.W., O’Boyle, M.F.P., Rohou, E.: Iterative compilation in a non-linear optimisation space. In: Workshop on Profile and Feedback-Directed Compilation (FDO-1), in conjunction with PACT ’98 (October 1998)
Pham, D., Asano, S., Bolliger, M., Day, M., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation CELL processor. In: Digest of Technical Papers, Solid-State Circuits Conference (ISSCC), Paper 10.2, pp. 184–185 (February 2005)
Flachs, B., Asano, S., Dhong, S., Hotstee, P., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H., Mueller, S., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N.: A streaming processor unit for a CELL processor. In: Digest of Technical Papers, Solid-State Circuits Conference (ISSCC), Paper 7.4, pp. 134–135 (February 2005)
Hoogerbrugge J., Terechko A.: A multithreaded multicore system for embedded media processing. Trans. High-Perform. Embed. Archit. Compil. 4(2), 168–187 (2008)
Google Scholar
Al-Kadi, G., Terechko, A.S.: A hardware task scheduler for embedded video processing. In: Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC), number 5409 in LNCS, pp. 140–152 (2009)

Download references

Author information

Authors and Affiliations

NXP Semiconductors, Eindhoven, The Netherlands
Harm Munk, Zbigniew Chamski, Philippe Dumont & Marc Duranton
IBM Haifa Research Laboratories, Haifa, Israel
Razya Ladelsky, Dorit Nuzman, Ira Rosen, Uzi Shvadron & Ayal Zaks
Alchemy Group, INRIA Saclay and LRI, Paris-Sud 11 University, Paris, France
Cédric Bastoul, Albert Cohen, Philippe Dumont, Mohammed Fellahi, Cupertino Miranda, Louis-Noël Pouchet & Konrad Trifunović
STMicroelectronics, Cornaredo, MI, Italy
Marco Cornero, Andrea Ornstein & Erven Rohou
Universitat Politècnica de Catalunya, Barcelona, Spain
Eduard Ayguadé, Paul Carpenter, Roger Ferrer, Xavier Martorell, Alex Ramírez & David Ródenas
Silicon Hive, Eindhoven, The Netherlands
Menno Lindwer
Centre de Recherche en Informatique, MINES ParisTech, Paris, France
Antoniu Pop
Compiler Performance Engineering, Advanced Micro Devices, Austin, TX, USA
Sebastian Pop

Authors

Harm Munk
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Cédric Bastoul
View author publications
You can also search for this author in PubMed Google Scholar
Paul Carpenter
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew Chamski
View author publications
You can also search for this author in PubMed Google Scholar
Albert Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Marco Cornero
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Dumont
View author publications
You can also search for this author in PubMed Google Scholar
Marc Duranton
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Fellahi
View author publications
You can also search for this author in PubMed Google Scholar
Roger Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Razya Ladelsky
View author publications
You can also search for this author in PubMed Google Scholar
Menno Lindwer
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Cupertino Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Dorit Nuzman
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Ornstein
View author publications
You can also search for this author in PubMed Google Scholar
Antoniu Pop
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Pop
View author publications
You can also search for this author in PubMed Google Scholar
Louis-Noël Pouchet
View author publications
You can also search for this author in PubMed Google Scholar
Alex Ramírez
View author publications
You can also search for this author in PubMed Google Scholar
David Ródenas
View author publications
You can also search for this author in PubMed Google Scholar
Erven Rohou
View author publications
You can also search for this author in PubMed Google Scholar
Ira Rosen
View author publications
You can also search for this author in PubMed Google Scholar
Uzi Shvadron
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Trifunović
View author publications
You can also search for this author in PubMed Google Scholar
Ayal Zaks
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harm Munk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Munk, H., Ayguadé, E., Bastoul, C. et al. ACOTES Project: Advanced Compiler Technologies for Embedded Streaming. Int J Parallel Prog 39, 397–450 (2011). https://doi.org/10.1007/s10766-010-0132-7

Download citation

Received: 08 February 2009
Accepted: 23 February 2010
Published: 20 April 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10766-010-0132-7

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Daedalus: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Accelerating Real-Time Applications with Predictable Work-Stealing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Daedalus: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

Accelerating Real-Time Applications with Predictable Work-Stealing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation