Abstract
The majority of applications, ranging from the low complexity to very multifaceted entities requiring dedicated hardware accelerators, are very well suited for Multiprocessor Systems-on-Chips (MPSoCs). It is critical to understand the general characteristics of a given embedded application: its behavior and its requirements in terms of MPSoC resources. This paper presents a complete method to study the important aspect of memory characteristic of an application. This method spans the theoretical, architecture-independent memory characterization to the quasi optimal static memory allocation of an application on a real shared-memory MPSoCs. The application is modeled as an Synchronous Dataflow (SDF) graph which is used to derive a Memory Exclusion Graph (MEG) essential for the analysis and allocation techniques. Practical considerations, such as cache coherence and memory broadcasting, are extensively treated. Memory footprint optimization is demonstrated using the example of a stereo matching algorithm from the computer vision domain. Experimental results show a reduction of the memory footprint by up to 43 % compared to a state-of-the-art minimization technique, a throughput improvement of 33 % over dynamic allocation, and the introduction of a tradeoff between multicore scheduling flexibility and memory footprint.
Similar content being viewed by others
References
Arndt, O., Becker, D., Banz, C., Blume, H. (2013). Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In Embedded computer systems: architectures, modeling, and simulation (SAMOS XIII).
Benazouz, M., Marchetti, O., Munier-Kordon, A., Urard, P. (2010). A new approach for minimizing buffer capacities with throughput constraint for embedded system design. In Computer systems and applications (AICCSA), 2010 IEEE/ACS.
Bodin, B., Munier-Kordon, A., de Dinechin, B. (2012). K-periodic schedules for evaluating the maximum throughput of a synchronous dataflow graph. In Embedded computer systems (SAMOS).
Bouchard, M., Angalović, M., Hertz, A. About equivalent interval colorings of weighted graphs. Discrete Appl. Math. doi:10.1016/j.dam.2009.04.015.
Boutellier, J. (2009). Quasi-static scheduling for fine-grained embedded multiprocessing. Ph.D.thesis.
Desnos, K., Pelcat, M., Nezan, J., Aridhi, S. (2012). Memory bounds for the distributed execution of a hierarchical synchronous data-flow graph. In International conference on embedded computer systems (SAMOS).
Desnos, K., Pelcat, M., Nezan, J.F., Aridhi, S. (2013). Pre-and post-scheduling memory allocation strategies on mpsocs. In Electronic system level synthesis conference (ESLsyn).
Desnos, K., & Zhang, J. (2013). Preesm project - stereo matching. svn://svn.code.sf.net/p/preesm/code/trunk/tests/stereo.
El Assad, S., & Noura, H. (2013). Generator of chaotic sequences and corresponding generating system. EP Patent App. EP20,110,720,313. http://www.google.com/patents/EP2553567A1?cl=en.
Electronic Systems Group TU Eindhoven (2013). Sdf for free (sdf3). http://www.es.ele.tue.nl/sdf3/.
Embedded Vision Alliance (2013). Embedded vision alliance. http://www.embedded-vision.com.
Fabri, J. (1979). Automatic storage optimization. Courant Institute of Mathematical Sciences, New York University.
Fischaber, S., Woods, R., McAllister, J. (2007). Soc memory hierarchy derivation from dataflow graphs. In IEEE workshop on signal processing systems (pp. 469–474). doi: 10.1109/SIPS.2007.4387593 10.1109/SIPS.2007.4387593.
Greef, E.D., Catthoor, F., Man, H.D. (1997). Array placement for storage size reduction in embedded multimedia systems. ASAP.
Intel (2013). i7-3610qm processor product page. http://ark.intel.com/products/64899/.
Johnson, D.S. (1973). Near-optimal bin packing algorithms. Ph.D. thesis, Massachusetts Institute of Technology.
Kalray (2013). Many-core processors – dataflow. http://www.kalray.eu/technology/dataflow/.
Lee, E., & Messerschmitt, D. (1987). Synchronous data flow. Proceedings of the IEEE, 75(9), 1235–1245. doi:10.1109/PROC.1987.13876
Lee, E.A., & Parks, T.M. (1995). Dataflow process networks. Proceedings of the IEEE, 83(5), 773–801.
Malamas, E.N., Petrakis, E.G., Zervakis, M., Petit, L., Legat, J.D. (2003). A survey on industrial vision systems, applications and tools. Image and Vision Computing, 21(2), 171–188.
Murthy, P., & Bhattacharyya, S. (2000). Shared memory implementations of synchronous dataflow specifications. In Proceedings of the design, automation and test in Europe conference and exhibition.
Murthy, P.K.,& Bhattacharyya, S.S. (2010). Memory management for synthesis of DSP software. CRC Press.
Östergård, P.R.J. (2001). A new algorithm for the maximum-weight clique problem. Nordic Journal of Computing, 8(4), 424–436.
Parks, T.M. (1995). Bounded scheduling of process networks. Ph.D. thesis, University of California.
Pelcat, M., Aridhi, S., Piat, J., Nezan, J.F. (2012). Physical layer multi-core prototyping: a dataflow-based approach for LTE eNodeB. Springer.
Pelcat, M., Nezan, J.F., Piat, J., Croizer, J., Aridhi, S. (2009). A System-Level architecture model for rapid prototyping of heterogeneous multicore embedded systems. DASIP.
Roy, S. (1999). Stereo without epipolar lines: a maximum-flow formulation. International Journal of Computer Vision, 34(2–3), 147–161.
Sriram, S., & Bhattacharyya, S.S. (2009). Embedded multiprocessors: scheduling and synchronization, 2nd Edn. Boca Raton, FL: CRC Press, Inc.
Stuijk, S., Geilen, M., Basten, T. (2006). Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the 43rd annual design automation conference.
Szeliski, R., & Zabih, R. (2000). An experimental comparison of stereo algorithms:Vision algorithms: theory and practice. In Vision algorithms: theory and practice, pp. 1–19. Springer.
Szymanek, R., & Kuchcinski, K. (2001). A constructive algorithm for memory-aware task assignment and scheduling. In CODES Proceedings.
Texas Instruments. Tms320c6678 product page. http://www.ti.com/product/tms320c6678.
Urban, F., Raulet,M., Nezan, J.F., Déforges, O. (2006). Automatic dsp cache memory management and fast prototyping for multi-processor image applications. In 14th European signal processing conference. Eusipco.
Wagner, D. (2007). Handheld augmented reality. Ph.D.thesis.
Wulf, W.A., & McKee, S.A. (1995). Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, 23(1), 20–24.
Yamaguchi, K., & Masuda, S. (2008). A new exact algorithm for the maximum weight clique problem. In 23rd international conference on circuit/systems, computers and communications (ITC-CSCC’08).
Zhang, J., Nezan, J.F., Pelcat, M., Cousin, J.G. (2013). Real-time gpu-based local stereo matching method. In IEEE conference on design and architectures for signal and image processing (DASIP).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Desnos, K., Pelcat, M., Nezan, JF. et al. Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs. J Sign Process Syst 80, 19–37 (2015). https://doi.org/10.1007/s11265-014-0952-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0952-6