[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs

In-Depth Study of a Computer Vision Application

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The majority of applications, ranging from the low complexity to very multifaceted entities requiring dedicated hardware accelerators, are very well suited for Multiprocessor Systems-on-Chips (MPSoCs). It is critical to understand the general characteristics of a given embedded application: its behavior and its requirements in terms of MPSoC resources. This paper presents a complete method to study the important aspect of memory characteristic of an application. This method spans the theoretical, architecture-independent memory characterization to the quasi optimal static memory allocation of an application on a real shared-memory MPSoCs. The application is modeled as an Synchronous Dataflow (SDF) graph which is used to derive a Memory Exclusion Graph (MEG) essential for the analysis and allocation techniques. Practical considerations, such as cache coherence and memory broadcasting, are extensively treated. Memory footprint optimization is demonstrated using the example of a stereo matching algorithm from the computer vision domain. Experimental results show a reduction of the memory footprint by up to 43 % compared to a state-of-the-art minimization technique, a throughput improvement of 33 % over dynamic allocation, and the introduction of a tradeoff between multicore scheduling flexibility and memory footprint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18

Similar content being viewed by others

References

  1. Arndt, O., Becker, D., Banz, C., Blume, H. (2013). Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In Embedded computer systems: architectures, modeling, and simulation (SAMOS XIII).

  2. Benazouz, M., Marchetti, O., Munier-Kordon, A., Urard, P. (2010). A new approach for minimizing buffer capacities with throughput constraint for embedded system design. In Computer systems and applications (AICCSA), 2010 IEEE/ACS.

  3. Bodin, B., Munier-Kordon, A., de Dinechin, B. (2012). K-periodic schedules for evaluating the maximum throughput of a synchronous dataflow graph. In Embedded computer systems (SAMOS).

  4. Bouchard, M., Angalović, M., Hertz, A. About equivalent interval colorings of weighted graphs. Discrete Appl. Math. doi:10.1016/j.dam.2009.04.015.

  5. Boutellier, J. (2009). Quasi-static scheduling for fine-grained embedded multiprocessing. Ph.D.thesis.

  6. Desnos, K., Pelcat, M., Nezan, J., Aridhi, S. (2012). Memory bounds for the distributed execution of a hierarchical synchronous data-flow graph. In International conference on embedded computer systems (SAMOS).

  7. Desnos, K., Pelcat, M., Nezan, J.F., Aridhi, S. (2013). Pre-and post-scheduling memory allocation strategies on mpsocs. In Electronic system level synthesis conference (ESLsyn).

  8. Desnos, K., & Zhang, J. (2013). Preesm project - stereo matching. svn://svn.code.sf.net/p/preesm/code/trunk/tests/stereo.

  9. El Assad, S., & Noura, H. (2013). Generator of chaotic sequences and corresponding generating system. EP Patent App. EP20,110,720,313. http://www.google.com/patents/EP2553567A1?cl=en.

  10. Electronic Systems Group TU Eindhoven (2013). Sdf for free (sdf3). http://www.es.ele.tue.nl/sdf3/.

  11. Embedded Vision Alliance (2013). Embedded vision alliance. http://www.embedded-vision.com.

  12. Fabri, J. (1979). Automatic storage optimization. Courant Institute of Mathematical Sciences, New York University.

  13. Fischaber, S., Woods, R., McAllister, J. (2007). Soc memory hierarchy derivation from dataflow graphs. In IEEE workshop on signal processing systems (pp. 469–474). doi: 10.1109/SIPS.2007.4387593 10.1109/SIPS.2007.4387593.

  14. Greef, E.D., Catthoor, F., Man, H.D. (1997). Array placement for storage size reduction in embedded multimedia systems. ASAP.

  15. Intel (2013). i7-3610qm processor product page. http://ark.intel.com/products/64899/.

  16. Johnson, D.S. (1973). Near-optimal bin packing algorithms. Ph.D. thesis, Massachusetts Institute of Technology.

  17. Kalray (2013). Many-core processors – dataflow. http://www.kalray.eu/technology/dataflow/.

  18. Lee, E., & Messerschmitt, D. (1987). Synchronous data flow. Proceedings of the IEEE, 75(9), 1235–1245. doi:10.1109/PROC.1987.13876

    Article  Google Scholar 

  19. Lee, E.A., & Parks, T.M. (1995). Dataflow process networks. Proceedings of the IEEE, 83(5), 773–801.

    Article  Google Scholar 

  20. Malamas, E.N., Petrakis, E.G., Zervakis, M., Petit, L., Legat, J.D. (2003). A survey on industrial vision systems, applications and tools. Image and Vision Computing, 21(2), 171–188.

    Article  Google Scholar 

  21. Murthy, P., & Bhattacharyya, S. (2000). Shared memory implementations of synchronous dataflow specifications. In Proceedings of the design, automation and test in Europe conference and exhibition.

  22. Murthy, P.K.,& Bhattacharyya, S.S. (2010). Memory management for synthesis of DSP software. CRC Press.

  23. Östergård, P.R.J. (2001). A new algorithm for the maximum-weight clique problem. Nordic Journal of Computing, 8(4), 424–436.

  24. Parks, T.M. (1995). Bounded scheduling of process networks. Ph.D. thesis, University of California.

  25. Pelcat, M., Aridhi, S., Piat, J., Nezan, J.F. (2012). Physical layer multi-core prototyping: a dataflow-based approach for LTE eNodeB. Springer.

  26. Pelcat, M., Nezan, J.F., Piat, J., Croizer, J., Aridhi, S. (2009). A System-Level architecture model for rapid prototyping of heterogeneous multicore embedded systems. DASIP.

  27. Roy, S. (1999). Stereo without epipolar lines: a maximum-flow formulation. International Journal of Computer Vision, 34(2–3), 147–161.

    Article  Google Scholar 

  28. Sriram, S., & Bhattacharyya, S.S. (2009). Embedded multiprocessors: scheduling and synchronization, 2nd Edn. Boca Raton, FL: CRC Press, Inc.

    Book  Google Scholar 

  29. Stuijk, S., Geilen, M., Basten, T. (2006). Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs. In Proceedings of the 43rd annual design automation conference.

  30. Szeliski, R., & Zabih, R. (2000). An experimental comparison of stereo algorithms:Vision algorithms: theory and practice. In Vision algorithms: theory and practice, pp. 1–19. Springer.

  31. Szymanek, R., & Kuchcinski, K. (2001). A constructive algorithm for memory-aware task assignment and scheduling. In CODES Proceedings.

  32. Texas Instruments. Tms320c6678 product page. http://www.ti.com/product/tms320c6678.

  33. Urban, F., Raulet,M., Nezan, J.F., Déforges, O. (2006). Automatic dsp cache memory management and fast prototyping for multi-processor image applications. In 14th European signal processing conference. Eusipco.

  34. Wagner, D. (2007). Handheld augmented reality. Ph.D.thesis.

  35. Wulf, W.A., & McKee, S.A. (1995). Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, 23(1), 20–24.

    Article  Google Scholar 

  36. Yamaguchi, K., & Masuda, S. (2008). A new exact algorithm for the maximum weight clique problem. In 23rd international conference on circuit/systems, computers and communications (ITC-CSCC’08).

  37. Zhang, J., Nezan, J.F., Pelcat, M., Cousin, J.G. (2013). Real-time gpu-based local stereo matching method. In IEEE conference on design and architectures for signal and image processing (DASIP).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxime Pelcat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Desnos, K., Pelcat, M., Nezan, JF. et al. Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs. J Sign Process Syst 80, 19–37 (2015). https://doi.org/10.1007/s11265-014-0952-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0952-6

Keywords

Navigation