The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

312 Accesses
Explore all metrics

Abstract

Dataflow is a parallel and generic model of computation that is agnostic of the underlying multi/many-core architecture executing it. State-of-the-art frameworks allow fast development of dataflow applications providing memory, communicating, and computing optimizations by design time exploration. However, the frameworks usually do not consider cache memory behavior when generating code. A generally accepted idea is that bigger and multi-level caches improve the performance of applications. This work evaluates such a hypothesis in a broad experiment campaign adopting different multi-core configurations related to the number of cores and cache parameters (size, sharing, controllers). The results show that bigger is not always better, and the foreseen future of more cores and bigger caches do not guarantee software-free better performance for dataflow applications. Additionally, this work investigates the adoption of two memory management strategies for dataflow applications: Copy-on-Write (CoW) and Non-Temporal Memory transfers (NTM). Experimental results addressing state-of-the-art applications show that NTM and CoW can contribute to reduce the execution time to -5.3% and \(-15.8\%\), respectively. CoW, specifically, shows improvements up to -21.8% in energy consumption with -16.8% of average among 22 different cache configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Article 08 January 2021

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Article 15 February 2018

Composite Data Types in Dynamic Dataflow Languages as Copyless Memory Sharing Mechanism

Availability of Data and Material

Not applicable.

Code Availability

Not applicable.

Notes

SIGSEGV is a synchronously-generated signal and is guaranteed to be delivered to the causing POSIX thread [22].

References

Furtunato, A. F. A., Georgiou, K., Eder, K., & Xavier-De-Souza, S. (2020). When parallel speedups hit the memory wall. IEEE Access, 8, 79225–79238. https://doi.org/10.1109/ACCESS.2020.2990418
Article Google Scholar
Pelcat, M., Desnos, K., Heulot, J., Guy, C., Nezan, J., Aridhi, S. (2014). Preesm: A dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In: European Embedded Design in Education and Research Conference (EDERC), pp. 36–40. https://doi.org/10.1109/EDERC.2014.6924354
Carlson, T. E., Heirman, W., & Eeckhout, L. (2011). Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. https://doi.org/10.1145/2063384.2063454
Slingerland, N., & Smith, A. (2001). Cache Performance for Multimedia Applications. In: International Conference on Supercomputing (ICS), ICS ’01, pp. 204–217. ACM, New York. https://doi.org/10.1145/377792.377833
Alves, M. A. Z., Freitas, H. C., & Navaux, P. O. A. (2009). Investigation of shared l2 cache on many-core processors. In: International Conference on Architecture of Computing Systems, pp. 1–10
Garcia, V., Gomez-Luna, J., Grass, T., Rico, A., Ayguade, E., & Pena, A. (2016). Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10. IEEE, New York (2016). https://doi.org/10.1109/IISWC.2016.7581277
Domagala, L., van Amstel, D., & Rastello, F. (2016). Generalized Cache Tiling for Dataflow Programs. In: SIGPLAN/SIGBED, LCTES, pp. 52–61. ACM, New York. https://doi.org/10.1145/2907950.2907960
Maghazeh, A., Chattopadhyay, S., Eles, P., & Peng, Z. (2019). Cache-Aware Kernel Tiling: An Approach for System-Level Performance Optimization of GPU-Based Applications. In: Design, Automation, and Test in Europe (DATE), pp. 570–575. IEEE, Florence. https://doi.org/10.23919/DATE.2019.8714861
Stoutchinin, A., & Benini, L. (2019). Streamdrive: A dynamic dataflow framework for clustered embedded architectures. Journal of Signal Processing System, 91(3–4), 275–301. https://doi.org/10.1007/s11265-018-1351-1
Article Google Scholar
Basilio, B. (2021). Fraguela and Diego Andrade: A software cache autotuning strategy for dataflow computing with upc++ depspawn. Computational and Mathematical Methods 1(1), 1–14. https://doi.org/10.1002/cmm4.1148
Article MathSciNet Google Scholar
Bovet, D. P., & Cesati, M. (2006). Understanding the Linux kernel, 3rd edn., chap. 10, p. 295. O’Reilly
Intel Corporation. (2020). Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes. Intel Corporation
Le, Q. T., Stern, J., & Brenner, S. (2020). Fast memcpy with SPDK and Intel® I/OAT DMA Engine. Retrieved March 15, 2021. https://software.intel.com/content/www/us/en/develop/articles/fast-memcpy-using-spdk-and-ioat-dma-engine.html
Desnos, K., Pelcat, M., Nezan, J. F., & Aridhi, S. (2016). On memory reuse between inputs and outputs of dataflow actors. ACM Transactions on Embedded Computing Systems 15(2). https://doi.org/10.1145/2871744
Kurd, N., Mosalikanti, P., Neidengard, M., Douglas, J., & Kumar, R. (2009). Next generation intel core micro-architecture (nehalem) clocking. IEEE Journal of Solid-State Circuits, 44(4), 1121–1129. https://doi.org/10.1109/JSSC.2009.2014023
Article Google Scholar
Kim, T., Sun, Z., Chen, H., Wang, H., & Tan, S. X. (2017). Energy and lifetime optimizations for dark silicon manycore microprocessor considering both hard and soft errors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(9), 2561–2574. https://doi.org/10.1109/TVLSI.2017.2707401
Rathore, V., Chaturvedi, V., Singh, A., Srikanthan, T., & Shafique, M. (2020). Longevity framework: Leveraging online integrated aging-aware hierarchical mapping and vf-selection for lifetime reliability optimization in manycore processors. IEEE Transactions on Computers pp. 1–1. https://doi.org/10.1109/TC.2020.3006571
PREESM. (2021). PREESM Applications Repository (https://github.com/preesm/preesm-apps).
Hamzah, R., & Ibrahim, H. (2015). Literature Survey on Stereo Vision Disparity Map Algorithms. Journal of Sensors, 16(1), 1–23. https://doi.org/10.1155/2016/8742920
Article Google Scholar
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157 vol.2. https://doi.org/10.1109/ICCV.1999.790410
Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., & Jouppi, N. P. (2009). Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In: International Symposium on Microarchitecture (MICRO), pp. 469–480. IEEE, New York, NY, USA.
IEEE. (2017). IEEE Standard for Information Technology–Portable Operating System Interface (POSIX(R)) Base Specifications, Issue 7. IEEE Std 1003.1-2017 1(1), 1–3951. https://doi.org/10.1109/IEEESTD.2018.8277153

Download references

Funding

This work is supported by the Agence Nationale de la Recherche under Grant No.: ANR-17-CE24-0018 We would like to give special thanks to the PREESM and Sniper communities for actively participating in the development of the tools which offer solid basements to this work.

Author information

Authors and Affiliations

Univ. Bretagne-Sud, UMR CNRS 6285, Lab-STICC, Lorient, France
Alemeh Ghasemi, Marcelo Ruaro, Rodrigo Cataldo & Kevin J. M. Martin
IRL 2010, CROSSING, Adelaide, Australia
Jean-Philippe Diguet

Authors

Alemeh Ghasemi
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Ruaro
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Cataldo
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Philippe Diguet
View author publications
You can also search for this author in PubMed Google Scholar
Kevin J. M. Martin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Ruaro.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghasemi, A., Ruaro, M., Cataldo, R. et al. The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications. J Sign Process Syst 94, 721–738 (2022). https://doi.org/10.1007/s11265-021-01730-7

Download citation

Received: 18 April 2021
Revised: 28 September 2021
Accepted: 30 November 2021
Published: 24 February 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11265-021-01730-7

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Composite Data Types in Dynamic Dataflow Languages as Copyless Memory Sharing Mechanism

Availability of Data and Material

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Composite Data Types in Dynamic Dataflow Languages as Copyless Memory Sharing Mechanism

Availability of Data and Material

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation