Towards improved parallelism through order reduction of accessing data in nD matrices

Behnam Rahnama¹

91 Accesses
Explore all metrics

Abstract

This paper encompasses the presentation of an enhanced approach with the capacity to reduce the time complexity of accessing nodes in m-dimensional matrices from \(O(n^m)\) to \(O(n\log n)\). The accomplishment of this process is attained by the serialization of nD (nD) matrices to single-dimensional arrays followed by the access of nodes accordingly. Linear representation of nD matrix data structure induces a superior parallelism of matrix calculations over dense, parallel core micro-architecture computers, including NVIDIA GPGPU Supercomputing and Intel Xeon Phi processing boards. This approach is feasibly implemented as the core of matrix data representation in Math software such as Matlab, Mathematica and Maple, in IDEs for more optimized code generation and in Parallel Computing Libraries such as CUBLAS and Magma.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

The Optimization and Parallelization of Two-Dimensional Zigzag Scanning on the Matrix

Optimized Hybrid Execution of Dense Matrix-Matrix Multiplication on Clusters of Heterogeneous Multicore and Many-Core Platforms

References

NVidia (2012) Kepler K20\(\times \) application performance technical brief. [online]. http://www.nvidia.com/docs/IO/122874/K20-and-K20X-application-performance-technical-brief.pdf
NVidia CUDA (2013) The NVIDIA CUDA basic linear algebra subroutines. [online]. https://developer.nvidia.com/cublas
Innovative Computing Laboratory (ICL) (2013) Matrix algebra on GPU and multicore architectures. [online]. http://icl.cs.utk.edu/magma/
Church PC et al (2011) Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers. In: Engineering in medicine and biology society. EMBC. IEEE, Boston, pp 924–927
Nath R, Tomov S, Dong T, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication. High performance computing, networking, storage and analysis (SC). IEEE, Seatle, pp 1–10
Google Scholar
Brady JT, Finney DW, Hartung, Ko MA, Mendelsohn NR, Menon JM, Nowlen DR, Howard M (1998) Dynamic memory allocation that enables efficient use of buffer pool memory segments, 5784698
Kitchener DPS, Waterloo SA (2010) System and method for dynamic memory allocation. US 7685396 B2
Yusuf H, Wei-Mei C, Morris C, Bashar MG (2010) Upper bounds for dynamic memory allocation. IEEE Trans Comput 59(4):468–477
Article MathSciNet Google Scholar
Kirk D, Strosnider J (1990) SMART (strategic memory allocation for real-time) cache design using the MIPS R3000. In: Proceeding of the real-time systems symposium. IEEE, Florida, pp 322–330
Yang C-T, Wang K-C, Cheng H-Y, Kuo C-T, Chu W-C (2011) Green power management with dynamic resource allocation for cloud. In: High performance computing and communications (HPCC). IEEE, Canada, pp 726–733
Darte A, Schreiber R, Villard G (2005) Lattice-based memory allocation. IEEE Trans Comput 54(10):1242–1257
Article Google Scholar
Udayakumaran S, Barua R (2003) Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In: CASES ’03 Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems. ACM, California, pp 276–286
Zorn B, Grunwald D (1994) Evaluating models of memory allocation. ACM Trans Model Comput Simul (TOMACS) 4(1):107–131
Article MATH Google Scholar
McKellar AC, Coffman EG (1969) Organizing matrices and matrix operations for paged memory systems. Commun ACM 12(3):153–165
Article MATH Google Scholar
Maged MM (2004) Scalable lock-free dynamic memory allocation. Programming language design and implementation. ACM, Washington, pp 35–46
Google Scholar
Xiaohuang H, Christopher IR, Stephen J, Ian B, Wen-mei H (2010) Xmalloc: a scalable lock-free dynamic memory allocator for many-core machines. Computer and information technology. IEEE, Bradford, pp 1134–1139
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Okan University, Istanbul, Turkey
Behnam Rahnama

Authors

Behnam Rahnama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behnam Rahnama.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahnama, B. Towards improved parallelism through order reduction of accessing data in nD matrices. J Supercomput 70, 977–986 (2014). https://doi.org/10.1007/s11227-014-1271-1

Download citation

Published: 10 September 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11227-014-1271-1

Towards improved parallelism through order reduction of accessing data in nD matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

The Optimization and Parallelization of Two-Dimensional Zigzag Scanning on the Matrix

Optimized Hybrid Execution of Dense Matrix-Matrix Multiplication on Clusters of Heterogeneous Multicore and Many-Core Platforms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards improved parallelism through order reduction of accessing data in nD matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

The Optimization and Parallelization of Two-Dimensional Zigzag Scanning on the Matrix

Optimized Hybrid Execution of Dense Matrix-Matrix Multiplication on Clusters of Heterogeneous Multicore and Many-Core Platforms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation