Abstract
This paper encompasses the presentation of an enhanced approach with the capacity to reduce the time complexity of accessing nodes in m-dimensional matrices from \(O(n^m)\) to \(O(n\log n)\). The accomplishment of this process is attained by the serialization of nD (nD) matrices to single-dimensional arrays followed by the access of nodes accordingly. Linear representation of nD matrix data structure induces a superior parallelism of matrix calculations over dense, parallel core micro-architecture computers, including NVIDIA GPGPU Supercomputing and Intel Xeon Phi processing boards. This approach is feasibly implemented as the core of matrix data representation in Math software such as Matlab, Mathematica and Maple, in IDEs for more optimized code generation and in Parallel Computing Libraries such as CUBLAS and Magma.
Similar content being viewed by others
References
NVidia (2012) Kepler K20\(\times \) application performance technical brief. [online]. http://www.nvidia.com/docs/IO/122874/K20-and-K20X-application-performance-technical-brief.pdf
NVidia CUDA (2013) The NVIDIA CUDA basic linear algebra subroutines. [online]. https://developer.nvidia.com/cublas
Innovative Computing Laboratory (ICL) (2013) Matrix algebra on GPU and multicore architectures. [online]. http://icl.cs.utk.edu/magma/
Church PC et al (2011) Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers. In: Engineering in medicine and biology society. EMBC. IEEE, Boston, pp 924–927
Nath R, Tomov S, Dong T, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication. High performance computing, networking, storage and analysis (SC). IEEE, Seatle, pp 1–10
Brady JT, Finney DW, Hartung, Ko MA, Mendelsohn NR, Menon JM, Nowlen DR, Howard M (1998) Dynamic memory allocation that enables efficient use of buffer pool memory segments, 5784698
Kitchener DPS, Waterloo SA (2010) System and method for dynamic memory allocation. US 7685396 B2
Yusuf H, Wei-Mei C, Morris C, Bashar MG (2010) Upper bounds for dynamic memory allocation. IEEE Trans Comput 59(4):468–477
Kirk D, Strosnider J (1990) SMART (strategic memory allocation for real-time) cache design using the MIPS R3000. In: Proceeding of the real-time systems symposium. IEEE, Florida, pp 322–330
Yang C-T, Wang K-C, Cheng H-Y, Kuo C-T, Chu W-C (2011) Green power management with dynamic resource allocation for cloud. In: High performance computing and communications (HPCC). IEEE, Canada, pp 726–733
Darte A, Schreiber R, Villard G (2005) Lattice-based memory allocation. IEEE Trans Comput 54(10):1242–1257
Udayakumaran S, Barua R (2003) Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In: CASES ’03 Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems. ACM, California, pp 276–286
Zorn B, Grunwald D (1994) Evaluating models of memory allocation. ACM Trans Model Comput Simul (TOMACS) 4(1):107–131
McKellar AC, Coffman EG (1969) Organizing matrices and matrix operations for paged memory systems. Commun ACM 12(3):153–165
Maged MM (2004) Scalable lock-free dynamic memory allocation. Programming language design and implementation. ACM, Washington, pp 35–46
Xiaohuang H, Christopher IR, Stephen J, Ian B, Wen-mei H (2010) Xmalloc: a scalable lock-free dynamic memory allocator for many-core machines. Computer and information technology. IEEE, Bradford, pp 1134–1139
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rahnama, B. Towards improved parallelism through order reduction of accessing data in nD matrices. J Supercomput 70, 977–986 (2014). https://doi.org/10.1007/s11227-014-1271-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1271-1