[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Towards improved parallelism through order reduction of accessing data in nD matrices

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper encompasses the presentation of an enhanced approach with the capacity to reduce the time complexity of accessing nodes in m-dimensional matrices from \(O(n^m)\) to \(O(n\log n)\). The accomplishment of this process is attained by the serialization of nD (nD) matrices to single-dimensional arrays followed by the access of nodes accordingly. Linear representation of nD matrix data structure induces a superior parallelism of matrix calculations over dense, parallel core micro-architecture computers, including NVIDIA GPGPU Supercomputing and Intel Xeon Phi processing boards. This approach is feasibly implemented as the core of matrix data representation in Math software such as Matlab, Mathematica and Maple, in IDEs for more optimized code generation and in Parallel Computing Libraries such as CUBLAS and Magma.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. NVidia (2012) Kepler K20\(\times \) application performance technical brief. [online]. http://www.nvidia.com/docs/IO/122874/K20-and-K20X-application-performance-technical-brief.pdf

  2. NVidia CUDA (2013) The NVIDIA CUDA basic linear algebra subroutines. [online]. https://developer.nvidia.com/cublas

  3. Innovative Computing Laboratory (ICL) (2013) Matrix algebra on GPU and multicore architectures. [online]. http://icl.cs.utk.edu/magma/

  4. Church PC et al (2011) Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers. In: Engineering in medicine and biology society. EMBC. IEEE, Boston, pp 924–927

  5. Nath R, Tomov S, Dong T, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication. High performance computing, networking, storage and analysis (SC). IEEE, Seatle, pp 1–10

    Google Scholar 

  6. Brady JT, Finney DW, Hartung, Ko MA, Mendelsohn NR, Menon JM, Nowlen DR, Howard M (1998) Dynamic memory allocation that enables efficient use of buffer pool memory segments, 5784698

  7. Kitchener DPS, Waterloo SA (2010) System and method for dynamic memory allocation. US 7685396 B2

  8. Yusuf H, Wei-Mei C, Morris C, Bashar MG (2010) Upper bounds for dynamic memory allocation. IEEE Trans Comput 59(4):468–477

    Article  MathSciNet  Google Scholar 

  9. Kirk D, Strosnider J (1990) SMART (strategic memory allocation for real-time) cache design using the MIPS R3000. In: Proceeding of the real-time systems symposium. IEEE, Florida, pp 322–330

  10. Yang C-T, Wang K-C, Cheng H-Y, Kuo C-T, Chu W-C (2011) Green power management with dynamic resource allocation for cloud. In: High performance computing and communications (HPCC). IEEE, Canada, pp 726–733

  11. Darte A, Schreiber R, Villard G (2005) Lattice-based memory allocation. IEEE Trans Comput 54(10):1242–1257

    Article  Google Scholar 

  12. Udayakumaran S, Barua R (2003) Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In: CASES ’03 Proceedings of the 2003 international conference on compilers, architecture and synthesis for embedded systems. ACM, California, pp 276–286

  13. Zorn B, Grunwald D (1994) Evaluating models of memory allocation. ACM Trans Model Comput Simul (TOMACS) 4(1):107–131

    Article  MATH  Google Scholar 

  14. McKellar AC, Coffman EG (1969) Organizing matrices and matrix operations for paged memory systems. Commun ACM 12(3):153–165

    Article  MATH  Google Scholar 

  15. Maged MM (2004) Scalable lock-free dynamic memory allocation. Programming language design and implementation. ACM, Washington, pp 35–46

    Google Scholar 

  16. Xiaohuang H, Christopher IR, Stephen J, Ian B, Wen-mei H (2010) Xmalloc: a scalable lock-free dynamic memory allocator for many-core machines. Computer and information technology. IEEE, Bradford, pp 1134–1139

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behnam Rahnama.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahnama, B. Towards improved parallelism through order reduction of accessing data in nD matrices. J Supercomput 70, 977–986 (2014). https://doi.org/10.1007/s11227-014-1271-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1271-1

Keywords

Navigation