[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1054943.1054962acmotherconferencesArticle/Chapter ViewAbstractPublication PageswmpiConference Proceedingsconference-collections
Article

The Opie compiler from row-major source to Morton-ordered matrices

Published: 20 June 2004 Publication History

Abstract

The Opie Project aims to develop a compiler to transform C codes written for row-major matrix representation into equivalent codes for Morton-order matrix representation, and to apply its techniques to other languages. Accepting a possible reduction in performance we seek to compile a library of usable code to support future development of new algorithms better suited to Morton-ordered matrices.This paper reports the formalism behind the OPIE compiler for C, its status: now compiling several standard Level-2 and Level-3 linear algebra operations, and a demonstration of a breakthrough reflected in a huge reduction of L1, L2, TLB misses. Overall performance improves on the Intel Xeon architecture.

References

[1]
Adamczyk, S., Spicer, J., and Vandevoorde, D. Edison Design Group: Compiler front ends for the OEM market, 2002. Visited Nov. 2002. http://www.edg.com/]]
[2]
Ahmed, N., and Pingali, K. Automatic generation of block-recursive codes. In Euro-Par 2000 Parallel Processing, A. Bode, T. Ludwig, W. Karl, and R. Wismüller, Eds., vol. 1900 of Lecture Notes in Comput. Sci. Springer, Heidelberg, 2000, pp. 368--378. http://springerlink.metapress.com/link.asp7id=1lv21jg7k16mj591]]
[3]
Allen, F. E., Cocke, J., and Kennedy, K. Reduction of operator strength. In Program Flow Analysis: Theory and Applications, S. W. Muchnick and N. D. Jones, Eds. Prentice-Hall, Englewood Cliffs, NJ, 1981, ch. 3.2, pp. 79--101.]]
[4]
Andersen, B. S., Wa&sacutelniewski, J., and Gustavson, F. G. A recursive formulation of Cholesky factorization of a matrix in packed storage. ACM Trans. Math. Softw. 27, 2 (June 2001), 214--244. http://doi.acm.org/10.1145/383738.383741]]
[5]
Bilmes, J., Asanović, K., Chin, C.-W., and Demmel, J. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proc. 1997 Int. Conf. Supercomputing. ACM Press, New York, July 1997, pp. 340--347. http://doi.acm.org/10.1145/263580.263662]]
[6]
Chatterjee, S., Jain, V. V., Lebeck, A. R., Mundhra, S., and Thottethodi, M. Nonlinear array layouts for hierarchical memory systems. In Proc. 13th Int. Conf. Supercomputing. ACM Press, New York, June 1999, pp. 444--453. http://doi.acm.org/10.1145/305138.305231]]
[7]
Chatterjee, S., Lebeck, A. R., Patnala, P. K., and Thottenthodi, M. Recursive array layouts and fast parallel matrix multiplication. IEEE Trans. Parallel Distrib. Syst. 13, 11 (Nov. 2002), 1105--1123. http://www.computer.org/tpds/td2002/1l105abs.htm]]
[8]
Cragon, H. G. A historical note on binary tree. SIGARCH Comput. Archit. News 18, 4 (Dec. 1990), 3.]]
[9]
Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and Von Eicken, T. LogP: a practical model of parallel computation. Commun. ACM 39, 11 (Nov. 1996), 78--85. http://doi.acm.org/10.1145/240455.240477]]
[10]
Elmroth, E., and Gustavson, F. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. Develop. 44, 4 (July 2000), 605--624. http://www.research.ibm.com/journal/rd/444/elmroth.html]]
[11]
Elmroth, E., Gustavson, F., Jonsson, I., and Kgström, B. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46, 1 (Mar. 2004), 3--45. http://epubs.siam.org/sam-bin/dbq/article/42869]]
[12]
Feldman, S. I., Gay, D. M., Maimone, M. W., and Schryer, N. L. Availability of f2c---a FORTRAN to C converter. SIGPLAN Fortran Forum 10, 2 (July 1991), 14--15. http://doi.acm.org/10.1145/122006.122007]]
[13]
Frens, J. D., and Wise, D. S. Auto-blocking matrix multiplication, or tracking BLAS3 performance from source code. Proc. 6th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program., SIGPLAN Not. 32, 7 (July 1997), 206--216. http://doi.acm.org/10.1145/263764.263789]]
[14]
Frens, J. D., and Wise, D. S. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism. Proc. 9th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program, SIGPLAN Not. 38, 10 (Oct. 2003), 144--154. http://doi.acm.org/10.1145/781498.781525]]
[15]
Golub, G. H., and Van Loan, C. F. Matrix Computations, third ed. The Johns Hopkins Univ. Press, Baltimore, 1996.]]
[16]
Goto, K., and Van De Geijn, R. On reducing TLB misses in matrix multiplication. FLAME Working Note 9, Univ. of Texas, Austin, Nov. 2002. http://www.cs.utexas.edu/users/flame/pubs/GOTO.ps.gz]]
[17]
INTEL CORP. Intel Math Kernel Library. Santa Clara, CA, 2003. http://www.intel.com/software/products/mk1/]]
[18]
Morton, G. M. A computer oriented geodetic data base and a new technique in file sequencing. Tech. rep., IBM Ltd., Ottawa, Ontario, Mar. 1966.]]
[19]
Orenstein, J. A., and Merrett, T. H. A class of data structures for associative searching. In Proc. 3rd ACM SIGACT-SIGMOD Symp. on Principles of Database Systems. ACM Press, New York, 1984, pp. 181--190.]]
[20]
Samet, H. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA, 1990, section 2.7.]]
[21]
Schrack, G. Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst. 55, 3 (May 1992), 221--230.]]
[22]
Tocher, K. D. The application of automatic computers to sampling experiments. J. Roy. Statist. Soc. Ser. B 16, 1 (1954), 39--61. See pp. 53--55.]]
[23]
Whaley, R. C., and Dongarra, J. J. Automatically tuned linear algebra software. In Proc. Supercomputing '98. IEEE Computer Soc. Press, Washington, DC, 1998, pp. 1--27. http://portal.acm.org./citation.cfm?id=509096]]
[24]
Wise, D. S. Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In Euro-Par 2000 - Parallel Processing, A. Bode, T. Ludwig, W. Karl, and R. Wismüller, Eds., vol. 1900 of Lecture Notes in Comput. Sci. Springer, Heidelberg, 2000, pp. 774--883. http://www.springerlink.com/link.asp?id=0pc0e9gfk4x9j5fa]]
[25]
Wise, D. S., Citro, C., and Rainey, M. Parallel programming with Morton-ordered matrices. In preparation, 2003.]]
[26]
Wise, D. S., Frens, J. D., Gu, Y., and Alexander, G. A. Language support for Morton-order matrices. Proc. 8th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program., SIGPLAN Not. 36, 7 (July 2001), 24--33. http://doi.acm.org/10.1145/379539.379559]]

Cited By

View all
  • (2024)Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for ArraysProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645034(83-94)Online publication date: 7-May-2024
  • (2017)Tile/line access cache memory based on a multi‐level Z‐order tiling data layoutConcurrency and Computation: Practice and Experience10.1002/cpe.437530:9Online publication date: 17-Dec-2017
  • (2007)Evaluating ISA support and hardware support for recursive data layoutsProceedings of the 14th international conference on High performance computing10.5555/1782174.1782191(95-106)Online publication date: 18-Dec-2007
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
June 2004
146 pages
ISBN:159593040X
DOI:10.1145/1054943
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache
  2. paging
  3. quadtrees
  4. scientific computing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for ArraysProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645034(83-94)Online publication date: 7-May-2024
  • (2017)Tile/line access cache memory based on a multi‐level Z‐order tiling data layoutConcurrency and Computation: Practice and Experience10.1002/cpe.437530:9Online publication date: 17-Dec-2017
  • (2007)Evaluating ISA support and hardware support for recursive data layoutsProceedings of the 14th international conference on High performance computing10.5555/1782174.1782191(95-106)Online publication date: 18-Dec-2007
  • (2007)Analyzing block locality in Morton-order and Morton-hybrid matricesACM SIGARCH Computer Architecture News10.1145/1327312.132731535:4(6-12)Online publication date: 1-Sep-2007
  • (2007)Evaluating ISA Support and Hardware Support for Recursive Data LayoutsHigh Performance Computing – HiPC 200710.1007/978-3-540-77220-0_13(95-106)Online publication date: 2007
  • (2006)Analyzing block locality in Morton-order and Morton-hybrid matricesProceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures10.1145/1166133.1166134(5-12)Online publication date: 16-Sep-2006
  • (2006)Fast additions on masked integersACM SIGPLAN Notices10.1145/1149982.114998741:5(39-45)Online publication date: 1-May-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media