Abstract
No abstract available.
Cited By
- Mohanty S, Bhasi V, Son M, Kandemir M and Das C FAAStloop: Optimizing Loop-Based Applications for Serverless Computing Proceedings of the 2024 ACM Symposium on Cloud Computing, (943-960)
- Liu S, Zhang Z and Wu W (2023). DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs, IEEE Transactions on Computers, 72:10, (2795-2807), Online publication date: 1-Oct-2023.
- Reber B, Gould M, Kneipp A, Liu F, Prechtl I, Ding C, Chen L and Patru D (2023). Cache Programming for Scientific Loops Using Leases, ACM Transactions on Architecture and Code Optimization, 20:3, (1-25), Online publication date: 30-Sep-2023.
- Gondimalla A, Liu J, Thottethodi M and Vijaykumar T (2023). Occam: Optimal Data Reuse for Convolutional Neural Networks, ACM Transactions on Architecture and Code Optimization, 20:1, (1-25), Online publication date: 31-Mar-2023.
- Saulo H, Balakrishnan N and Vila R (2023). On a quantile autoregressive conditional duration model, Mathematics and Computers in Simulation, 203:C, (425-448), Online publication date: 1-Jan-2023.
- Susungi A and Tadonki C (2021). Intermediate Representations for Explicitly Parallel Programs, ACM Computing Surveys, 54:5, (1-24), Online publication date: 30-Jun-2022.
- Yuan L, Cao H, Zhang Y, Li K, Lu P and Yue Y Temporal vectorization for stencils Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-13)
- Olivry A, Iooss G, Tollenaere N, Rountev A, Sadayappan P and Rastello F IOOpt: automatic derivation of I/O complexity bounds for affine programs Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, (1187-1202)
- Şuşu A (2020). A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad Memory, ACM Transactions on Embedded Computing Systems, 19:6, (1-30), Online publication date: 30-Nov-2020.
- Wu M, Liu Y, Cui H, Wei Q, Li Q, Li L, Lv F, Xue J and Feng X Bandwidth-Aware Loop Tiling for DMA-Supported Scratchpad Memory Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, (97-109)
- Palkowski M and Bielecki W (2022). Parallel tiled cache and energy efficient codes for O ( n 4 ) RNA folding algorithms, Journal of Parallel and Distributed Computing, 137:C, (252-258), Online publication date: 1-Mar-2020.
- Zarei Zefreh E, Lotfi S, Mohammad Khanli L and Karimpour J (2019). Topology and computational-power aware tile mapping of perfectly nested loops with dependencies on distributed systems, Journal of Parallel and Distributed Computing, 129:C, (14-35), Online publication date: 1-Jul-2019.
- Bielecki W and Skotnicki P (2019). Insight into tiles generated by means of a correction technique, The Journal of Supercomputing, 75:5, (2665-2690), Online publication date: 1-May-2019.
- Zhang F and Xue J (2018). Poker, ACM Transactions on Architecture and Code Optimization, 15:4, (1-28), Online publication date: 8-Jan-2019.
- Su X, Liao X, Jiang H, Yang C and Xue J (2018). SCP, ACM Transactions on Architecture and Code Optimization, 15:4, (1-21), Online publication date: 31-Dec-2019.
- Seyfari Y, Lotfi S and Karimpour J (2018). Optimizing inter-nest data locality in imperfect stencils based on loop blocking, The Journal of Supercomputing, 74:10, (5432-5460), Online publication date: 1-Oct-2018.
- Seyfari Y, Lotfi S and Karimpour J (2018). Optimizing inter-nest data locality in imperfect stencils based on loop blocking, The Journal of Supercomputing, 74:10, (5432-5460), Online publication date: 1-Oct-2018.
- Zhao J, Cui H, Zhang Y, Xue J and Feng X Revisiting Loop Tiling for Datacenters Proceedings of the 2018 International Conference on Supercomputing, (328-340)
- Bielecki W, Palkowski M and Skotnicki P (2018). Generation of parallel synchronization-free tiled code, Computing, 100:3, (277-302), Online publication date: 1-Mar-2018.
- Zhang F and Xue J Poker: permutation-based SIMD execution of intensive tree search by path encoding Proceedings of the 2018 International Symposium on Code Generation and Optimization, (87-99)
- Cui Y, Liu S, Zou N and Wu W A Dynamic Parallel Strategy for DOACROSS Loops Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, (108-115)
- Sampaio D, Pouchet L and Rastello F Simplification and runtime resolution of data dependence constraints for loop transformations Proceedings of the International Conference on Supercomputing, (1-11)
- Su X, Liao X and Xue J Automatic generation of fast BLAS3-GEMM: a portable compiler approach Proceedings of the 2017 International Symposium on Code Generation and Optimization, (122-133)
- Vanderbruggen T, Cavazos J, Liao C and Quinlan D Directive-based tile abstraction to distribute loops on accelerators Proceedings of the General Purpose GPUs, (53-62)
- Zhou H and Xue J (2016). A Compiler Approach for Exploiting Partial SIMD Parallelism, ACM Transactions on Architecture and Code Optimization, 13:1, (1-26), Online publication date: 5-Apr-2016.
- Darte A, Isoard A and Yuki T Extended lattice-based memory allocation Proceedings of the 25th International Conference on Compiler Construction, (218-228)
- Pananilath I, Acharya A, Vasista V and Bondhugula U (2015). An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations, ACM Transactions on Architecture and Code Optimization, 12:2, (1-23), Online publication date: 8-Jul-2015.
- Zou Y and Rajopadhye S Automatic Energy Efficient Parallelization of Uniform Dependence Computations Proceedings of the 29th ACM on International Conference on Supercomputing, (373-382)
- Mullapudi R, Vasista V and Bondhugula U (2015). PolyMage, ACM SIGARCH Computer Architecture News, 43:1, (429-443), Online publication date: 29-May-2015.
- Mullapudi R, Vasista V and Bondhugula U (2015). PolyMage, ACM SIGPLAN Notices, 50:4, (429-443), Online publication date: 12-May-2015.
- Mullapudi R, Vasista V and Bondhugula U PolyMage Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, (429-443)
- Gheolbănoiu A, Petrică L and Coţofană S Hybrid adaptive clock management for FPGA processor acceleration Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, (1359-1364)
- Feld D, Soddemann T, Jünger M and Mallach S Hardware-Aware Automatic Code-Transformation to Support Compilers in Exploiting the Multi-Level Parallel Potential of Modern CPUs Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, (1-10)
- Bondhugula U, Bandishti V, Cohen A, Potron G and Vasilache N Tiling and optimizing time-iterated computations on periodic domains Proceedings of the 23rd international conference on Parallel architectures and compilation, (39-50)
- Hannig F, Lari V, Boppu S, Tanase A and Reiche O (2014). Invasive Tightly-Coupled Processor Arrays, ACM Transactions on Embedded Computing Systems, 13:4s, (1-29), Online publication date: 1-Jul-2014.
- Bondhugula U Compiling affine loop nests for distributed-memory parallel architectures Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
- Alias C, Darte A and Plesco A Optimizing remote accesses for offloaded kernels Proceedings of the Conference on Design, Automation and Test in Europe, (575-580)
- Bathen L, Ahn Y, Pasricha S and Dutt N (2013). MultiMaKe, ACM Transactions on Embedded Computing Systems, 12:1s, (1-25), Online publication date: 1-Mar-2013.
- Bao B and Ding C Defensive loop tiling for shared cache Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), (1-11)
- Cui H, Yi Q, Xue J and Feng X (2013). Layout-oblivious compiler optimization for matrix computations, ACM Transactions on Architecture and Code Optimization, 9:4, (1-20), Online publication date: 1-Jan-2013.
- Bandishti V, Pananilath I and Bondhugula U Tiling stencil computations to maximize parallelism Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
- Cui H, Yi Q, Xue J and Feng X Layout-oblivious optimization for matrix computations Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (429-430)
- Alias C, Darte A and Plesco A (2012). Optimizing remote accesses for offloaded kernels, ACM SIGPLAN Notices, 47:8, (285-286), Online publication date: 11-Sep-2012.
- Cui H, Xue J, Wang L, Yang Y, Feng X and Fan D (2012). Extendable pattern-oriented optimization directives, ACM Transactions on Architecture and Code Optimization, 9:3, (1-37), Online publication date: 1-Sep-2012.
- Tadonki C, Lacassagne L, Dadi E and El Daoudi M Accelerator-Based implementation of the harris algorithm Proceedings of the 5th international conference on Image and Signal Processing, (485-492)
- Renganarayanan L, Kim D, Strout M and Rajopadhye S (2012). Parameterized loop tiling, ACM Transactions on Programming Languages and Systems, 34:1, (1-41), Online publication date: 1-Apr-2012.
- Shirako J, Sharma K, Fauzia N, Pouchet L, Ramanujam J, Sadayappan P and Sarkar V Analytical bounds for optimal tile size selection Proceedings of the 21st international conference on Compiler Construction, (101-121)
- Alias C, Darte A and Plesco A Optimizing remote accesses for offloaded kernels Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (285-286)
- Di P and Xue J Model-driven tile size selection for DOACROSS loops on GPUs Proceedings of the 17th international conference on Parallel processing - Volume Part II, (401-412)
- Cui H, Xue J, Wang L, Yang Y, Feng X and Fan D Extendable pattern-oriented optimization directives Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (107-118)
- Alias C, Pasca B and Plesco A Automatic generation of fpga-specific pipelined accelerators Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications, (53-66)
- Li L, Xue J and Knoop J (2011). Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs, ACM Transactions on Embedded Computing Systems, 10:2, (1-42), Online publication date: 1-Dec-2010.
- Corvino R, Gamatié A and Boulet P Architecture exploration for efficient data transfer and storage in data-parallel applications Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, (101-116)
- Smyk A and Tudruj M Streaming model computation of the FDTD problem Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I, (184-192)
- Baskaran M, Hartono A, Tavarageri S, Henretty T, Ramanujam J and Sadayappan P Parameterized tiling revisited Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, (200-209)
- Yuki T, Renganarayanan L, Rajopadhye S, Anderson C, Eichenberger A and O'Brien K Automatic creation of tile size selection models Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, (190-199)
- Renganarayana L, Bondhugula U, Derisavi S, Eichenberger A and O'Brien K Compact multi-dimensional kernel extraction for register tiling Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (1-12)
- Zhang Y, Kandemir M, Pitsianis N and Sun X Exploring parallelization strategies for NUFFT data translation Proceedings of the seventh ACM international conference on Embedded software, (187-196)
- Kim D and Rajopadhye S Efficient tiled loop generation Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing, (293-307)
- Smyk A and Tudruj M Optimization of FDTD computations in a streaming model architecture Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, (547-556)
- Hartono A, Baskaran M, Bastoul C, Cohen A, Krishnamoorthy S, Norris B, Ramanujam J and Sadayappan P Parametric multi-level tiling of imperfectly nested loops Proceedings of the 23rd international conference on Supercomputing, (147-157)
- Renganarayana L and Rajopadhye S Positivity, posynomials and tile size selection Proceedings of the 2008 ACM/IEEE conference on Supercomputing, (1-12)
- Andronikos T, Ciorba F, Theodoropoulos P, Kamenopoulos D and Papakonstantinou G (2008). Cronus, Journal of Systems and Software, 81:8, (1389-1405), Online publication date: 1-Aug-2008.
- Wang L, Yang X, Xue J, Deng Y, Yan X, Tang T and Nguyen Q (2008). Optimizing scientific application loops on stream processors, ACM SIGPLAN Notices, 43:7, (161-170), Online publication date: 27-Jun-2008.
- Wang L, Yang X, Xue J, Deng Y, Yan X, Tang T and Nguyen Q Optimizing scientific application loops on stream processors Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, (161-170)
- Bondhugula U, Hartono A, Ramanujam J and Sadayappan P A practical automatic polyhedral parallelizer and locality optimizer Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, (101-113)
- Bondhugula U, Hartono A, Ramanujam J and Sadayappan P (2008). A practical automatic polyhedral parallelizer and locality optimizer, ACM SIGPLAN Notices, 43:6, (101-113), Online publication date: 30-May-2008.
- Vera X, Lisper B and Xue J (2007). Data cache locking for tight timing calculations, ACM Transactions on Embedded Computing Systems, 7:1, (1-38), Online publication date: 1-Dec-2007.
- Kim D, Renganarayanan L, Rostron D, Rajopadhye S and Strout M Multi-level tiling Proceedings of the 2007 ACM/IEEE conference on Supercomputing, (1-12)
- Bouchebaba Y, Girodias B, Nicolescu G, Aboulhamid E, Lavigueur B and Paulin P (2007). MPSoC memory optimization using program transformation, ACM Transactions on Design Automation of Electronic Systems, 12:4, (43-es), Online publication date: 1-Sep-2007.
- Du J, Yang X, Wang G, Tang T and Zeng K Architecture-based optimization for mapping scientific applications to imagine Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications, (32-43)
- Li L, Wu H, Feng H and Xue J Towards data tiling for whole programs in scratchpad memory allocation Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture, (63-74)
- Renganarayanan L, Kim D, Rajopadhye S and Strout M Parameterized tiled loops for free Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, (405-414)
- Renganarayanan L, Kim D, Rajopadhye S and Strout M (2007). Parameterized tiled loops for free, ACM SIGPLAN Notices, 42:6, (405-414), Online publication date: 10-Jun-2007.
- Yang X, Du J, Yan X and Deng Y Matrix-Based programming optimization for improving memory hierarchy performance on imagine Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications, (782-793)
- Hu Z, del Cuvillo J, Zhu W and Gao G Optimization of dense matrix multiplication on IBM cyclops-64 Proceedings of the 12th international conference on Parallel Processing, (134-144)
- Ciorba F, Andronikos T, Riakiotakis I, Chronopoulos A and Papakonstantinou G Dynamic multi phase scheduling for heterogeneous cluste Proceedings of the 20th international conference on Parallel and distributed processing, (72-72)
- Pan L, Lai M, Dillencourt M and Bic L Mobile pipelines Proceedings of the 12th international conference on High Performance Computing, (201-212)
- Xue J Aggressive loop fusion for improving locality and parallelism Proceedings of the Third international conference on Parallel and Distributed Processing and Applications, (224-238)
- Renganarayana L, Ramakrishna U and Rajopadhye S Combined ILP and register tiling Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (244-258)
- Zhang C and Kurdahi F On combining iteration space tiling with data space tiling for scratch-pad memory systems Proceedings of the 2005 Asia and South Pacific Design Automation Conference, (973-976)
- Renganarayana L and Rajopadhye S A Geometric Programming Framework for Optimal Multi-Level Tiling Proceedings of the 2004 ACM/IEEE conference on Supercomputing
- Xue J and Vera X (2004). Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior, IEEE Transactions on Computers, 53:5, (547-566), Online publication date: 1-May-2004.
Index Terms
- Loop tiling for parallelism
Please enable JavaScript to view thecomments powered by Disqus.
Recommendations
Tiling imperfectly-nested loop nests
SC '00: Proceedings of the 2000 ACM/IEEE conference on SupercomputingTiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop ...
Parameterized loop tiling
Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and ...