High performance in modern computing platforms requires programs to be parallel, distributed, and run on heterogeneous hardware. However programming such architectures is extremely difficult due to the need to implement the application using multiple programming models and combine them together in ad-hoc ways. High-level programming frameworks based on parallel patterns have recently become a popular solution to raise the level of abstraction and provide implicitly parallel execution on a variety of architectures. Portable performance is often still difficult to achieve however due to the system's inability to optimize programs across data structure abstractions and nested parallelism. In this dissertation, I introduce the Delite Multiloop Language (DMLL), a new intermediate language based on common parallel patterns that captures the necessary semantic knowledge to efficiently target distributed heterogeneous architectures. Combined with a straightforward array-based data structure model, the language semantics naturally capture a set of powerful transformations over nested parallel patterns that restructure computation to enable distribution and optimize for heterogeneous devices. These transformations enable improved single-threaded performance, greater parallel scalability, smaller memory footprints, transparently targeting distributed memory architectures, and automated data movement and distribution. I also present experimental results for a range of applications spanning multiple domains and demonstrate highly efficient execution compared to manually-optimized counterparts in alternative systems.
Index Terms
- Have Abstraction and Eat Performance Too: Optimized Heterogeneous Computing with Parallel Patterns
Recommendations
Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and OptimizationHigh performance in modern computing platforms requires programs to be parallel, distributed, and run on heterogeneous hardware. However programming such architectures is extremely difficult due to the need to implement the application using multiple ...
A high performance parallel DCT with OpenCL on heterogeneous computing environment
A noteworthy thing in desktop PCs is that they can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and data-parallelism with multi-core CPU and many-core GPU. This paper presents a high ...