More Web Proxy on the site http://driver.im/

research-article

An efficient heuristic for instruction scheduling on clustered vliw processors

Authors:

Jingling XueAuthors Info & Claims

CASES '11: Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems

Pages 35 - 44

https://doi.org/10.1145/2038698.2038707

Published: 09 October 2011 Publication History

Abstract

Clustering is a well-known technique for improving the scalability of classical VLIW processors. A clustered VLIW processor consists of multiple clusters, each of which has its own register file and functional units. This paper presents a novel phase coupled priority-based heuristic for scheduling a set of instructions in a basic block on a clustered VLIW processor. Our heuristic converts the instruction scheduling problem into the problem of scheduling a set of instructions with a common deadline. The priority of each instruction v_i is the l_max(v_i)-successor-tree-consistent deadline which is the upper bound on the latest completion time of v_i in any feasible schedule for a relaxed problem where the precedence-latency constraints between v_i and all its successors, as well as the resource constraints are considered. We have simulated our heuristic, UAS heuristic and Integrated heuristic on the 808 basic blocks taken from the MediaBench II benchmark suite using six processor models. On average, for the six processor models, our heuristic improves 25%, 25%, 33%, 23%, 26%, 27% over UAS heuristic, respectively, and 15%, 16%, 15%, 9%, 20%, 8% over Integrated heuristic, respectively.

References

[1]

John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Elsevier, 114--120, fourth edition, 2006.

Digital Library

[2]

Andrei Terechko, Erwan Le Thenaff, Manish Garg, Jos van Eijndhoven, and Henk Corporaal. Inter-cluster communication models for clustered VLIW processors. In proceedings of Symposium on High Performance Computer Architectures, 2003.

Digital Library

[3]

E. Ozer, S. Banerjia, and T. M. Conte. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual International Symposium on Microarchitecture, 1998.

Digital Library

[4]

Rahul Nagpal and Y. N. Srikant. pragmatic integrated scheduling for clustered vliw architectures. software-practice and experience, 38:227--257, 2008.

Digital Library

[5]

Jeffrey D. Ullman. Complexity of Sequencing Problems. John Wiley and Sons, 1976.

[6]

John R. Ellis. Bulldog: A Compiler for VLIW Architectures. The MIT Press, 1986.

Digital Library

[7]

Saurabh Jang, Steve Carr, Philip Sweany, and Darla Kuras. A code generation framework for VLIW architectures with partitioned register banks. In proceedings of 3rd International Conference on Massively Parallel Computing Systems, 1998.

[8]

Victor S. Lapinskii and Margarida F. Jacome. cluster assignment for high-performace embedded VLIW processors. ACM transactions on design automation of electronic systems, 7(3):430--454, July 2002.

Digital Library

[9]

Rainer Leupers. Instruction scheduling for clustered VLIW DSPs. In proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2000.

Digital Library

[10]

Kailas K, Agrawala A, and Ebcioglu K. Cars: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture, 2001.

Digital Library

[11]

Jesús Sánchez and Antonio Gonzálezor. Instruction scheduling for clustered VLIW architectures. In Proceedings of 13th International Symposium on System Synthesis, 2000.

Digital Library

[12]

Javier Zalamea, Josep Llosa, Eduard Ayguade, and Matoe Valero. Modulo scheduling with integrated register spilling for clustered VLIW architectures. In Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 160--169, 2001.

Digital Library

[13]

Phillip B. Gibbons and Steven S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 1986.

Digital Library

[14]

Josep M. Codina, Jesús Sánchez, and Antonio González. A unified modulo scheduling and register allocation technique for clustered processors. In Proceedings of 2001 International Conference on Parallel Architecture and Compilation Techniques, 2001.

Digital Library

[15]

Yi Qian, Steve Carr, and Philip Sweany. optimizing loop performance for clustered vliw architectures. In Proceedings of 2002 International Conference on Parallel Architecture and Compilation Techniques, 2002.

Digital Library

[16]

Alex Aleta, Josep M. Codina, Jesús Sánchez, Antonio González, and David Kaeli. Agamos: A graph-based approach to modulo scheduling for clustered microarchitectures. IEEE Transactions on Computers, 58(6):770--783, 2009.

Digital Library

[17]

Mediabench ii benchmark. http://euler.slu.edu/~fritts/mediabench/.

[18]

TI tms320c64xx DSPs. http://www.ti.com.

Cited By

Deng CChen ZShi YMa YWen MLuo L(2024)Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic ProgrammingACM Transactions on Design Automation of Electronic Systems10.1145/364313529:5(1-20)Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1145/3643135
Stuckmann FPayá–Vayá G(2024)A Graph Neural Network Approach to Improve List Scheduling Heuristics Under Register-Pressure2024 13th International Conference on Modern Circuits and Systems Technologies (MOCAST)10.1109/MOCAST61810.2024.10615463(01-06)Online publication date: 26-Jun-2024
https://doi.org/10.1109/MOCAST61810.2024.10615463
Su XWu HXue J(2017)An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW ProcessorsACM Transactions on Embedded Computing Systems10.1145/312652416:5s(1-21)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126524
Show More Cited By

Recommendations

Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming
Typical embedded processors, such as Digital Signal Processors (DSPs), usually adopt Very Long Instruction Word (VLIW) architecture to improve computing efficiency. The performance of VLIW processors heavily relies on Instruction-Level Parallelism (ILP). ...
Optimal instruction scheduling using integer programming

This paper presents a new approach to local instruction scheduling based on integer programming that produces optimal instruction schedules in a reasonable time, even for very large basic blocks. The new approach first uses a set of graph ...
Combinatorial Register Allocation and Instruction Scheduling

This article introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '11: Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems

October 2011

250 pages

ISBN:9781450307130

DOI:10.1145/2038698

Program Chairs:
Rajesh Gupta
University of California at San Diego
,
Vincent Mooney
Georgia Tech. & Nanyang Tech. U.

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWeek '11

Sponsor:

ESWeek '11: Seventh Embedded Systems Week

October 9 - 14, 2011

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
169
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deng CChen ZShi YMa YWen MLuo L(2024)Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic ProgrammingACM Transactions on Design Automation of Electronic Systems10.1145/364313529:5(1-20)Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1145/3643135
Stuckmann FPayá–Vayá G(2024)A Graph Neural Network Approach to Improve List Scheduling Heuristics Under Register-Pressure2024 13th International Conference on Modern Circuits and Systems Technologies (MOCAST)10.1109/MOCAST61810.2024.10615463(01-06)Online publication date: 26-Jun-2024
https://doi.org/10.1109/MOCAST61810.2024.10615463
Su XWu HXue J(2017)An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW ProcessorsACM Transactions on Embedded Computing Systems10.1145/312652416:5s(1-21)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126524
He HYang XZhang Y(2017)On Improving Performance and Energy Efficiency for Register-File Connected Clustered VLIW Architectures for Embedded System UsageThe Computer Journal10.1093/comjnl/bxx001Online publication date: 22-Jan-2017
https://doi.org/10.1093/comjnl/bxx001
Zhang XWu HSun HXue JFettweis GNebel W(2014)Lifetime holes aware register allocation for clustered VLIW processorsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616716(1-4)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616716
Porpodas VCintra MRabbah RRaghunathan A(2013)CAeSaRProceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.5555/2555729.2555738(1-10)Online publication date: 29-Sep-2013
https://dl.acm.org/doi/10.5555/2555729.2555738
Tang HYang XWang SZhang Y(2013)Optimizing Instruction Scheduling and Register Allocation for Register‐File‐Connected Clustered VLIW ArchitecturesThe Scientific World Journal10.1155/2013/9130382013:1Online publication date: 18-Jul-2013
https://doi.org/10.1155/2013/913038
Porpodas VCintra M(2013)LUCASACM SIGPLAN Notices10.1145/2499369.246556548:5(45-54)Online publication date: 20-Jun-2013
https://dl.acm.org/doi/10.1145/2499369.2465565
Porpodas VCintra MFranke BXue J(2013)LUCASProceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2491899.2465565(45-54)Online publication date: 20-Jun-2013
https://dl.acm.org/doi/10.1145/2491899.2465565
Porpodas VCintra MFranke BXue J(2013)LUCASProceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2465554.2465565(45-54)Online publication date: 20-Jun-2013
https://dl.acm.org/doi/10.1145/2465554.2465565
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten