[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A general constraint-centric scheduling framework for spatial architectures

Published: 16 June 2013 Publication History

Abstract

Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration of techniques between architectures.
Our goal is to develop a scheduling framework usable for all spatial architectures. To this end, we expresses spatial scheduling as a constraint satisfaction problem using Integer Linear Programming (ILP). We observe that architecture primitives and scheduler responsibilities can be related through five abstractions: placement of computation, routing of data, managing event timing, managing resource utilization, and forming the optimization objectives. We encode these responsibilities as 20 general ILP constraints, which are used to create schedulers for the disparate TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using ILP is implementable, practical, and typically matches or outperforms specialized schedulers.

References

[1]
Trips toolchain, http://www.cs.utexas.edu/ trips/dist/.
[2]
A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools.
[3]
S. Amarasinghe, D. R. Karger, W. Lee, and V. S. Mirrokni. A theoretical and practical approach to instruction scheduling on spatial architectures. Technical report, MIT, 2002.
[4]
S. Amellal and B. Kaminska. Functional synthesis of digital systems with tass. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 13(5):537--552, 1994.
[5]
C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPOPP 1991.
[6]
O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz. Energyperformance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In ISCA 2010.
[7]
S. S. Battacharyya, E. A. Lee, and P. K. Murthy. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996.
[8]
S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, 2011.
[9]
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder, and the TRIPS Team. Scaling to the end of silicon with EDGE architectures. IEEE Computer, 37(7):44--55, 2004.
[10]
N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner. Applicationspecific processing on a general-purpose core via transparent instruction set customization. In MICRO 2004.
[11]
J. Cong, K. Gururaj, G. Han, and W. Jiang. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. Very Large Scale Integr. Syst., 17(9), Sept. 2009.
[12]
K. Coons, X. Chen, S. Kushwaha, K. S. McKinley, and D. Burger. A Spatial Path Scheduling Algorithm for EDGE Architectures. In ASPLOS 2006.
[13]
L. De Carli, Y. Pan, A. Kumar, C. Estan, and K. Sankaralingam. Plug: Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In SIGCOMM 2009.
[14]
L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, 2008.
[15]
A. Deb, J. M. Codina, and A. Gonzales. Softhv: A hw/sw co-designed processor with horizontal and vertical fusion. In International Conference on Computing Frontiers 2011.
[16]
A. E. Eichenberger and E. S. Davidson. Efficient formulation for optimal modulo schedulers. In PLDI 1997.
[17]
J. R. Ellis. Bulldog: a compiler for vliw architectures. PhD thesis, 1985.
[18]
D. W. Engels, J. Feldman, D. R. Karger, and M. Ruhl. Parallel processor scheduling with delay constraints. In SODA 2001.
[19]
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark Silicon and the End of Multicore Scaling. In ISCA 2011.
[20]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In MICRO 2012.
[21]
K. Fan, H. h. Park, M. Kudlur, and S. o. Mahlke. Modulo scheduling for highly customized datapaths to increase hardware reusability. In CGO 2008.
[22]
P. Feautrier. Some efficient solutions to the affine scheduling problem. International Journal of Parallel Programming, 21:313--347, 1992.
[23]
M. Gebhart, B. A. Maher, K. E. Coons, J. Diamond, P. Gratz, M. Marino, N. Ranganathan, B. Robatmili, A. Smith, J. Burrill, S. W. Keckler, D. Burger, and K. S. McKinley. An evaluation of the trips computer system. In ASPLOS 2009.
[24]
G. J. Gordon, S. A. Hong, and M. Dudík. First-order mixed integer linear programming. In UAI 2009.
[25]
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. Dyser: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro, 33(5), 2012.
[26]
V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA 2011.
[27]
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August. Bundled execution of recurring traces for energy-efficient general purpose processing. In MICRO 2011.
[28]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4):6--15, 2011.
[29]
J. N. Hooker. Logic, optimization and constraint programming. INFORMS Journal on Computing, 14:295--321, 2002.
[30]
J. N. Hooker and M. A. Osorio. Mixed logical-linear programming. Discrete Appl. Math., 96-97(1), Oct. 1999.
[31]
Z. Huang, S. Malik, N. Moreano, and G. Araujo. The design of dynamically reconfigurable datapath coprocessors. ACM Trans. Embed. Comput. Syst., 3(2):361--384, May 2004.
[32]
R. Joshi, G. Nelson, and K. Randall. Denali: a goal-directed superoptimizer. In PLDI 2002.
[33]
K. Kailas and A. Agrawala. Cars: A new code generation framework for clustered ilp processors. In HPCA 2001.
[34]
M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In PLDI 2008.
[35]
A. Kumar, L. De Carli, S. J. Kim, M. de Kruijf, K. Sankaralingam, C. Estan, and S. Jha. Design and implementation of the plug architecture for programmable and efficient network lookups. In PACT 2010.
[36]
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-time scheduling of instruction-level parallelism on a raw machine. In ASPLOS 1998.
[37]
M. Mercaldi, S. Swanson, A. Petersen, A. Putnam, A. Schwerin, M. Oskin, and S. J. Eggers. Instruction scheduling for a tiled dataflow architecture. In ASPLOS 2006.
[38]
M. Mercaldi, S. Swanson, A. Petersen, A. Putnam, A. Schwerin, M. Oskin, and S. J. Eggers. Modeling instruction placement on a spatial architecture. In SPAA 2006.
[39]
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, M. Budiu, and S. C. Goldstein. Tartan: Evaluating spatial computation for whole program execution. In ASPLOS 2006.
[40]
R. Nagarajan, S. K. Kushwaha, D. Burger, K. S. McKinley, C. Lin, and S. W. Keckler. Static placement, dynamic issue (spdi) scheduling for edge architectures. In PACT 2004.
[41]
E. Özer, S. Banerjia, and T. M. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. In MICRO 31.
[42]
J. Palsberg and M. Naik. Ilp-based resource-aware compilation, 2004.
[43]
H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim. Edgecentric modulo scheduling for coarse-grained reconfigurable architectures. In PACT 2008.
[44]
W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Supercomputing 1991.
[45]
N. Satish, K. Ravindran, and K. Keutzer. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE 2007.
[46]
S. Swanson, K. Michelson, A. Schwerin, and M. Oskin. Wavescalar. In MICRO 2003.
[47]
M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson- Edefors, and P. Stenstrom. Flexcore: Utilizing exposed datapath control for efficient computing. In IC-SAMOS 2007.
[48]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation cores: reducing the energy of mature computations. In ASPLOS 2010.
[49]
H. M. Wagner. An integer linear-programming model for machine scheduling. Naval Research Logistics Quarterly, 6(2):131--140, 1959.
[50]
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring It All to Software: RAW Machines. Computer, 30(9):86--93, 1997.
[51]
M. Watkins, M. Cianchetti, and D. Albonesi. Shared reconfigurable architectures for cmps. In FPGA 2008.
[52]
L. A. Wolsey and G. L. Nemhauser. Integer and Combinatorial Optimization.

Cited By

View all
  • (2024)Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable ArraysIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339054243:10(3055-3067)Online publication date: 1-Oct-2024
  • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: 1-Apr-2024
  • (2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 48, Issue 6
PLDI '13
June 2013
515 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499370
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2013
    546 pages
    ISBN:9781450320146
    DOI:10.1145/2491956
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2013
Published in SIGPLAN Volume 48, Issue 6

Check for updates

Author Tags

  1. integer linear programming
  2. spatial architecture scheduling
  3. spatial architectures

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)10
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable ArraysIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339054243:10(3055-3067)Online publication date: 1-Oct-2024
  • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: 1-Apr-2024
  • (2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
  • (2023)DFGC: DFG-aware NoC Control based on Time Stamp Prediction for Dataflow Architecture2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00071(432-439)Online publication date: 6-Nov-2023
  • (2022)Exploring the Approaches to Data Flow ComputingComputers, Materials & Continua10.32604/cmc.2022.02062371:2(2333-2346)Online publication date: 2022
  • (2022)CaSMapProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527426(259-273)Online publication date: 18-Jun-2022
  • (2022)Twenty Years of Automated Methods for Mapping Applications on CGRA2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW55747.2022.00118(679-686)Online publication date: May-2022
  • (2022)Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment MechanismJournal of Computer Science and Technology10.1007/s11390-020-0555-637:4(942-959)Online publication date: 1-Jul-2022
  • (2022)Mapping and virtual neuron assignment algorithms for MAERI acceleratorThe Journal of Supercomputing10.1007/s11227-021-03893-378:1(238-257)Online publication date: 1-Jan-2022
  • (2022)Compilation SystemSoftware Defined Chips10.1007/978-981-19-6994-2_4(197-311)Online publication date: 21-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media