[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/998680.1006736acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

The Vector-Thread Architecture

Published: 02 March 2004 Publication History

Abstract

The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute models. The VT abstraction providesthe programmer with a control processor and a vector of virtualprocessors (VPs). The control processor can use vector-fetch commandsto broadcast instructions to all the VPs or each VP can usethread-fetches to direct its own control flow. A seamless intermixingof the vector and threaded control mechanisms allows a VT architectureto flexibly and compactly encode application parallelismand locality, and a VT machine exploits these to improve performanceand efficiency. We present SCALE, an instantiation of theVT architecture designed for low-power and high-performance embeddedsystems. We evaluate the SCALE prototype design usingdetailed simulation of a broad range of embedded applications andshow that its performance is competitive with larger and more complexprocessors.

References

[1]
{1} T.-C. Chiueh. Multi-threaded vectorization. In ISCA-18, May 1991.
[2]
{2} C. R. Jesshope. Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines. Australia Computer Science Communications, 23(4):80-88, 2001.
[3]
{3} K. Kitagawa, S. Tagaya, Y. Hagihara, and Y. Kanoh. A hardware overview of SX-6 and SX-7 supercomputer. NEC Research & Development Journal, 44(1):2-7, Jan 2003.
[4]
{4} C. Kozyrakis. Scalable vector media-processors for embedded systems. PhD thesis, University of California at Berkeley, May 2002.
[5]
{5} C. Kozyrakis and D. Patterson. Overcoming the limitations of conventional vector processors. In ISCA-30, June 2003.
[6]
{6} C. Kozyrakis, S. Perissakis, D. Patterson, T. Anderson, K. Asanovi¿, N. Cardwell, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, R. Thomas, N. Treuhaft, and K. Yelick. Scalable Processors in the Billion-Transistor Era: IRAM. IEEE Computer, 30(9):75-78, Sept 1997.
[7]
{7} K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart Memories: A modular reconfigurable architecture. In Proc. ISCA 27, pages 161-171, June 2000.
[8]
{8} S. Rixner, W. Dally, U. Kapasi, B. Khailany, A. Lopez-Lagunas, P. Mattson, and J. Owens. A bandwidth-efficient architecture for media processing. In MICRO-31, Nov 1998.
[9]
{9} R. M. Russel. The CRAY-1 computer system. Communications of the ACM, 21(1):63-72, Jan 1978.
[10]
{10} K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ISCA-30, June 2003.
[11]
{11} J. E. Smith. Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer, 22(7):21-35, July 1989.
[12]
{12} G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA-22, pages 414-425, June 1995.
[13]
{13} E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9):86-93, Sept 1997.
[14]
{14} J. Wawrzynek, K. Asanovi¿, B. Kingsbury, J. Beck, D. Johnson, and N. Morgan. Spert-II: A vector microprocessor system. IEEE Computer, 29(3):79-86, Mar 1996.
[15]
{15} M. Zhang and K. Asanovi¿. Highly-associative caches for low-power processors. In Kool Chips Workshop, MICRO-33, Dec 2000.

Cited By

View all
  • (2023)Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU CoresProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582046(483-497)Online publication date: 25-Mar-2023
  • (2021)SnafuProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00084(1027-1040)Online publication date: 14-Jun-2021
  • (2019)Towards General Purpose Acceleration by Exploiting Common Data-Dependence FormsProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358276(924-939)Online publication date: 12-Oct-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
    ISCA 2004
    March 2004
    373 pages
    ISSN:0163-5964
    DOI:10.1145/1028176
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 March 2004

Check for updates

Qualifiers

  • Article

Conference

ISCA04
Sponsor:

Acceptance Rates

ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)4
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU CoresProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582046(483-497)Online publication date: 25-Mar-2023
  • (2021)SnafuProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00084(1027-1040)Online publication date: 14-Jun-2021
  • (2019)Towards General Purpose Acceleration by Exploiting Common Data-Dependence FormsProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358276(924-939)Online publication date: 12-Oct-2019
  • (2017)Stream-Dataflow AccelerationACM SIGARCH Computer Architecture News10.1145/3140659.308025545:2(416-429)Online publication date: 24-Jun-2017
  • (2017)Using intra-core loop-task accelerators to improve the productivity and performance of task-based parallel programsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3136952(759-773)Online publication date: 14-Oct-2017
  • (2017)Stream-Dataflow AccelerationProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080255(416-429)Online publication date: 24-Jun-2017
  • (2017)An Integrated Vector-Scalar Design on an In-Order ARM CoreACM Transactions on Architecture and Code Optimization10.1145/307561814:2(1-26)Online publication date: 26-May-2017
  • (2016)FlexVec: auto-vectorization for irregular loopsACM SIGPLAN Notices10.1145/2980983.290811151:6(697-710)Online publication date: 2-Jun-2016
  • (2016)FlexVec: auto-vectorization for irregular loopsProceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2908080.2908111(697-710)Online publication date: 2-Jun-2016
  • (2016)Towards low-power embedded vector processorProceedings of the ACM International Conference on Computing Frontiers10.1145/2903150.2903485(339-342)Online publication date: 16-May-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media