Article

VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

Authors:

Jeff H. Derby,

Robert K. Montoye,

José MoreiraAuthors Info & Claims

CF '06: Proceedings of the 3rd conference on Computing frontiers

Pages 303 - 312

https://doi.org/10.1145/1128022.1128062

Published: 03 May 2006 Publication History

Get Access

Abstract

There is increasing interest in the use of accelerators in computer systems. Accelerators are processor-attached hardware units that can perform certain functions faster than the conventional general purpose processor. In this paper, we describe the VICTORIA PowerPC architecture, which is based on the iVMX accelerator technology. The iVMX accelerator extends the existing VMX architecture with indirect register addressing. That approach greatly extends the architected space of registers and opens the door for highly optimized vector algorithms that can sustain very high processing rates. The large space of registers is directly controlled by the executing code and offers a sufficiently large storage to hold sizeable intermediate results. This helps reduce the negative effects of limited memory bandwidth and high memory latency. The iVMX accelerator is an example of in-line accelerator; that is, the instructions that drive the accelerator are part of the same stream that drives the main processor. Compared to off-line accelerators, which execute their own instruction stream, in-line accelerators present a much more convenient programming model.

References

[1]

IBM Corporation. PowerPC Microprocessor Family: Vector/SIMD Multimedia Extension Technology Programming Environments Manual. Ver. 2.06, 22 Aug. 2005.

Google Scholar

[2]

Intel Corporation. Intel® Itanium® Architecture Software Developer's Manual, Vol. 2: System Architecture. Oct. 2002.

Google Scholar

[3]

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/.

Google Scholar

[4]

Tyson, G. S., Smelyanskiy, M., and Davidson, E. S. Evaluating the use of register queues in software pipelined loops. IEEE Trans. Comput., vol. 50 no. 8 (Aug. 2001), 769--783.

Digital Library

Google Scholar

[5]

Kiyohara, T., Mahlke, S., Chen, W., and Bringmann, R. Register connection: a new approach to adding registers into instruction set architecture. In Proc. 20th Annual International Symposium on Computer Architecture (ISCA'20) (San Diego, CA, May 16-19, 1993). 247--256.

Digital Library

Google Scholar

[6]

Moreno, J. H., et. al. An innovative low-power high-performance programmable signal processor for digital communications. IBM J. Res. Devel., vol. 47 no. 2/3 (March/May 2003), 299--326.

Digital Library

Google Scholar

[7]

Cascaval, C. G. and Chatterjee, S. System and Method for Encoding and Decoding Architecture Registers. US Pat. Appl. 20050060520, 17 Mar. 2005.

Google Scholar

[8]

Quintana, F., Corbal, J., Espasa, R., Valero, M. Adding a vector unit to a superscalar processor. In Proc. 13th International Conference on Supercomputing (Rhodes, Greece, June 20 - 25, 1999), 1--10.

Digital Library

Google Scholar

[9]

Espasa, R., et. al. Tarantula: a vector extension to the Alpha architecture. In Proc. 29th Annual International Symposium on Computer Architecture (Anchorage, AK, May 25-29, 2002). 281--292.

Digital Library

Google Scholar

[10]

Kozyrakis, C. and Patterson,D. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-35) (Istanbul, Turkey, Nov. 18-22, 2002). 283--293.

Digital Library

Google Scholar

[11]

Jouppi, N. P., Bertoni, J., and Wall, D. W. A unified vector/scalar floating architecture. In Proc. ASPLOS-III: the Third International Conference on Architecture Support for Programming Languages and Operating Systems (April 1989). 134--143.

Digital Library

Google Scholar

[12]

Chang, L., et. al. Stable SRAM cell design for the 32 nm node and beyond. In 2005 Symp. VLSI Tech. Dig. Of Tech. Papers (Kyoto, Japan, Jun. 2005). 128--129.

Crossref

Google Scholar

[13]

Derby, J. H. and Moreno, J. H. A high-performance embedded DSP with novel SIMD features. In Proc. ICASSP'03 (Hong Kong, Apr. 6 - 10, 2003), II - 301--304.

Crossref

Google Scholar

[14]

Naishlos, D., Biberstein, M., Ben-David, S., and Zaks, A. Vectorizing for a SIMdD DSP architecture. In Proc. CASES'03 (San Jose, CA, Oct. 2003).

Digital Library

Google Scholar

Cited By

View all

Ciobanu CGaydadjiev GPilato CSciuto D(2018)The Case for Polymorphic Registers in Dataflow ComputingInternational Journal of Parallel Programming10.1007/s10766-017-0494-146:6(1185-1219)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-017-0494-1
Cui JZhang WHuang FWu L(2015)Hierarchical Adaptive Recovery Algorithm in Mobile ALMFrontiers in Internet Technologies10.1007/978-3-662-46826-5_8(95-105)Online publication date: 18-Apr-2015
https://doi.org/10.1007/978-3-662-46826-5_8
Sreedhar DDerby JMontoye RJohnson C(2014)Matrix-matrix multiplication on a large register file architecture with indirection2014 21st International Conference on High Performance Computing (HiPC)10.1109/HiPC.2014.7116709(1-10)Online publication date: Dec-2014
https://doi.org/10.1109/HiPC.2014.7116709
Show More Cited By

Index Terms

VICTORIA: VMX indirect compute technology oriented towards in-line acceleration
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data

Recommendations

Performance and power evaluation of an in-line accelerator
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

In this paper we evaluate the performance and power of a processor-attached in-line accelerator. The accelerator provides high-performance SIMD computing and power efficiency by means of a very large register file and a set of vector multimedia ...
A high-performance sorting algorithm for multicore single-instruction multiple-data processors

Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both single-instruction multiple-data (SIMD) instructions and thread-level parallelism. In this paper, we propose a new high-...
Multi- and many-core data mining with adaptive sparse grids
CF '11: Proceedings of the 8th ACM International Conference on Computing Frontiers

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CF '06: Proceedings of the 3rd conference on Computing frontiers

May 2006

430 pages

ISBN:1595933026

DOI:10.1145/1128022

General Chairs:
Monica Alderighi
IASF - INAF
,
Valentina Salapura
IBM
,
Program Chair:
Sally A. McKee
Cornell University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CF06

Sponsor:

CF06: Computing Frontiers Conference

May 3 - 5, 2006

Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
341
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ciobanu CGaydadjiev GPilato CSciuto D(2018)The Case for Polymorphic Registers in Dataflow ComputingInternational Journal of Parallel Programming10.1007/s10766-017-0494-146:6(1185-1219)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s10766-017-0494-1
Cui JZhang WHuang FWu L(2015)Hierarchical Adaptive Recovery Algorithm in Mobile ALMFrontiers in Internet Technologies10.1007/978-3-662-46826-5_8(95-105)Online publication date: 18-Apr-2015
https://doi.org/10.1007/978-3-662-46826-5_8
Sreedhar DDerby JMontoye RJohnson C(2014)Matrix-matrix multiplication on a large register file architecture with indirection2014 21st International Conference on High Performance Computing (HiPC)10.1109/HiPC.2014.7116709(1-10)Online publication date: Dec-2014
https://doi.org/10.1109/HiPC.2014.7116709
Raghavan PCatthoor F(2012)Storage Allocation for Streaming-Based Register FileEnergy-Aware Memory Management for Embedded Multimedia Systems10.1201/b11418-6(151-194)Online publication date: 4-Jan-2012
https://doi.org/10.1201/b11418-6
Vega ABose PBuyuktosunoglu ADerby JFranceschini MJohnson CMontoye R(2012)Architectural perspectives of future wireless base stations based on the IBM PowerEN processorProceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2012.6169045(1-10)Online publication date: 25-Feb-2012
https://dl.acm.org/doi/10.1109/HPCA.2012.6169045
Rico ADerby JMontoye RHeil TCher CBose PAmato NFranke HKelly P(2010)Performance and power evaluation of an in-line acceleratorProceedings of the 7th ACM international conference on Computing frontiers10.1145/1787275.1787293(81-82)Online publication date: 17-May-2010
https://dl.acm.org/doi/10.1145/1787275.1787293
Catthoor FRaghavan PLambrechts AJayapala MKritikakou AAbsar JCatthoor FRaghavan PLambrechts AJayapala MKritikakou AAbsar J(2010)An Asymmetrical Register File: The VWRUltra-Low Energy Domain-Specific Instruction-Set Processors10.1007/978-90-481-9528-2_8(199-222)Online publication date: 3-Jul-2010
https://doi.org/10.1007/978-90-481-9528-2_8
Raghavan PCatthoor FRosenstiel WWakabayashi K(2009)SARAProceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis10.1145/1629435.1629442(41-50)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629435.1629442
Nuzman DNamolaru MZaks ADerby JRamirez ABiliardi GGschwind M(2008)Compiling for an indirect vector register architectureProceedings of the 5th conference on Computing frontiers10.1145/1366230.1366266(199-208)Online publication date: 5-May-2008
https://dl.acm.org/doi/10.1145/1366230.1366266

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Performance and power evaluation of an in-line accelerator

A high-performance sorting algorithm for multicore single-instruction multiple-data processors

Multi- and many-core data mining with adaptive sparse grids