[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/320080.320124acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

Optimizations and oracle parallelism with dynamic translation

Published: 16 November 1999 Publication History

Abstract

We describe several optimizations which can be employed in a dynamic binary translation (DBT) system, where low compilation/translation overhead is essential. These optimizations achieve a high degree of ILP, sometimes even surpassing a static compiler employing more sophisticated, and more time-consuming algorithms [9]. We present results in which we employ these optimizations in a dynamic binary translation system capable of computing oracle parallelism.

References

[1]
A.V. Aho, R. Sethi and J.D. Ullman, Compilers- Principles, Techniques, and Tools, Addison-Wesley Publishers, Reading, MA, 1986.
[2]
J.L. Baer and D.P. Bovet, Compilation of Arithmetic Expressions for Parallel Computations, Proceedings of IFIP Congress, North-Holland, Amsterdam, pp. 340-346, 1968.
[3]
R. Brent, The Parallel Evaluation of General Arithmetic Expressions, Journal of the ACM, Vol. 21, No. 2, pp. 201-206, April 1974.
[4]
R. Brent and R. Towle, On the Time Required to Parse an Arithmetic Expression for Parallel Processing, International Conference on Parallel Processing, edited by P.H. Enslow, pp. 254, IEEE, August 1976.
[5]
A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, J. Yates, FX!32-A Profile-Directed Binary Translator, IEEE Micro, Vol. 18, No. 2, pp. 56-64, March 1998.
[6]
J. Cocke and J.T. Schwartz, Programming Languages and Their Compilers: Preliminary Notes, Technical Report, Courant Institute of Mathematical Sciences, New York University, 1970.
[7]
K. Ebcio~lu, Some Design Ideas for a VLIW Architecture for Sequential-Natured Software, In Parallel Processing (Proceedings of IFIP WG 10.3 Working Conference on Parallel Processing), edited by M. Cosnard et al., pp. 3-21, North Holland, 1988.
[8]
K. Ebcio~lu and T. Nakatani, A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture, In Languages and Compilers for Parallel Computing, D. Gelemter, A. Nicolau, and D. Padua (eds.), Research Monographs in Parallel and Distributed Computing, pp. 213-224, MIT Press, 1990.
[9]
K. Ebcio/glu and E. Altman, DAISY: Dynamic Compilation for 100% Architectural Compatibility, Report No. RC 20538, IBM T.J. Watson Research Center, Yorktown Heights, NY, 1996, http: //www. research, ibm. com/vliw/pubs .html
[10]
K. Ebcio~lu and E. Altman, DAISY: Dynamic Compilation for 100% Architectural Compatibility, Proceedings of ISCA- 24, pp. 26-37, Denver, CO, June 1997.
[11]
K. Ebcio~glu, E. Altman, S. Sathaye, and M. Gschwind Execution-basedScheduling for VLIW Architectures, To Appear in Proceedings of Europar-99, Toulouse, France, August/September 1999.
[12]
M. Emami, R. Ghiya, and L.J. Hendren. Context-sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers, Proceedings of SIGPLAN PLDI, pp. 242-256, Orlando, FL, June 1994.
[13]
L.J. Hendren, J. Hummel, and A. Nicolau, Abstractions for Recursive Pointer Data Structures: Improving the Analysis and Transformation of Imperative Programs, Proceedings of SIGPLAN PLDI, pp. 249-260, San Francisco, CA, June I992.
[14]
IBM and Motorola,The PowerPC Microprocessor Family: The Programming Environments Manual for 32-Bit Microprocessors, www. mot.com/SPS/PowerPC/teksupport/teklibrary/manuats/pem32b, pc
[15]
M. S. Lam and R. P. Wilson, Limits of Control Flow on Parallelism, Proceedings of ISCA-19, pp. 46-57, Gold Coast, Australia, May 1992.
[16]
Leslie Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, Vol. 28, No. 9, pp. 690-691, September t 979.
[17]
M.H. Lipasti and J.P. Shen, Exceeding the Datafiow Limit via Value Prediction, Proceedings of Micro-29, Paris, France, December 1996.
[18]
C. May, MIMIC: A Fast System/370 Simulator, Proceedings of SIGPLAN'87 Symposium on Interpreters and Interpretive Techniques, pp. 1-13, St. Paul, MN, June 1987.
[19]
A. Moshovos and G. Sohi, Streamlining Inter-operation Memory Commttnication via Data Dependence Prediction, Proceedings of Micro-30, Research Triangle Park, NC, December 1997.
[20]
M. Moudgill and J. Moreno, Run-time Detection and Recovery from Incorrectly Ordered Memory Operations, Report No. RC 20857, IBM T.J. Watson Research Center, Yorktown Heights, NY, 1997, http: //www. research, ibm. com/vliw/pubs .html
[21]
T. Nakatani and K. Ebcioglu, Combining as a Compilation Technique for VLIW Architectures, Proceedings of Micro-22, pp. 43-57, Dublin, Ireland, August 1989.
[22]
A. Nicolau, Percolation Scheduling: A Parallel Compilation Technique, TR 85-678, Department of Computer Science, Cornell University, 1985.
[23]
A. Nicolau and R. Potasman, Incremental Tree Height Reduction for High Level Synthesis, Proceedings of the 28th ACM/IEEE Design Automation Conference, pp. 770-774, San Francisco, CA, June 1991.
[24]
B.R. Rau and C.D. Glaeser, Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing, Proceedings of Micro- 14, pp. 183-198, 1981.
[25]
G.M. Silberman and K. Ebcio~lu, An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures, IEEE Computer, Vol. 26, No. 6, pp. 39-56, June 1993.
[26]
J.E. Smith, S. Sastry, T. Hell, T. Bezenek, M. Zhong, and V. lyengar, Achieving High Performance via Co-Designed Virtual Machines, http://www.ece.wisc.edu/~es/pitches/vms.ps, November 5, 1998.
[27]
Sun Microsystems, The Java Hotspot Peformance Engine Architecture, http://java.sun.com/products/hotspot/whitepaper.html, April 27, 1999.
[28]
K. B. Theobald, G. R. Gao and L. J. Hendren, On the Limits of Program Parallelism and its Smoothability, Proceedings of Micro-25, pp. l 0-19, Portland, OR, December 1992.
[29]
D.W. Wall, Limits oflnstruction-Level Parallelism, Proceedings of ASPLOS-IV, pp. 176-188, Santa Clara, CA, April 1991.
[30]
E. Witchel and M. Rosenblum, Embra: Fast and Flexible Machine Simulation, Proceedings of ACM SIGMET- RICS'96, pp. 68-79, Philadelphia, PA, May 1996.

Cited By

View all
  • (2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
  • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544148(11-22)Online publication date: 15-Feb-2014
  • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544148(11-22)Online publication date: 15-Feb-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 32: Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
November 1999
299 pages
ISBN:076950437X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 16 November 1999

Check for updates

Qualifiers

  • Article

Conference

MICRO99
Sponsor:

Acceptance Rates

MICRO 32 Paper Acceptance Rate 27 of 131 submissions, 21%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)8
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
  • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544148(11-22)Online publication date: 15-Feb-2014
  • (2014)Just-In-Time Software PipeliningProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2544137.2544148(11-22)Online publication date: 15-Feb-2014
  • (2010)Trace execution automata in dynamic binary translationProceedings of the 2010 international conference on Computer Architecture10.1007/978-3-642-24322-6_10(99-116)Online publication date: 19-Jun-2010
  • (2007)Evaluation of bus based interconnect mechanisms in clustered VLIW architecturesInternational Journal of Parallel Programming10.1007/s10766-007-0045-235:6(507-527)Online publication date: 1-Dec-2007
  • (2000)Understanding the backward slices of performance degrading instructionsACM SIGARCH Computer Architecture News10.1145/342001.33967628:2(172-181)Online publication date: 1-May-2000
  • (2000)Understanding the backward slices of performance degrading instructionsProceedings of the 27th annual international symposium on Computer architecture10.1145/339647.339676(172-181)Online publication date: 10-Jun-2000
  • (2000)Binary translation and architecture convergence issues for IBM system/390Proceedings of the 14th international conference on Supercomputing10.1145/335231.335264(336-347)Online publication date: 8-May-2000

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media