More Web Proxy on the site http://driver.im/

research-article

Open access

Mostly static program partitioning of binary executables

Authors:

Michael FranzAuthors Info & Claims

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 31, Issue 5

Article No.: 17, Pages 1 - 46

https://doi.org/10.1145/1538917.1538918

Published: 03 July 2009 Publication History

Abstract

We have built a runtime compilation system that takes unmodified sequential binaries and improves their performance on off-the-shelf multiprocessors using dynamic vectorization and loop-level parallelization techniques. Our system, Azure, is purely software based and requires no specific hardware support for speculative thread execution, yet it is able to break even in most cases; that is, the achieved speedup exceeds the cost of runtime monitoring and compilation, often by significant amounts.

Key to this remarkable performance is an offline preprocessing step that extracts a mostly correct control flow graph (CFG) from the binary program ahead of time. This statically obtained CFG is incomplete in that it may be missing some edges corresponding to computed branches. We describe how such additional control flow edges are discovered and handled at runtime, so that an incomplete static analysis never leads to an incorrect optimization result.

The availability of a mostly correct CFG enables us to statically partition a binary executable into single-entry multiple-exit regions and to identify potential parallelization candidates ahead of execution. Program regions that are not candidates for parallelization can thereby be excluded completely from runtime monitoring and dynamic recompilation. Azure's extremely low overhead is a direct consequence of this design.

References

[1]

]]Akkary, H. and Driscoll, M. A. 1998. A dynamic multithreading processor. In Proceedings of the 31st Annual International Symposium on Microarchitecture. ACM Press, 226--236.

Digital Library

[2]

]]Bala, V., Duesterwald, E., and Banerjia, S. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 1--12.

Digital Library

[3]

]]Balakrishnan, G. and Reps, T. 2004. Analyzing memory accesses in x86 executables. In Proceedings of the Conference on Compiler Construction. Lecture Notes in Computer Science, vol. 2985, Springer Verlag, 5--23.

[4]

]]Baraz, L., Devor, T., Etzion, O., Goldenberg, S., Skaletsky, A., Wang, Y., and Zemach, Y. 2003. IA-32 Execution Layer: A two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th International Symposium on Micro-architecture. IEEE, 191--201.

Digital Library

[5]

]]Buck, B. and Hollingsworth, J. K. 2000. An API for runtime code patching. Int. J. High Perform. Comput. Applic. 14, 4, 317--329.

Digital Library

[6]

]]Byrd, G. T. and Holliday, M. A. 1995. Multithreaded processor architecture. IEEE Spectrum 32, 8, 38--46.

Digital Library

[7]

]]Carlisle, M. C., Rogers, A., Reppy, J. H., and Hendren, L. J. 1994. Early experiences with Olden. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 1--20.

Digital Library

[8]

]]Chambers, C. 2002. Staged compilation. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation. ACM Press, New York, NY, 1--8.

Digital Library

[9]

]]Chernoff, A., Herdeg, M., Hookway, R., Reeve, C., Rubin, N., Tye, T., Yadavalli, S. B., and Yates, J. 1998. FX&excl;32: A profile-directed binary translator. IEEE Micro 18, 2, 56--64.

Digital Library

[10]

]]Cifuentes, C. and Gough, K. J. 1995. Decompilation of binary programs. Softw. Pract. Exper. 25, 7, 811--829.

Digital Library

[11]

]]Cintra, M. and Llanos, D. R. 2003. Toward efficient and robust software speculative parallelization on multiprocessors. In Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York. 13--24.

Digital Library

[12]

]]de Alba, M. and Kaeli, D. 2001. Runtime predictability of loops. In Proceedings of the 4th Annual IEEE International Workshop on Workload Characterization.

Digital Library

[13]

]]Ebcioğlu, K. and Altman, E. R. 1997. DAISY: Dynamic compilation for 100&percnt; architectural compatibility. In Proceedings of the 24th Annual International Symposium on Computer Architecture. 26--37.

Digital Library

[14]

]]Ebcioğlu, K., Fritts, J., Kosonocky, S., Gschwind, M., Altman, E., Kailas, K., and Bright, T. 1998. An eight-issue tree VLIW processor for dynamic binary translation. In Proceedings of the International Conference on Computer Design.

Digital Library

[15]

]]Fisher, J. A. 1981. Trace scheduling: A technique for global micro-code compaction. IEEE Trans. Comput. 30, 7, 478--490.

Digital Library

[16]

]]Grant, B., Philipose, M., Mock, M., Chambers, C., and Eggers, S. J. 1999. An evaluation of staged run time optimizations in DyC. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, 293--304.

Digital Library

[17]

]]Hammond, L., Hubbert, B. A., Siu, M., Prabhu, M. K., Chen, M., and Olukotun, K. 2000. The Stanford Hydra CMP. IEEE Micro 20, 2, 71--84.

Digital Library

[18]

]]Hwu, W., Mahlke, S., Chen, W., Chang, P., Warter, N., Bringmann, R., Ouellette, R., Hank, R., Kiyohara, T., Haab, G., Holm, J., and Lavery, D. 1993. The superblock: An effective technique for vliw and superscalar compilation. J. Supercomput. 7, 1--2, 229--248.

Digital Library

[19]

]]Kagan, M., Gochman, S., Orenstien, D., and Lin, D. 1997. MMX micro-architecture of Pentium processors with MMX technology and Pentium II micro-processors. Intel Techn. J. 8.

[20]

]]Kistler, T. and Franz, M. 2001. Continuous program optimization: Design and evaluation. IEEE Trans. Comput. 50, 6, 549--566.

Digital Library

[21]

]]Kistler, T. and Franz, M. 2003. Continuous program optimization: A case study. ACM Trans. Program. Lang. Syst. 25, 4, 500--548.

Digital Library

[22]

]]Klaiber, A. 2000. The technology behind Crusoe processors. White Paper, Transmeta Corp.

[23]

]]Krishnan, V. and Torrellas, J. 1998. Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor. In Proceedings of the International Conference on Supercomputing. 85--92.

Digital Library

[24]

]]Krishnan, V. and Torrellas, J. 1999. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput. 48, 9, 866--880.

Digital Library

[25]

]]Krishnan, V. S. 1998. Speculative multithreading architectures. Tech. rep. UIUCDCS-R-98-2048, UIUC.

Digital Library

[26]

]]Larus, J. R. and Ball, T. 1994. Rewriting executable files to measure program behavior. Softw. Pract. Exper. 24, 2, 197--218.

Digital Library

[27]

]]Leung, A. and George, L. 1999. Static single assignment form for machine code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, 204--214.

Digital Library

[28]

]]Lo, J. 1998. Exploiting thread-level parallelism on simultaneous multithreaded processors. Ph.D. thesis, University of Washington.

Digital Library

[29]

]]Nguyen, H. and John, L. K. 1999. Exploiting SIMD parallelism in DSP and multimedia algorithms using the altivec technology. In Proceedings of the International Conference on Supercomputing. 11--20.

Digital Library

[30]

]]Oberman, S., Favor, G., and Weber, F. 1999. AMD 3DNow&excl; technology: Architecture and implementations. IEEE Micro 19, 2 (Mar./Apr.), 37--48.

Digital Library

[31]

]]Olukotun, K., Nayfeh, B. A., Hammond, L., Wilson, K., and Chang, K. 1996. The case for a single-chip multiprocessor. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII). 2--11.

Digital Library

[32]

]]Quinones, C. G., Madriles, C., Sanchez, J., Marcuello, P., Gonzalez, A., and Tullsen, D. M. 2005. Mitosis compiler: An infrastructure for speculative threading based on pre-computation slices. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, New York. 269--279.

Digital Library

[33]

]]Rauchwerger, L. and Padua, D. A. 1999. The LRPD test: Speculative run time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parall. Distrib. Syst. 10, 2, 160--180.

Digital Library

[34]

]]Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. 1997. Trace processors. In Proceedings of the International Symposium on Microarchitecture. 138--148.

Digital Library

[35]

]]Skoglund, J. and Felsberg, M. 2005. Fast image processing using SSE2. In Proceedings of the SSBA Symposium on Image Analysis.

[36]

]]Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. 1998. Multi-scalar processors. In 25 Years of ISCA: Retrospectives and Reprints. 521--532.

Digital Library

[37]

]]Srivastava, A. and Eustace, A. 2004. Atom: A system for building customized program analysis tools. SIGPLAN Not. 39, 4, 528--539.

Digital Library

[38]

]]Thakkar, S. T. and Huff, T. 1999. The Internet Streaming SIMD Extensions. Intel Tech. J., 8.

[39]

]]Tsai, J.-Y., Huang, J., Amlo, C., Lilja, D. J., and Yew, P.-C. 1999. The superthreaded processor architecture. IEEE Trans. Comput. 48, 9, 881--902.

Digital Library

[40]

]]Tsai, J.-Y. and Yew, P.-C. 1996. The superthreaded architecture: Thread pipelining with run time data dependence checking and control speculation. In Proceedings of Parallel Architectures and Compilation Techniques. 35--46.

Digital Library

[41]

]]Tullsen, D. M., Eggers, S., and Levy, H. M. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22th Annual International Symposium on Computer Architecture. 392--403.

Digital Library

[42]

]]Voss, M. J. and Eigenmann, R. 2000. Adapt: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society, Washington, 163.

Digital Library

[43]

]]Voss, M. J. and Eigenmann, R. 2001. High-level adaptive program optimization with ADAPT. ACM SIGPLAN Not. 36, 7, 93--102.

Digital Library

[44]

]]Yardimci, E. 2006. Exploiting parallelism to improve the performance of sequential binary executables. Ph.D. thesis, University of California, Irvine.

Digital Library

[45]

]]Yardimci, E. and Franz, M. 2006. Dynamic parallelization and mapping of binary executables on hierarchical platforms. In Proceedings of the 3rd Conference on Computing Frontiers. ACM Press, New York. 127--138.

Digital Library

[46]

]]Zilles, C. and Sohi, G. 2001. A programmable co-processor for profiling. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture.

Digital Library

Cited By

Gobran Tde Carvalho JBarton C(2023)DASS: Dynamic Adaptive Sub-Target Specialization2023 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW60351.2023.00016(36-45)Online publication date: 17-Oct-2023
https://doi.org/10.1109/SBAC-PADW60351.2023.00016
Altinay ANash JKroes TRajasekaran PZhou DDabrowski AGens DNa YVolckaert SGiuffrida CBos HFranz MBilas AMagoutis KMarkatos EKostic DSeltzer M(2020)BinRecProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387550(1-16)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3342195.3387550
Vavala BNeves NSteenkiste P(2016)Secure Identification of Actively Executed Code on a Generic Trusted Component2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2016.45(419-430)Online publication date: Jun-2016
https://doi.org/10.1109/DSN.2016.45
Show More Cited By

Index Terms

Mostly static program partitioning of binary executables
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Incremental compilers
      2. Runtime environments
    2. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Low overhead dynamic binary translation on ARM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Low overhead dynamic binary translation on ARM
PLDI '17

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
A platform for secure static binary instrumentation
VEE '14: Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Program instrumentation techniques form the basis of many recent software security defenses, including defenses against common exploits and security policy enforcement. As compared to source-code instrumentation, binary instrumentation is easier to use ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems

ACM Transactions on Programming Languages and Systems Volume 31, Issue 5

June 2009

152 pages

ISSN:0164-0925

EISSN:1558-4593

DOI:10.1145/1538917

Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2009

Accepted: 01 November 2008

Revised: 01 December 2007

Received: 01 August 2006

Published in TOPLAS Volume 31, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
725
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)6

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gobran Tde Carvalho JBarton C(2023)DASS: Dynamic Adaptive Sub-Target Specialization2023 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW60351.2023.00016(36-45)Online publication date: 17-Oct-2023
https://doi.org/10.1109/SBAC-PADW60351.2023.00016
Altinay ANash JKroes TRajasekaran PZhou DDabrowski AGens DNa YVolckaert SGiuffrida CBos HFranz MBilas AMagoutis KMarkatos EKostic DSeltzer M(2020)BinRecProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387550(1-16)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3342195.3387550
Vavala BNeves NSteenkiste P(2016)Secure Identification of Actively Executed Code on a Generic Trusted Component2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN.2016.45(419-430)Online publication date: Jun-2016
https://doi.org/10.1109/DSN.2016.45
Nuzman DEres RDyshel SZalmanovici MCastanos J(2013)JIT technology with C/C++ACM Transactions on Architecture and Code Optimization10.1145/2541228.255531510:4(1-25)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1145/2541228.2555315

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents