[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Published: 16 March 2013 Publication History

Abstract

Transactional memory (TM) has been proposed to alleviate some key programmability problems in chip multiprocessors. Most TMs optimistically allow concurrent transactions, detecting read-write or write-write conflicts. Upon conflicts, existing hardware TMs (HTMs) use one of three conflict-resolution policies: (1) always-abort, (2) always-wait for some conflicting transactions to complete, or (3) always-go past conflicts and resolve acyclic conflicts at commit or abort upon cyclic dependencies. While each policy has advantages, the policies degrade performance under contention by limiting concurrency (always-abort, always-wait) or incurring late aborts due to cyclic dependencies (always-go). Thus, while always-go avoids acyclic aborts, no policy avoids cyclic aborts. We propose Wait-n-GoTM (WnGTM) to increase concurrency while avoiding cyclic aborts. We observe that most cyclic dependencies are caused by threads interleaving multiple accesses to a few heavily-read-write-shared delinquent data cache blocks. These accesses occur in code sections called cycle inducer sections (CISTs). Accordingly, we propose Wait-n-Go (WnG) conflict-resolution to avoid many cyclic aborts by predicting and serializing the CISTs. To support the WnG policy, we extend previous HTMs to (1) allow multiple readers and writers, (2) scalably identify dependencies, and (3) detect cyclic dependencies via new mechanisms, namely, conflict transactional state, order-capture, and hardware timestamps, respectively. In 16-core simulations of STAMP, WnGTM achieves average speedups of 46% for higher-contention benchmarks and 28% for all benchmarks over always-abort (TokenTM) with low-contention benchmarks remaining unchanged, compared to always-go (DATM) and always-wait (LogTM-SE), which perform worse than and 6% better than TokenTM, respectively.

References

[1]
A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 26--37. 2006.
[2]
A. R. Alameldeen and D. A. Wood. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture, page 7. 2003.
[3]
C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded transactional memory. In Proceedings of the Eleventh International Symposium on High-Performance Computer Architecture, pages 316--327. 2005.
[4]
L. Baugh, N. Neelakantam, and C. Zilles. Using hardware memory protection to build a high-performance, strongly-atomic hybrid transactional memory. In Proceedings of the 35th Annual International Symposium on Computer Architecture, pages 115--126, 2008.
[5]
G. Blake, R. G. Dreslinski, and T. Mudge. Proactive transaction scheduling for contention management. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 156--167,2009.
[6]
C. Blundell, J. Devietti, E. C. Lewis, and M. M. K. Martin. Making the fast case common and the uncommon case simple in unbounded transactional memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 24--34, 2007.
[7]
C. Blundell, A. Raghavan, and M. M. Martin. RETCON: transactional repair without replay. In Proceedings of the 37th Annual International Symposium on Computer Architecture, pages 258--269, 2010.
[8]
J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood. TokenTM: Efficient execution of large transactions with hardware transactional memory. In Proceedings of the 35th Annual International Symposium on Computer Architecture, pages 127--138, 2008.
[9]
J. Bobba, K. E. Moore, H. Volos, L. Yen, M. D. Hill, M. M. Swift, and D. A. Wood. Performance pathologies in hardware transactional memory. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 81--91. 2007.
[10]
C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of The IEEE International Symposium on Workload Characterization, 2008.
[11]
W. Chuang, S. Narayanasamy, G. Venkatesh, J. Sampson, M. Van Biesbrouck, G. Pokam, B. Calder, and O. Colavin. Unbounded page-based transactional memory. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 347--358, 2006.
[12]
J. Chung, L. Yen, S. Diestelhorst, M. Pohlack, M. Hohmuth, D. Christie, and D. Grossman. ASF: AMD64 extension for lock-free data structures and transactional memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 39--50, 2010.
[13]
C. Click. Azul's experiences with hardware transactional memory. In Transactional Memory Workshop, 2009.
[14]
Intel Corporation. Intel architecture instruction set extensions programming reference, 319433-012a edition. 2012.
[15]
P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional memory. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 336--346, 2006.
[16]
D. Dice, Y. Lev, M. Moir, and D. Nussbaum. Early experience with a commercial hardware transactional memory implementation. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 157--168, 2009.
[17]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 102--113, 2004.
[18]
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 289--300. 1993.
[19]
O. S. Hofmann, C. J. Rossbach, and E. Witchel. Maximum benefit from a minimal HTM. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 145--156. 2009.
[20]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, Feb. 2002.
[21]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, 2005.
[22]
C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 69--80, 2007.
[23]
K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. LogTM: Log-based transactional memory. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pages 254--265. 2006.
[24]
A. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 181--193, 1997.
[25]
R. Rajwar, M. Herlihy, and K. Lai. Virtualizing transactional memory. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 494--505, 2005.
[26]
H. E. Ramadan, C. J. Rossbach, and E. Witchel. Dependence-aware transactional memory for increased concurrency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pages 246--257, 2008.
[27]
N. Shavit and D. Touitou. Software transactional memory. In Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pages 204--213. 1995.
[28]
A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible decoupled transactional memory support. In Proceedings of the 35th International Symposium on Computer Architecture, pages 139--150. 2008.
[29]
A. Sodani. Race to exascale: Opportunities and challenges. In Keynote at the Annual IEEE/ACM 44th Annual International Symposium on Microarchitecture, 2011.
[30]
G. Voskuilen, F. Ahmad, and T. N. Vijaykumar. Timetraveler: exploiting acyclic races for optimizing memory race recording. In Proceedings of the 37th Annual International Symposium on Computer Architecture, pages 198--209, 2010.
[31]
M. Waliullah and P. Stenstrom. Intermediate checkpointing with conflicting access prediction in transactional memory systems. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing, pages 1--11, 2008.
[32]
A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pages 127--136, 2012.
[33]
L. Yen, J. Bobba, M. M. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, pages 261--272. 2007.

Cited By

View all
  • (2015)Efficient Correction of Anomalies in Snapshot Isolation TransactionsACM Transactions on Architecture and Code Optimization10.1145/269326011:4(1-24)Online publication date: 9-Jan-2015
  • (2024)LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00081(865-875)Online publication date: 27-May-2024
  • (2020)PeacenikProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378485(317-333)Online publication date: 9-Mar-2020
  • Show More Cited By

Index Terms

  1. Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 1
    ASPLOS '13
    March 2013
    540 pages
    ISSN:0163-5964
    DOI:10.1145/2490301
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
      March 2013
      574 pages
      ISBN:9781450318709
      DOI:10.1145/2451116
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 March 2013
    Published in SIGARCH Volume 41, Issue 1

    Check for updates

    Author Tags

    1. cyclic dependencies
    2. serializing transactions
    3. transactional conflicts
    4. transactional memory

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Efficient Correction of Anomalies in Snapshot Isolation TransactionsACM Transactions on Architecture and Code Optimization10.1145/269326011:4(1-24)Online publication date: 9-Jan-2015
    • (2024)LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00081(865-875)Online publication date: 27-May-2024
    • (2020)PeacenikProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378485(317-333)Online publication date: 9-Mar-2020
    • (2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
    • (2019)Multiversioned Page Overlays: Enabling Faster Serializable Hardware Transactional Memory2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00038(395-408)Online publication date: Sep-2019
    • (2018)Harmonizing speculative and non-speculative execution in architectures for ordered parallelismProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00026(217-230)Online publication date: 20-Oct-2018
    • (2017)Accelerating GPU Hardware Transactional Memory with Snapshot IsolationACM SIGARCH Computer Architecture News10.1145/3140659.308020445:2(282-294)Online publication date: 24-Jun-2017
    • (2017)Accelerating GPU Hardware Transactional Memory with Snapshot IsolationProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080204(282-294)Online publication date: 24-Jun-2017
    • (2017)SAM: Optimizing Multithreaded Cores for Speculative Parallelism2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.37(64-78)Online publication date: Sep-2017
    • (2016)Exploiting semantic commutativity in hardware speculationThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195679(1-12)Online publication date: 15-Oct-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media