[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3079856.3080204acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article
Public Access

Accelerating GPU Hardware Transactional Memory with Snapshot Isolation

Published: 24 June 2017 Publication History

Abstract

Snapshot Isolation (SI) is an established model in the database community, which permits write-read conflicts to pass and aborts transactions only on write-write conflicts. With the Write Skew anomaly correctly eliminated, SI can reduce the occurrence of aborts, save the work done by transactions, and greatly benefit long transactions involving complex data structures.
GPUs are evolving towards a general-purpose computing device with growing support for irregular workloads, including transactional memory. The usage of snapshot isolation on transactional memory has proven to be greatly beneficial for performance. In this paper, we propose a multi-versioned memory subsystem for hardware-based transactional memory on the GPU, with a method for eliminating the Write Skew anomaly on the fly, and finally incorporate Snapshot Isolation with this system.
The results show that snapshot isolation can effectively boost the performance of dynamically sized data structures such as linked lists, binary trees and red-black trees, sometimes by as much as 4.5x, which results in improved overall performance of benchmarks utilizing these data structures.

References

[1]
Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. 1995. A Critique of ANSI SQL Isolation Levels. In Proceedings of the 1995 ACM International Conference on Management of Data (SIGMOD).
[2]
Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2009. Serializable Isolation for Snapshot Databases. In ACM Transactions on Database Systems (TODS), Vol. 34. 1--42.
[3]
George. C. Caragea, Fuat Keceli, Alexadros Tzannes, and Uzi Vishkin. 2010. General-Purpose vs. GPU: Comparison of Many-Cores on Irregular Workloads. In Proceedings of the Second Usenix Workshop on Hot Topics in Parallelism. http://www.usenix.org/event/hotpar10/final
[4]
Sui Chen and Lu Peng. 2016. Efficient GPU Hardware Transactional Memory through Early Conflict Resolution. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 274--284.
[5]
Alan Fekete, Dimitrios Liarokapis, Elizabeth O'Neil, Patrick O'Neil, and Dennis Shasha. 2005. Making Snapshot Isolation Serializable. In ACM Transactions on Database Systems (TODS), Vol. 30. 492--528.
[6]
Wilson W. L. Feng. 2013. GPGPU-Sim 3.2.1. http://www.ece.ubc.ca/~wwlfung/code/kilotm-gpgpu_sim.tgz. (2013). Retrieved on 2015-05-30.
[7]
Michael Ferdman, Pejman Lotfi-Kamran, Ken Balet, and Babak Falsafi. 2011. Cuckoo Directory: A Scalable Directory for Many-Core Systems. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA).
[8]
Wilson W. L. Fung and Tor M. Aamodt. 2013. Energy efficient GPU transactional memory via space-time optimizations. In Proceedings of the 46th International Symposium on Microarchitecture (MICRO).
[9]
Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, and Tor M. Aamodt. 2011. Hardware transactional memory for GPU architectures. In Proceedings of the 44th International Symposium on Microarchitecture(MICRO).
[10]
Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support For Lock-free Data Structures. IEEE Computers Society Press. 289--300 pages.
[11]
Maurice Herlihy, Victor Luchangco, and Mark Moir. 2006. A Flexible Framework for Implementing Software Transactional Memory. In Proceedings of the 21th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA).
[12]
Anup Holey and Antonia Zhai. 2014. Lightweight Software Transactions on GPUs. Proceedings of the 43rd International Conference on Parallel Processing (ICPP) (Sep 2014).
[13]
Kevin Hsieh, Eiman Ebrahim, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 204--216.
[14]
Intel Corporation. 2016. Chapter 8, Intel Transactional Synchronization Extensions. (2016).
[15]
Syed Ali Raza Jafri, Gwendolyn Voskuilen, and T. N. Vijaykumar. 2013. Wait-n-GoTM: Improving HTM Performance by Serializing Cyclic Dependencies. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[16]
David R. Karger. 1993. Global Min-cuts in RNC, and Other Ramifications of a Simple Min-out Algorithm. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '93). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 21--30. http://dl.acm.org/citation.cfm?id=313559.313605
[17]
HP Labs. 2009. CACTI 5.3. http://quid.hpl.hp.com:9081/cacti/. (2009). Retrieved on 2016-07-01.
[18]
Heiner Litz, David Cheriton, Amin Firoozshahian, Omid Azizi, and John P. Stevenson. 2014. SI-TM: Reducing Transactional Memory Abort Rates Through Snapshot Isolation. In Proceedings of the 19th international conference on Architectural Support for programming Languages and Operating Systems (ASPLOS).
[19]
Heiner Litz, Richardo J. Dias, and David R. Cheriton. 2014. Efficient Correction of Anomalies in Snapshot Isolation Transactions. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2014), 1--24.
[20]
Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N. Scherer III, and Michael L. Scott. 2006. Lowering the Overhead of Nonblocking Software Transactional Memory. In Tech Report, Dept. of Computer Science, Univ. of Rochester.
[21]
Chí Cao. Minh, JaeWoong. Chung, Christos Kozyrakis, and Kunle Olukotun. 2008. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC).
[22]
Prabhakar Misra and Mainak Chaudhuri. 2012. Performance Evaluation of Concurrent Lock-free Data Structures on GPUs. 18th International Conference on Parallel and Distributed Systems (ICPADS) (Dec 2012).
[23]
Anurag Negi, Per Stenstrom, Manuel E. Acacio, Rubén Titos-Gil, and José M. Garcia. 2011. π-TM: Pessimistic invalidation for scalable lazy hardware transactional memory. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[24]
Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling Ways and Associativity. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25]
Daniel Sanchez and Christos Kozyrakis. 2012. SCD: A scalable coherence directory with flexible sharer set encoding. In Proceedings of the 18th IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[26]
Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and Energy-Efficient in-DRAM Bulk Data Copy and Initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 186--197.
[27]
Michael F. Spear, Maged M. Michael, and Christoph von Praun. 2008. RingSTM: Scalable Transactions with a Single Atomic Instruction. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, New York, NY, USA, 275--284.
[28]
Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, and Mateo Valero. 2009. EazyHTM: EAger-LaZY hardware Transactional Memory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 145--155.
[29]
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar. 2010. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA).
[30]
Yunlong Xu, Rui Wang, Nilanjan Goswami, Tao Li, Lan Gao, and Depei Qian. 2014. Software Transactional Memory for GPU Architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 49 --52.
[31]
Lihang Zhao and Jeffrey Draper. 2014. Consolidated Conflict Detection for Hardware Transactional Memory. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT). 201--212.

Cited By

View all
  • (2023)Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based SynchronizationProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577474(1-13)Online publication date: 25-Feb-2023
  • (2022)Adaptive Contention Management for Fine-Grained Synchronization on Commodity GPUsACM Transactions on Architecture and Code Optimization10.1145/354730119:4(1-21)Online publication date: 11-Jul-2022
  • (2021)KVCGProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463779(1-12)Online publication date: 14-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
June 2017
736 pages
ISBN:9781450348928
DOI:10.1145/3079856
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. Snapshot Isolation
  3. Transactional Memory

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ISCA '17
Sponsor:

Acceptance Rates

ISCA '17 Paper Acceptance Rate 54 of 322 submissions, 17%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)18
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based SynchronizationProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577474(1-13)Online publication date: 25-Feb-2023
  • (2022)Adaptive Contention Management for Fine-Grained Synchronization on Commodity GPUsACM Transactions on Architecture and Code Optimization10.1145/354730119:4(1-21)Online publication date: 11-Jul-2022
  • (2021)KVCGProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463779(1-12)Online publication date: 14-Jun-2021
  • (2020)Architectural Support for NVRAM Persistence in GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.296023331:5(1107-1120)Online publication date: 1-May-2020
  • (2020)Don't forget about synchronization! Guidelines for using locks on graphics processing unitsConcurrency and Computation: Practice and Experience10.1002/cpe.575734:2Online publication date: 13-Apr-2020
  • (2019)FPGA-Accelerated Optimistic Concurrency Control for Transactional MemoryProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358270(911-923)Online publication date: 12-Oct-2019
  • (2019)Efficient GPU NVRAM Persistence with Helper WarpsProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317810(1-6)Online publication date: 2-Jun-2019
  • (2019)Don't Forget About Synchronization!Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3303084.3309488(11-20)Online publication date: 17-Feb-2019
  • (2019)Fast Fine-Grained Global Synchronization on GPUsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304055(793-806)Online publication date: 4-Apr-2019
  • (2019)Stretching the capacity of hardware transactional memory in IBM POWER architecturesProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295714(107-119)Online publication date: 16-Feb-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media