[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A wait-free queue as fast as fetch-and-add

Published: 27 February 2016 Publication History

Abstract

Concurrent data structures that have fast and predictable performance are of critical importance for harnessing the power of multicore processors, which are now ubiquitous. Although wait-free objects, whose operations complete in a bounded number of steps, were devised more than two decades ago, wait-free objects that can deliver scalable high performance are still rare.
In this paper, we present the first wait-free FIFO queue based on fetch-and-add (FAA). While compare-and-swap (CAS) based non-blocking algorithms may perform poorly due to work wasted by CAS failures, algorithms that coordinate using FAA, which is guaranteed to succeed, can in principle perform better under high contention. Along with FAA, our queue uses a custom epoch-based scheme to reclaim memory; on x86 architectures, it requires no extra memory fences on our algorithm's typical execution path. An empirical study of our new FAA-based wait-free FIFO queue under high contention on four different architectures with many hardware threads shows that it outperforms prior queue designs that lack a wait-free progress guarantee. Surprisingly, at the highest level of contention, the throughput of our queue is often as high as that of a microbenchmark that only performs FAA. As a result, our fast wait-free queue implementation is useful in practice on most multi-core systems today. We believe that our design can serve as an example of how to construct other fast wait-free objects.

Supplementary Material

Supplemental material. (a16-yang.zip)

References

[1]
J. Alemany and E. W. Felten. Performance Issues in Non-blocking Synchronization on Shared-memory Multiprocessors. In Proceedings of the Eleventh Annual ACM Symposium on Principles of Distributed Computing, PODC '92, pages 125--134, New York, NY, USA, 1992. ACM.
[2]
J. Anderson and M. Moir. Universal constructions for large objects. In J.-M. Hlary and M. Raynal, editors, Distributed Algorithms, volume 972 of Lecture Notes in Computer Science, pages 168--182. Springer Berlin Heidelberg, 1995.
[3]
T. A. Brown. Reclaiming Memory for Lock-Free Data Structures: There Has to Be a Better Way. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC '15, pages 261--270, New York, NY, USA, 2015. ACM. 2767436.
[4]
P. Chuong, F. Ellen, and V. Ramachandran. A Universal Construction for Wait-free Transaction Friendly Data Structures. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pages 335--344, New York, NY, USA, 2010. ACM.
[5]
E. W. Dijkstra. Solution of a problem in concurrent programming control. Commun. ACM, 8(9):569--, Sept. 1965. 365617.
[6]
J. Evans. Scalable memory allocation using jemalloc. http://on.fb.me/1KmCoyj, January 2011.
[7]
P. Fatourou and N. D. Kallimanis. A Highly-efficient Wait-free Universal Construction. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 325--334, New York, NY, USA, 2011. ACM. 1989549.
[8]
P. Fatourou and N. D. Kallimanis. Revisiting the Combining Synchronization Technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 257--266, New York, NY, USA, 2012. ACM.
[9]
A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM.
[10]
T. L. Harris. A Pragmatic Implementation of Non-blocking Linked-Lists. In Proceedings of the 15th International Conference on Distributed Computing, DISC '01, pages 300--314, London, UK, UK, 2001. Springer-Verlag.
[11]
M. Herlihy. Wait-free Synchronization. ACM Trans. Program. Lang. Syst., 13(1):124--149, Jan. 1991.
[12]
M. P. Herlihy and J. M. Wing. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst., 12(3): 463--492, July 1990.
[13]
A. Kogan and E. Petrank. Wait-free Queues with Multiple Enqueuers and Dequeuers. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 223--234, New York, NY, USA, 2011. ACM. 1941585.
[14]
A. Kogan and E. Petrank. A Methodology for Creating Fast Wait-free Data Structures. SIGPLAN Not., 47(8):141--150, Feb. 2012.
[15]
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput., 28(9):690--691, Sept. 1979.
[16]
M. M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Trans. Parallel Distrib. Syst., 15(6):491--504, June 2004.
[17]
M. M. Michael and M. L. Scott. Simple, Fast, and Practical Non-blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, PODC '96, pages 267--275, New York, NY, USA, 1996. ACM.
[18]
M. Moir. Laziness pays! Using lazy synchronization mechanisms to improve non-blocking constructions. Distributed Computing, 14(4): 193--204, 2001.
[19]
A. Morrison and Y. Afek. Fast Concurrent Queues for x86 Processors. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 103--112, New York, NY, USA, 2013. ACM.
[20]
TAU Multicore Computing Group. Fast concurrent queues for x86 processors. http://mcg.cs.tau.ac.il/projects/lcrq/.
[21]
S. Timnat and E. Petrank. A Practical Wait-free Simulation for Lock-free Data Structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 357--368, New York, NY, USA, 2014. ACM.

Cited By

View all
  • (2024)Memory Bounds for Concurrent Bounded QueuesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638497(188-199)Online publication date: 2-Mar-2024
  • (2023)CQS: A Formally-Verified Framework for Fair and Abortable SynchronizationProceedings of the ACM on Programming Languages10.1145/35912307:PLDI(244-266)Online publication date: 6-Jun-2023
  • (2023)The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577485(14-26)Online publication date: 25-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    February 2016
    420 pages
    ISBN:9781450340922
    DOI:10.1145/2851141
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016
Published in SIGPLAN Volume 51, Issue 8

Check for updates

Author Tags

  1. fast-path-slow-path
  2. non-blocking queue
  3. wait-free

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)10
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Memory Bounds for Concurrent Bounded QueuesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638497(188-199)Online publication date: 2-Mar-2024
  • (2023)CQS: A Formally-Verified Framework for Fair and Abortable SynchronizationProceedings of the ACM on Programming Languages10.1145/35912307:PLDI(244-266)Online publication date: 6-Jun-2023
  • (2023)The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577485(14-26)Online publication date: 25-Feb-2023
  • (2023)Fast and Scalable Channels in Kotlin CoroutinesProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577481(107-118)Online publication date: 25-Feb-2023
  • (2020)Scaling concurrent queues by using HTM to profit from failed atomic operationsProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374511(89-101)Online publication date: 19-Feb-2020
  • (2020)Restricted memory-friendly lock-free bounded queuesProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374508(433-434)Online publication date: 19-Feb-2020
  • (2020)Entropy Measurement of Concurrent DisorderQuantitative Evaluation of Systems10.1007/978-3-030-59854-9_18(239-257)Online publication date: 3-Nov-2020
  • (2018)Relaxed Schedulers Can Efficiently Parallelize Iterative AlgorithmsProceedings of the 2018 ACM Symposium on Principles of Distributed Computing10.1145/3212734.3212756(377-386)Online publication date: 23-Jul-2018
  • (2024)A Family of Fast and Memory Efficient Lock- and Wait-Free ReclamationProceedings of the ACM on Programming Languages10.1145/36588518:PLDI(2174-2198)Online publication date: 20-Jun-2024
  • (2024)Read/write fence-free work-stealing with multiplicityJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104816186:COnline publication date: 1-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media