More Web Proxy on the site http://driver.im/

article

Free access

Counting networks

Authors:

Maurice Herlihy,

Nir ShavitAuthors Info & Claims

Journal of the ACM (JACM), Volume 41, Issue 5

Pages 1020 - 1048

https://doi.org/10.1145/185675.185815

Published: 01 September 1994 Publication History

Abstract

Many fundamental multi-processor coordination problems can be expressed as counting problems: Processes must cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention.

Motivated by observations on the behavior of sorting networks, we offer a new approach to solving such problems, by introducing counting networks, a new class of networks that can be used to count. We give two counting network constructions, one of depth log n(1 + log n)/2 using n log (1 + log n)/4 “gates,” and a second of depth log² n using n log² n/2 gates. These networks avoid the sequential bottlenecks inherent to earlier solutions and substantially lower the memory contention.

Finally, to show that counting networks are not merely mathematical creatures, we provide experimental evidence that they outperform conventional synchronization techniques under a variety of circumstances.

References

[1]

~AGARWAL, A, AND CHERIAN, M 1989. Adaptive backoff synchronization techniques. In Pro- ~c eedblgS of tt~e 16th Symposium ol~ Comp,terArchitecture (June). IEEE Computer Society Press, ~Los Alamitos, Calil., pp. 396-406.

[2]

~AGARWAL, A., CHAIKEN, D., D'SOUZA, G., JOHNSON, K., KRANZ, D., KUBIATOWICZ, J., KURIHARA, ~K., Elm, B.-H., M^A, G., NUSSBAUM, D., PARKIN, M., AND YOUNG, D. 1991. The MIT alewife ~machine: A large-scale distributed-memory multiprocessor. In Proceedings of Workshop on ~Scalable Shared Memoly Muhiprocessors. Kluwer Academic Publishers. (An extended version of ~this paper has been submitted for publication, and appears as MIT/LCS Memo TM-454, 1991.)

[3]

~AHARONSON, E., AND ATTIYA, H. 1992. Counting networks with arbitrary fan-out. In Proceed- ~ings of the 3rd Symposium on Discrete Algorithms (Orlando, Fla., Jan. 27-29). ACM-SIAM, New ~York, pp. 104-113.

[4]

~AJTAI, M., KOMLOS, J., AND SZEMERI2DI, E. 1983. An O(n log n) sorting network. In Proceed- ~trigs of the 15th ACM Symposium on the Theory of Computing. (Boston, Mass., Apr. 25-27). ~ACM, New York, pp. 1-9.

[5]

~ANDERSON, T.E. 1989. The performance implications of spin-waiting alternatives for shared- ~memory multiprocessors. Tech. Rep. 89-04-03. Univ. Washington, Seattle, Wash.

[6]

~ASPNES, J., HERLIHY, M. P., AND SHAVIT, N. 1991. Counting networks and multi-processor ~coordination. In Proceedings of the 23rd Annual Symposium on Theory of Compuung, New ~Orleans, La., May 6-8). ACM, New York, pp. 348-358.

[7]

~BATCttER, K.E. 1968. Sorting networks and their applications. In Proceedings of AF1PS Joint ~Computer Conference 32, 338-334.

[8]

~CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. MIT ~Press, Cambridge, Mass.

[9]

~DOWD, M., PERL, Y., RUDOLPH, L., AND SAKS, M. 1989. The periodic balanced sorting network. ~J. ACM 36, 4 (Oct), 738-757.

[10]

~ELLIS, C. S., AND OLSON, T.J. 1988. Algorithms for parallel memory allocation. J. Parallel ~Progr. 17, 4 (Aug.) 303 345.

[11]

~FELq'ON, E. W., L^MARC^, A., AND LADNER, R. 1993. Building counting networks from larger ~balancers. Tech. Rep. 93-(/4-09. Univ. Washington, Seattle, Wash.

[12]

~FREUDENTHAL, E., AND GOTTL1EB, A. 1991. Process coordination with fetch-and-increment. In ~Proceedings of the 4th International Conference on Architecture Support Jor Progl:amming Lan- ~gttages and Opovtting Systenzs, (Santa Clara, Calif., Apr.).

[13]

~GAWHCK, D. 1985. Processing "hot spots" in high performance systems. In Proceedings of ~COMPCON'~S5. IEEE, Los Alamitos, Cahf., pp. 249-251.

[14]

~GOODM~N, J., VERNON, M., AND WOEST, P. 1989. A set of efficient synchronization primitives ~for a large-scale shared-memory multiprocessor. In Proceedings o)~ the 3rd International Confer- ~ence on Architectural Suppot;t Jor Programming Languages and OI)eratmg Systems (Apr.). ACM, ~New York, pp. 64 77.

[15]

~GOTTLIEB, A., GRISHMAN, R., KRUSK^L, C. P., MCAULIFFE, K. P., RUDOLPH, L., AND SNIR, M. ~1984. The NYU ultracomputer Dcsignmg an mimd parallel computer. IEEE Trans. Conlpttt- ~ers C-32, 2 (Fcb.), 175-189.

[16]

~GOTTI.IEB, A., LUBACHEVSKS, B. D., AND RUDLOPH, L. 1983. Basic techniques for the efficient ~coordination of very large numbers of cooperating sequential processors. ACM Trans. Prog. ~Lang. &'st. 5, 2 (Apr.), 164-189.

[17]

~HARDAVEI,LAS, N., KARAKOS, D., AND MAYRONICOL^S, M. 1993. Notes on sorting and counting ~nctworks. In Proceedings of WDAG'93, to appear.

[18]

~HENSGEN, D., FINKEL, R., AND MANGER, U. 1988. Two algorithms for barrier synchronization. ~hit. J. Para. Prog. 17, 1, 1117.

[19]

~HERLIHY, M. P., LIM, B. H. AND SHAVIT, N. 1992. Low contention load balancing on largc-scale ~multiproccssors. In Proceedblgs of the 4th Annual ACM Symposium on Parallel Algorithms and ~,4rclntectures, (San Diego, Calif., June 29-July 11. ACM, New York, pp. 219 222.

[20]

~HERLIHY, M. P., SHAVIT, N., AND WAARTS, O. 1991. Low-contention linearizable counting. In ~Proceedings' of the 32th 1EEE Symposium on Foundations of Conlpztler Science (Oct.) IEEE, New ~York, pp. 526 535.

[21]

~KR^NZ, D., HALSTEAD, R., AND MOHR, E. 1989. MuI-T, A high-performance parallel L~sp. In ~Proceedtngs of the ACM SIGPLAN '89 Conference on Programming Language Design and ~hnplemeutatlon, (Portland, Ore., June 21-231. ACM, New York, pp. 81-90.

[22]

~KRUSKAL, C. P., RUDOLPH, L., AND SNIR, U. 1986. Efficient synchronization on multiproccssors ~with shared memory. In FroceedDtgx of the 5{h AC3{ ${OACT-$IOOPS S)'mpoMum on Frmciples ~oJ Dtstnbnted Computing, ACM. New York, pp. 218-228.

[23]

~KLUGERMAN, M. AND PLAXTON, C.G. 1992. Small-depth counting networks. In Proceedings of ~the 24th Annual Symposium on the Theory of Computing. (Victoria, B.C., Canada, May 4-6). ~ACM, New York, pp. 417 428.

[24]

~LYNCH, N. A., aND TU'fTLIS, M. R. 1987. Hierarchical correctness proofs for distributed ~algorithms. In Proceedbhqs of the 6th AC3,1 Symposutm on Principles of Dtstrtbuted Computing ~(Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 137-151. (Full version available ~as MIT Tech. Rep. MIT/LCS/TR-387.)

[25]

~MULLOR-CRUMMEY, J. M., ~ND SCOTT, M.L. 1990. Algorithms for scalable synchronization on ~shared-memory multlprocessors. Tech. Rep. 342. Umv. Rochester, Rochester, N.Y. (Apr.).

[26]

~RUDOLPIL L. 1983. Decentrahzed cache scheme for an MIMD parallel processor, In Proceed- ~m~~ of the Ilth Annual Computing Architecture Conference. pp. 340-347.

[27]

~MELLOR-CRUMMEY, J. M., AND SCOTI', g.L. 1991. Synchronization without contention. In ~Prc)ceedztzg, s' of the 4th lnternatzonal Conference on Architecture Support for Programming Lan- ~guages attd Operating Systems (Santa Clara, Calif., Apr.) ACM, New York, pp. 269 278.

[28]

~PELEG, D., AND UPFAL, E. 1986. The token distribution problem. In Proeeedmgs of the 27th ~IEEE Sympostum on Foundaaons of Computer Sctence (Oct.). IEEE, New York.

[29]

~PF1STER, G. H., ET AL. 1985. The IBM research parallel processor prototype (RP3): Introduc-tion and architecture. In Proceedings of the International Conference on Parallel Processing.

[30]

~PFIS'rER, G. H., AND NORTON, A. 1985. 'Hot spot' contention and combining m multistage ~mterconnectlon networks. IEEE ;trans. Comput. C-34, 11 (Nov.), 933 938.

[31]

~STONE, H. S. 1984. Database applications of the fetch-and-add instruction. IEEE Trans ~Comput. C-33, 7 (July), 604-612.

[32]

~ViSHKIN, U. 1984. A parallel-deslgn distributed-implementation (PDDI) general purpose com- ~puter. Theoret. Compul. Sci. 32, 157 172.

Cited By

Roh YWei YRuppert EFatourou PJayanti SShun J(2025)Aggregating Funnels for Faster Fetch&Add and QueuesProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710873(99-114)Online publication date: 28-Feb-2025
https://dl.acm.org/doi/10.1145/3710848.3710873
Sela GPetrank E(2022)Concurrent sizeProceedings of the ACM on Programming Languages10.1145/35633006:OOPSLA2(345-372)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563300
Gonzalez-Guerrero PBautista MLyles DMichelogiannakis GFalsafi BFerdman MLu SWenisch T(2022)Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507765(963-976)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507765
Show More Cited By

Index Terms

Counting networks

Recommendations

Scalable concurrent counting

The notion of counting is central to a number of basic multiprocessor coordination problems, such as dynamic load balancing, barrier synchronization, and concurrent data structure design. We investigate the scalability of a variety of counting techniques ...
A combinatorial treatment of balancing networks

Balancing networks, originally introduced by Aspnes et al. (Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348-358, May 1991), represent a new class of distributed, low-contention data structures suitable for solving many ...
New Self-Routing Permutation Networks

This contribution is focused on self-routing permutation networks capable of routing all n! permutations of its n inputs to its n outputs without internal conflict. First, a self-routing permutation network named BNB SRPN is described. The network ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 41, Issue 5

Sept. 1994

230 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/185675

Issue’s Table of Contents

Copyright © 1994 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1994

Published in JACM Volume 41, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

131
Total Citations
View Citations
1,245
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)12

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Roh YWei YRuppert EFatourou PJayanti SShun J(2025)Aggregating Funnels for Faster Fetch&Add and QueuesProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710873(99-114)Online publication date: 28-Feb-2025
https://dl.acm.org/doi/10.1145/3710848.3710873
Sela GPetrank E(2022)Concurrent sizeProceedings of the ACM on Programming Languages10.1145/35633006:OOPSLA2(345-372)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.1145/3563300
Gonzalez-Guerrero PBautista MLyles DMichelogiannakis GFalsafi BFerdman MLu SWenisch T(2022)Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507765(963-976)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507765
Cook VPeterson CPainter ZDechev D(2022)Quantifiability: a concurrent correctness condition modeled in vector spaceComputing10.1007/s00607-022-01092-3105:5(955-978)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1007/s00607-022-01092-3
Baig MHendler DMilani ATravers C(2022)Long-lived counters with polylogarithmic amortized step complexityDistributed Computing10.1007/s00446-022-00439-536:1(29-43)Online publication date: 29-Nov-2022
https://doi.org/10.1007/s00446-022-00439-5
Cook VPeterson CPainter ZDechev D(2021)Quantifiability: Correctness of Concurrent Programs in Vector Space2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP52278.2021.00023(94-98)Online publication date: Mar-2021
https://doi.org/10.1109/PDP52278.2021.00023
Michelogiannakis GLyles DGonzalez-Guerrero PBautista MVasudevan DButko A(2021)SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00113(1046-1055)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00113
(2021)BibliographyThe Art of Multiprocessor Programming10.1016/B978-0-12-415950-1.00033-1(533-540)Online publication date: 2021
https://doi.org/10.1016/B978-0-12-415950-1.00033-1
Herlihy MShavit NLuchangco VSpear M(2021)Counting, sorting, and distributed coordinationThe Art of Multiprocessor Programming10.1016/B978-0-12-415950-1.00022-7(265-303)Online publication date: 2021
https://doi.org/10.1016/B978-0-12-415950-1.00022-7
Herlihy MShavit NLuchangco VSpear M(2021)Concurrent objectsThe Art of Multiprocessor Programming10.1016/B978-0-12-415950-1.00012-4(49-74)Online publication date: 2021
https://doi.org/10.1016/B978-0-12-415950-1.00012-4
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents