[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Counting networks

Published: 01 September 1994 Publication History

Abstract

Many fundamental multi-processor coordination problems can be expressed as counting problems: Processes must cooperate to assign successive values from a given range, such as addresses in memory or destinations on an interconnection network. Conventional solutions to these problems perform poorly because of synchronization bottlenecks and high memory contention.
Motivated by observations on the behavior of sorting networks, we offer a new approach to solving such problems, by introducing counting networks, a new class of networks that can be used to count. We give two counting network constructions, one of depth log n(1 + log n)/2 using n log (1 + log n)/4 “gates,” and a second of depth log2 n using n log2 n/2 gates. These networks avoid the sequential bottlenecks inherent to earlier solutions and substantially lower the memory contention.
Finally, to show that counting networks are not merely mathematical creatures, we provide experimental evidence that they outperform conventional synchronization techniques under a variety of circumstances.

References

[1]
~AGARWAL, A, AND CHERIAN, M 1989. Adaptive backoff synchronization techniques. In Pro- ~c eedblgS of tt~e 16th Symposium ol~ Comp,terArchitecture (June). IEEE Computer Society Press, ~Los Alamitos, Calil., pp. 396-406.
[2]
~AGARWAL, A., CHAIKEN, D., D'SOUZA, G., JOHNSON, K., KRANZ, D., KUBIATOWICZ, J., KURIHARA, ~K., Elm, B.-H., M^A, G., NUSSBAUM, D., PARKIN, M., AND YOUNG, D. 1991. The MIT alewife ~machine: A large-scale distributed-memory multiprocessor. In Proceedings of Workshop on ~Scalable Shared Memoly Muhiprocessors. Kluwer Academic Publishers. (An extended version of ~this paper has been submitted for publication, and appears as MIT/LCS Memo TM-454, 1991.)
[3]
~AHARONSON, E., AND ATTIYA, H. 1992. Counting networks with arbitrary fan-out. In Proceed- ~ings of the 3rd Symposium on Discrete Algorithms (Orlando, Fla., Jan. 27-29). ACM-SIAM, New ~York, pp. 104-113.
[4]
~AJTAI, M., KOMLOS, J., AND SZEMERI2DI, E. 1983. An O(n log n) sorting network. In Proceed- ~trigs of the 15th ACM Symposium on the Theory of Computing. (Boston, Mass., Apr. 25-27). ~ACM, New York, pp. 1-9.
[5]
~ANDERSON, T.E. 1989. The performance implications of spin-waiting alternatives for shared- ~memory multiprocessors. Tech. Rep. 89-04-03. Univ. Washington, Seattle, Wash.
[6]
~ASPNES, J., HERLIHY, M. P., AND SHAVIT, N. 1991. Counting networks and multi-processor ~coordination. In Proceedings of the 23rd Annual Symposium on Theory of Compuung, New ~Orleans, La., May 6-8). ACM, New York, pp. 348-358.
[7]
~BATCttER, K.E. 1968. Sorting networks and their applications. In Proceedings of AF1PS Joint ~Computer Conference 32, 338-334.
[8]
~CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. MIT ~Press, Cambridge, Mass.
[9]
~DOWD, M., PERL, Y., RUDOLPH, L., AND SAKS, M. 1989. The periodic balanced sorting network. ~J. ACM 36, 4 (Oct), 738-757.
[10]
~ELLIS, C. S., AND OLSON, T.J. 1988. Algorithms for parallel memory allocation. J. Parallel ~Progr. 17, 4 (Aug.) 303 345.
[11]
~FELq'ON, E. W., L^MARC^, A., AND LADNER, R. 1993. Building counting networks from larger ~balancers. Tech. Rep. 93-(/4-09. Univ. Washington, Seattle, Wash.
[12]
~FREUDENTHAL, E., AND GOTTL1EB, A. 1991. Process coordination with fetch-and-increment. In ~Proceedings of the 4th International Conference on Architecture Support Jor Progl:amming Lan- ~gttages and Opovtting Systenzs, (Santa Clara, Calif., Apr.).
[13]
~GAWHCK, D. 1985. Processing "hot spots" in high performance systems. In Proceedings of ~COMPCON'~S5. IEEE, Los Alamitos, Cahf., pp. 249-251.
[14]
~GOODM~N, J., VERNON, M., AND WOEST, P. 1989. A set of efficient synchronization primitives ~for a large-scale shared-memory multiprocessor. In Proceedings o)~ the 3rd International Confer- ~ence on Architectural Suppot;t Jor Programming Languages and OI)eratmg Systems (Apr.). ACM, ~New York, pp. 64 77.
[15]
~GOTTLIEB, A., GRISHMAN, R., KRUSK^L, C. P., MCAULIFFE, K. P., RUDOLPH, L., AND SNIR, M. ~1984. The NYU ultracomputer Dcsignmg an mimd parallel computer. IEEE Trans. Conlpttt- ~ers C-32, 2 (Fcb.), 175-189.
[16]
~GOTTI.IEB, A., LUBACHEVSKS, B. D., AND RUDLOPH, L. 1983. Basic techniques for the efficient ~coordination of very large numbers of cooperating sequential processors. ACM Trans. Prog. ~Lang. &'st. 5, 2 (Apr.), 164-189.
[17]
~HARDAVEI,LAS, N., KARAKOS, D., AND MAYRONICOL^S, M. 1993. Notes on sorting and counting ~nctworks. In Proceedings of WDAG'93, to appear.
[18]
~HENSGEN, D., FINKEL, R., AND MANGER, U. 1988. Two algorithms for barrier synchronization. ~hit. J. Para. Prog. 17, 1, 1117.
[19]
~HERLIHY, M. P., LIM, B. H. AND SHAVIT, N. 1992. Low contention load balancing on largc-scale ~multiproccssors. In Proceedblgs of the 4th Annual ACM Symposium on Parallel Algorithms and ~,4rclntectures, (San Diego, Calif., June 29-July 11. ACM, New York, pp. 219 222.
[20]
~HERLIHY, M. P., SHAVIT, N., AND WAARTS, O. 1991. Low-contention linearizable counting. In ~Proceedings' of the 32th 1EEE Symposium on Foundations of Conlpztler Science (Oct.) IEEE, New ~York, pp. 526 535.
[21]
~KR^NZ, D., HALSTEAD, R., AND MOHR, E. 1989. MuI-T, A high-performance parallel L~sp. In ~Proceedtngs of the ACM SIGPLAN '89 Conference on Programming Language Design and ~hnplemeutatlon, (Portland, Ore., June 21-231. ACM, New York, pp. 81-90.
[22]
~KRUSKAL, C. P., RUDOLPH, L., AND SNIR, U. 1986. Efficient synchronization on multiproccssors ~with shared memory. In FroceedDtgx of the 5{h AC3{ ${OACT-$IOOPS S)'mpoMum on Frmciples ~oJ Dtstnbnted Computing, ACM. New York, pp. 218-228.
[23]
~KLUGERMAN, M. AND PLAXTON, C.G. 1992. Small-depth counting networks. In Proceedings of ~the 24th Annual Symposium on the Theory of Computing. (Victoria, B.C., Canada, May 4-6). ~ACM, New York, pp. 417 428.
[24]
~LYNCH, N. A., aND TU'fTLIS, M. R. 1987. Hierarchical correctness proofs for distributed ~algorithms. In Proceedbhqs of the 6th AC3,1 Symposutm on Principles of Dtstrtbuted Computing ~(Vancouver, B.C., Canada, Aug. 10-12). ACM, New York, pp. 137-151. (Full version available ~as MIT Tech. Rep. MIT/LCS/TR-387.)
[25]
~MULLOR-CRUMMEY, J. M., ~ND SCOTT, M.L. 1990. Algorithms for scalable synchronization on ~shared-memory multlprocessors. Tech. Rep. 342. Umv. Rochester, Rochester, N.Y. (Apr.).
[26]
~RUDOLPIL L. 1983. Decentrahzed cache scheme for an MIMD parallel processor, In Proceed- ~m~~ of the Ilth Annual Computing Architecture Conference. pp. 340-347.
[27]
~MELLOR-CRUMMEY, J. M., AND SCOTI', g.L. 1991. Synchronization without contention. In ~Prc)ceedztzg, s' of the 4th lnternatzonal Conference on Architecture Support for Programming Lan- ~guages attd Operating Systems (Santa Clara, Calif., Apr.) ACM, New York, pp. 269 278.
[28]
~PELEG, D., AND UPFAL, E. 1986. The token distribution problem. In Proeeedmgs of the 27th ~IEEE Sympostum on Foundaaons of Computer Sctence (Oct.). IEEE, New York.
[29]
~PF1STER, G. H., ET AL. 1985. The IBM research parallel processor prototype (RP3): Introduc-tion and architecture. In Proceedings of the International Conference on Parallel Processing.
[30]
~PFIS'rER, G. H., AND NORTON, A. 1985. 'Hot spot' contention and combining m multistage ~mterconnectlon networks. IEEE ;trans. Comput. C-34, 11 (Nov.), 933 938.
[31]
~STONE, H. S. 1984. Database applications of the fetch-and-add instruction. IEEE Trans ~Comput. C-33, 7 (July), 604-612.
[32]
~ViSHKIN, U. 1984. A parallel-deslgn distributed-implementation (PDDI) general purpose com- ~puter. Theoret. Compul. Sci. 32, 157 172.

Cited By

View all
  • (2025)Aggregating Funnels for Faster Fetch&Add and QueuesProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710873(99-114)Online publication date: 28-Feb-2025
  • (2022)Concurrent sizeProceedings of the ACM on Programming Languages10.1145/35633006:OOPSLA2(345-372)Online publication date: 31-Oct-2022
  • (2022)Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507765(963-976)Online publication date: 28-Feb-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 41, Issue 5
Sept. 1994
230 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/185675
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1994
Published in JACM Volume 41, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. counting networks
  2. hot-sports
  3. network routing
  4. parallel processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)12
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Aggregating Funnels for Faster Fetch&Add and QueuesProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710873(99-114)Online publication date: 28-Feb-2025
  • (2022)Concurrent sizeProceedings of the ACM on Programming Languages10.1145/35633006:OOPSLA2(345-372)Online publication date: 31-Oct-2022
  • (2022)Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507765(963-976)Online publication date: 28-Feb-2022
  • (2022)Quantifiability: a concurrent correctness condition modeled in vector spaceComputing10.1007/s00607-022-01092-3105:5(955-978)Online publication date: 7-Jun-2022
  • (2022)Long-lived counters with polylogarithmic amortized step complexityDistributed Computing10.1007/s00446-022-00439-536:1(29-43)Online publication date: 29-Nov-2022
  • (2021)Quantifiability: Correctness of Concurrent Programs in Vector Space2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP52278.2021.00023(94-98)Online publication date: Mar-2021
  • (2021)SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00113(1046-1055)Online publication date: May-2021
  • (2021)BibliographyThe Art of Multiprocessor Programming10.1016/B978-0-12-415950-1.00033-1(533-540)Online publication date: 2021
  • (2021)Counting, sorting, and distributed coordinationThe Art of Multiprocessor Programming10.1016/B978-0-12-415950-1.00022-7(265-303)Online publication date: 2021
  • (2021)Concurrent objectsThe Art of Multiprocessor Programming10.1016/B978-0-12-415950-1.00012-4(49-74)Online publication date: 2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media