RBPCCM: Relax Blocking Parallel Collective Communication Mechanism Base on Hardware with Scalability

Xiu-jiang Ren¹⁴,
Zhou Zhou¹⁴,
Qing Peng¹⁴ &
…
Xiang-hui Xie¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 600))

Included in the following conference series:

CCF National Conference on Computer Engineering and Technology

484 Accesses

Abstract

With the development of parallel computation, the scale of high performance computing system increases dramatically and the collective communication has become its bottleneck. The collective communication with the hardware support has the relatively high performance. However, scalability of collective communication is always a crucial problem, because the number of nodes involved is not fixed. This paper proposes the Relax Blocking Parallel Collective Communication Mechanism (RBPCCM) to improve the performance of the collective communication in parallel computation. This mechanism, cooperating hardware and software, implements the scalable collective communication by distributing collective resource allocation numbers. Furthermore, RBPCCM supports the implementation in various scales of endpoint, unconstrained by the interconnect network topology. A functional simulation model is built based on the system of Sunway Taihu Light to verify the correctness and scalability of this proposed method. The implementation of RBPCCM prototype is built based on the network interface, and a FPGA platform is constructed for performance test. It is testified that RBPCCM has the improvement as regards to delay performance from 2.4 to 37 times, compared with the Point-to-Point communication based on software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploiting copy engines for intra-node MPI collective communication

Article Open access 11 May 2023

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

References

Lucas, R., Ang, J., Bergman, K., et al.: DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) report: top ten exascale research challenges (2014)
Google Scholar
Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance. In: Achieving Optimal Performance on the 8192 Processors of ASCI Q, Proceedings of SC2003, pp. 1–17. ACM, New York (2003)
Google Scholar
Rabenseifner, R.: Automatic MPI counter profiling of all users: first result on a CRAY T3E 900-512. In: Proceedings of the Message Passing Interface Developer’s and User’s Conference (MPIDC), pp. 77–85. HLRS, Atlanta, USA (1999)
Google Scholar
Moody, A., Fernandez, J., Petrini, F., et al.: Scalable NIC-based reduction on large-scale clusters. In: ACM/IEEE Conference on Supercomputing, p. 59. ACM (2003)
Google Scholar
Culler, D., Richard, K.Y., Patterson, D., Eicken, T. et al.: LogP: towards a realistic model of parallel computation. 28(7), 1–12 (1993)
Google Scholar
Gabrielyan, E., Hersch, R.D.: Network topology aware scheduling of collective communications. In: International Conference on Telecommunications, vol. 2, pp. 1051–1058. IEEE (2003)
Google Scholar
Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, pp. 918–926. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44520-X_128
Chapter Google Scholar
Hoefler, T., Squyres, J.M., Rehm, W., Lumsdaine, A.: A case for non-blocking collective operations. In: Min, G., Di Martino, B., Yang, L.T., Guo, M., Rünger, G. (eds.) ISPA 2006. LNCS, vol. 4331, pp. 155–164. Springer, Heidelberg (2006). https://doi.org/10.1007/11942634_17
Chapter Google Scholar
Petrini, F., Coll, S., Frachtenberg, E., et al.: Hardware- and software-based collective communication on the quadrics network. In: IEEE International Symposium on Network Computing and Applications, pp. 24–35. IEEE (2001)
Google Scholar
Giampapa, M.E., Giampapa, M.E., Giampapa, M.E., et al.: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. International Conference on Supercomputing, pp. 94–103. ACM (2008)
Google Scholar
Faraj, A., Kumar, S., Smith, B., et al.: MPI collective communications on the Blue Gene/P supercomputer: algorithms and optimizations. In: International Conference on Supercomputing, pp. 489–490. ACM (2009)
Google Scholar
Haring, R., Ohmacht, M., Fox, T., et al.: The IBM Blue Gene/Q compute chip. IEEE Micro 32(2), 48–60 (2011)
Article Google Scholar
Arimilli, B., Arimilli, R., Chung, V., et al.: The PERCS high-performance interconnect, pp. 75–82. IEEE (2010)
Google Scholar
Mai, L., Rupprecht, L., Alim, A., et al.: NetAgg: using middleboxes for application-specific on-path aggregation in data centres, vol. 23(6), pp. 249–262 (2014)
Google Scholar
Wagner, A., Jin, H.W., Panda, D.K., et al.: NIC-based offload of dynamic user-defined modules for Myrinet clusters. IEEE International Conference on CLUSTER Computing, pp. 205–214. IEEE Computer Society (2004)
Google Scholar
Yu, W., Buntinas, D., Graham, R.L., et al.: Efficient and scalable barrier over quadrics and Myrinet with a new NIC-based collective message passing protocol, p. 182 (2004)
Google Scholar
Zahavi, E., Zahavi, E., Zahavi, E., et al.: Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction. In: The Workshop on Optimization of Communication in HPC, pp. 1–10. IEEE Press (2016)
Google Scholar
Arap, O., Swany, M.: Offloading collective operations to programmable logic on a Zynq cluster. In: High-Performance Interconnects, pp. 76–83. IEEE (2016)
Google Scholar
Lu, Y., Shen, Z., Zhou, E., Zhu, M.: MCRM system: CIM-. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds.) ISPA 2005. LNCS, vol. 3759, pp. 549–558. Springer, Heidelberg (2005). https://doi.org/10.1007/11576259_60
Chapter Google Scholar

Download references

Acknowledgements

This research is supported by National Science and Technology Major Project with No. 2013ZX0102-8001-001-001.

Author information

Authors and Affiliations

Jiangnan Institute of Computing Technology, Wuxi, 214083, China
Xiu-jiang Ren, Zhou Zhou & Qing Peng
State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, 214125, China
Xiang-hui Xie

Authors

Xiu-jiang Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qing Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-hui Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang-hui Xie .

Editor information

Editors and Affiliations

National University of Defense Technology, Changsha, China
Weixia Xu
National University of Defense Technology, Changsha, China
Liquan Xiao
School of Computer Science, National University of Defense Technology, Changsha, China
Jinwen Li
National University of Defense Technology, Changsha, China
Chengyi Zhang
National University of Defense Technology, Changsha, China
Zhenzhen Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, Xj., Zhou, Z., Peng, Q., Xie, Xh. (2018). RBPCCM: Relax Blocking Parallel Collective Communication Mechanism Base on Hardware with Scalability. In: Xu, W., Xiao, L., Li, J., Zhang, C., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2017. Communications in Computer and Information Science, vol 600. Springer, Singapore. https://doi.org/10.1007/978-981-10-7844-6_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-7844-6_7
Published: 03 January 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7843-9
Online ISBN: 978-981-10-7844-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

RBPCCM: Relax Blocking Parallel Collective Communication Mechanism Base on Hardware with Scalability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting copy engines for intra-node MPI collective communication

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

RBPCCM: Relax Blocking Parallel Collective Communication Mechanism Base on Hardware with Scalability

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting copy engines for intra-node MPI collective communication

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation