Abstract
The prefix sums algorithm is a fundamental parallel programming building block used to solve significant problems in engineering, mathematical software, and big data analytics. In this paper, we present a generalization of the work-efficient prefix sums algorithm introduced by Blelloch, which in its original form is particularly well-performing on highly parallel architectures. However, the algorithm works only with arrays whose size is a power of 2. While various solutions have been developed to alleviate this limitation, we propose a canonical extension of the classical algorithm, which preserves its original form and maintains the performance characteristics of the work-efficient algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Intel Developer Cloud. https://devcloud.intel.com/. Accessed 30 Oct 2022
Aananthakrishnan, S., et al.: Piuma: programmable integrated unified memory architecture. arXiv preprint arXiv:2010.06277 (2020)
Akl, S.G.: Parallel Computation: Models and Methods. Prentice-Hall, Inc., Hoboken (1997)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, November 14–20, 2009, Portland, Oregon, USA. ACM (2009). https://doi.org/10.1145/1654059.1654078
Bilgic, B., Horn, B.K.P., Masaki, I.: Efficient integral image computation on the GPU. In: IEEE Intelligent Vehicles Symposium (IV), 2010, La Jolla, CA, USA, June 21–24, 2010, pp. 528–533. IEEE (2010). https://doi.org/10.1109/IVS.2010.5548142
Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989). https://doi.org/10.1109/12.42122
Blelloch, G.E.: Prefix sums and their applications. Technical report. CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, November 1990
Blelloch, G.E., Little, J.J.: Parallel solutions to geometric problems in the scan model of computation. J. Comput. Syst. Sci. 48(1), 90–115 (1994). https://doi.org/10.1016/S0022-0000(05)80023-6
Chatterjee, S., Blelloch, G.E., Zagha, M.: Scan primitives for vector computers. In: Martin, J.L., Pryor, D.V., Montry, G.R. (eds.) Proceedings Supercomputing ’90, New York, NY, USA, November 12–16, 1990, pp. 666–675. IEEE Computer Society (1990). https://doi.org/10.1109/SUPERC.1990.130084
Cole, R., Vishkin, U.: Faster optimal parallel prefix sums and list ranking. Inf. Comput. 81(3), 334–352 (1989). https://doi.org/10.1016/0890-5401(89)90036-9
Elliott, C.: Generic functional parallel algorithms: scan and FFT. Proc. ACM Program. Lang. 1(ICFP), 7:1–7:25 (2017). https://doi.org/10.1145/3110251
Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. In: Sarkar, V., Ryder, B.G., Soffa, M.L. (eds.) Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20–24, 1994, pp. 135–146. ACM (1994). https://doi.org/10.1145/178243.178255
Harris, M., Sengupta, S., Owens, J.D.: Chapter 39. parallel prefix sum (scan) with CUDA (2007). https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda
Hillis, W.D., Jr., G.L.S.: Data parallel algorithms. Commun. ACM. 29(12), 1170–1183 (1986). https://doi.org/10.1145/7902.7903
Iso/iec 14882:2020(en) programming languages - c++. Standard, International Organization for Standardization, Geneva, Switzerland, December 2020
Ladner, R.E., Fischer, M.J.: Parallel prefix computation. J. ACM 27(4), 831–838 (1980). https://doi.org/10.1145/322217.322232
Lakhotia, K., Petrini, F., Kannan, R., Prasanna, V.K.: Accelerating all reduce with in-network reduction on intel PIUMA. IEEE Micro 42(2), 44–52 (2022)
Lakshmivarahan, S., Dhall, S.K.: Parallel Computing Using the Prefix Problem. Oxford University Press, Oxford (1994)
Merrill, D., Garland, M., Grimshaw, A.S.: High-performance and scalable GPU graph traversal. ACM Trans. Parallel Comput. 1(2), 14:1–14:30 (2015). https://doi.org/10.1145/2717511
Morihata, A.: Lambda calculus with algebraic simplification for reduction parallelisation: extended study. J. Funct. Program. 31, e7 (2021). https://doi.org/10.1017/S0956796821000058
Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J., Tian, X.: Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Springer Nature, CA (2021). https://doi.org/10.1007/978-1-4842-5574-2
Sanders, P., Träff, J.L.: Parallel prefix (scan) algorithms for MPI. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J.J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface, 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, September 17–20, 2006, Proceedings. LNCS, vol. 4192, pp. 49–57. Springer, Cham (2006). https://doi.org/10.1007/11846802_15
Satish, N., Harris, M.J., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23–29, 2009, pp. 1–10. IEEE (2009). https://doi.org/10.1109/IPDPS.2009.5161005
Sengupta, S., Lefohn, A., Owens, J.D.: A work-efficient step-efficient prefix sum algorithm (2006)
Sycl 2020 specification (revision 5). Standard, Khronos Group, Beaverton, OR, USA, May 2022
Zhang, N.: A novel parallel prefix sum algorithm and its implementation on multi-core platforms. In: 2010 2nd International Conference on Computer Engineering and Technology, vol. 2, pp. V2–66-V2-70. IEEE (2010). https://doi.org/10.1109/ICCET.2010.5485315
Zhang, W., Wang, Y., Ross, K.A.: Parallel prefix sum with SIMD. In: Bordawekar, R., Lahiri, T. (eds.) International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2020, Tokyo, Japan, August 31, 2020. pp. 1–11 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sikorski, A., Wrosz, I., Lewandowski, M. (2023). A Generalized Parallel Prefix Sums Algorithm for Arbitrary Size Arrays. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. https://doi.org/10.1007/978-3-031-30442-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-30442-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30441-5
Online ISBN: 978-3-031-30442-2
eBook Packages: Computer ScienceComputer Science (R0)