[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

HyLAC: : Hybrid linear assignment solver in CUDA

Published: 16 May 2024 Publication History

Abstract

The Linear Assignment Problem (LAP) is a fundamental combinatorial optimization problem with a wide range of applications. Over the years, significant progress has been made in developing efficient algorithms to solve the LAP, particularly in the realm of high-performance computing, leading to remarkable reductions in computation time. In recent years, hardware improvements in General Purpose Graphics Processing Units (GPGPUs) have shown promise in meeting the ever-increasing compute bandwidth requirements. This has attracted researchers to develop GPU-accelerated algorithms to solve the LAP.
Recent work in the GPU domain has uncovered parallelism available in the problem structure to achieve significant performance improvements. However, each solution presented so far targets either sparse or dense instances of the problem and has some scope for improvement. The Hungarian algorithm is one of the most famous approaches to solving the LAP in polynomial time. Hungarian algorithm has classical O ( N 4 ) (Munkres') and tree based O ( N 3 ) (Lawler's) implementations. It is well established that the Munkres' implementation is faster for sparse LAP instances while the Lawler's implementation is faster for dense instances. In this work, we blend the GPU implementations of Munkres' and Lawler's to develop a Hybrid GPU-accelerated solver for LAP that switches between the two implementations based on available sparsity. Also, we improve the existing GPU implementations to reduce memory contention, minimize CPU-GPU synchronizations, and coalesced memory access. The resulting solver (HyLAC) works faster than existing CPU/GPU LAP solvers for sparse as well as dense problem instances. HyLAC achieves a speedup of up to 6.14× over existing state-of-the-art GPU implementation when run on the same hardware. We also develop an implementation to solve a list of small LAPs (tiled LAP), which is particularly useful in the optimization domain. This tiled LAP solver performs 22.59× faster than the existing implementation.

Highlights

The fastest Linear Assignment Problem (LAP) solver that uses GPUs.
Improved implementation of classical and tree variants of Hungarian algorithm.
Solves a stream of small LAPs 22.59× faster than the existing solution.

References

[1]
M. Akgül, A genuinely polynomial primal simplex algorithm for the assignment problem, Discrete Appl. Math. 45 (2) (1993) 93–115,.
[2]
H. Alt, N. Blum, K. Mehlhorn, M. Paul, Computing a maximum cardinality matching in a bipartite graph in time o ( n 1.5 ( m / log ⁡ ( n ) ), Inf. Process. Lett. 37 (1991) 237–240.
[3]
E. Balas, D.L. Miller, J.F. Pekny, P. Toth, A parallel shortest augmenting path algorithm for the assignment problem, J. ACM 38 (4) (1991) 985–1004,.
[4]
M. Balinski, R.E. Gomory, A primal method for the assignment and transportation problems, Manag. Sci. 10 (1964) 578–593.
[5]
R.S. Barr, F. Glover, D. Klingman, The alternating basis algorithm for assignment problems, Math. Program. 13 (1) (1977) 1–13,.
[6]
D. Bertsekas, S. Pallottino, M. Scutellà, Polynomial auction algorithms for shortest paths, Comput. Optim. Appl. 4 (2) (1995) 99–125,.
[7]
D.P. Bertsekas, A new algorithm for the assignment problem, Math. Program. 21 (1981) 152–171.
[8]
D.P. Bertsekas, The auction algorithm: a distributed relaxation method for the assignment problem, Ann. Oper. Res. 14 (1) (1988) 105–123,.
[9]
S.H. Bokhari, Assignment Problems in Parallel and Distributed Computing, Kluwer International Series in Engineering and Computer Science, vol. SECS 32, Kluwer Academic Publishers, Boston, 1987.
[10]
R.E. Burkard, M. Dell'Amico, S. Martello, Assignment Problems, revised reprint edition, SIAM, Society for Industrial and Applied Mathematics, Philadelphia, 2012.
[11]
G. Carpaneto, P. Toth, Primal-dual algorithms for the assignment problem, Discrete Appl. Math. 18 (2) (1987) 137–153,.
[12]
W.H. Cunningham, A network simplex method, Math. Program. 11 (1) (1976) 105–116,.
[13]
K. Date, R. Nagi, GPU-accelerated Hungarian algorithms for the linear assignment problem, Parallel Comput. 57 (2016) 52–72,.
[14]
K. Date, R. Nagi, Level 2 reformulation linearization technique–based parallel algorithms for solving large quadratic assignment problems on graphics processing unit clusters, INFORMS J. Comput. 31 (4) (2019) 771–789,.
[15]
M. Dell'Amico, P. Toth, Algorithms and codes for dense assignment problems: the state of the art, Discrete Appl. Math. 100 (1) (2000) 17–48,.
[16]
U. Derigs, The shortest augmenting path method for solving assignment problems - motivation and computational experience, in: C.L. Monma (Ed.), Algorithms and Software for Optimization - Part I, in: Annals of Operations Research, vol. 4, Baltzer, Basel, 1985, pp. 57–102.
[17]
E. Dinic, M. Cronrod, An algorithm for solution of the assignment problem, Sov. Math. Dokl. 10 (1969) 1324–1326.
[18]
J.R. Driscoll, H.N. Gabow, R. Shrairman, R.E. Tarjan, Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation, Commun. ACM 31 (11) (1988) 1343–1354,.
[19]
J. Edmonds, R.M. Karp, Theoretical improvements in algorithmic efficiency for network flow problems, J. ACM 19 (2) (1972) 248–264,.
[20]
M. Fayyazi, D. Kaeli, W. Meleis, Parallel maximum weight bipartite matching algorithms for scheduling in input-queued switches, in: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings, 2004, p. 4,.
[21]
M.L. Fisher, An applications oriented guide to Lagrangian relaxation, Interfaces 15 (2) (1985) 10–21.
[22]
A. Goldberg, R. Kennedy, An efficient cost scaling algorithm for the assignment problem, Math. Program. 71 (1995) 153–177,.
[23]
A.V. Goldberg, S.A. Plotkin, P.M. Vaidya, Sublinear-time parallel algorithms for matching and related problems, in: [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science, 1988, pp. 174–185.
[24]
S. Guthe, D. Thuerck, Algorithm 1015: a fast scalable solver for the dense linear (sum) assignment problem, ACM Trans. Math. Softw. 47 (2) (2021) 18:1–18:27,.
[25]
A.J. Hoffman, H.M. Markowitz, A note on shortest path, assignment, and transportation problems, Nav. Res. Logist. Q. 10 (1) (1963) 375–379,.
[26]
J.E. Hopcroft, R.M. Karp, An n 5 / 2 algorithm for maximum matchings in bipartite graphs, SIAM J. Comput. 2 (4) (1973) 225–231,.
[27]
M.S. Hung, W.O. Rom, Solving the assignment problem by relaxation, Oper. Res. 28 (4) (1980) 969–982.
[28]
R. Jonker, A. Volgenant, A shortest augmenting path algorithm for dense and sparse linear assignment problems, Computing 38 (4) (1987) 325–340,.
[29]
R. Jonker, T. Volgenant, Improving the Hungarian assignment algorithm, Oper. Res. Lett. 5 (4) (1986) 171–175,.
[30]
S. Kawtikwar, R. Nagi, Multi-target tracking with gpu-accelerated data association engine, in: 2023 26th International Conference on Information Fusion (FUSION), 2023, pp. 1–8,.
[31]
J.L. Kennington, Z. Wang, An empirical analysis of the dense assignment problem: sequential and parallel implementations, INFORMS J. Comput. 3 (4) (1991) 299–306,.
[32]
H.W. Kuhn, The Hungarian method for the assignment problem, Nav. Res. Logist. Q. 2 (1–2) (1955) 83–97,.
[33]
(2023): Large range 3d scatter plot for comparison with Guthe et al. (LAP21). https://plotly.com/~samkawtikwar/30/.
[34]
E.L. Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart and Winston, New York, 1976.
[35]
P.A. Lopes, S.S. Yadav, A. Ilic, S.K. Patra, Fast block distributed CUDA implementation of the Hungarian algorithm, J. Parallel Distrib. Comput. 130 (2019) 50–62,.
[36]
R.E. Machol, M. Wien, Technical note—a “hard” assignment problem, Oper. Res. 24 (1) (1976) 190–192,.
[37]
L.F. McGinnis, Implementation and testing of a primal-dual algorithm for the assignment problem, Oper. Res. 31 (2) (1983) 277–291,.
[38]
W. Mei, W. Hwu, D.B. Kirk, I. El Hajj, Programming Massively Parallel Processors, fourth edition, Morgan Kaufmann, 2023,.
[39]
J. Munkres, Algorithms for the assignment and transportation problems, J. Soc. Ind. Appl. Math. 5 (1) (1957) 32–38.
[40]
K.G. Murty, An algorithm for ranking all the assignments in order of increasing cost, Oper. Res. 16 (3) (1968) 682–687.
[41]
NVIDIA (Dec 2022): Nvidia nsight compute. https://developer.nvidia.com/nsight-compute.
[42]
J. Orlin, R. Ahuja, New scaling algorithms for the assignment and minimum mean cycle problems, Math. Program. 54 (1992) 41–56,.
[43]
K.G. Ramakrishnan, N. Karmarkar, A.P. Kamath, An approximate dual projective algorithm for solving assignment problems, in: Network Flows and Matching, 1991.
[44]
O.H. Reynen, GPU-accelerated algorithms for the resource-constrained assignment problem, Master's thesis University of Illinois Urbana-Champaign, Champaign, IL USA, 2020.
[45]
U. Schwiegelshohn, L. Thiele, A systolic array for the assignment problem, IEEE Trans. Comput. 37 (11) (1988) 1422–1425,.
[46]
(2023): Short range 3d scatter plot for comparison with Guthe et al. (LAP21). https://plotly.com/~samkawtikwar/28/.
[47]
S. Storøy, T. Sørevik, Massively parallel augmenting path algorithms for the assignment problem, Computing 59 (1) (1997) 1–16,.
[48]
N. Tomizawa, On some techniques useful for solution of transportation network problems, Networks 1 (2) (1971) 173–194,.
[49]
S. Vadrevu, R. Nagi, A gpu-accelerated dual-ascent algorithm for the multidimensional assignment problem in a multitarget tracking application, IEEE Trans. Autom. Sci. Eng. 20 (3) (2023) 1706–1720,.
[50]
H. Zaki, A comparison of two algorithms for the assignment problem, Comput. Optim. Appl. 4 (1995) 23–45.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing  Volume 187, Issue C
May 2024
181 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 16 May 2024

Author Tags

  1. Hungarian algorithm
  2. Linear assignment problem
  3. GPU-accelerated graph algorithms
  4. High-performance computing
  5. Parallel algorithm

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media