[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Dominant Point-Based Sequential and Parallel Algorithms for the Multiple Sequential Substring Constrained-LCS Problem

Published: 08 November 2024 Publication History

Abstract

The Longest Common Subsequence (LCS) problem is a well-known and studied problem in computer science and bioinformatics. It consists in finding the longest subsequence that is common to two or more given sequences. In this article, we address the problem of finding all LCS for the Sequential Substring Constrained-LCS (SSCLCS) problem, called the Multiple SSCLCS problem. To solve this problem, we first propose a dominant point-based sequential algorithm, designed on a new Leveled Direct Acyclic Graph (DAG) that gives the correct evaluation order of subproblems to avoid redundancy due to overlap. Depending on whether the constraints may overlap or not, it requires \(O(S|\Sigma |^{K+1}+r+n|\Sigma |)\) and \(O(S|\Sigma |^{K+1}+n|\Sigma |)\) time with \(O(Max\_level+n|\Sigma |)\) space. S is the number of partial SSCLCS in a node, K is the number of DAG levels, n is the length of sequences, r is the total length of constraints, \(Max\_level\) is the number of nodes in the largest level of the DAG, and \(|\Sigma |\) is the length of the alphabet. Then, we derive a coarse-grained multicomputer parallel solution requiring \(O(\tfrac{S|\Sigma |^{K+1}+r+n|\Sigma |}{p})\) and \(O(\tfrac{S|\Sigma |^{K+1}+n|\Sigma |}{p})\) execution time, \(O(Max\_level+n|\Sigma |)\) memory space and \(O(K)\) communication rounds. p is the number of processors. Experimental results showed that the parallel algorithm is, respectively, 14.43× and 19.19× faster than the sequential algorithm on 32 and 64 processors.

References

[1]
Hsing-Yen Ann, Chang-Biau Yang, Yung-Hsing Peng, and Bern-Cherng Liaw. 2010. Efficient algorithms for the block edit problems. Information and Computation 208, 3 (2010), 221–229.
[2]
Hsing-Yen Ann, Chang-Biau Yang, and Chiou-Ting Tseng. 2014. Efficient polynomial-time algorithms for the constrained LCS problem with strings exclusion. J. Comb. Optim. 28, 4 (2014), 800–813.
[3]
Alberto Apostolico and Concettina Guerra. 1987. The longest common subsequence problem revisited. Algorithmica 2 (1987), 316–336.
[4]
Karl Bringmann and Bhaskar Ray Chaudhury. 2018. Sketching, streaming, and fine-grained complexity of (weighted) LCS. CoRR abs/1810.01238 (2018). arXiv:1810.01238http://arxiv.org/abs/1810.01238
[5]
Yi-Ching Chen and Kun-Mao Chao. 2011. On the generalized constrained longest common subsequence problems. Journal of Combinatorial Optimization 21, 3 (2011), 383–392.
[6]
Yixin Chen, Andrew Wan, and Wei Liu. 2006. A fast parallel algorithm for finding the longest common sequence of multiple biosequences. BMC Bioinform. 7, S-4 (2006).
[7]
Yi-Ching Chen. 2010. Algorithms for the hybrid constrained longest common subsequence problem. In the 27th Workshop on Combinatorial Mathematics and Computation Theory (Taichung, Taiwan). 32–37.
[8]
Francis Y. L. Chin, Alfredo De Santis, Anna Lisa Ferrara, Ngai Lam Ho, and S. K. Kim. 2004. A simple algorithm for the constrained sequence problems. Inf. Process. Lett. 90, 4 (2004), 175–179.
[9]
Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. 2007. Algorithms on Strings. Cambridge University Press.
[10]
Université de Picardie Jules Verne. 2023. Plateforme MatriCS. https://www.matrics.u-picardie.fr/. [Online; accessed 21-August-2023].
[11]
Frank K. H. A. Dehne, Andreas Fabri, and Andrew Rau-Chaplin. 1993. Scalable parallel geometric algorithms for coarse grained multicomputers. In the 9th Annual Symposium on Computational Geometry(San Diego, CA, May 19–21, 1993). ACM, New York, 298–307.
[12]
Frank K. H. A. Dehne, Andreas Fabri, and Andrew Rau-Chaplin. 1996. Scalable parallel computational geometry for coarse grained multicomputers. International Journal of Computational Geometry & Applications 6, 3 (1996), 379–400.
[13]
Frank K. H. A. Dehne, Afonso Ferreira, Edson Cáceres, Siang W. Song, and Alessandro Roncato. 2002. Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33, 2 (2002), 183–200.
[14]
Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. 2004. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19–22, 2004, Proceedings (Lecture Notes in Computer Science), Dieter Kranzlmüller, Péter Kacsuk, and Jack J. Dongarra (Eds.), Vol. 3241. Springer, 97–104.
[15]
Thierry Garcia, Jean Frédéric Myoupo, and David Semé. 2003. A coarse-grained multicomputer algorithm for the longest common subsequence problem. In 11th Euromicro Workshop on Parallel, Distributed and Network-Based Processing (PDP 2003), 5–7 February 2003, Genova, Italy. IEEE Computer Society, 349–356.
[16]
Thierry Garcia and David Semé. 2005. A coarse-grained multicomputer algorithm for the detection of repetitions. Inf. Process. Lett. 93, 6 (2005), 307–313.
[17]
Thierry Garcia and David Semé. 2006. A load balancing technique for some coarse-grained multicomputer algorithms. In 21st International Conference on Computers and Their Applications (CATA-2006) (Seattle, Washington, March 23–25, 2006). ISCA, 301–306.
[18]
Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press.
[19]
Daniel S. Hirschberg. 1977. Algorithms for the longest common subsequence problem. J. ACM 24, 4 (1977), 664–675.
[20]
W. J. Hsu and M. W. Du. 1984. Computing a longest common subsequence for a set of strings. BIT 24, 1 (1984), 45–59.
[21]
Kuo-Si Huang, Chang-Biau Yang, Kuo-Tsung Tseng, Yung-Hsing Peng, and Hsing-Yen Ann. 2007. Dynamic programming algorithms for the mosaic longest common subsequence problem. Inform. Process. Lett. 102, 2–3 (2007), 99–103.
[22]
Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 2 (1977), 323–350.
[23]
Yanni Li, Yuping Wang, and Liang Bao. 2012. FACC: A novel finite automaton based on cloud computing for the multiple longest common subsequences search. Mathematical Problems in Engineering 2012 (2012).
[24]
Roy Lowrance and Robert A. Wagner. 1975. An extension of the string-to-string correction problem. J. ACM 22, 2 (1975), 177–183.
[25]
Mi Lu and Hua Lin. 1994. Parallel algorithms for the longest common subsequence problem. IEEE Trans. Parallel Distributed Syst. 5, 8 (1994), 835–848.
[26]
David Maier. 1978. The complexity of some problems on subsequences and supersequences. J. ACM 25, 2 (1978), 322–336.
[27]
Jean Frédéric Myoupo, Armel Nkonjoh Ngomade, and Vianney Kengne Tchendji. 2018. Coarse-grained multicomputer based-parallel algorithms for the longest common subsequence problem with a string-exclusion constraint. In 2018 International Conference on High Performance Computing and Simulation (HPCS 2018), Orleans, France, July 16–20, 2018. IEEE, 1038–1044.
[28]
Armel Nkonjoh Ngomade, Jean Frédéric Myoupo, and Vianney Kengne Tchendji. 2020. A dominant point-based parallel algorithm that finds all longest common subsequences for a constrained-MLCS problem. Journal of Computational Science 40 (2020), 101070.
[29]
U.S. National Library of Medicine. 2019. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/genbank/. [Online; accessed 06-Febrary-2020].
[30]
Zhan Peng and Yuping Wang. 2017. A novel efficient graph model for the multiple longest common subsequences (MLCS) problem. Frontiers in Genetics 8 (2017), 104.
[31]
Vianney Kengne Tchendji, Armel Nkonjoh Ngomade, Jerry Lacmou Zeutouo, and Jean Frédéric Myoupo. 2020. Efficient CGM-based parallel algorithms for the longest common subsequence problem with multiple substring-exclusion constraints. Parallel Comput. 91 (2020).
[32]
Vianney Kengne Tchendji, Hermann Bogning Tepiele, Mathias Akong Onabid, Jean Frédéric Myoupo, and Jerry Lacmou Zeutouo. 2022. A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem. Parallel Comput. 111 (2022), 102927.
[33]
Yin-Te Tsai. 2003. The constrained longest common subsequence problem. Inform. Process. Lett. 88, 4 (2003), 173–176.
[34]
Chiou-Ting Tseng, Chang-Biau Yang, and Hsing-Yen Ann. 2013. Efficient algorithms for the longest common subsequence problem with sequential substring constraints. J. Complexity 29, 1 (2013), 44–52.
[35]
Soroush Vahidi, Baruch Schieber, Zhihui Du, and David A. Bader. 2023. Parallel longest common subsequence analysis in chapel. CoRR abs/2309.09072 (2023). arXiv:2309.09072
[36]
Leslie G. Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103–111.
[37]
Robert A. Wagner and Michael J. Fischer. 1974. The string-to-string correction problem. J. ACM 21, 1 (1974), 168–173.
[38]
Lei Wang, Xiaodong Wang, Yingjie Wu, and Daxin Zhu. 2013. An efficient dynamic programming algorithm for the generalized LCS problem with multiple substring exclusion constrains. CoRR abs/1303.1872 (2013). arXiv:1303.1872http://arxiv.org/abs/1303.1872
[39]
Qingguo Wang, Dmitry Korkin, and Yi Shang. 2011. A fast multiple longest common subsequence (MLCS) algorithm. IEEE Trans. Knowl. Data Eng. 23, 3 (2011), 321–334.
[40]
Xiaodong Wang, Yingjie Wu, and Daxin Zhu. 2016. A polynomial time algorithm for a generalized longest common subsequence problem. In Green, Pervasive, and Cloud Computing. Springer, Cham, 18–29.
[41]
Jiaoyun Yang, Yun Xu, Yi Shang, and Guoliang Chen. 2014. A space-bounded anytime algorithm for the multiple longest common subsequence problem. IEEE Trans. Knowl. Data Eng. 26, 11 (2014), 2599–2609.
[42]
Jiaoyun Yang, Yun Xu, Guangzhong Sun, and Yi Shang. 2013. A new progressive algorithm for a multiple longest common subsequences problem and its efficient parallelization. IEEE Trans. Parallel Distributed Syst. 24, 5 (2013), 862–870.
[43]
Daxin Zhu, Lei Wang, Yingjie Wu, and Xiaodong Wang. 2015. An efficient dynamic programming algorithm for the generalized LCS problem with multiple substring inclusive constraints. CoRR abs/1505.06529 (2015). arXiv:1505.06529http://arxiv.org/abs/1505.06529

Index Terms

  1. Dominant Point-Based Sequential and Parallel Algorithms for the Multiple Sequential Substring Constrained-LCS Problem

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Parallel Computing
          ACM Transactions on Parallel Computing  Volume 11, Issue 4
          December 2024
          120 pages
          EISSN:2329-4957
          DOI:10.1145/3613655
          Issue’s Table of Contents

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 08 November 2024
          Online AM: 23 September 2024
          Accepted: 16 September 2024
          Revised: 15 September 2024
          Received: 31 August 2023
          Published in TOPC Volume 11, Issue 4

          Check for updates

          Author Tags

          1. Constrained longest common subsequence
          2. dominant point
          3. coarse-grained multicomputer
          4. multi-level directed acyclic graph

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 73
            Total Downloads
          • Downloads (Last 12 months)73
          • Downloads (Last 6 weeks)14
          Reflects downloads up to 14 Jan 2025

          Other Metrics

          Citations

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          Full Text

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media