Abstract
Haplotype information plays an important role in many genetic analyses. However, the identification of haplotypes based on sequencing methods is both expensive and time consuming. Current sequencing methods are only efficient to determine conflated data of haplotypes, that is, genotypes. This raises the need to develop computational methods to infer haplotypes from genotypes.
Haplotype inference by pure parsimony is an NP-hard problem and still remains a challenging task in bioinformatics. In this paper, we propose an efficient ant colony optimization (ACO) heuristic method, named ACOHAP, to solve the problem. The main idea is based on the construction of a binary tree structure through which ants can travel and resolve conflated data of all haplotypes from site to site. Experiments with both small and large data sets show that ACOHAP outperforms other state-of-the-art heuristic methods. ACOHAP is as good as the currently best exact method, RPoly, on small data sets. However, it is much better than RPoly on large data sets. These results demonstrate the efficiency of the ACOHAP algorithm to solve the haplotype inference by pure parsimony problem for both small and large data sets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Even though Table 3 of Benedettini et al. (2008) presents results of ACO-HI+ on the four SU data sets, only results concerning SU2 can be used for our comparisons. This is because of two reasons. First, results of ACO-HI+ for the SU1 data set are incorrectly reported in Benedettini et al. (2008); in fact, the sum of optimal solution values for the SU1 data set is 11961, which is much higher than 2453 as reported in Table 3 of Benedettini et al. (2008). Second, Table 3 of Benedettini et al. (2008) does not report the sum of the solution values of ACO-HI+ for all problem instances of the SU-100kb and SU3 data sets; this makes a comparison with our results impossible.
References
Benedettini, S., Roli, A., & Di Gaspero, L. (2008). Two-level ACO for haplotype inference under pure parsimony. In M. Dorigo, M. Birattari, C. Blum, M. Clerc, T. Stützle, & A. Winfield (Eds.), Lecture notes in computer science: Vol. 5217. Ant colony optimization and swarm intelligence (pp. 179–190). The 6th international workshop, ANTS 2008. Berlin/Germany: Springer.
Brown, D. G., & Harrower, I. M. (2006). Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(2), 141–154.
Clark, S. (1990). Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7, 111–122.
Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J., & Lander, E. S. (2001). High-resolution haplotype structure in the Human genome. Nature Genetics, 29(2), 229–232.
Di Gaspero, L., & Roli, A. (2008). Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony. Journal of Algorithms, 63(1–3), 55–69.
Do, D. D., Dinh, Q. H., & Hoang, X. H. (2008). On the pheromone update rules of ant colony optimization approaches for the job shop scheduling problem. In T. D. Bui, T. V. Ho, & Q. T. Ha (Eds.), Lecture notes in computer science: Vol. 5357. The 11th pacific rim international conference on multi-agents: intelligent agents and multi-agent systems (pp. 153–160). Heidelberg: Springer.
Dorigo, M., & Stützle, T. (2004). Ant colony optimization. Cambridge: MIT Press.
Graça, A., Marques-silva, J., Lynce, I., & Oliveira, A. L. (2007). Efficient haplotype inference with pseudo-Boolean optimization. In H. Anai, K. Horimoto, & T. Kutsia (Eds.), Lecture notes in computer science: Vol. 4545. Algebraic biology 2007 (pp. 125–139). Heidelberg: Springer.
Graça, A., Marques-silva, J., Lynce, I., & Oliveira, A. L. (2008). Efficient haplotype inference with combined CP and OR techniques. In L. Perron & M. Trick (Eds.), Lecture notes in computer science: Vol. 5015. The 5th international conference on integration of AI and OR techniques in constraint programming for combinatorial optimization problems, CPAIOR 2008 (pp. 308–312). Heidelberg: Springer.
Graça, A., Lynce, I., Marques-Silva, J., & Oliveira, A. L. (2010). Haplotype inference by pure parsimony: a survey. Journal of Computational Biology, 17(8), 969–992.
Gusfield, D. (2001). Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of Computational Biology, 8(3), 305–323.
Gusfield, D. (2003). Haplotype inference by pure parsimony. In R. Baeza-Yates, E. Chávez, & M. Crochemore (Eds.), Lecture notes in computer science: Vol. 2676. Combinatorial pattern matching (pp. 144–155). 14th Annual symposium on combinatorial pattern matching, CPM 2003. Heidelberg: Springer.
Gusfield, D., & Orzack, S. H. (2005). Haplotype inference. In S. Aluru (Ed.), Handbook of computational molecular biology (pp. 1–28). Boca Raton: CRC Press.
Hutter, F., Hoos, H., & Stützle, T. (2007). Automatic algorithm configuration based on local search. In H. Robert & A. Howe (Eds.), The twenty-second conference on artificial intelligence (AAAI ’07) (pp. 1152–1157). Menlo Park: AAAI Press.
Istrail, S. (2004). Computational methods for SNPs and haplotype inference. Berlin/Heidelberg: Springer.
Li, Z., Zhou, W., Zhang, X., & Chen, L. (2005). A parsimonious tree-grow method for haplotype inference. Bioinformatics, 21(17), 3475–3481.
Lynce, I., & Marques-Silva, J. (2008). Haplotype inference with Boolean satisfiability. International Journal Artificial Intelligence Tools, 17, 355–387.
Marchini, J., Cutler, D., Patterson, N., Stephens, M., Eskin, E., Halperin, E., Lin, S., Qin, Z., Munro, H., Abecasis, G., & Donnelly, P. (2006). A comparison of phasing algorithms for trios and unrelated individuals. The American Journal of Human Genetics, 78(3), 437–450.
Rosa, S., & Guimarães, S. (2010). Insights on haplotype inference on large genotype datasets. In E. Ferreira, S. Miyano, & P. Stadler (Eds.), Lecture notes in computer science: Vol. 6268. Advances in bioinformatics and computational biology (pp. 47–58). 5th Brazilian conference on bioinformatics. Heidelberg: Springer.
Stützle, T., & Hoos, H. (2000). MAX-MIN ant system. Future Generation Computer Systems, 16(8), 889–914.
The International Hapmap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861.
Tininini, L., Bertolazzi, P., Godi, A., & Lancia, G. (2010). CollHaps: a heuristic approach to haplotype inference by parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3), 511–523.
Acknowledgements
We thank Gavin Band, Quang Le, and Khoi Le for comments and careful proof reading. We appreciate support from Graça and Tininini for providing us their programs. We also thank anonymous referees and editors for helpful comments and corrections. This work is partly supported by Vietnam National Science and Technology Fund (Nafosted: 102.01-2011.21).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Do, D.D., Le, S.V. & Hoang, X.H. ACOHAP: an efficient ant colony optimization for the haplotype inference by pure parsimony problem. Swarm Intell 7, 63–77 (2013). https://doi.org/10.1007/s11721-013-0077-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11721-013-0077-8