Abstract
The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a \(\frac{4}{3}\)-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.
Similar content being viewed by others
References
Bader M (2009) On reversal and transposition medians. Int Sci Index 30 3(6):504–512
Berestycki N, Durrett R (2006) A phase transition in the random transposition random walk. Probab Theory Relat F 136(2):203–233
Biller P, Feijao P, Meidanis J (2013) Rearrangement-based phylogeny using the single-cut-or-join operation. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 10(1):122–134
Biller P, Guéguen L, Tannier E (2015) Moments of genome evolution by double cut-and-join. BMC Bioinform 16(Suppl 14):S7
Bourque G, Pevzner PA (2002) Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res 12(1):26–36
Bryant D (1998) The complexity of the breakpoint median problem. Technical repot CRM-2579, Centre de recherches mathematiques, Universite de Montreal
Caprara A (1999) Formulations and hardness of multiple sorting by reversals. In: Proceedings of the third annual international conference on computational molecular biology (RECOMB), ACM, pp 84–93
Caprara A (2002) Additive bounding, worst-case analysis, and the breakpoint median problem. SIAM J Opt 13(2):508–519
Caprara A (2003) The reversal median problem. INFORMS J Comput 15(1):93–113. doi:10.1287/ijoc.15.1.93.15155
Cosner M, Jansen R, Moret B (2000) An empirical comparison of phylogenetic methods on chloroplast gene order data in campanulaceae. Kluwer, Dordrecht
Delsarte P (1978) Bilinear forms over a finite field, with applications to coding theory. J Comb Theory Ser A 25(3):226–241
Feijao P, Meidanis J (2011) SCJ: a breakpoint-like distance that simplifies several rearrangement problems. IEEE IEEE/ACM Trans Comput Biol Bioinform 8:1318–1329. doi:10.1109/TCBB.2011.34
Feijao P, Meidanis J (2012) Extending the algebraic formalism for genome rearrangements to include linear chromosomes. In: de Souto M, Kann M (eds) BSB 2012, LNBI 7409. Springer, Berlin, pp 13–24
Fertin G, Labarre A, Rusu I, Tannier E, Vialette S (2009) Combinatorics of genome rearrangements. MIT Press, Cambridge
Gao N (2014) Using genetic algorithm to solve median problem and phylogenetic inference. PhD thesis, University of South Carolina
Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure, dynamics, and function using networkX. In: Proceedings of the 7th python in science conference (SciPy2008), Pasadena, CA USA, pp 11–15
Haghighi M, Sankoff D (2012) Medians seek the corners, and other conjectures. BMC bioinformatics 13(Suppl 19):S5
Jamshidpey A, Sankoff D (2013) Phase change for the accuracy of the median value in estimating divergence time. BMC bioinformatics 14(Suppl 15):S7
Meyer CD (2000) Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, Philadelphia
Moret BM, Wang LS, Warnow T, Wyman SK (2001) New approaches for reconstructing phylogenies from gene order data. Bioinformatics 17(suppl 1):S165–S173
Pe’er I, Shamir R (1998) The median problems for breakpoints are NP-Complete. In: Electronic colloquium on computational complexity, vol 71. Hasso Plattner Institute, Potsdam
Pe’er I, Shamir R (2000) Approximation algorithms for the median problem in the breakpoint model. Comparative genomics. Springer, Berlin, pp 225–241
Rajan V, Xu AW, Lin Y, Swenson KM, Moret BM (2010) Heuristics for the inversion median problem. BMC bioinformatics 11(Suppl 1):S30
Sankoff D, Blanchette M (1998) Multiple genome rearrangement and breakpoint phylogeny. J Comput Biol 5(3):555–570
Sankoff D, Sundaram G, Kececioglu J (1996) Steiner points in the space of genome rearrangements. Int J Found Comput Sci 7(01):1–9
Tannier E, Zheng C, Sankoff D (2009) Multichromosomal median and halving problems under different genomic distances. BMC Bioinform 10:120. doi:10.1186/1471-2105-10-120
Xu AW (2009a) DCJ median problems on linear multichromosomal genomes: graph representation and fast exact solutions. Comp genomics. Springer, Berlin, pp 70–83
Xu AW (2009b) A fast and exact algorithm for the median of three problem: a graph decomposition approach. J Comput Biol 16(10):1369–1381
Yancopoulos S, Attie O, Friedberg R (2005) Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16):3340–3346. doi:10.1093/bioinformatics/bti535
Zanetti JPP, Biller P, Meidanis J (2013) On the matrix median problem. In: Darling A, Stoye J (eds) Algorithms in bioinformatics., Lecture notes in computer scienceSpringer, Berlin, pp 230–243. doi:10.1007/978-3-642-40453-5_18
Zhang M, Arndt W, Tang J (2009) An exact solver for the DCJ median problem. In: Pacific symposium on biocomputing, vol 14, pp 138–149
Acknowledgments
We thank funding agency FAPESP (Brazil) for financial support (Grants 2012/13865-7 and 2012/14104-0).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zanetti, J.P.P., Biller, P. & Meidanis, J. Median Approximations for Genomes Modeled as Matrices. Bull Math Biol 78, 786–814 (2016). https://doi.org/10.1007/s11538-016-0162-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-016-0162-4