Abstract
Multiple RNA interaction can be modeled as a problem in combinatorial optimization, where the “optimal” structure is driven by an energy-minimization-like algorithm. However, the actual structure may not be optimal in this computational sense. Moreover, it is not necessarily unique. Therefore, alternative sub-optimal solutions are needed to cover the biological ground.
We extend a recent combinatorial formulation for the Multiple RNA Interaction problem with approximation algorithms to handle more elaborate interaction patterns, which when combined with Gibbs sampling and MCMC (Markov Chain Monte Carlo), can efficiently generate a reasonable number of optimal and sub-optimal solutions. When viable structures are far from an optimal solution, exploring dependence among different parts of the interaction can increase their score and boost their candidacy for the sampling algorithm. By clustering the solutions, we identify few representatives that are distinct enough to suggest possible alternative structures.
Supported by a Research Starter Award in Informatics from the PhRMA Foundation www.phrmafoundation.org.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Circular interactions with odd cycles (where the interaction graph G is not restricted to being bipartite) can be achieved by allowing inverted windows in which the interaction given by \(w(l_1,l_2,i,j,u,v)\) occurs between bases \([i-u+1,i]\) on RNA \(l_1\) and bases \([j,j-v+1]\) (inverted sequence) on RNA \(l_2\), but we do not explore this direction here.
- 2.
For ease of notation, we are thinking of \(A_l\) as a sequence and a set at the same time.
- 3.
We use “first representative” because many solutions can represent the same candidate; for instance, a window can split in different ways, but we still refer to it as a window split.
- 4.
Since a single non-symmetric window may also represent a split, our percentage hit for window splits is lower than it should be with the no filtering option.
References
Ahmed, S.A., Mneimneh, S.: Multiple RNA interaction with sub-optimal solutions. In: Basu, M., Pan, Y., Wang, J. (eds.) ISBRA 2014. LNCS, vol. 8492, pp. 149–162. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08171-7_14
Ahmed, S.A., Mneimneh, S., Greenbaum, N.L.: A combinatorial approach for multiple RNA interaction: formulations, approximations, and heuristics. In: Du, D.-Z., Zhang, G. (eds.) COCOON 2013. LNCS, vol. 7936, pp. 421–433. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38768-5_38
Alkan, C., Karakoc, E., Nadeau, J.H., Sahinalp, S.C., Zhang, K.: RNA-RNA interaction prediction and antisense RNA target search. J. Comput. Biol. 13(2), 267–282 (2006)
Alkan, F., et al.: RIsearch2: suffix array-based large-scale prediction of RNA-RNA interactions and siRNA off-targets. Nucleic Acids Res. 45(8), e60 (2017)
Andronescu, M., Zhang, Z.C., Condon, A.: Secondary structure prediction of interacting RNA molecules. J. Mol. Biol. 345(5), 987–1001 (2005)
Antonov, I., Marakhonov, A., Zamkova, M., Medvedeva, Y.: ASSA: fast identification of statistically significant interactions between long RNAs. J. Bioinform. Comput. Biol. 16(01), 1840001 (2018)
Cao, S., Chen, S.-J.: Free energy landscapes of RNA/RNA complexes: with applications to snRNA complexes in spliceosomes. J. Mol. Biol. 357(1), 292–312 (2006)
Chekuri, C., Kumar, A.: Maximum coverage problem with group budget constraints and applications. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) APPROX/RANDOM - 2004. LNCS, vol. 3122, pp. 72–83. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27821-4_7
Chen, H.-L., Condon, A., Jabbari, H.: An \(o(n^5)\) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids. J. Comput. Biol. 16(6), 803–815 (2009)
Chitsaz, H., Backofen, R., Sahinalp, S.C.: biRNA: fast RNA-RNA binding sites prediction. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 25–36. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04241-6_3
Chitsaz, H., Salari, R., Sahinalp, S.C., Backofen, R.: A partition function algorithm for interacting nucleic acid strands. Bioinformatics 25(12), i365–i373 (2009)
Ding, Y., Lawrence, C.E.: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. 31(24), 7280–7301 (2003)
Dirks, R.M., Bois, J.S., Schaeffer, J.M., Winfree, E., Pierce, N.A.: Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev. 49(1), 65–88 (2007)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Chap. 11. Cambridge University Press, Cambridge (1998)
Fukunaga, T., Hamada, M.: RIblast: an ultrafast RNA-RNA interaction prediction system based on a seed-and-extension approach. Bioinformatics 33(17), 2666–2674 (2017)
Gallager, R.G.: Discrete Stochastic Processes, Chap. 4. SECS, vol. 321. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-2329-1
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Huang, F.W., Qin, J., Reidys, C.M., Stadler, P.F.: Partition function and base pairing probabilities for RNA-RNA interaction prediction. Bioinformatics 25(20), 2646–2654 (2009)
Huang, F.W., Qin, J., Reidys, C.M., Stadler, P.F.: Target prediction and a statistical sampling algorithm for RNA-RNA interaction. Bioinformatics 26(2), 175–181 (2010)
Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr, Corbaz (1901)
Jankowsky, E., Schwenzer, B.: Oligonucleotide facilitators may inhibit or activate a hammerhead ribozyme. Nucleic Acids Res. 24(3), 423–429 (1996)
Jiang, T., Li, M.: On the approximation of shortest common supersequences and longest common subsequences. SIAM J. Comput. 24(5), 1122–1139 (1995)
Kolb, F.A., et al.: Progression of a loop-loop complex to a four-way junction is crucial for the activity of a regulatory antisense RNA. EMBO J. 19(21), 5905–5915 (2000)
Kolb, F.A., et al.: An unusual structure formed by antisense-target RNA binding involves an extended kissing complex with a four-way junction and a side-by-side helical alignment. RNA 6(3), 311–324 (2000)
Li, A.X., Marz, M., Qin, J., Reidys, C.M.: RNA-RNA interaction prediction based on multiple sequence alignments. Bioinformatics 27(4), 456–463 (2011)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Metzler, D., Nebel, M.E.: Predicting RNA secondary structures with pseudoknots by MCMC sampling. J. Math. Biol. 56(1–2), 161–181 (2008)
Meyer, I.M.: Predicting novel RNA-RNA interactions. Curr. Opin. Struct. Biol. 18(3), 387–393 (2008)
Mneimneh, S.: On the approximation of optimal structures for RNA-RNA interaction. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 6(4), 682–688 (2009)
Mneimneh, S., Ahmed, S.A.: Multiple RNA interaction: beyond two. IEEE Trans. Nanobiosci. 14(2), 210–219 (2015)
Mneimneh, S., Ahmed, S.A.: Gibbs/MCMC sampling for multiple RNA interaction with sub-optimal solutions. In: Botón-Fernández, M., Martín-Vide, C., Santander-Jiménez, S., Vega-Rodríguez, M. (eds.) AlCoB 2016. LNCS, vol. 9702, pp. 78–90. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-38827-4_7
Mückstein, U., Tafer, H., Hackermüller, J., Bernhart, S.H., Stadler, P.F., Hofacker, I.L.: Thermodynamics of RNA-RNA binding. Bioinformatics 22(10), 1177–1182 (2006)
Newby, M.I., Greenbaum, N.L.: A conserved pseudouridine modification in eukaryotic U2 snRNA induces a change in branch-site architecture. RNA 7(06), 833–845 (2001)
Pervouchine, D.D.: IRIS: intermolecular RNA interaction search. Genome Inform. Ser. 15(2), 92 (2004)
Pinard, R., et al.: Functional involvement of G8 in the hairpin ribozyme cleavage mechanism. EMBO J. 20(22), 6434–6442 (2001)
Salari, R., Backofen, R., Sahinalp, S.C.: Fast prediction of RNA-RNA interaction. Algorithms Mol. Biol. 5(5), 5 (2010)
Sashital, D.G., Cornilescu, G., Butcher, S.E.: U2–U6 RNA folding reveals a group II intron-like domain and a four-helix junction. Nat. Struct. Mol. Biol. 11(12), 1237–1242 (2004)
Schmidt, C., Welz, R., Müller, S.: RNA double cleavage by a hairpin-derived twin ribozyme. Nucleic Acids Res. 28(4), 886–894 (2000)
Sun, J.-S., Manley, J.L.: A novel U2–U6 snRNA structure is necessary for mammalian mRNA splicing. Genes Dev. 9(7), 843–854 (1995)
Tong, W., Goebel, R., Liu, T., Lin, G.: Approximating the maximum multiple RNA interaction problem. Theoret. Comput. Sci. 556, 63–70 (2014)
Wei, D., Alpert, L.V., Lawrence, C.E.: RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences. Bioinformatics 27(18), 2486–2493 (2011)
Zhao, C., et al.: Conformational heterogeneity of the protein-free human spliceosomal U2–U6 snRNA complex. RNA 19(4), 561–573 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Given a solution S, define |S| as the number of windows in S, and let
be the |S| windows in the order defined by the partial order relation follow (from Sect. 2) extended to a total order in a deterministic way.
Each of these windows, say \(w(l, l', i, j, u, v)\), defines the two intervals, \([i-u+1,i]\) in level l and \([j-v+1,j]\) in level \(l'\). Consider the set of interaction intervals \(I(S)=\sum _l I_l(S)\) to be ordered accordingly. Therefore,
is an ordered set of 2|S| intervals. Let \(L(S)=\{(l_1,l_1'),\ldots ,(l_{|S|},l_{|S|}')\}\) be an ordered set of |S| pairs, where \((l_i,l_i')\) is the pair defining the \(i^{\text {th}}\) window. Therefore, L(S) means that we have the following set of pairwise interactions (not necessarily unique in terms of RNAs): RNA \(l_1\) with RNA \(l_1'\), RNA \(l_2\) with RNA \(l_2'\), \(\ldots \), RNA \(l_{|S|}\) with RNA \(l_{|S|}'\). Two solutions that do not agree on this set are considered completely dissimilar; otherwise, their distance is given by the amount of overlap in their interaction intervals (as in the Jaccard metric [21]), hence the following definition of distance:
Given two solutions \(S_1\) with \(I(S_1)=\{I_1,I_2,\ldots \}\) and \(S_2\) with \(I(S_2)=\{T_1,T_2,\ldots \}\), the distance between \(S_1\) and \(S_2\) is
where \(\cap \) and \(\cup \) represent the standard intersection and union operations on sets respectively, and intervals are treated as sets of integers.
Recall that a symmetric window \(w(l_1,l_2,i,j,u,v)\) satisfies \(u=v\) (and typically consists of u base pairs). When applying the distance function, a non-symmetric window is first converted to consecutive symmetric windows by maximizing the number of base pairs (but otherwise is still reported as a non-symmetric window in a given solution).
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ahmed, S.A., Farhat, S., Mneimneh, S. (2018). Making Multiple RNA Interaction Practical. In: Kim, D., Uma, R., Zelikovsky, A. (eds) Combinatorial Optimization and Applications. COCOA 2018. Lecture Notes in Computer Science(), vol 11346. Springer, Cham. https://doi.org/10.1007/978-3-030-04651-4_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-04651-4_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04650-7
Online ISBN: 978-3-030-04651-4
eBook Packages: Computer ScienceComputer Science (R0)