[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3388440.3412415acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Joint Grid Discretization for Biological Pattern Discovery

Published: 10 November 2020 Publication History

Abstract

The complexity, dynamics, and scale of data acquired by modern biotechnology increasingly favor model-free computational methods that make minimal assumptions about underlying biological mechanisms. For example, single-cell transcriptome and proteome data have a throughput several orders more than bulk methods. Many model-free statistical methods for pattern discovery such as mutual information and chi-squared tests, however, require discrete data. Most discretization methods minimize squared errors for each variable independently, not necessarily retaining joint patterns. To address this issue, we present a joint grid discretization algorithm that preserves clusters in the original data. We evaluated this algorithm on simulated data to show its advantage over other methods in maintaining clusters as measured by the adjusted Rand index. We also show it promotes global functional patterns over independent patterns. On single-cell proteome and transcriptome of leukemia and healthy blood, joint grid discretization captured known protein-to-RNA regulatory relationships, while revealing previously unknown interactions. As such, the joint grid discretization is applicable as a data transformation step in associative, functional, and causal inference of molecular interactions fundamental to systems biology. The developed software is publicly available at https://cran.r-project.org/package=GridOnClusters

References

[1]
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 2005. Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery 11, 1 (2005), 5--33.
[2]
Kailash Budhathoki and Jilles Vreeken. 2017. MDL for causal inference on discrete data. In 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 751--756.
[3]
Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang, and Zhifeng Hao. 2018. Causal discovery from discrete data using hidden compact representation. In Advances in Neural Information Processing Systems. 2666--2674.
[4]
Ethan G Cerami, Benjamin E Gross, Emek Demir, Igor Rodchenkov, Özgün Babur, Nadia Anwar, Nikolaus Schultz, Gary D Bader, and Chris Sander. 2011. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Research 39, suppl 1 (2011), D685--D690.
[5]
Jinhuan Dou, Adnan Khan, Muhammad Zahoor Khan, Siyuan Mi, Yajing Wang, Ying Yu, and Yachun Wang. 2020. Heat Stress Impairs the Physiological Responses and Regulates Genes Coding for Extracellular Exosomal Proteins in Rat. Genes 11, 3 (2020), 306.
[6]
E Dvořáková, S Kumar, J Kléma, F Źelezný, K Drbal, and M Song. 2019. Evaluating Model-free Directional Dependency Methods on Single-cell RNA Sequencing Data with Severe Dropout. In Proceedings of International Conference on Bioinformatics Research and Applications. Seoul, South Korea, 55--62.
[7]
Jaya Gautam, Suhrid Banskota, Hyunji Lee, Yu-Jeong Lee, Yong Hyun Jeon, Jung-Ae Kim, and Byeong-Seon Jeong. 2018. Down-regulation of cathepsin S and matrix metalloproteinase-9 via Src, a non-receptor tyrosine kinase, suppresses triple-negative breast cancer growth and metastasis. Experimental & Molecular Medicine 50, 9 (2018), 1--14.
[8]
Sam JP Gobin, Paula Biesta, and Peter J Van den Elsen. 2003. Regulation of human ß2-microglobulin transactivation in hematopoietic cells. Blood, The Journal of the American Society of Hematology 101, 8 (2003), 3058--3064.
[9]
Aparna Godavarthy, Ryan Kelly, John Jimah, Miguel Beckford, Tiffany Caza, David Fernandez, Nick Huang, Manuel Duarte, Joshua Lewis, Hind J Fadel, et al. 2020. Lupus-associated endogenous retroviral LTR polymorphism and epigenetic imprinting promote HRES-1/Rab4 expression and mTOR activation. JCI Insight 5, 1 (2020).
[10]
Jeffrey M Granja, Sandy Klemm, Lisa M McGinnis, Arwa S Kathiria, Anja Mezger, M Ryan Corces, Benjamin Parks, Eric Gars, Michaela Liedtke, Grace XY Zheng, Howard Y Chang, Ravindra Majeti, and William J Greenleaf. 2019. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature Biotechnology 37, 12 (2019), 1458--1465.
[11]
Robert Gray. 1984. Vector quantization. IEEE ASSP Magazine 1, 2 (1984), 4--29.
[12]
Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. 2011. The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Datasets. Journal of Machine Learning Research 12 (2011), 1977--1981. http://jmlr.csail.mit.edu/papers/v12/hahsler11a.html
[13]
Sajal Kumar, Hua Zhong, Ruby Sharma, Yiyi Li, and Mingzhou Song. 2018. Scrutinizing functional interaction networks from RNA-binding proteins to their targets in cancer. In IEEE International Conference on Bioinformatics and Biomedicine. Madrid, Spain, 185--190. https://doi.org/10.1109/BIBM.2018.8621502
[14]
David J C MacKay. 2003. Information Theory, Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
[15]
David G. Messerschmitt. 1971. Quantizing for maximum output entropy (Corresp.). IEEE Transactions on Information Theory 17, 5 (Sep. 1971), 612--612. https://doi.org/10.1109/TIT.1971.1054681
[16]
Hien H Nguyen, Susan C Tilton, Christopher J Kemp, and Mingzhou Song. 2017. Nonmonotonic Pathway Gene Expression Analysis Reveals Oncogenic Role of p27/Kip1 at Intermediate Dose. Cancer Informatics 16 (11 2017), 1176935117740132. https://doi.org/10.1177/1176935117740132
[17]
Hien H. Nguyen, Hua Zhong, and Mingzhou Song. 2020. Optimality, Accuracy, and Efficiency of an Exact Functional Test. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. 2683--2689. https://doi.org/10.24963/ijcai.2020/372 Main track.
[18]
SD Palmer and M Song. 2009. Quantization of multivariate continuous random variables by sequential dynamic programming. In Proceedings of the CAHSI Annual Meeting. 43--46.
[19]
Karl Pearson. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50, 302 (1900), 157--175.
[20]
Dominic SC Raj, Vallabh O Shah, Mehdi Rambod, Csaba P Kovesdy, and Kamyar Kalantar-Zadeh. 2009. Association of soluble endotoxin receptor CD14 and mortality among patients undergoing hemodialysis. American Journal of Kidney Diseases 54, 6 (2009), 1062--1071.
[21]
William M. Rand. 1971. Objective Criteria for the Evaluation of Clustering Methods. J. Amer. Statist. Assoc. 66, 336 (1971), 846--850. http://www.jstor.org/stable/2284239
[22]
Michael Rebhan, Vered Chalifa-Caspi, Jaime Prilusky, and Doron Lancet. 1998. GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 8 (1998), 656--664.
[23]
Joe Song, Hua Zhong, and Haizhou Wang. 2020. Ckmeans.1d.dp: Optimal, Fast, and Reproducible Univariate Clustering. R package version 4.3.3. https://CRAN.R-project.org/package=Ckmeans.1d.dp.
[24]
Mingzhou Song, Robert M Haralick, and Stéphane Boissinot. 2010. Efficient and exact maximum likelihood quantisation of genomic features using dynamic programming. International Journal of Data Mining and Bioinformatics 4, 2 (2010), 123--141. https://doi.org/10.1504/ijdmb.2010.032167
[25]
Mingzhou Song and Hua Zhong. 2020. Efficient weighted univariate clustering maps outstanding dysregulated genomic zones in human cancers. Bioinformatics (Jul 2020). https://doi.org/10.1093/bioinformatics/btaa613 [Published online ahead of print, 2020 Jul 3].
[26]
Marlon Stoeckius, Christoph Hafemeister, William Stephenson, Brian Houck-Loomis, Pratip K Chattopadhyay, Harold Swerdlow, Rahul Satija, and Peter Smibert. 2017. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods 14, 9 (2017), 865.
[27]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. J. Mach. Learn. Res. 11 (Dec. 2010), 2837--2854.
[28]
Haizhou Wang, Ming Leung, Angela Wandinger-Ness, Laurie G Hudson, and Mingzhou Song. 2016. Constrained inference of protein interaction networks for invadopodium formation in cancer. IET Systems Biology 10, 2 (04 2016), 76--85. https://doi.org/10.1049/iet-syb.2015.0009
[29]
Haizhou Wang and Mingzhou Song. 2011. Ckmeans.1d.dp: Optimal k-means clustering in one dimension by dynamic programming. The R Journal 3, 2 (2011), 29--33. https://doi.org/10.32614/RJ-2011--015
[30]
Yang Wang, Chong Liu, Ying Fang, Xiaoli Liu, Wentao Li, Shuqing Liu, Yingyu Liu, Yuxi Liu, Catherine Charreyre, Jean-Christophe Audonnet, et al. 2012. Transcription analysis on response of porcine alveolar macrophages to Haemophilus parasuis. BMC Genomics 13, 1 (2012), 68.
[31]
Ivan Zanoni, Renato Ostuni, Giusy Capuano, Maddalena Collini, Michele Caccia, Antonella Ellena Ronchi, Marcella Rocchetti, Francesca Mingozzi, Maria Foti, Giuseppe Chirico, et al. 2009. CD14 regulates the dendritic cell life cycle after LPS exposure through NFAT activation. Nature 460, 7252 (2009), 264--268.
[32]
Yang Zhang and Mingzhou Song. 2013. Deciphering interactions in causal networks without parametric assumptions. arXiv Molecular Networks (2013), 1311.2707. arXiv:1311.2707 http://arxiv.org/abs/1311.2707
[33]
Grace XY Zheng, Jessica M Terry, Phillip Belgrader, Paul Ryvkin, Zachary W Bent, Ryan Wilson, Solongo B Ziraldo, Tobias D Wheeler, Geoff P McDermott, Junjie Zhu, et al. 2017. Massively parallel digital transcriptional profiling of single cells. Nature Communications 8, 1 (2017), 1--12.
[34]
Hua Zhong and Mingzhou Song. 2019. Directional association test reveals high-quality putative cancer driver biomarkers including noncoding RNAs. BMC Med Genomics 12, Suppl 7 (2019), 129. https://doi.org/10.1186/s12920-019-0565-9
[35]
Hua Zhong and Mingzhou Song. 2019. A fast exact functional test for directional association and cancer biology applications. IEEE/ACM Trans Comput Biol Bioinform 16, 3 (2019), 818--826. https://doi.org/10.1109/TCBB.2018.2809743

Cited By

View all
  • (2022)Detecting genetic epistasis by differential departure from independenceMolecular Genetics and Genomics10.1007/s00438-022-01893-3297:4(911-924)Online publication date: 23-May-2022
  • (2020)GridOnClusters: Cluster-Preserving Multivariate Joint Grid DiscretizationCRAN: Contributed Packages10.32614/CRAN.package.GridOnClustersOnline publication date: 20-Mar-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
September 2020
193 pages
ISBN:9781450379649
DOI:10.1145/3388440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. Functional chisquared statistics
  3. Grid discretization
  4. Pattern discovery
  5. Single-cell sequencing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

BCB '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)5
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Detecting genetic epistasis by differential departure from independenceMolecular Genetics and Genomics10.1007/s00438-022-01893-3297:4(911-924)Online publication date: 23-May-2022
  • (2020)GridOnClusters: Cluster-Preserving Multivariate Joint Grid DiscretizationCRAN: Contributed Packages10.32614/CRAN.package.GridOnClustersOnline publication date: 20-Mar-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media