GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, ...
Molecular Function Prediction Using Neighborhood Features
The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the ...
A Metric on the Space of Reduced Phylogenetic Networks
Phylogenetic networks are leaf-labeled, rooted, acyclic, and directed graphs that are used to model reticulate evolutionary histories. Several measures for quantifying the topological dissimilarity between two phylogenetic networks have been devised, ...
Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets
A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster ...
Automated Isolation of Translational Efficiency Bias That Resists the Confounding Effect of GC(AT)-Content
Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, ...
Identification of Full and Partial Class Relevant Genes
Multiclass cancer classification on microarray data has provided the feasibility of cancer diagnosis across all of the common malignancies in parallel. Using multiclass cancer feature selection approaches, it is now possible to identify genes relevant ...
Model Composition for Macromolecular Regulatory Networks
Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Large models are usually built up from smaller models, representing subsets of reactions within the larger network. To assist modelers ...
Reassortment Networks for Investigating the Evolution of Segmented Viruses
Many viruses of interest, such as influenza A, have distinct segments in their genome. The evolution of these viruses involves mutation and reassortment, where segments are interchanged between viruses that coinfect a host. Phylogenetic trees can be ...
Signal Quality Measurements for cDNA Microarray Data
Concerns about the reliability of expression data from microarrays inspire ongoing research into measurement error in these experiments. Error arises at both the technical level within the laboratory and the experimental level. In this paper, we will ...
Approximation Algorithms for Predicting RNA Secondary Structures with Arbitrary Pseudoknots
We study three closely related problems motivated by the prediction of RNA secondary structures with arbitrary pseudoknots: the problem 2-Interval Pattern proposed by Vialette [CHECK END OF SENTENCE], the problem Maximum Base Pair Stackings proposed by ...
Fast Hinge Detection Algorithms for Flexible Protein Structures
Analysis of conformational changes is one of the keys to the understanding of protein functions and interactions. For the analysis, we often compare two protein structures, taking flexible regions like hinge regions into consideration. The Root Mean ...
Sorting Genomes by Reciprocal Translocations, Insertions, and Deletions
The problem of sorting by reciprocal translocations (abbreviated as SBT) arises from the field of comparative genomics, which is to find a shortest sequence of reciprocal translocations that transforms one genome \Pi into another genome \Gamma, with the ...
Linear Separability of Gene Expression Data Sets
We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes ...