Mining protein family specific residue packing patterns from protein structure graphs
Proceedings of the eighth annual international conference on Research in …, 2004•dl.acm.org
Finding recurring residue packing patterns, or spatial motifs, that characterize protein
structural families is an important problem in bioinformatics. We apply a novel frequent
subgraph mining algorithm to three graph representations of protein three-dimensional (3D)
structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are
connected by edges using three approaches: first, based on simple distance threshold
between contact residues; second using the Delaunay tessellation from computational …
structural families is an important problem in bioinformatics. We apply a novel frequent
subgraph mining algorithm to three graph representations of protein three-dimensional (3D)
structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are
connected by edges using three approaches: first, based on simple distance threshold
between contact residues; second using the Delaunay tessellation from computational …
Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. We apply a novel frequent subgraph mining algorithm to three graph representations of protein three-dimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are connected by edges using three approaches: first, based on simple distance threshold between contact residues; second using the Delaunay tessellation from computational geometry, and third using the recently developed almost-Delaunay tessellation approach.Applying a frequent subgraph mining algorithm to a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, we typically identify several hundred common subgraphs equivalent to common packing motifs found in the majority of proteins in the family. We also use the counts of motifs extracted from proteins in two different SCOP families as input variables in a binary classification experiment. The resulting models are capable of predicting the protein family association with the accuracy exceeding 90 percent. Our results indicate that graphs based on both almost-Delaunay and Delaunay tessellations are sparser than the contact distance graphs; yet they are robust and efficient for mining protein spatial motif.
ACM Digital Library