Discovering Frequent Graph Patterns Using Disjoint Paths
Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data mining, the issue is frequent labels and common specific topologies. Here, the structure of the data is just as important as its content. We study ...
Some Effective Techniques for Naive Bayes Text Classification
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the ...
Discovering Frequent Closed Partial Orders from Strings
Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For example, if many customers follow a partial order in their purchases ...
Learning Contextual Dependency Network Models for Link-Based Classification
Links among objects contain rich semantics that can be very helpful in classifying the objects. However, many irrelevant links can be found in real-world link data such as Web pages. Often, these noisy and irrelevant links do not provide useful and ...
On Mining Instance-Centric Classification Rules
Many studies have shown that rule-based classifiers perform well in classifying categorical and sparse high-dimensional databases. However, a fundamental limitation with many rule-based classifiers is that they find the rules by employing various ...
Access Structures for Angular Similarity Queries
Angular similarity measures have been utilized by several database applications to define semantic similarity between various data types such as text documents, time-series, images, and scientific data. Although similarity searches based on Euclidean ...
Design and Performance Evaluation of Broadcast Algorithms for Time-Constrained Data Retrieval
We refer "time-constrained services” to those requests that have to be replied to within a certain client-expected time duration. If the answer cannot reach the client within this expected time, the value of the information may seriously degrade or even ...
Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, ...
Incremental Processing of Continual Range Queries over Moving Objects
Efficient processing of continual range queries over moving objects is critically important in providing location-aware services and applications. A set of continual range queries, each defining the geographical region of interest, can be periodically (...
Decentralized Assignment Reasoning Using Collaborative Local Mediation
The collaborative linear assignment problem (CLAP) is a recent framework being developed to provide an intellectual basis for investigating uncluttered agent-based solutions for a fundamental class of combinatorial assignment (or allocation) ...