An empirical study of query expansion and cluster-based retrieval in language modeling approach
The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper ...
Document reranking by term distribution and maximal marginal relevance for Chinese information retrieval
In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term ...
Cross-document event clustering using knowledge mining from co-reference chains
Unifying terminology usages which captures more term semantics is useful for event clustering. This paper proposes a metric of normalized chain edit distance to mine, incrementally, controlled vocabulary from cross-document coreference chains. ...
Contextual feature selection for text classification
We present a simple approach for the classification of "noisy" documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally ...
Answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology
We propose answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology. A passage expansion technique based on simple anaphora resolution is introduced to retrieve more informative ...
Use of place information for improved event tracking
The main purpose of topic detection and tracking (TDT) is to detect, group, and organize newspaper articles reporting on the same event. Since an event is a reported occurrence at a specific time and place and the unavoidable consequences, TDT can ...
A hybrid generative/discriminative approach to text classification with additional information
This paper presents a classifier for text data samples consisting of main text and additional components, such as Web pages and technical papers. We focus on multiclass and single-labeled text classification problems and design the classifier based on a ...
Efficient implementation of associative classifiers for document classification
In practical text classification tasks, the ability to interpret the classification result is as important as the ability to classify exactly. Associative classifiers have many favorable characteristics such as rapid training, good classification ...
Employing web mining and data fusion to improve weak ad hoc retrieval
When a user issues a reasonable query to a retrieval system and obtains no relevant documents, he or she is bound to feel frustrated. We call these weak queries and retrievals. Improving their effectiveness is an important issue for ad hoc retrieval and ...
A reliable FAQ retrieval system using a query log classification technique based on latent semantic analysis
To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To ...
Supervised categorization of JavaScriptTM using program analysis features
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a ...
Topic distillation via sub-site retrieval
Topic distillation is one of the main information needs when users search the Web. Previous approaches for topic distillation treat single page as the basic searching unit, which has not fully utilized the structure information of the Web. In this paper,...
Color image retrieval technique based on color features and image bitmap
The field of color image retrieval has been an important research area for several decades. For the purpose of effectively retrieving more similar images from the digital image databases, this paper uses the color distributions, the mean value and the ...
A probabilistic music recommender considering user opinions and audio features
A recommender system has an obvious appeal in an environment where the amount of on-line information vastly outstrips any individual's capability to survey. Music recommendation is considered a popular application area. In order to make personalized ...
Integrating textual and visual information for cross-language image retrieval: a trans-media dictionary approach
This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual representations is proposed. First, we mine the relationships between text ...
Semantic categorization of digital home photo using photographic region templates
In this paper, a semantic categorization method in generic home photos is proposed. In recent years, the semantic categorization of image has been a challenge due to the proliferation of digital home photos. Our approach is to detect semantically ...
Object identification and retrieval from efficient image matching. Snap2Tell with the STOIC dataset
Traditional content based image retrieval attempts to retrieve images using syntactic features for a query image. Annotated image banks and Google allow the use of text to retrieve images. In this paper, we studied the task of using the content of an ...
On the reliability of information retrieval metrics based on graded relevance
This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these ...