Building a distributed full-text index for the web
We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction ...
Scaling question answering to the web
The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as “who was the first American in space?” or “what is the second tallest mountain in the world?” Yet today's most advanced ...
WebQuilt: A proxy-based approach to remote web usability testing
WebQuilt is a web logging and visualization system that helps web design teams run usability tests (both local and remote) and analyze the collected data. Logging is done through a proxy, overcoming many of the problems with server-side and client-side ...
On the design of a learning crawler for topical resource discovery
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of information are available on practically every possible topic. In such cases, it is valuable to perform topical resource discovery effectively. Consequently, ...
A highly scalable and effective method for metasearch
A metasearch engine is a system that supports unified access to multiple local search engines. Database selection is one of the main challenges in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small ...