[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-642-00958-7_55guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Study of the Impact of Index Updates on Distributed Query Processing for Web Search

Published: 18 April 2009 Publication History

Abstract

Query processing in Web search engines today is mainly performed within a single site or data center, which is required to scale as the Web grows and users require fast answers to their queries. Constraints in the size and cost of data centers, however, may limit the scalability of search engines. Multi-site search engines that perform distributed query processing represent one way to overcome such constraints. Each site processes locally as many queries as possible, keeping latency low without contacting remote sites. Forwarding a query to remote sites depends on the document collection of remote sites. Multi-site search engines pose several new challenges. When a site updates its index, it has to inform other sites. The updates, however, are not instantaneous due to the volume of data exchanged or possible network failures. During the period of time that there are index inconsistencies across sites, queries may not be forwarded optimally. In this work, we investigate the impact of index inconsistencies on a distributed query processing algorithm, when there are index updates, and we observe that delayed index information propagation reduces the effectiveness of query processing, because queries are less likely to be routed optimally.

References

[1]
AllTheWeb search engine, http://www.alltheweb.com
[2]
Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges in distributed information retrieval. In: 23rd ICDE (2007)
[3]
Baeza-Yates, R., Gionis, A., Junqueira, F., Plachouras, V., Telloli, L.: On the feasiblity of multi-site Web search engines (2008) (submitted for reviewing)
[4]
Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The google cluster architecture. IEEE Micro. 23(2), 22-28 (2003)
[5]
Callan, J.: Distributed Information Retrieval. In: Croft, W.B. (ed.) Advances in Information Retrieval, pp. 127-150. Kluwer Academic Publishers, Dordrecht (2000)
[6]
Douglis, F., Feldmann, A., Krishnamurthy, B., Mogul, J.C.: Rate of change and other metrics: a live study of the world wide web. In: 1st USITS (1997)
[7]
D'Souza, D., Thom, J.A., Zobel, J.: Collection selection for managed distributed document databases. Inf. Process. Manage. 40(3), 527-546 (2004)
[8]
Fetterly, D., Manasse, M., Najork, M., Wiener, J.L.: A large-scale study of the evolution of web pages. Softw. Pract. Exper. 34(2), 213-237 (2004)
[9]
Ntoulas, A., Cho, J.: Pruning policies for two-tiered inverted index with correctness guarantee. In: 30th SIGIR Conference, pp. 191-198 (2007)
[10]
Ntoulas, A., Cho, J., Olston, C.: What's new on the web?: the evolution of the web from a search engine perspective. In: 13th WWW, pp. 1-12 (2004)
[11]
Puppin, D., Silvestri, F., Perego, R., Baeza-Yates, R.: Load-balancing and caching for collection selection architectures. In: 2nd INFOSCALE, pp. 1-10 (2007)
[12]
Risvik, K.M., Aasheim, Y., Lidal, M.: Multi-tier architecture for web search engines. In: 1st LAWEB, pp. 132-143 (2003)
[13]
Robertson, S.E.,Walker, S., Jones, S., Hancock-beaulieu, M.M., Gatford, M.: Okapi at trec-3. In: Text REtrieval Conference (TREC-3), pp. 109-126 (1995)
[14]
Yalag, P., Nath, S., Yu, H., Gibbons, P.B., Seshan, S.: Beyond availability: Towards a deeper understanding of machine failure characteristics in large distributed systems. In: USENIX WORLDS (2004)

Cited By

View all
  • (2011)Document assignment in multi-site search enginesProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935907(575-584)Online publication date: 9-Feb-2011
  • (2009)Quantifying performance and quality gains in distributed web search enginesProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval10.1145/1571941.1572013(411-418)Online publication date: 19-Jul-2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ECIR '09: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
April 2009
817 pages
ISBN:9783642009570
  • Editors:
  • Mohand Boughanem,
  • Catherine Berrut,
  • Josiane Mothe,
  • Chantal Soule-Dupuy

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 18 April 2009

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Document assignment in multi-site search enginesProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935907(575-584)Online publication date: 9-Feb-2011
  • (2009)Quantifying performance and quality gains in distributed web search enginesProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval10.1145/1571941.1572013(411-418)Online publication date: 19-Jul-2009

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media