[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/775152.775192acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Adaptive on-line page importance computation

Published: 20 May 2003 Publication History

Abstract

The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.

References

[1]
S. Abiteboul, G. Cobena, J. Masanes, and G. Sedrati. A first experience in archiving the french web. ECDL, 2002.]]
[2]
S. Abiteboul, M. Preda, and G. Cobena. Computing web page importance without storing the graph of the web (extended abstract). IEEE-CS Data Engineering Bulletin, Volume 25, 2002.]]
[3]
Alta vista. http://www.altavista.com/.]]
[4]
Andrei Z. Broder and al. Graph structure in the web. WWW9/Computer Networks, 2000.]]
[5]
Steve Chien, Cynthia Dwork, Ravi Kumar, Dan Simon, and D. Sivakumar. Link evolution: Analysis and algorithms. In Workshop on Algorithms and Models for the Web Graph (WAW), 2002.]]
[6]
Junghoo Cho, Hector García-Molina, and Lawrence Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1-7):161--172, 1998.]]
[7]
Kai Lai Chung. Markov chains with stationary transition probabilities. Springer, 1967.]]
[8]
Colin Cooper and Alan M. Frieze. A general model of undirected web graphs. In European Symposium on Algorithms, pages 500--511, 2001.]]
[9]
Y. Dong D. Zhang. An efficient algorithm to rank web resources. 9 th International World Wide Web Conference, 2000.]]
[10]
F. R. Gantmacher. Applications of the theory of matrices. In Interscience Publishers, pages 64--79, 1959.]]
[11]
Google. http://www.google.com/.]]
[12]
T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.]]
[13]
H. Garcia-Molina J. Cho. Synchronizing a database to improve freshness. SIGMOD, 2000.]]
[14]
M.R. Henzinger J. Dean. Finding related pages in the world wide web. 8th International World Wide Web Conference, 1999.]]
[15]
A. Broder K. Bharat. Estimating the relative size and overlap of public web search engines. 7th International World Wide Web Conference (WWW7), 1998.]]
[16]
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.]]
[17]
G.V. Meghabghab. Google's web page ranking applied to different topological web graph structures. JASIS 52(9), 2001.]]
[18]
Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. ACM Computing Surveys, 28(1):33--37, 1996.]]
[19]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web, 1998.]]
[20]
M. Preda. Data acquisition for an xml warehouse. DEA thesis Paris 7 University, 2000.]]
[21]
L. Page S. Brin. The anatomy of a large-scale hypertextual web search engine. WWW7 Conference, Computer Networks 30(1--7), 1998.]]
[22]
B. Dom S. Chakrabarti, M. van den Berg. Focused crawling: a new approach to topic-specific web resource discovery. 8th World Wide Web Conference, 1999.]]
[23]
L. Giles S. Lawrence. Accessibility and distribution of information on the web. Nature, 1999.]]
[24]
Search-engine watch. www.searchenginewatch.com/.]]
[25]
S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development, 41(6):711--??, 1997.]]
[26]
C.J. van Rijsbergen. Information retrieval. London, Butterworths, 1979.]]
[27]
Lucie Xyleme. A dynamic warehouse for xml data of the web. IEEE Data Engineering Bulletin, 2001.]]
[28]
Xyleme. www.xyleme.com.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '03: Proceedings of the 12th international conference on World Wide Web
May 2003
772 pages
ISBN:1581136803
DOI:10.1145/775152
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hyperlink
  2. page importance
  3. web graph

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)New ways of solving large Markov chainsQueueing Systems10.1007/s11134-022-09821-3100:3-4(217-219)Online publication date: 28-Apr-2022
  • (2022)Red Light Green Light Method for Solving Large Markov ChainsJournal of Scientific Computing10.1007/s10915-022-01976-893:1Online publication date: 1-Oct-2022
  • (2021)MCS+: An Efficient Algorithm for Crawling the Community Structure in Multiplex NetworksACM Transactions on Knowledge Discovery from Data10.1145/345152716:1(1-32)Online publication date: 20-Jul-2021
  • (2020)Resource Efficient Algorithms for Message Sampling in Online Social Networks2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS52053.2020.9336530(1-8)Online publication date: 14-Dec-2020
  • (2019)Semantics Based Web Ranking Using a Robust Weight SchemeInternational Journal of Web Portals10.4018/IJWP.201901010411:1(56-72)Online publication date: Jan-2019
  • (2019)Collecting Influencers: A Comparative Study of Online Network Crawlers2019 Ivannikov Ispras Open Conference (ISPRAS)10.1109/ISPRAS47671.2019.00012(42-48)Online publication date: Dec-2019
  • (2018)TGNetProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271698(97-106)Online publication date: 17-Oct-2018
  • (2018)Guidelines for Online Network CrawlingProceedings of the 10th ACM Conference on Web Science10.1145/3201064.3201066(57-66)Online publication date: 15-May-2018
  • (2018)Unsupervised Domain Ranking in Large-Scale Web CrawlsACM Transactions on the Web10.1145/318218012:4(1-29)Online publication date: 27-Sep-2018
  • (2018)Syncretic matchingProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3152494.3152508(146-156)Online publication date: 11-Jan-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media