[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Distributions of surfers’ paths through the World Wide Web: Empirical characterizations

Published: 15 January 1999 Publication History

Abstract

Surfing the World Wide Web (WWW) involves traversing hyperlink connections among documents. The ability to predict surfing patterns could solve many problems facing producers and consumers of WWW content. We analyzed WWW server logs for a WWW site, collected over ten days, to compare different path reconstruction methods and to investigate how past surfing behavior predicts future surfing choices. Since log files do not explicitly contain user paths, various methods have evolved to reconstruct user paths. Session times, number of clicks per visit, and Levenshtein Distance analyses were performed to show the impact of various reconstruction methods. Different methods for measuring surfing patterns were also compared. Markov model approximations were used to model the probability of users choosing links conditional on past surfing paths. Information-theoretic (entropy) measurements suggest that information is gained by using longer paths to estimate the conditional probability of link choice given surf path. The improvements diminish, however, as one increases the length of path beyond one. Information-theoretic (total divergence to the average entropy) measurements suggest that the conditional probabilities of link choice given surf path are more stable over time for shorter paths than longer paths. Direct examination of the accuracy of the conditional probability models in predicting test data also suggests that shorter paths yield more stable models and can be estimated reliably with less data than longer paths.

References

[1]
Arlitt, M. and C. Williamson (1996), "Web Server Workload Characterization: The Search for Invariants," In ACM SIGMETRICS Conference , Philadelphia, PA.
[2]
Brin, S. and L. Page (1998), "The Anatomy of a Large-Scale Hypertextual Web Search Engine," World Wide Web 7 .
[3]
Catledge, L.D. and J.E. Pitkow (1995), "Characterizing Browsing Strategies in the World-Wide Web," Computer Networks and ISDN Systems 26 , 6, 1065-1073.
[4]
Cunha, C. and C.F.B. Joccoud (1997), "Determining WWW User's Next Access and Its Application to Pre-Fetching," In Proceedings of the International Symposium on Computers and Communication , Alexandria, Egypt.
[5]
Huberman, B.A. and L.A. Adamic (1998), Novelty and Social Search in the World Wide Web , Xerox PARC, Palo Alto, CA.
[6]
Huberman, B.A., P. Pirolli, J. Pitkow, and R. Lukose (1998), "Strong Regularities in World Wide Web Surfing," Science 280 , 95-97.
[7]
Kantor, P.B. (1997), A Novel Approach to Information Finding in Networked Environments , Rutgers, Piscataway, NJ.
[8]
Kleinberg, J. (1998), "Authoritative Sources in a Hyperlinked Environment," In Proc. 9th ACM-SIAM Symposium on Discrete Algorithms .
[9]
Levenshtein, V.I. (1966), "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Soviet Phys. Dokl. 10 , 8, 707-710.
[10]
Manley, S., M. Courage, and M. Seltzer (1997), A Self-Scaling and Self-Configuring Benchmark for Web Servers , Harvard College, Boston, MA.
[11]
Padmanabhan, V.N. and J.C. Mogul (1996), "Using Predictive Pre-Fetching to Improve World Wide Web Latency," Comput. Comm. Rev. 26 .
[12]
Pirolli, P. and S.K. Card (in press), "Information Foraging," Psychol. Rev.
[13]
Pirolli, P., J. Pitkow, and R. Rao (1996), "Silk From a Sow's Ear: Extracting Usable Structures From the Web," In Proc. of Conference on Human Factors in Computing Systems, CHI '96 , Vancouver, Canada.
[14]
Pitkow, J.E. (1997), "In Search of Reliable Usage Data on the WWW," In Proc. of The 6th International World Wide Web Conference , Santa Clara, CA.
[15]
Pitkow, J.E. and C.M. Kehoe (1996), "GVU's 6th WWW User Survey," http://www.gvu.gatech.edu/user_surveys.

Cited By

View all
  • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
  • (2022)Learning the Markov Order of Paths in GraphsProceedings of the ACM Web Conference 202210.1145/3485447.3512091(1559-1569)Online publication date: 25-Apr-2022
  • (2022)Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functionsData Mining and Knowledge Discovery10.1007/s10618-021-00808-x36:1(379-413)Online publication date: 1-Jan-2022
  • Show More Cited By
  1. Distributions of surfers’ paths through the World Wide Web: Empirical characterizations

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image World Wide Web
    World Wide Web  Volume 2, Issue 1-2
    1999
    95 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 15 January 1999

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
    • (2022)Learning the Markov Order of Paths in GraphsProceedings of the ACM Web Conference 202210.1145/3485447.3512091(1559-1569)Online publication date: 25-Apr-2022
    • (2022)Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functionsData Mining and Knowledge Discovery10.1007/s10618-021-00808-x36:1(379-413)Online publication date: 1-Jan-2022
    • (2021)PKM3: an optimal Markov model for predicting future navigation sequences of the web surfersPattern Analysis & Applications10.1007/s10044-020-00892-724:1(263-281)Online publication date: 1-Feb-2021
    • (2018)Query for Architecture, Click through MilitaryProceedings of the 10th ACM Conference on Web Science10.1145/3201064.3201092(371-380)Online publication date: 15-May-2018
    • (2018)The Context of College Students' Facebook Use and Academic PerformanceProceedings of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3173574.3173992(1-11)Online publication date: 21-Apr-2018
    • (2017)An Empirical Analysis of Web Navigation Prediction TechniquesJournal of Cases on Information Technology10.4018/jcit.201701010119:1(1-14)Online publication date: 1-Jan-2017
    • (2017)Web navigation prediction using Markov-based modelsInternational Journal of Web Engineering and Technology10.1504/IJWET.2016.08176611:4(310-334)Online publication date: 1-Jan-2017
    • (2017)A Bayesian Method for Comparing Hypotheses About Human TrailsACM Transactions on the Web10.1145/305495011:3(1-29)Online publication date: 23-Jun-2017
    • (2017)Linear Additive Markov ProcessesProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052644(411-419)Online publication date: 3-Apr-2017
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media