[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Bayesian Method for Comparing Hypotheses About Human Trails

Published: 23 June 2017 Publication History

Abstract

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful, for example, for improving underlying network structures, predicting user clicks, or enhancing recommendations. In this work, we present a method called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our method utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to calculate the evidence of the data under them. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method, and to compare the relative plausibility of hypotheses, we employ Bayes factors. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including Web site navigation, business reviews, and online music played. Our work expands the repertoire of methods available for studying human trails.

References

[1]
Dimitris Achlioptas. 2001. Database-friendly random projections. In Proceedings of the Symposium on Principles of Database Systems. ACM, New York, NY, 274--281.
[2]
Jisun An, Daniele Quercia, and Jon Crowcroft. 2014. Partisan sharing: Facebook evidence and societal consequences. In Proceedings of the Conference on Online Social Networks. ACM, New York, NY, 13--24.
[3]
Claudio Baccigalupo and Enric Plaza. 2006. Case-based sequential ordering of songs for playlist recommendation. In Proceedings of the European Conference on Case-Based Reasoning. 286--300.
[4]
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.
[5]
Martin Becker, Kathrin Borchert, Matthias Hirth, Hauke Mewes, Andreas Hotho, and Phuoc Tran-Gia. 2015. MicroTrails: Comparing hypotheses about task selection on a crowdsourcing platform. In Proceedings of the International Conference on Knowledge Technologies and Data-Driven Business. ACM, New York, NY, 10.
[6]
Martin Becker, Hauke Mewes, Andreas Hotho, Dimitar Dimitrov, Florian Lemmerich, and Markus Strohmaier. 2016. SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails. Available at http://dmir.org/sparktrails/.
[7]
Martin Becker, Philipp Singer, Florian Lemmerich, Andreas Hotho, Denis Helic, and Markus Strohmaier. 2015a. Photowalking the city: Comparing hypotheses about urban photo trails on Flickr. In Proceedings of the International Conference on Social Informatics. 227--244.
[8]
Martin Becker, Philipp Singer, Florian Lemmerich, Andreas Hotho, Denis Helic, and Markus Strohmaier. 2015b. VizTrails: An information visualization tool for exploring geographic movement trajectories. In Proceedings of the Conference on Hypertext and Social Media. ACM, New York, NY, 319--320.
[9]
Tim Berners-Lee and Mark Fischetti. 2000. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Harper Business.
[10]
Mikhail Bilenko and Ryen W. White. 2008. Mining the search trails of surfing crowds: Identifying relevant Websites from user activity. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 51--60.
[11]
Jose Borges and Mark Levene. 2000. Data mining of user navigation patterns. In Web Usage Analysis and User Profiling. Lecture Notes in Computer Science, Vol. 1836. Springer, 92--112.
[12]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the International Conference on World Wide Web. 107--117.
[13]
Duncan P. Brumby and Andrew Howes. 2004. Good enough but I’ll just check: Web-page search as attentional refocusing. In Proceedings of the International Conference on Cognitive Modeling. 46--51.
[14]
Vannevar Bush. 1945. As we may think. Atlantic Monthly 176, 1, 101--108.
[15]
John W. Byers, Michael Mitzenmacher, and Georgios Zervas. 2012. The Groupon effect on Yelp ratings: A root cause analysis. In Proceedings of the Conference on Electronic Commerce. ACM, New York, NY, 248--265.
[16]
Lara D. Catledge and James E. Pitkow. 1995. Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27, 6, 1065--1073.
[17]
O. Celma. 2010. Music Recommendation and Discovery in the Long Tail. Springer.
[18]
Matthew Chalmers, Kerry Rodden, and Dominique Brodbeck. 1998. The order of things: Activity-centred information access. Computer Networks and ISDN Systems 30, 1, 359--367.
[19]
Ed H. Chi, Peter L. T. Pirolli, Kim Chen, and James Pitkow. 2001. Using information scent to model user information needs and actions and the Web. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, New York, NY, 490--497.
[20]
Flavio Chierichetti, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. 2012. Are Web users really Markovian? In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 609--618.
[21]
Sanjoy Dasgupta and Anupam Gupta. 2003. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22, 1, 60--65.
[22]
Cameron Davidson-Pilon. 2014. Probablistic Programming and Bayesian Methods for Hackers. Retrieved March 21, 2017, from http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian- Methods-for-Hackers/.
[23]
Munmun De Choudhury, Moran Feldman, Sihem Amer-Yahia, Nadav Golbandi, Ronny Lempel, and Cong Yu. 2010. Automatic construction of travel itineraries using social breadcrumbs. In Proceedings of the Conference on Hypertext and Hypermedia. ACM, New York, NY, 35--44.
[24]
Mukund Deshpande and George Karypis. 2004. Selective Markov models for predicting Web page accesses. ACM Transactions on Internet Technology 4, 2, 163--184.
[25]
Lisette Espín-Noboa, Florian Lemmerich, Philipp Singer, and Markus Strohmaier. 2016. Discovering and characterizing mobility patterns in urban spaces: A study of Manhattan taxi data. In Proceedings of the International Conference on World Wide Web Companion.
[26]
Paul H. Garthwaite, Joseph B. Kadane, and Anthony O’Hagan. 2005. Statistical methods for eliciting probability distributions. Journal of the American Statistical Association 100, 470, 680--701.
[27]
S. M. Gore. 1987. Biostatistics and the medical research council. Medical Research Council News 35, 19--20.
[28]
Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. 1998. Strong regularities in World Wide Web surfing. Science 280, 5360, 95--97.
[29]
Robert E. Kass and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90, 430, 773--795.
[30]
Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 453--461.
[31]
R. Lempel and S. Moran. 2000. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks 33, 1, 387--401.
[32]
Ping Li, Trevor J. Hastie, and Kenneth W. Church. 2006. Very sparse random projections. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 287--296.
[33]
David J. C. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, UK.
[34]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Vol. 1. Cambridge University Press, Cambridge, UK.
[35]
Judith Masthoff. 2004. Group modeling: Selecting a sequence of television items to suit a group of viewers. In Personalized Digital Television. Springer, 93--141.
[36]
Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, and Masatoshi Yoshikawa. 2012. Fast mining and forecasting of complex time-stamped events. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 271--279.
[37]
Theodor H. Nelson. 1965. Complex information processing: A file structure for the complex, the changing and the indeterminate. In Proceedings of the 20th National Conference (ACM’65). ACM, New York, NY, 84--100.
[38]
J. Oakley. 2010. Eliciting univariate probability distributions. In Rethinking Risk Measurement and Reporting, Vol. 1, K. Bocker (Ed.). Risk Books, London, UK, 155--178.
[39]
Byron J. Pierce, Stanley R. Parkinson, and Norwood Sisson. 1992. Effects of semantic similarity, omission probability and number of alternatives in computer menu search. International Journal of Man-Machine Studies 37, 5, 653--677.
[40]
Peter L. T. Pirolli and Stuart K. Card. 1999. Information foraging. Psychological Review 106, 4, 643--675.
[41]
Peter L. T. Pirolli and James E. Pitkow. 1999. Distributions of surfers’ paths through the World Wide Web: Empirical characterizations. World Wide Web 2, 1--2, 29--45.
[42]
Derek de Solla Price. 1976. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science 27, 5, 292--306.
[43]
Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM 8, 10, 627--633.
[44]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 5, 513--523.
[45]
Anna Samoilenko, Fariba Karimi, Daniel Edler, Jérôme Kunegis, and Markus Strohmaier. 2016. Linguistic neighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editing activity. In Proceedings of the International School and Conference on Network Science.
[46]
Philipp Singer, Denis Helic, Andreas Hotho, and Markus Strohmaier. 2015. HypTrails: A Bayesian approach for comparing hypotheses about human trails on the Web. In Proceedings of the International Conference on World Wide Web.
[47]
Philipp Singer, Denis Helic, Behnam Taraghi, and Markus Strohmaier. 2014. Detecting memory and structure in human navigation patterns using Markov chain models of varying order. PloS One 9, 7, e102070.
[48]
Philipp Singer, Thomas Niebler, Markus Strohmaier, and Andreas Hotho. 2013. Computing semantic relatedness from human navigational paths: A case study on Wikipedia. International Journal on Semantic Web and Information Systems 9, 4, 41--70.
[49]
Roger W. Sinnott. 1984. Virtues of the Haversine. Sky and Telescope 68, 2, 158.
[50]
Christopher C. Strelioff, James P. Crutchfield, and Alfred W. Hübler. 2007. Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E 76, 1, 011106.
[51]
Wolf Vanpaemel. 2010. Prior sensitivity in theory testing: An apologia for the Bayes factor. Journal of Mathematical Psychology 54, 6, 491--498.
[52]
Wolf Vanpaemel. 2011. Constructing informative model priors using hierarchical methods. Journal of Mathematical Psychology 55, 1, 106--117.
[53]
Wolf Vanpaemel and Michael D. Lee. 2012. Using priors to formalize theory: Optimal attention and the generalized context model. Psychonomic Bulletin and Review 19, 6, 1047--1056.
[54]
Simon Walk, Philipp Singer, Lisette Espín Noboa, Tania Tudorache, Mark A. Musen, and Markus Strohmaier. 2015. Understanding how users edit ontologies: Comparing hypotheses about four real-world projects. In Proceedings of the International Semantic Web Conference. 551--568.
[55]
Simon Walk, Philipp Singer, and Markus Strohmaier. 2014a. Sequential action patterns in collaborative ontology-engineering projects: A case-study in the biomedical domain. In Proceedings of the International Conference on Conference on Information and Knowledge Management. ACM, New York, NY.
[56]
Simon Walk, Philipp Singer, Markus Strohmaier, Tania Tudorache, Mark A. Musen, and Natalya F. Noy. 2014b. Discovering beaten paths in collaborative ontology-engineering projects using Markov chains. Journal of Biomedical Informatics 51, 254--271.
[57]
Larry Wasserman. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 1, 92--107.
[58]
Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 619--628.
[59]
Robert West, Joelle Pineau, and Doina Precup. 2009. Wikispeedia: An online game for inferring semantic distances between concepts. In Proceedings of the International Joint Conference on Artificial Intelligence. 1598--1603.
[60]
Ryen W. White and Jeff Huang. 2010. Assessing the scenic route: Measuring the value of search trails in Web logs. In Proceedings of the Conference on Research and Development in Information Retrieval. ACM, New York, NY, 587--594.
[61]
Wangang Xie, Paul O. Lewis, Yu Fan, Lynn Kuo, and Ming-Hui Chen. 2010. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60, 2, 150--160.
[62]
Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, and Nigam Shah. 2014. Finding progression stages in time-evolving event sequences. In Proceedings of the International Conference on World Wide Web. ACM, New York, NY, 783--794.

Cited By

View all
  • (2020)Navigation leads for exploratory search and navigation in digital librariesKnowledge and Information Systems10.1007/s10115-019-01434-2Online publication date: 31-Jan-2020
  • (2018)Do Spatial Abilities Have an Impact on Route Learning in Hypertexts?Spatial Cognition XI10.1007/978-3-319-96385-3_15(211-227)Online publication date: 24-Jul-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 11, Issue 3
August 2017
209 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/3113174
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2017
Accepted: 01 January 2017
Revised: 01 November 2016
Received: 01 February 2016
Published in TWEB Volume 11, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian statistics
  2. Human trails
  3. Markov chain
  4. Web
  5. hypotheses
  6. paths
  7. sequences
  8. sequential human behavior

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • FWF Austrian Science Fund research project “Navigability of Decentralized Information Networks.”
  • DFG German Science Fund research project “PoSTs II”

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Navigation leads for exploratory search and navigation in digital librariesKnowledge and Information Systems10.1007/s10115-019-01434-2Online publication date: 31-Jan-2020
  • (2018)Do Spatial Abilities Have an Impact on Route Learning in Hypertexts?Spatial Cognition XI10.1007/978-3-319-96385-3_15(211-227)Online publication date: 24-Jul-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media