[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3342220.3343648acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

Role of the Website Structure in the Diversity of Browsing Behaviors

Published: 12 September 2019 Publication History

Abstract

The quantitative measurement of the diversity of information consumption has emerged as a prominent tool in the examination of relevant phenomena such as filter bubbles. This paper proposes an analysis of the diversity of the navigation of users inside a website through the analysis of server log files. The methodology, guided and illustrated by a case study, but easily applicable to other cases, establishes relations between types of users' behavior, site structure, and diversity of web browsing. Using the navigation paths of sessions reconstructed from the log file, the proposed methodology offers three main insights: 1) it reveals diversification patterns associated with the page network structure, 2) it relates human browsing characteristics (such as multi-tabbing or click frequency) with the degree of diversity, and 3) it helps identifying diversification patterns specific to subsets of users. These results are in turn useful in the analysis of recommender systems and in the design of websites when there are diversity-related goals or constrains.

References

[1]
Charu C Aggarwal et almbox. 2016. Recommender systems .Springer.
[2]
Maristella Agosti, Franco Crivellari, and Giorgio Maria Di Nunzio. 2012. Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction. Data Mining and Knowledge Discovery, Vol. 24, 3 (2012), 663--696.
[3]
Lalit R Bahl, Frederick Jelinek, and Robert L Mercer. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 2 (1983), 179--190.
[4]
Pablo Barberá, John T Jost, Jonathan Nagler, Joshua A Tucker, and Richard Bonneau. 2015. Tweeting from left to right: Is online political communication more than an echo chamber? Psychological science, Vol. 26, 10 (2015), 1531--1542.
[5]
Christian Bomhardt, Wolfgang Gaul, and Lars Schmidt-Thieme. 2005. Web robot detection-preprocessing web logfiles for robot detection. In New developments in classification and data analysis. Springer, 113--124.
[6]
Abdelhamid Salah Brahim, Lionel Tabourier, and Bénédicte Le Grand. 2013. A data-driven analysis to question epidemic models for citation cascades on the blogosphere. In 7th International AAAI Conference on Weblogs and Social Media .
[7]
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of mathematical sociology, Vol. 25, 2 (2001), 163--177.
[8]
Hui-Min Chen and Michael D Cooper. 2001. Using clustering techniques to detect usage patterns in a Web-based information system. Journal of the American Society for Information Science and Technology, Vol. 52, 11 (2001), 888--904.
[9]
Min Chen and Young U Ryu. 2013. Facilitating effective user navigation through website structure improvement. IEEE Transactions on Knowledge and Data Engineering, Vol. 25, 3 (2013), 571--588.
[10]
Ed H Chi, Peter Pirolli, and James Pitkow. 2000. The scent of a site: A system for analyzing and predicting information scent, usage, and usability of a web site. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 161--168.
[11]
Dimitar Dimitrov, Philipp Singer, Florian Lemmerich, and Markus Strohmaier. 2017. What makes a link successful on wikipedia?. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 917--926.
[12]
Derek Doran and Swapna S Gokhale. 2011. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, Vol. 22, 1--2 (2011), 183--210.
[13]
Nathan Eagle, Michael Macy, and Rob Claxton. 2010. Network diversity and economic development. Science, Vol. 328, 5981 (2010), 1029--1031.
[14]
Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, and online news consumption. Public opinion quarterly, Vol. 80, S1 (2016), 298--320.
[15]
Daniel Fleder and Kartik Hosanagar. 2009. Blockbuster culture's next rise or fall: The impact of recommender systems on sales diversity. Management science, Vol. 55, 5 (2009), 697--712.
[16]
Yongjian Fu, Ming-Yi Shih, Mario Creado, and Chunhua Ju. 2002. Reorganizing web sites based on user access patterns. Intelligent Systems in Accounting, Finance & Management, Vol. 11, 1 (2002), 39--53.
[17]
Joris Guérin, Olivier Gibaru, Stéphane Thiery, and Eric Nyiri. 2017. Clustering for different scales of measurement-the gap-ratio weighted k-means algorithm. arXiv preprint arXiv:1703.07625 (2017).
[18]
Aaron Halfaker, Oliver Keyes, Daniel Kluver, Jacob Thebault-Spieker, Tien Nguyen, Kenneth Shores, Anuradha Uduwage, and Morten Warncke-Wang. 2015. User session identification based on strong regularities in inter-activity time. In Proceedings of the 24th International Conference on World Wide Web. 410--418.
[19]
Denis Helic, Markus Strohmaier, Michael Granitzer, and Reinhold Scherer. 2013. Models of human navigation in information networks based on decentralized search. In Proceedings of the 24th Conference on Hypertext and Social Media. ACM, 89--98.
[20]
Bernard J Jansen. 2008. Handbook of research on web log analysis .IGI Global.
[21]
Rosie Jones and Kristina Lisa Klinkner. 2008. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th Conference on Information and Knowledge Management. ACM, 699--708.
[22]
Juhi Kulshrestha, Muhammad Bilal Zafar, Lisette Espin Noboa, Krishna P Gummadi, and Saptarshi Ghosh. 2015. Characterizing Information Diets of Social Media Users. In 9th International AAAI Conference on Web and Social Media .
[23]
Amaury L'Huillier, Sylvain Castagnos, and Anne Boyer. 2016. Modéliser la diversité au cours du temps pour détecter le contexte dans un service de musique en ligne. Revue des Sciences et Technologies de l'Information (2016).
[24]
Haibin Liu and Vlado Kevs elj. 2007. Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests. Data & Knowledge Engineering, Vol. 61, 2 (2007), 304--330.
[25]
Ray McAleese. 1989. Navigation and browsing in hypertext. Hypertext: theory into practice (1989), 6--44.
[26]
Sharon McDonald and Rosemary J Stevenson. 1998. Navigation in hyperspace: An evaluation of the effects of navigational tools and subject matter expertise on browsing and information retrieval in hypertext. Interacting with computers, Vol. 10, 2 (1998), 129--142.
[27]
Alan L Montgomery and Christos Faloutsos. 2001. Identifying web browsing trends and patterns. Computer, Vol. 34, 7 (2001), 94--95.
[28]
Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web. ACM, 677--686.
[29]
David Nicholas, Paul Huntington, and Hamid R Jamali. 2008. User diversity: as demonstrated by deep log analysis. The Electronic Library, Vol. 26, 1 (2008), 21--38.
[30]
David Nicholas, Paul Huntington, and Anthony Watkinson. 2005. Scholarly journal usage: the results of deep log analysis. Journal of documentation, Vol. 61, 2 (2005), 248--280.
[31]
Dimitar Nikolov, Diego FM Oliveira, Alessandro Flammini, and Filippo Menczer. 2015. Measuring online social bubbles. PeerJ Computer Science, Vol. 1 (2015), e38.
[32]
Adrien Nouvellet, Florence D'Alché-Buc, Valérie Baudouin, Christophe Prieur, and Francc ois Roueff. 2019. Discovery of usage patterns in digital library web logs using Markov modeling . Preprint. https://hal.archives-ouvertes.fr/hal-02182244
[33]
Lukasz Olejnik, Claude Castelluccia, and Artur Janc. 2012. Why johnny can't browse in peace: On the uniqueness of web browsing history patterns. In 5th Workshop on Hot Topics in Privacy Enhancing Technologies (HotPETs 2012) .
[34]
Ashwin Paranjape, Robert West, Leila Zia, and Jure Leskovec. 2016. Improving website hyperlink structure using server logs. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 615--624.
[35]
Eli Pariser. 2011. The filter bubble: What the Internet is hiding from you .Penguin UK.
[36]
Thomas A Peters. 1993. The history and development of transaction log analysis. Library hi tech, Vol. 11, 2 (1993), 41--66.
[37]
Huy Pham, Cyrus Shahabi, and Yan Liu. 2013. EBM: an entropy-based model to infer social strength from spatiotemporal data. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 265--276.
[38]
Francesco Ricci, Lior Rokach, and Bracha Shapira. 2015. Recommender systems: introduction and challenges. In Recommender systems handbook . Springer, 1--34.
[39]
Aju Thalappillil Scaria, Rose Marie Philip, Robert West, and Jure Leskovec. 2014. The last click: Why users give up information network navigation. In Proceedings of the 7th ICWSDM Conference. ACM, 213--222.
[40]
Ana Luc'ia Schmidt, Fabiana Zollo, Michela Del Vicario, Alessandro Bessi, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2017. Anatomy of news consumption on Facebook. Proceedings of the National Academy of Sciences, Vol. 114, 12 (2017), 3035--3039.
[41]
Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz. 1999. Analysis of a very large web search engine query log. In ACM SIGIR Forum, Vol. 33. ACM, 6--12.
[42]
Ramakrishnan Srikant and Yinghui Yang. 2001. Mining web logs to improve website organization. WWW, Vol. 1 (2001), 430--437.
[43]
Dick Stenmark. 2008. Identifying clusters of user behavior in intranet search engine log files. Journal of the American Society for Information Science and Technology, Vol. 59, 14 (2008), 2232--2243.
[44]
Andrew Stirling. 1998. On the economics and analysis of diversity. Science Policy Research Unit (SPRU), Electronic Working Papers Series, Paper, Vol. 28 (1998), 1--156.
[45]
Andy Stirling. 2007. A general framework for analysing diversity in science, technology and society. Journal of the Royal Society Interface, Vol. 4, 15 (2007), 707--719.
[46]
Pang-Ning Tan and Vipin Kumar. 2004. Discovery of web robot sessions based on their navigational patterns. In Intelligent Technologies for Information Analysis. Springer, 193--222.
[47]
Michele Trevisiol, Luca Maria Aiello, Rossano Schifanella, and Alejandro Jaimes. 2014. Cold-start news recommendation with domain-dependent browse graph. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 81--88.
[48]
Saúl Vargas and Pablo Castells. 2011. Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the fifth ACM conference on Recommender systems. ACM, 109--116.
[49]
Simon Wakeling and Paul Clough. 2016. Determining the Optimal Session Interval for Transaction Log Analysis of an Online Library Catalogue. In European Conference on Information Retrieval . Springer, 703--708.
[50]
Robert West and Jure Leskovec. 2012. Human wayfinding in information networks. In Proceedings of the 21st international conference on World Wide Web. ACM, 619--628.
[51]
Dietmar Wolfram. 2008. Search characteristics in different types of Web-based IR environments: Are they the same? Information processing & management, Vol. 44, 3 (2008), 1279--1292.
[52]
Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúvs Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, Vol. 107, 10 (2010), 4511--4515.

Cited By

View all
  • (2024)American politics in 3D: measuring multidimensional issue alignment in social media using social graphs and text dataApplied Network Science10.1007/s41109-023-00608-w9:1Online publication date: 10-Jan-2024
  • (2023)Malicious website identification using design attribute learningInternational Journal of Information Security10.1007/s10207-023-00686-y22:5(1207-1217)Online publication date: 24-Mar-2023
  • (2023)Multidimensional Online American Politics: Mining Emergent Social Cleavages in Social GraphsComplex Networks and Their Applications XI10.1007/978-3-031-21127-0_15(176-189)Online publication date: 4-Jan-2023
  • Show More Cited By

Index Terms

  1. Role of the Website Structure in the Diversity of Browsing Behaviors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HT '19: Proceedings of the 30th ACM Conference on Hypertext and Social Media
    September 2019
    326 pages
    ISBN:9781450368858
    DOI:10.1145/3342220
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 September 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. diversity
    2. filter bubbles
    3. log analysis
    4. web-browsing patterns

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    HT '19
    Sponsor:

    Acceptance Rates

    HT '19 Paper Acceptance Rate 20 of 68 submissions, 29%;
    Overall Acceptance Rate 378 of 1,158 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)American politics in 3D: measuring multidimensional issue alignment in social media using social graphs and text dataApplied Network Science10.1007/s41109-023-00608-w9:1Online publication date: 10-Jan-2024
    • (2023)Malicious website identification using design attribute learningInternational Journal of Information Security10.1007/s10207-023-00686-y22:5(1207-1217)Online publication date: 24-Mar-2023
    • (2023)Multidimensional Online American Politics: Mining Emergent Social Cleavages in Social GraphsComplex Networks and Their Applications XI10.1007/978-3-031-21127-0_15(176-189)Online publication date: 4-Jan-2023
    • (2021)Reinforcement Learning Page Prediction for Hierarchically Ordered Municipal WebsitesInformation10.3390/info1206023112:6(231)Online publication date: 28-May-2021
    • (2021)Measuring Diversity in Heterogeneous Information NetworksTheoretical Computer Science10.1016/j.tcs.2021.01.013Online publication date: Jan-2021
    • (2021)A query expansion method based on topic modeling and DBpedia featuresInternational Journal of Information Management Data Insights10.1016/j.jjimei.2021.1000431:2(100043)Online publication date: Nov-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media