[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Identifying clusters of user behavior in intranet search engine log files

Published: 01 December 2008 Publication History

Abstract

When studying how ordinary Web users interact with Web search engines, researchers tend to either treat the users as a homogeneous group or group them according to search experience. Neither approach is sufficient, we argue, to capture the variety in behavior that is known to exist among searchers. By applying automatic clustering technique based on self-organizing maps to search engine log files from a corporate intranet, we show that users can be usefully separated into distinguishable segments based on their actual search behavior. Based on these segments, future tools for information seeking and retrieval can be targeted to specific segments rather than just made to fit the “the average user.” The exact number of clusters, and to some extent their characteristics, can be expected to vary between intranets, but our results indicate that some more generic groups may exist. In our study, a large group of users appeared to be “fact seekers” who would benefit from higher precision, a smaller group of users were more holistically oriented and would likely benefit from higher recall, and a third category of users seemed to constitute the knowledgeable users. These three groups may raise different design implications for search-tool developers. © 2008 Wiley Periodicals, Inc.

References

[1]
Berry, M.J. & Linoff, G. (1997). Data mining techniques: For marketing, sales, and customer support. New York, NY: Wiley.
[2]
Bilal, D. (2000). Children's use of the Yahooligans! Web search engine: I. Cognitive, physical, and affective behaviors on fact-based search tasks. Journal of the American Society for Information Science, 51(7), 646-665.
[3]
Bilal, D. (2001). Children's use of the Yahooligans! Web search engine: II. Cognitive, physical, and affective behaviors on research tasks. Journal of the American Society for Information Science, 52(2), 118-136.
[4]
Buckland, M., & Gey, F. (1994). The relationship between recall and precision. Journal of the American Society for Information Science, 45(1), 12-19.
[5]
Chang, M., Leggett, J.J., Furuta, R., Kerne, A., Williams, J.P., Burns, S.A., & Bias, R.G. (2004). Collection understanding. Proceedings of JCDL 2004 (pp. 334-342), Tucson, AZ.
[6]
Chen, H., Houston, A.L., Sewell, R.R., & Schatz, B.R. (1998). Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Science, 49(7), 582-603.
[7]
Chen, H.-M, & Cooper, M.D. (2001). Using clustering techniques to detect usage patterns in a web-based information system. Journal of the American Society for Information Science and Technology, 52(11), 888-904.
[8]
CIS. (2008) SOM Toolbox homepage. Laboratory of Computer and Information Science (CIS), Department of Computer Science and Engineering, Helsinki University of Technology. Retrieved July 21, 2008, from http://www.cis.hut.fi/projects/somtoolbox/
[9]
Cooper, A. (1999). The inmates are running the asylum: Why high tech products drive us crazy and how to restore the sanity. Indianapolis, IN: Sams.
[10]
Davies, D.I., & Bouldin, D.W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(4), 224-227.
[11]
Desmet, H. (2001). Buying behavior study with basket analysis: Pre-clustering with a Kohonen map. European Journal of Economic and Social Systems, 15(2), 17-30.
[12]
Eason. K., Richardson, S., & Yu, L. (2000). Patterns of use of electronic journals. Journal of Documentation, 49(4), 356-69.
[13]
Gevrey, M., Worner, S.P, Kasabov, N., Pitt, J., & Giraudel, J.-L. (2006). Estimating risk of events using SOM models: A case study on invasive species establishment. Ecological Modelling, 197(3-4), 361-372.
[14]
Ghani, R., & Fano, A. (2002). Building recommender systems using a knowledge base of products semantics In Proceedings of the Workshop on Recommendation and Personalization in E-Commerce, the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems, Malaga, Spain.
[15]
Günter, S., & Burke, H. (2001). Validation indices for graph clustering. In Proceedings of the 3rd IAPR-TC15 Workshop on Graph-Based Representations in Pattern Recognition (pp. 229-238), Ischia, Italy.
[16]
Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of Intelligent Information Systems, 17(2/3), 107-145.
[17]
He, D., & Göker, A. (2000). Detecting session boundaries from Web user logs. In Proceedings of the 22nd annual Colloquium on Information Retrieval Research (ECIR), Sidney Sussex College, Cambridge, England.
[18]
Huang, X., Peng, F., An, A., & Schuurmans, D. (2004). Dynamic web log sessions identification with statistical language models. Journal of the American Society for Information Science and Technology, 55(13), 1290-1303.
[19]
Jansen, B., & Spink, A. (2003). An analysis of Web documents retrieved and viewed. In Proceedings of ICIC 2003 (pp. 65-69), Las Vegas, NV.
[20]
Jansen, B., Spink, A., Bateman, J., & Saracevic, T. (1998). Real life information retrieval: A study of user queries on the web. ACM SIGIR Forum, 32(1), 5-17.
[21]
Jansen, B., Spink, A., & Saracevic, T. (2000). Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, 36(2), 207-227.
[22]
Jansen, B.J. (2006). Search log analysis: What is it: what's been done: how to do it. Library and Information Science Research, 28(3), 407-432.
[23]
Kotler, P., Armstrong, G., Cunningham, M.H., & Warren, R. (1996). Principles of marketing (7th ed.). Englewood Cliffs, NJ: Prentice-Hall.
[24]
Li, H., Cao, Y., Xu, J., Hu, Y., Li, S., & Meyerzon, D. (2005). A new approach to intranet search based on information extraction. In Proceedings of CIKM '05 (pp. 460-468), Bremen, Germany.
[25]
Maarek, Y.S., & Ben-Shaul, I.Z. (1996). Automatically organizing bookmarks per contents. Computer Networks and ISDN Systems, 28, 1321-1333.
[26]
Machón, I., & López, H. (2006). End-point detection of the aerobic phase in a biological reactor using SOM and clustering algorithms. Engineering Applications of Artificial Intelligence, 19(1), 19-28.
[27]
Moore, J.L., Erdelez, S., & He, W. (2007). The search experience variable in information behavior research. Journal of the American Society for Information Science and Technology, 58(19), 1529-1546.
[28]
Nielsen, J. (1999). User interface directions for the Web. Communications of the ACM, 42(1), 65-72.
[29]
Silverstein, C., Henzinger, M., Marais, H., & Moricz, M. (1998, October 26). Analysis of a very large AltaVista query log. Digital SRC Technical Note 1998-014.
[30]
Spink, A., & Jansen, B. (2004). Web search: Public searching of the web. Dordrecht, The Netherlands: Kluwer.
[31]
Stenmark, D. (2005). One week with a corporate search engine: A time-based analysis of intranet information seeking. In Proceedings of AMCIS 2005 (pp. 2306-2316), Omaha, NE.
[32]
Stenmark, D., & Jadaan, T. (2006). Intranet users' information-seeking behaviour: A longitudinal study of search engine logs. In Proceedings of ASIS&T 2006, Austin, TX.
[33]
Strindberg, H. (2006). Mining a corporate intranet for user segments of information seeking behavior. Unpublished master's thesis, University of Gothenburg, Gothenburg, Sweden.
[34]
Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on Neural Networks, 11(3), 586-600.
[35]
Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (1999). Self-organizing map in Matlab: The SOM toolbox. In Proceedings of the Matlab DSP Conference 1999 (pp. 35-40), Espoo, Finland.
[36]
Wang, L., Jiang, M., Lu, Y., Noe, F., & Smith, J.C. (2006). Self-organizing map clustering analysis for molecular data. in Proceedings of the 3rd International Symposium on Neural Networks (pp. 1250-1255), Chengdu, China.
[37]
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 13(3), 645-678.

Cited By

View all
  • (2023)Impact of COVID-19 on search in an organisationJournal of Information Science10.1177/016555152198953149:1(43-58)Online publication date: 1-Feb-2023
  • (2019)Role of the Website Structure in the Diversity of Browsing BehaviorsProceedings of the 30th ACM Conference on Hypertext and Social Media10.1145/3342220.3343648(133-142)Online publication date: 12-Sep-2019
  • (2019)Analysis of Transaction Logs from National Museums LiverpoolDigital Libraries for Open Knowledge10.1007/978-3-030-30760-8_7(84-98)Online publication date: 9-Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology  Volume 59, Issue 14
December 2008
158 pages

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 December 2008

Author Tags

  1. end users
  2. individual differences
  3. search behavior
  4. user attributes
  5. user models

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Impact of COVID-19 on search in an organisationJournal of Information Science10.1177/016555152198953149:1(43-58)Online publication date: 1-Feb-2023
  • (2019)Role of the Website Structure in the Diversity of Browsing BehaviorsProceedings of the 30th ACM Conference on Hypertext and Social Media10.1145/3342220.3343648(133-142)Online publication date: 12-Sep-2019
  • (2019)Analysis of Transaction Logs from National Museums LiverpoolDigital Libraries for Open Knowledge10.1007/978-3-030-30760-8_7(84-98)Online publication date: 9-Sep-2019
  • (2017)A longitudinal study of user queries and browsing requests in a case-based reasoning retrieval systemJournal of the Association for Information Science and Technology10.1002/asi.2373868:5(1124-1136)Online publication date: 1-May-2017
  • (2017)Exploratory information searching in the enterpriseJournal of the Association for Information Science and Technology10.1002/asi.2359568:1(77-96)Online publication date: 1-Jan-2017
  • (2016)An exploration of search session patterns in an image-based digital libraryJournal of Information Science10.1177/016555151559895242:4(477-491)Online publication date: 1-Aug-2016
  • (2016)Investigating Cluster Stability when Analyzing Transaction LogsProceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries10.1145/2910896.2910923(115-118)Online publication date: 19-Jun-2016
  • (2014)Toward a model of emotions and mood in the online information search processJournal of the Association for Information Science and Technology10.5555/3151180.315118365:9(1775-1793)Online publication date: 1-Sep-2014
  • (2014)Categorising search sessionsProceedings of the 5th Information Interaction in Context Symposium10.1145/2637002.2637035(251-254)Online publication date: 26-Aug-2014
  • (2013)Mixing and matching usage dataProceedings of the 41st annual ACM SIGUCCS conference on User services10.1145/2504776.2504806(119-122)Online publication date: 3-Nov-2013
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media