[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1779599.1779603acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmdacConference Proceedingsconference-collections
research-article

Extracting user profiles from large scale data

Published: 26 April 2010 Publication History

Abstract

In this work we present the details of a large scale user profiling framework that we developed here in IBM on top of Apache Hadoop. We address the problem of extracting and maintaining a very large number of user profiles from large scale data. We first describe an efficient user profiling framework with high user profiling quality guarantees. We then describe a scalable implementation of the proposed framework in Apache Hadoop and discuss its challenges.

References

[1]
D. Carmel, H. Roitman, and N. Zwerdling. Enhancing cluster labeling using wikipedia. In SIGIR '09, pages 139--146, New York, NY, USA, 2009. ACM.
[2]
D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR '06, pages 390--397. ACM Press, 2006.
[3]
U. Cetintemel, M. J. Franklin, and C. L. Giles. Self-adaptive user profiles for large-scale data delivery. In ICDE, pages 622--633, 2000.
[4]
L. Chen and K. Sycara. Webmate: a personal agent for browsing and searching. In AGENTS '98, New York, NY, USA, 1998. ACM.
[5]
Y. Chen, D. Pavlov, and J. F. Canny. Large-scale behavioral targeting. In KDD '09, New York, NY, USA, 2009. ACM.
[6]
S. Gauch, M. Speretta, A. Chandramouli, and A. Micarelli. User profiles for personalized information access. In The Adaptive Web, volume 4321 of Lecture Notes in Computer Science. Berlin, Heidelberg, 2007.
[7]
M. Hinne, W. Kraaij, S. Raaijmakers, S. Verberne, T. van der Weide, and M. van der Heijden. Annotation of urls: more than the sum of parts. In SIGIR '09, pages 632--633, New York, NY, USA, 2009. ACM.
[8]
H. R. Kim and P. K. Chan. Learning implicit user interest hierarchy for context in personalization. In In Proc. of International Conference on Intelligent User Interface (IUI), pages 101--108, 2003.
[9]
S. Kullback and R. A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics, 22(1):79--86, 1951.
[10]
L. Li, Z. Yang, B. Wang, and M. Kitsuregawa. Dynamic adaptation strategies for long-term and short-term user profile to personalize search. In APWeb/WAIM, pages 228--240, 2007.
[11]
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[12]
H. Roitman, D. Carmel, and E. Yom-Tov. Maintaining dynamic channel profiles on the web. Proc. VLDB Endow., 1(1):151--162, 2008.
[13]
K. Sugiyama, K. Hatano, and M. Yoshikawa. Adaptive web search based on user profile constructed without any effort from users. In WWW, pages 675--684, 2004.
[14]
Yanagimoto and S. H. Omatu. User profile creation using genetic algorithm with kullback leibler divergence. IEEJ Transactions on Electronics, Information and Systems, 126:389--394, 2006.
[15]
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In AAIM '08, pages 337--348, Berlin, Heidelberg, 2008. Springer-Verlag.

Cited By

View all
  • (2021)Microbloggers’ interest inference using a subgraph streamIntelligent Data Analysis10.3233/IDA-19504225:2(397-417)Online publication date: 4-Mar-2021
  • (2021)Reliably Calibrated Isotonic RegressionAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75762-5_46(578-589)Online publication date: 9-May-2021
  • (2018)Exploring the Universe of Egregious Conversations in ChatbotsCompanion Proceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3180308.3180324(1-2)Online publication date: 5-Mar-2018
  • Show More Cited By

Index Terms

  1. Extracting user profiles from large scale data

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
      April 2010
      53 pages
      ISBN:9781605589916
      DOI:10.1145/1779599
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 April 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Hadoop
      2. large scale
      3. user profile

      Qualifiers

      • Research-article

      Conference

      MDAC '10

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Microbloggers’ interest inference using a subgraph streamIntelligent Data Analysis10.3233/IDA-19504225:2(397-417)Online publication date: 4-Mar-2021
      • (2021)Reliably Calibrated Isotonic RegressionAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75762-5_46(578-589)Online publication date: 9-May-2021
      • (2018)Exploring the Universe of Egregious Conversations in ChatbotsCompanion Proceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3180308.3180324(1-2)Online publication date: 5-Mar-2018
      • (2017)Business Graphing for Internet-Enabled Enterprises2017 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2017.84(549-556)Online publication date: Jun-2017
      • (2016)Characterizing Users in an Online Classified Ad NetworkProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912849(1-9)Online publication date: 13-Jun-2016
      • (2015)What's the big deal about big data?Big Data and Information Analytics10.3934/bdia.2016.1.311:1(31-79)Online publication date: Sep-2015
      • (2015)Scalable Learning of k-dependence Bayesian Classifiers under MapReduceProceedings of the 2015 IEEE Trustcom/BigDataSE/ISPA - Volume 0210.1109/Trustcom.2015.558(25-32)Online publication date: 20-Aug-2015
      • (2014)An author-reader influence model for detecting topic-based influencers in social mediaProceedings of the 25th ACM conference on Hypertext and social media10.1145/2631775.2631804(46-55)Online publication date: 1-Sep-2014
      • (2013)A statistical approach to mining customers' conversational data from social mediaIBM Journal of Research and Development10.1147/JRD.2013.225183357:3-4(14-14)Online publication date: 1-May-2013
      • (2013)Modeling the uniqueness of the user preferences for recommendation systemsProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484102(777-780)Online publication date: 28-Jul-2013
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media