[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1718487.1718524acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Learning similarity metrics for event identification in social media

Published: 04 February 2010 Publication History

Abstract

Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host substantial amounts of user-contributed materials (e.g., photographs, videos, and textual content) for a wide variety of real-world events of different type and scale. By automatically identifying these events and their associated user-contributed social media documents, which is the focus of this paper, we can enable event browsing and search in state-of-the-art search engines. To address this problem, we exploit the rich "context" associated with social media content, including user-provided annotations (e.g., title, tags) and automatically generated information (e.g., content creation time). Using this rich context, which includes both textual and non-textual features, we can define appropriate document similarity metrics to enable online clustering of media to events. As a key contribution of this paper, we explore a variety of techniques for learning multi-feature similarity metrics for social media documents in a principled manner. We evaluate our techniques on large-scale, real-world datasets of event images from Flickr. Our evaluation results suggest that our approach identifies events, and their associated social media documents, more effectively than the state-of-the-art strategies on which we build.

References

[1]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the First ACM International Conference on Web Search and Data Mining (WSDM'08), 2008.
[2]
J. Allan. Introduction to topic detection and tracking. In J. Allan, editor, Topic Detection and Tracking -- Event-based Information Organization, pages 1--16. Kluwer Academic Publisher, 2002.
[3]
J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR'98), 1998.
[4]
S. Amer-Yahia, M. Benedikt, L.V.S. Lakshmanan, and J. Stoyanovich. Efficient network aware search in collaborative tagging sites. PVLDB, 1(1):710--721, 2008.
[5]
E. Amigo, J. Gonzalo, J. Artiles, and F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 2008.
[6]
H. Becker, M. Naaman, and L. Gravano. Event identification in social media. In Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'09), June 2009.
[7]
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
[8]
M. Bilenko, S. Basu, and M. Sahami. Adaptive product normalization: Using online learning for record linkage in comparison shopping. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM'05), 2005.
[9]
M. Bilenko, B. Kamath, and R.J. Mooney. Adaptive blocking: Learning to scale up record linkage. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM'06), 2006.
[10]
M. Bilenko and R.J. Mooney. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03), 2003.
[11]
L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceedings of the 2009 ACM CIKM International Conference on Information and Knowledge Management (CIKM'09), 2009.
[12]
Z.S. Chen, D.V. Kalashnikov, and S. Mehrotra. Exploiting context analysis for combining multiple entity resolution systems. In Proceedings of the 2009 ACM International Conference on Management of Data (SIGMOD'09), 2009.
[13]
W.W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02), 2002.
[14]
J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning (ICML'07), 2007.
[15]
J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1--30, 2006.
[16]
U.M. Diwekar. Introduction to applied optimization. Springer, 2003.
[17]
C. Domeniconi and M. Al-Razgan. Weighted cluster ensembles: Methods and analysis. ACM Transactions on Knowledge Discovery from Data, 2(4):1--40, 2009.
[18]
A. Gionis, H. Mannila, and P. Tsaparas. Clustering reference matching. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00), 2000.
[19]
V. Hatzivassiloglou, L. Gravano, and A. Maganti. An investigation of linguistic features and clustering algorithms for topical document clustering. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR'00), 2000.
[20]
M.A. Hernandez and S.J. Stolfo. The merge/purge problem for large databases. In Proceedings of the 1996 ACM International Conference on Management of Data (SIGMOD'96), 1995.
[21]
P. Heymann, G. Koutrika, and H. Garcia-Molina. Can social bookmarking improve web search? In Proceedings of the First ACM International Conference on Web Search and Data Mining (WSDM'08), Feb. 2008.
[22]
P. Heymann, D. Ramage, and H. Garcia-Molina. Social tag prediction. In Proceedings of the 31st ACM International Conference on Research and Development in Information Retrieval (SIGIR'08), July 2008.
[23]
G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning: Application in VLSI domain. In Proceedings of the 34th ACM Conference on Design Automation (DAC'97), 1997.
[24]
L. Kennedy and M. Naaman. Less talk, more rock: Automated organization of community-contributed collections of concert videos. In Proceedings of the 18th International World Wide Web Conference (WWW'09), 2009.
[25]
L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA'07), 2007.
[26]
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proceedings of the 27th ACM International Conference on Research and Development in Information Retrieval (SIGIR'04), 2004.
[27]
L. Liu, L. Sun, Y. Rui, Y. Shi, and S. Yang. Web video topic discovery and tracking via bipartite graph reinforcement model. In Proceedings of the 17th International World Wide Web Conference (WWW'08), 2008.
[28]
J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi. Simple semantics in topic detection and tracking. Information Retrieval, 7(3-4):347--368, 2004.
[29]
C.D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[30]
A. McCallum, K. Nigam, and L.H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00), 2000.
[31]
T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of the 30th ACM International Conference on Research and Development in Information Retrieval (SIGIR'07), pages 103--110, 2007.
[32]
S.E. Robertson and S. Walker. Okapi/Keenbow at TREC-8. In Proceedings of the Fourteenth Text REtrieval Conference (TREC-8), 1999.
[33]
R.W. Sinnott. Virtues of the Haversine. Sky and Telescope, 68:159, 1984.
[34]
A. Strehl, J. Ghosh, and C. Cardie. Cluster ensembles -- a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583--617, 2002.
[35]
S.C.A. Thomopoulos, D.K. Bougoulias, and C.-D. Wann. Dignet: an unsupervised-learning clustering algorithm for clustering and data fusion. IEEE Transactions on Aerospace Electronic Systems, 31:21--38, Jan. 1995.
[36]
I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2 edition, 2005.
[37]
E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems 15, 2002.
[38]
Y. Yang, J. Carbonell, R. Brown, T. Pierce, B.T. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intel ligent Systems Special Issue on Applications of Intel ligent Information Retrieval, 14(4):32--43, 1999.
[39]
Y. Yang, T. Pierce, and J. Carbonell. A study on retrospective and on-line event detection. In Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR'98), 1998.
[40]
K. Zhang, J. Zi, and L.G. Wu. New event detection based on indexing-tree and named entity. In Proceedings of the 30th ACM International Conference on Research and Development in Information Retrieval (SIGIR'07), 2007.
[41]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An effcient data clustering method for very large databases. In Proceedings of the 1996 ACM International Conference on Management of Data (SIGMOD'96), 1996.

Cited By

View all
  • (2024)ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social mediaSocial Network Analysis and Mining10.1007/s13278-024-01282-114:1Online publication date: 28-Jun-2024
  • (2024)A Graph Based-Novel Framework for Social Synchrony Detection Using Influential User and Event Detection ApproachCongress on Smart Computing Technologies10.1007/978-981-97-5081-8_37(491-506)Online publication date: 30-Oct-2024
  • (2024)Transforming Data Coming from Social Media Streams into Disaster‐Related InformationResponding to Extreme Weather Events10.1002/9781119741374.ch14(326-367)Online publication date: 2-Feb-2024
  • Show More Cited By

Index Terms

  1. Learning similarity metrics for event identification in social media

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '10: Proceedings of the third ACM international conference on Web search and data mining
    February 2010
    468 pages
    ISBN:9781605588896
    DOI:10.1145/1718487
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. event identification
    2. similarity metric learning
    3. social media

    Qualifiers

    • Research-article

    Conference

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social mediaSocial Network Analysis and Mining10.1007/s13278-024-01282-114:1Online publication date: 28-Jun-2024
    • (2024)A Graph Based-Novel Framework for Social Synchrony Detection Using Influential User and Event Detection ApproachCongress on Smart Computing Technologies10.1007/978-981-97-5081-8_37(491-506)Online publication date: 30-Oct-2024
    • (2024)Transforming Data Coming from Social Media Streams into Disaster‐Related InformationResponding to Extreme Weather Events10.1002/9781119741374.ch14(326-367)Online publication date: 2-Feb-2024
    • (2023)Learning Discriminative Text Representation for Streaming Social Event DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.311968635:12(12295-12309)Online publication date: 1-Dec-2023
    • (2023)Events management in social media: a systematic literature reviewSocial Network Analysis and Mining10.1007/s13278-023-01079-813:1Online publication date: 17-Apr-2023
    • (2023)Classification Thyroid Disease Using Multinomial Logistic Regressions (LR)The Effect of Information Technology on Business and Marketing Intelligence Systems10.1007/978-3-031-12382-5_34(645-659)Online publication date: 9-Feb-2023
    • (2023)Recent methods on short text stream clusteringWIREs Computational Statistics10.1002/wics.161015:6Online publication date: 3-Apr-2023
    • (2023)Axiomatic Analysis of Pre‐Processing Methodologies Using Machine Learning in Text MiningConvergence of Cloud with AI for Big Data Analytics10.1002/9781119905233.ch11(229-256)Online publication date: 10-Feb-2023
    • (2022)A Review on the Trends in Event Detection by Analyzing Social Media Platforms’ DataSensors10.3390/s2212453122:12(4531)Online publication date: 15-Jun-2022
    • (2022)Event Detection from Social Media Stream: Methods, Datasets and Opportunities2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020411(3509-3516)Online publication date: 17-Dec-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media