[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

TQEL: framework for query-driven linking of top-k entities in social media blogs

Published: 01 July 2021 Publication History

Abstract

Social media analysis over blogs (such as tweets) often requires determining top-k mentions of a certain category (e.g., movies) in a collection (e.g., tweets collected over a given day). Such queries require entity linking (EL) function to be executed that is often expensive. We propose TQEL, a framework that minimizes the joint cost of EL calls and top-k query processing. The paper presents two variants - TQEL-exact and TQEL-approximate that retrieve the exact / approximate top-k results. TQEL-approximate, using a weaker stopping condition, achieves significantly improved performance (with the fraction of the cost of TQEL-exact) while providing strong probabilistic guarantees (over 2 orders of magnitude lower EL calls with 95% confidence threshold compared to TQEL-exact). TQEL-exact itself is orders of magnitude better compared to a naive approach that calls EL functions on the entire dataset.

References

[1]
2021. Apache Lucene. https://lucene.apache.org/.
[2]
2021. Wikipedia. https://www.wikipedia.org.
[3]
2021. Wikipedia:Database download. https://en.wikipedia.org/wiki/Wikipedia:Database_download.
[4]
Hotham Altwaijry, Sharad Mehrotra, and Dmitri V Kalashnikov. 2015. Query: A framework for integrating entity resolution with query processing. Proceedings of the VLDB Endowment 9, 3 (2015), 120--131.
[5]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722--735.
[6]
Ron Avnur and Joseph M Hellerstein. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 261--272.
[7]
Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, and Rajeev Motwani. 2003. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 313--324.
[8]
Eleonora Ciceri, Piero Fraternali, Davide Martinenghi, and Marco Tagliasacchi. 2015. Crowdsourcing for top-k query processing over uncertain data. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2015), 41--53.
[9]
Graham Cormode, Feifei Li, and Ke Yi. 2009. Semantics of ranking queries for probabilistic data and expected ranks. In 2009 IEEE 25th International Conference on Data Engineering. IEEE, 305--316.
[10]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st international conference on World Wide Web. 469--478.
[11]
Eyal Dushkin and Tova Milo. 2018. Top-k sorting under partial order information. In Proceedings of the 2018 International Conference on Management of Data. 1007--1019.
[12]
MS Fabian, K Gjergji, WEIKUM Gerhard, et al. 2007. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In 16th International World Wide Web Conference, WWW. 697--706.
[13]
Paolo Ferragina and Ugo Scaiella. 2010. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management. 1625--1628.
[14]
Abhishek Gattani, Digvijay S Lamba, Nikesh Garera, Mitul Tiwari, Xiaoyong Chai, Sanjib Das, Sri Subramaniam, Anand Rajaraman, Venky Harinarayan, and AnHai Doan. 2013. Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach. Proceedings of the VLDB Endowment 6, 11 (2013), 1126--1137.
[15]
Stella Giannakopoulou, Manos Karpathiotakis, and Anastasia Ailamaki. 2020. Cleaning Denial Constraint Violations through Relaxation. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 805--815.
[16]
Stephen Guo, Ming-Wei Chang, and Emre Kiciman. 2013. To link or not to link? a study on end-to-end tweet entity linking. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1020--1030.
[17]
Xianpei Han and Jun Zhao. 2009. NLPR_KBP in TAC 2009 KBP Track: A Two-Stage Method to Entity Linking. In TAC. Citeseer.
[18]
Ming Hua, Jian Pei, Wenjie Zhang, and Xuemin Lin. 2008. Ranking queries on uncertain data: a probabilistic threshold approach. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 673--686.
[19]
Wen Hua, Kai Zheng, and Xiaofang Zhou. 2015. Microblog entity linking with social temporal context. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1761--1775.
[20]
Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR) 40, 4 (2008), 1--58.
[21]
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher Jermaine, and Peter J Haas. 2008. MCDB: a monte carlo approach to managing uncertain data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 687--700.
[22]
Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. Overview of the TAC 2010 knowledge base population track. In Third text analysis conference (TAC 2010), Vol. 3. 3--3.
[23]
Jongwuk Lee, Dongwon Lee, and Seung-won Hwang. 2017. CrowdK: Answering top-k queries with crowdsourcing. Information Sciences 399 (2017), 98--120.
[24]
Rui Li, Shengjie Wang, and Kevin Chen-Chuan Chang. 2013. Towards social data platform: Automatic topic-focused monitor for twitter stream. Proceedings of the VLDB Endowment 6, 14 (2013), 1966--1977.
[25]
Yan Li, Hao Wang, Ngai Meng Kou, Zhiguo Gong, et al. 2020. Crowdsourced top-k queries by pairwise preference judgments with confidence and budget control. The VLDB Journal (2020), 1--25.
[26]
Xin Lin, Jianliang Xu, Haibo Hu, and Zhe Fan. 2017. Reducing Uncertainty of Probabilistic Top-k Ranking via Pairwise Crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 29, 10 (2017), 2290--2303.
[27]
Sean Monahan, John Lehmann, Timothy Nyberg, Jesse Plymale, and Arnold Jung. 2011. Cross-Lingual Cross-Document Coreference with Entity Linking. In TAC.
[28]
Christopher Re, Nilesh Dalvi, and Dan Suciu. 2007. Efficient top-k query evaluation on probabilistic data. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 886--895.
[29]
Mehdi Sadri, Sharad Mehrotra, and Yaming Yu. 2016. Online adaptive topic focused tweet acquisition. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2353--2358.
[30]
Wei Shen, Jianyong Wang, and Jiawei Han. 2014. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (2014), 443--460.
[31]
Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2012. Linden: linking named entities with knowledge base via semantic knowledge. In Proceedings of the 21st international conference on World Wide Web. 449--458.
[32]
Wei Shen, Jianyong Wang, Ping Luo, and Min Wang. 2013. Linking named entities in tweets with knowledge base via user interest modeling. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 68--76.
[33]
Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information processing & management 45, 4 (2009), 427--437.
[34]
Mohamed A Soliman, Ihab F Ilyas, and Kevin Chen-Chuan Chang. 2007. Top-k query processing in uncertain databases. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 896--905.
[35]
Mohamed A Soliman, Ihab F Ilyas, and Kevin Chen-Chuan Chang. 2008. Probabilistic top-k and ranking-aggregate queries. ACM Transactions on Database Systems (TODS) 33, 3 (2008), 1--54.
[36]
Alaa Tharwat. 2020. Classification assessment methods. Applied Computing and Informatics (2020).
[37]
Vasilis Verroios and Hector Garcia-Molina. 2019. Top-k entity resolution with adaptive locality-sensitive hashing. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1718--1721.
[38]
Wei Zhang, Chew Lim Tan, Yan Chuan Sim, and Jian Su. 2010. NUS-I2R: Learning a Combined System for Entity Linking. In TAC.
[39]
Xi Zhang and Jan Chomicki. 2009. Semantics and evaluation of top-k queries in probabilistic databases. Distributed and parallel databases 26, 1 (2009), 67--126.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 11
July 2021
732 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2021
Published in PVLDB Volume 14, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media