[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2783258.2783339acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Modeling Truth Existence in Truth Discovery

Published: 10 August 2015 Publication History

Abstract

When integrating information from multiple sources, it is common to encounter conflicting answers to the same question. Truth discovery is to infer the most accurate and complete integrated answers from conflicting sources. In some cases, there exist questions for which the true answers are excluded from the candidate answers provided by all sources. Without any prior knowledge, these questions, named no-truth questions, are difficult to be distinguished from the questions that have true answers, named has-truth questions. In particular, these no-truth questions degrade the precision of the answer integration system. We address such a challenge by introducing source quality, which is made up of three fine-grained measures: silent rate, false spoken rate and true spoken rate. By incorporating these three measures, we propose a probabilistic graphical model, which simultaneously infers truth as well as source quality without any a priori training involving ground truth answers. Moreover, since inferring this graphical model requires parameter tuning of the prior of truth, we propose an initialization scheme based upon a quantity named truth existence score, which synthesizes two indicators, namely, participation rate and consistency rate. Compared with existing methods, our method can effectively filter out no-truth questions, which results in more accurate source quality estimation. Consequently, our method provides more accurate and complete answers to both has-truth and no-truth questions. Experiments on three real-world datasets illustrate the notable advantage of our method over existing state-of-the-art truth discovery methods.

References

[1]
C. M. Bishop et al. Pattern recognition and machine learning, volume 1. Springer New York, 2006.
[2]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1--38, 1977.
[3]
X. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1):550--561, 2009.
[4]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, 2014.
[5]
A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proc. of WSDM, 2010.
[6]
H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In ACL, 2011.
[7]
Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 2014.
[8]
Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, 2014.
[9]
X. Li, X. L. Dong, K. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: is the problem solved? PVLDB, 2012.
[10]
S. Mukherjee, G. Weikum, and C. Danescu-Mizil. People on drugs: credibility of user statements in health communities. In KDD, 2014.
[11]
J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, 2010.
[12]
J. Pasternack and D. Roth. Latent credibility analysis. WWW, 2013.
[13]
G.-J. Qi, C. C. Aggarwal, J. Han, and T. Huang. Mining collective intelligence in diverse groups. In WWW, 2013.
[14]
M. Surdeanu and H. Ji. Overview of the english slot filling track at the tac2014 knowledge base population evaluation. In TAC, 2014.
[15]
V. Vydiswaran, C. Zhai, and D. Roth. Content-driven trust propagation framework. In Proc. of SIGKDD, 2011.
[16]
D. A. Waguih and L. Berti-Equille. Truth discovery algorithms: An experimental evaluation. arXiv preprint arXiv:1409.6428, 2014.
[17]
D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: A maximum likelihood estimation approach. IPSN, 2012.
[18]
Z. Wang, Q. Gu, Y. Ning, and H. Liu. High dimensional expectation-maximization algorithm: Statistical optimization and asymptotic normality. arXiv arXiv preprint:1412.8729, 2014.
[19]
X. Yin, J. Han, and P. Yu. Truth discovery with multiple conflicting information providers on the web. TKDE, 20(6):796--808, 2008.
[20]
D. Yu, H. Huang, T. Cassidy, H. Ji, C. Wang, S. Zhi, J. Han, C. Voss, and M. Magdon-Ismail. The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding. In COLING. ACM, 2014.
[21]
B. Zhao, B. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550--561, 2012.

Cited By

View all
  • (2024)Hypergraph-based Truth Discovery for Sparse Data in Mobile CrowdsensingACM Transactions on Sensor Networks10.1145/364989420:3(1-23)Online publication date: 28-Feb-2024
  • (2024)Claim polarity analysis from conflicting sourcesInternational Journal of Data Science and Analytics10.1007/s41060-024-00634-6Online publication date: 7-Oct-2024
  • (2024)Generalizing truth discovery by incorporating multi-truth featuresComputing10.1007/s00607-024-01288-9106:5(1557-1583)Online publication date: 22-Apr-2024
  • Show More Cited By

Index Terms

  1. Modeling Truth Existence in Truth Discovery

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information extraction
    2. information integration system
    3. knowledge base
    4. knowledge graph
    5. probabilistic graphical model
    6. source quality
    7. truth discovery
    8. truth existence
    9. truth finding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 26 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hypergraph-based Truth Discovery for Sparse Data in Mobile CrowdsensingACM Transactions on Sensor Networks10.1145/364989420:3(1-23)Online publication date: 28-Feb-2024
    • (2024)Claim polarity analysis from conflicting sourcesInternational Journal of Data Science and Analytics10.1007/s41060-024-00634-6Online publication date: 7-Oct-2024
    • (2024)Generalizing truth discovery by incorporating multi-truth featuresComputing10.1007/s00607-024-01288-9106:5(1557-1583)Online publication date: 22-Apr-2024
    • (2023)PrivTDSI: A Local Differentially Private Approach for Truth Discovery via Sampling and InferenceIEEE Transactions on Big Data10.1109/TBDATA.2022.31861759:2(471-484)Online publication date: 1-Apr-2023
    • (2022)Enabling Efficient and Strong Privacy-Preserving Truth Discovery in Mobile CrowdsensingIEEE Transactions on Information Forensics and Security10.1109/TIFS.2022.320790517(3569-3581)Online publication date: 2022
    • (2022)Towards an axiomatic approach to truth discoveryAutonomous Agents and Multi-Agent Systems10.1007/s10458-022-09569-336:2Online publication date: 1-Oct-2022
    • (2022)Truth validation with evidenceKnowledge and Information Systems10.1007/s10115-022-01663-y64:5(1187-1209)Online publication date: 15-Mar-2022
    • (2022)Privacy-Preserving Truth Discovery with Task HidingPrivacy-Preserving in Mobile Crowdsensing10.1007/978-981-19-8315-3_7(169-192)Online publication date: 21-Dec-2022
    • (2022)IntroductionKnowledge Discovery from Multi-Sourced Data10.1007/978-981-19-1879-7_1(1-11)Online publication date: 14-Jun-2022
    • (2021)Expertise-Aware Truth Analysis and Task Allocation in Mobile CrowdsourcingIEEE Transactions on Mobile Computing10.1109/TMC.2019.295568820:3(1001-1016)Online publication date: 1-Mar-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media