More Web Proxy on the site http://driver.im/

research-article

Public Access

Computational Fact Checking through Query Perturbations

Authors:

Pankaj K. Agarwal,

Cong YuAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 42, Issue 1

Article No.: 4, Pages 1 - 41

https://doi.org/10.1145/2996453

Published: 09 January 2017 Publication History

Abstract

Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “cherry-picking”? This article proposes a framework that models claims based on structured data as parameterized queries. Intuitively, with its choice of the parameter setting, a claim presents a particular (and potentially biased) view of the underlying data. A key insight is that we can learn a lot about a claim by “perturbing” its parameters and seeing how its conclusion changes. For example, a claim is not robust if small perturbations to its parameters can change its conclusions significantly. This framework allows us to formulate practical fact-checking tasks—reverse-engineering vague claims, and countering questionable claims—as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of “meta” algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

References

[1]

Charu C. Aggarwal (Ed.). 2009. Managing and Mining Uncertain Data. Springer.

Digital Library

[2]

Raju Balakrishnan and Subbarao Kambhampati. 2011. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 2011 International Conference on World Wide Web. 227--236.

Digital Library

[3]

Philip A. Bernstein and Laura M. Haas. 2008. Information integration in the enterprise. Commun. ACM 51, 9 (2008), 72--79.

Digital Library

[4]

Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 2001 International Conference on Data Engineering. 421--430.

Digital Library

[5]

Christian Buchta. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett. 33, 2 (1989), 63--65.

Digital Library

[6]

Surajit Chaudhuri. 1990. Generalization and a framework for query modification. In Proceedings of the 6th International Conference on Data Engineering, 1990. IEEE, 138--145.

Digital Library

[7]

Bernard Chazelle. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3 (1988), 427--462.

Digital Library

[8]

Wesley W. Chu, Qiming Chen, and Rei-Chi Lee. 1991. Cooperative Query Answering via Type Abstraction Hierarchy. Springer.

[9]

Sarah Cohen, James T. Hamilton, and Fred Turner. 2011a. Computational journalism. Commun. ACM 54, 10 (2011), 66--71.

Digital Library

[10]

Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011b. Computational journalism: A call to arms to database researchers. In Proceedings of the 2011 Conference on Innovative Data Systems Research.

Digital Library

[11]

Harish D., Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. In Proceedings of the 2008 International Conference on Very Large Data Bases. 1124--1140.

[12]

Nilesh N. Dalvi, Christopher Ré, and Dan Suciu. 2009. Probabilistic databases: Diamonds in the dirt. Commun. ACM 52, 7 (2009), 86--94.

Digital Library

[13]

Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In Proceedings of the 2010 International Conference on Database Theory. 89--103.

Digital Library

[14]

Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer.

[15]

AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann.

Digital Library

[16]

Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561.

Digital Library

[17]

Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. System Sci. 66, 4 (2003), 614--656.

Digital Library

[18]

Sumit Ganguly. 1998. Design and analysis of parametric query optimization algorithms. In Proceedings of the 1998 International Conference on Very Large Data Bases. 228--238.

Digital Library

[19]

Jim Giles. 2012. Truth goggles. The New Scientist 2882 (Sept. 2012), 44--47.

[20]

Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 1996 International Conference on Data Engineering. 152--159.

Digital Library

[21]

Dov Harel and Robert E. Tarjan. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2 (1984), 338--355.

Digital Library

[22]

Zhian He and Eric Lo. 2012. Answering why-not questions on top-k queries. In Proceedings of the 2012 International Conference on Data Engineering. 750--761.

Digital Library

[23]

Soon-Young Huh, Kae-Hyun Moon, and Hee-Seok Lee. 2000. A data abstraction approach for query relaxation. Inf. Softw. Technol. 42, 6 (2000), 407--418.

[24]

Arvind Hulgeri and S. Sudarshan. 2003. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. In Proceedings of the 2003 International Conference on Very Large Data Bases. 766--777.

Digital Library

[25]

Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. 1992. Parametric query optimization. In Proceedings of the 1992 International Conference on Very Large Data Bases. 103--114.

Digital Library

[26]

Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, and Peter J. Haas. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Database Syst. 36, 3 (2011), 18.

Digital Library

[27]

Christian S. Jensen and Richard Snodgrass. 1994. Temporal specialization and generalization. IEEE Trans. Knowl. Data Eng. 6, 6 (1994), 954--974.

Digital Library

[28]

Jia-Ling Koh, Kuang-Ting Chiang, and I.-Chih Chiu. 2013. The strategies for supporting query specialization and query generalization in social tagging systems. In Database Systems for Advanced Applications. Springer, 164--178.

Digital Library

[29]

Hsiang-Tsung Kung, Fabrizio Luccio, and Franco P. Preparata. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4 (1975), 469--476.

Digital Library

[30]

Xian Li, Weiyi Meng, and Clement T. Yu. 2011. T.-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 International Conference on Data Engineering. 63--74.

Digital Library

[31]

Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, and H. V. Jagadish. 2007. DaNaLIX: A domain-adaptive natural language interface for querying XML. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 1165--1168.

Digital Library

[32]

Yunyao Li, Huahai Yang, and H. V. Jagadish. 2006. Constructing a generic natural language interface for an XML database. In Proceedings of the 2006 International Conference on Extending Database Technology. 737--754.

Digital Library

[33]

Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Carolina Ruiz, and Matthew O. Ward. 2013. PARAS: A parameter space framework for online association mining. Proc. VLDB Endow. 6, 3 (2013), 193--204.

Digital Library

[34]

Kurt Mehlhorn and Stefan Näher. 1990. Dynamic fractional cascading. Algorithmica 5, 1--4 (1990), 215--241.

[35]

Kyriakos Mouratidis and HweeHwa Pang. 2012. Computing immutable regions for subspace top-k queries. Proc.VLDB Endow. 6, 2 (2012), 73--84.

Digital Library

[36]

Ana-Maria Popescu, Oren Etzioni, and Henry A. Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 2003 International Conference on Intelligent User Interfaces. 149--157.

Digital Library

[37]

Alexander J. Quinn and Benjamin B. Bederson. 2011. Human computation: A survey and taxonomy of a growing field. In Proceedings of the 2011 International Conference on Human Factors in Computing Systems. 1403--1412.

Digital Library

[38]

Sudeepa Roy and Dan Suciu. 2014. A formal approach to finding explanations for database queries. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1579--1590.

Digital Library

[39]

Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 805--816.

Digital Library

[40]

Robert Endre Tarjan. 1979. Applications of path compression on balanced trees. J. ACM 26, 4 (1979), 690--715.

Digital Library

[41]

Quoc Trung Tran and Chee-Yong Chan. 2010. How to ConQueR why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 15--26.

Digital Library

[42]

Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 535--548.

Digital Library

[43]

Brett Walenz and Jun Yang. 2016. Perturbation analysis of database queries. Proc. VLDB Endow 9, 14 (2016).

Digital Library

[44]

Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow. 6, 8 (June 2013), 553--564.

Digital Library

[45]

You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2012. On “one of the few” objects. In Proceedings of the 2012 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.

Digital Library

[46]

You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. iCheck: Computationally combating lies, d--ned lies, and statistics. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1063--1066.

Digital Library

[47]

Yusuke Yamamoto and Katsumi Tanaka. 2009. Finding comparative facts and aspects for judging the credibility of uncertain facts. In Proceedings of the 2009 International Conference on Web Information Systems Engineering. 291--305.

Digital Library

[48]

Yusuke Yamamoto, Taro Tezuka, Adam Jatowt, and Katsumi Tanaka. 2008. Supporting judgment of fact trustworthiness considering temporal and sentimental aspects. In Proceedings of the 2008 International Conference on Web Information Systems Engineering. 206--220.

Digital Library

[49]

Albert Yu, Pankaj K. Agarwal, and Jun Yang. 2012. Processing a large number of continuous preference top-k queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 397--408.

Digital Library

[50]

Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561.

Digital Library

Cited By

Fu YGuo SHoffswell JS. Bursztyn VRossi RStasko J(2024)"The Data Says Otherwise" — Towards Automated Fact-checking and Communication of Data ClaimsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676359(1-20)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676359
Seelam APaul Choudhury ALiu CGoay MBali KVashistha A(2024)"Fact-checks are for the Top 0.1%": Examining Reach, Awareness, and Relevance of Fact-Checking in Rural IndiaProceedings of the ACM on Human-Computer Interaction10.1145/36373338:CSCW1(1-34)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637333
Bussotti JVeltri ESantoro DPapotti P(2023)Generation of Training Examples for Tabular Natural Language InferenceProceedings of the ACM on Management of Data10.1145/36267301:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626730
Show More Cited By

Index Terms

Computational Fact Checking through Query Perturbations
1. Information systems
  1. Information systems applications

Recommendations

Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This paper introduces how ClaimBuster, a fact-checking platform, uses natural language processing and supervised learning to detect important factual claims in political discourses. The claim spotting model is built using a human-labeled dataset of ...
Toward computational fact-checking

Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-...
Detecting Check-worthy Factual Claims in Presidential Debates
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Public figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 42, Issue 1

Invited Paper from ICDT 2014, Invited Paper from EDBT 2015, Regular Papers and Technical Correspondence

March 2017

263 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/3015779

Editor:
Christian S. Jensen
Aalborg University, Denmark

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2017

Accepted: 01 September 2016

Revised: 01 May 2016

Received: 01 June 2015

Published in TODS Volume 42, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,122
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)12

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fu YGuo SHoffswell JS. Bursztyn VRossi RStasko J(2024)"The Data Says Otherwise" — Towards Automated Fact-checking and Communication of Data ClaimsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676359(1-20)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676359
Seelam APaul Choudhury ALiu CGoay MBali KVashistha A(2024)"Fact-checks are for the Top 0.1%": Examining Reach, Awareness, and Relevance of Fact-Checking in Rural IndiaProceedings of the ACM on Human-Computer Interaction10.1145/36373338:CSCW1(1-34)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637333
Bussotti JVeltri ESantoro DPapotti P(2023)Generation of Training Examples for Tabular Natural Language InferenceProceedings of the ACM on Management of Data10.1145/36267301:4(1-27)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626730
Advani RPapotti PAsudeh ASingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Maximizing Neutrality in News OrderingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599425(11-24)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599425
Veltri EBadaro GSaeed MPapotti P(2023)Data Ambiguity Profiling for the Generation of Training Examples2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00041(450-463)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00041
Boland KFafalios PTchechmedjiev ADietze STodorov K(2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
https://doi.org/10.3233/SW-212838
Lin YYoungmann BMoskovitch YJagadish HMilo T(2022)OREOProceedings of the VLDB Endowment10.14778/3554821.355484615:12(3570-3573)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.14778/3554821.3554846
Lin YYoungmann BMoskovitch YJagadish HMilo T(2022)On detecting cherry-picked generalizationsProceedings of the VLDB Endowment10.14778/3485450.348545715:1(59-71)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.14778/3485450.3485457
Diao YGuzewicz PManolescu IMazuran MLi GLi ZIdreos SSrivastava D(2021)Efficient Exploration of Interesting Aggregates in RDF GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457307(392-404)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457307
Ahmed SBalla KHinkelmann KCorradini F(2021)Fact Checking: Detection of Check Worthy Statements Through Support Vector Machine and Feed Forward Neural NetworkAdvances in Information and Communication10.1007/978-3-030-73103-8_37(520-535)Online publication date: 16-Apr-2021
https://doi.org/10.1007/978-3-030-73103-8_37
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents