[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Online data fusion

Published: 01 August 2011 Publication History

Abstract

The Web contains a significant volume of structured data in various domains, but a lot of data are dirty and erroneous, and they can be propagated through copying. While data integration techniques allow querying structured data on the Web, they take the union of the answers retrieved from different sources and can thus return conflicting information. Data fusion techniques, on the other hand, aim to find the true values, but are designed for offline data aggregation and can take a long time.
This paper proposes Solaris, the first online data fusion system. It starts with returning answers from the first probed source, and refreshes the answers as it probes more sources and applies fusion techniques on the retrieved data. For each returned answer, it shows the likelihood that the answer is correct, and stops retrieving data for it after gaining enough confidence that data from the unprocessed sources are unlikely to change the answer. We address key problems in building such a system and show empirically that the system can start returning correct answers quickly and terminate fast without sacrificing the quality of the answers.

References

[1]
L. Berti-Equille. Quality Awareness for Managing and Mining Data. PhD thesis, Universite de Rennes 1, 2007.
[2]
L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Probabilistic models to reconcile complex data from inaccurate data sources. In CAiSE, 83--97, 2010.
[3]
J. Bleiholder, S. Khuller, F. Naumann, L. Raschid, and Y. Wu. Query planning in the presence of overlapping sources. In EDBT, 811--828, 2006.
[4]
M. J. Cafarella, A. Y. Halevy, and J. Madhavan. Structured data on the web. Commun. ACM, 54(2):72--79, 2011.
[5]
X. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 3(1):1358--1369, 2010.
[6]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. PVLDB, 2(1):550--561, 2009.
[7]
X. L. Dong and F. Naumann. Data fusion - resolving data conflicts for integration. PVLDB, 2(2):1654--1655, 2009.
[8]
A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In WSDM, 131--140, 2010.
[9]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, 171--182, 1997.
[10]
G. A. Mihaila, L. Raschid, and M.-E. Vidal. Using quality of data metadata for source selection and ranking. In WebDB, 93--98, 2000.
[11]
F. Naumann. Quality-Driven Query Answering for Integrated Information Systems. Springer, 2002.
[12]
A. D. Sarma, X. L. Dong, and A. Y. Halevy. Data integration with dependent sources. In EDBT, 401--412, 2011.
[13]
M. A. Suryanto, E.-P. Lim, A. Sun, and R. H. L. Chiang. Quality-aware collaborative question answering: methods and evaluation. In WSDM, 142--151, 2009.
[14]
M. Wu and A. Marian. A framework for corroborating answers from multiple web sources. Inf. Syst., 36(2):431--449, 2011.
[15]
N. K. Yeganeh, S. Sadiq, K. Deng, and X. Zhou. Data quality aware queries in collaborative information systems. Lecture Notes in Computer Science, 5446:39--50, 2009.
[16]
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng., 20:796--808, 2008.

Cited By

View all
  • (2024)Generalizing truth discovery by incorporating multi-truth featuresComputing10.1007/s00607-024-01288-9106:5(1557-1583)Online publication date: 1-May-2024
  • (2022)Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarityThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00757-x32:3(475-500)Online publication date: 19-Jul-2022
  • (2020)From Appearance to EssenceACM Transactions on Intelligent Systems and Technology10.1145/341174911:6(1-24)Online publication date: 11-Sep-2020
  • Show More Cited By
  1. Online data fusion

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 4, Issue 11
    August 2011
    520 pages

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2011
    Published in PVLDB Volume 4, Issue 11

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Generalizing truth discovery by incorporating multi-truth featuresComputing10.1007/s00607-024-01288-9106:5(1557-1583)Online publication date: 1-May-2024
    • (2022)Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarityThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00757-x32:3(475-500)Online publication date: 19-Jul-2022
    • (2020)From Appearance to EssenceACM Transactions on Intelligent Systems and Technology10.1145/341174911:6(1-24)Online publication date: 11-Sep-2020
    • (2019)MedTruthProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357934(719-728)Online publication date: 3-Nov-2019
    • (2018)Domain-aware multi-truth discovery from conflicting sourcesProceedings of the VLDB Endowment10.1145/3187009.317773911:5(635-647)Online publication date: 1-Jan-2018
    • (2018)Domain-aware multi-truth discovery from conflicting sourcesProceedings of the VLDB Endowment10.1145/3177732.317773911:5(635-647)Online publication date: 5-Oct-2018
    • (2018)Source Selection for Inconsistency DetectionDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_22(370-385)Online publication date: 21-May-2018
    • (2017)Staging User Feedback toward Rapid Conflict Resolution in Data FusionProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035941(603-618)Online publication date: 9-May-2017
    • (2016)A Survey on Truth DiscoveryACM SIGKDD Explorations Newsletter10.1145/2897350.289735217:2(1-16)Online publication date: 25-Feb-2016
    • (2015)Truth discovery and crowdsourcing aggregationProceedings of the VLDB Endowment10.14778/2824032.28241368:12(2048-2049)Online publication date: 1-Aug-2015
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media