[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2463676.2463680acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

GeoDeepDive: statistical inference using familiar data-processing languages

Published: 22 June 2013 Publication History

Abstract

We describe our proposed demonstration of GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles. This requires solving a host of classical data management challenges including data acquisition (e.g., from scanned documents), data extraction, and data integration. SIGMOD attendees will see demonstrations of three aspects of our system: (1) an end-to-end system that is of a high enough quality to perform novel geological science, but is written by a small enough team so that each aspect can be manageably explained; (2) a simple feature engineering system that allows a user to write in familiar SQL or Python; and (3) the effect of different sources of feedback on result quality including expert labeling, distant supervision, traditional rules, and crowd-sourced data.
Our prototype builds on our work integrating statistical inference and learning tools into traditional database systems. If successful, our demonstration will allow attendees to see that data processing systems that use machine learning contain many familiar data processing problems such as efficient querying, indexing, and supporting tools for database-backed websites, none of which are machine-learning problems, per se.

References

[1]
I. Halevy et al. Sulfate burial constraints on the Phanerozoic sulfur cycle. Science, 2012.
[2]
J. Hellerstein et al. The MADlib analytics library or MAD skills, the SQL. In PVLDB, 2012.
[3]
F. Niu et al. Tuffy: Scaling up statistical inference in Markov logic networks using an RDBMS. PVLDB, 2011.
[4]
F. Niu et al. DeepDive: Web-scale knowledge-base construction using statistical learning and inference. In VLDS, 2012.
[5]
S. Peters et al. Formation of the 'Great Unconformity' as a trigger for the Cambrian explosion. Nature, 2012.
[6]
C. Zhang et al. Big data versus the crowd: Looking for relationships in all the right places. In ACL, 2012.
[7]
C. Zhang et al. Towards high-throughput Gibbs sampling at scale: A study across storage managers. SIGMOD, 2013.

Cited By

View all
  • (2024)GeoKnowledgeFusion: A Platform for Multimodal Data Compilation from Geoscience LiteratureRemote Sensing10.3390/rs1609148416:9(1484)Online publication date: 23-Apr-2024
  • (2023)Data science for geoscience: Recent progress and future trends from the perspective of a data life cycleRecent Advancement in Geoinformatics and Data Science10.1130/2022.2558(05)(57-69)Online publication date: 22-Mar-2023
  • (2023)Climate paleogeography knowledge graph and deep time paleoclimate classificationsGeoscience Frontiers10.1016/j.gsf.2022.10145014:5(101450)Online publication date: Sep-2023
  • Show More Cited By

Index Terms

  1. GeoDeepDive: statistical inference using familiar data-processing languages

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
    June 2013
    1322 pages
    ISBN:9781450320375
    DOI:10.1145/2463676
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. demonstration
    2. geoscience
    3. statistical inference

    Qualifiers

    • Demonstration

    Conference

    SIGMOD/PODS'13
    Sponsor:

    Acceptance Rates

    SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GeoKnowledgeFusion: A Platform for Multimodal Data Compilation from Geoscience LiteratureRemote Sensing10.3390/rs1609148416:9(1484)Online publication date: 23-Apr-2024
    • (2023)Data science for geoscience: Recent progress and future trends from the perspective of a data life cycleRecent Advancement in Geoinformatics and Data Science10.1130/2022.2558(05)(57-69)Online publication date: 22-Mar-2023
    • (2023)Climate paleogeography knowledge graph and deep time paleoclimate classificationsGeoscience Frontiers10.1016/j.gsf.2022.10145014:5(101450)Online publication date: Sep-2023
    • (2023) GeoDeepShovel : A platform for building scientific database from geoscience literature with AI assistance Geoscience Data Journal10.1002/gdj3.18610:4(519-537)Online publication date: 28-Feb-2023
    • (2022)Climate matching with the climatchR R packageEnvironmental Modelling & Software10.1016/j.envsoft.2022.105510157(105510)Online publication date: Nov-2022
    • (2021)Deep Hash-based Relevance-aware Data Quality Assessment for Image Dark DataACM/IMS Transactions on Data Science10.1145/34200382:2(1-26)Online publication date: 8-Apr-2021
    • (2021)Understanding ore-forming conditions using machine reading of textOre Geology Reviews10.1016/j.oregeorev.2021.104200135(104200)Online publication date: Aug-2021
    • (2021)Auto-labelling entities in low-resource text: a geological case studyKnowledge and Information Systems10.1007/s10115-020-01532-6Online publication date: 15-Jan-2021
    • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
    • (2020)Disease surveillance using online newsJournal of Biomedical Informatics10.1016/j.jbi.2020.103374102:COnline publication date: 1-Feb-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media