[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2998476.2998480acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomputeConference Proceedingsconference-collections
research-article

Efficient Multi-depth Querying on Provenance of Relational Queries Using Graph Database

Published: 21 October 2016 Publication History

Abstract

Data Provenance is the history associated with that data. It constitutes the origin, creation, processing, and archiving of data. In today's Internet era, it has gained significant importance for database analytics. Most of the provenance models store provenance information in relational databases for further querying and analysis. Although, querying of provenance in Relational Databases is very efficient for small data sets, it becomes inefficient as the provenance data grows and traversal depth of provenance query increases. This is mainly due to increase in number of join operations to search the entire provenance data. Graph Databases provide an alternative to RDBMSs for storing and analyzing provenance data as it can scale to billions of nodes and at the same time traverse thousands of relationships efficiently. In this paper, we propose efficient multi-depth querying of provenance data using graph databases. The proposed solution allows efficient querying of provenance of current as well as historical queries. A comparison between relational and graph databases is presented for varying provenance data size and traversal depths. Graph databases are found to scale well with increasing depth of provenance queries, whereas in relational databases the querying time increases exponentially.

References

[1]
Arab, B., Gawlick, D., Radhakrishnan, V., Guo, H., and Glavic, B. 2014. A Generic Provenance Middleware for Queries, Updates, and Transactions. In TaPP '14: 6th USENIX Workshop on the Theory and Practice of Provenance.
[2]
Bhargava, G., and Gadia. S. K. 1993. Relational Database Systems with Zero Information Loss. In IEEE Transactions on Knowledge and Data Engineering, vol. 5, Issue. 1, pages 76--87. DOI=http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=204093
[3]
Buneman, P., Khanna, S., and Tan., W.C. 2000. Data provenance some basic issues. In proceeding of Foundations of Software Technology and Theoretical Computer Science, pages 87--93. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.7132
[4]
Buneman, P., Khanna, S., and Tan., W.C. 2001. Why and Where: A Characterization of Data Provenance. In ICDT, LECT NOTES COMPUT SC, pages 316--330. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.1848
[5]
Buneman, P., and Davidson, S.B. 2010. Data provenance -- the foundation of data quality. A Technical Report, September. DOI= http://icnjp.net/reading/data-provenance-the-foundation-of-data-quality-wOhg.html
[6]
Chiticariu, L., Tan., W.C., and Vijayvargiya., G. 2005. DBNotes: A Post-It System for Relational Databases based on Provenance. In SIGMOD, pages 942--944.
[7]
Cui, Y., Widom, J., and Wiener., J. L. 2000. Tracing the Lineage of View Data in a Warehousing Environment. In TODS, Volume 25 Issue 2, pages 179--227.
[8]
Geerts, F., and Kementsietsidis, A. 2006. MONDRIAN: Annotating and querying databases through colors and blocks, In ICDE, pages 82--91.
[9]
Glavic, B., and Alonso., G. 2009. Perm: Processing provenance and data on the same data model through query rewriting. In ICDE, pages 174--185. DOI=http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=4812401
[10]
Glavic, B., and Miller, R.J. 2011. Reexamining Some Holy Grails of Data Provenance. In TaPP '11: 3rd USENIX Workshop on the Theory and Practice of Provenance, pages 1--6.
[11]
Glavic, B. 2012. Big Data Provenance: Challenges and Implications for Benchmarking. In Springer LNCS 8163, pages 72--80. DOI=http://cs.iit.edu/~dbgroup/php/bibtexbrowser.php?key=G13&bib=.%2Ffiles%2Fdbgroup.bib
[12]
Glavic, B., Alonso, G., and Miller, R.J. 2013. Using SQL for Efficient Generation and Querying of Provenance Information. In Springer LNCS 8000, pages 291--320. DOI=http://cs.iit.edu/~dbgroup/pdfpubls/GM13.pdf
[13]
Green, T. J., Karvounarakis, G., Ives, Z. G., and Tannen, V. 2007. Update Exchange with Mappings and Provenance. In VLDB, pages 675--686.
[14]
Green, T. J., Karvounarakis, G., and Tannen, V. 2007. Provenance Semirings. In PODS, pages 31--40. DOI=http://dl.acm.org/citation.cfm?id=1265535
[15]
Green, T. J., Karvounarakis, G., Ives, Z. G., and Tannen, V. 2010. Provenance in Orchestra. In IEEE Data Eng. Bull., 33(3), pages 9--16. DOI=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174.5428
[16]
Karvounarakis, G., Tannen, V., and Ives, Z.G. 2010. Querying Data Provenance. In SIGMOD, pages 951--962.
[17]
Karvounarakis, G., and Green, T.J. 2012. Semiring-Annotated Data: Queries and Provenance. In SIGMOD, Volume 41, Issue 3, pages 05--14.
[18]
Kirby, G.N.C., de Kerckhove, C., Shumailov, I., Carson, J.K., Dearle, A., Dibben, C.J.L. & Williamson, L. 2014 Comparing relational and graph databases for pedigree data sets. Paper presented at Workshop on Population Reconstruction, Amsterdam, Netherlands.
[19]
Korolev, V., and Joshi, A. 2014. PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments. In Proceeding of Workshop on Reproducible Research Methodologies (REPRODUCE'14).
[20]
Rani, A., and Thalia, S. 2014. Knowledge driven decision support system for provenance models in relational database. In Proc. ICDSE, pages 68--75. DOI=http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6974614
[21]
Rani, A., Goyal, N., and, Gadia, S.K. 2015. Data Provenance for Historical Queries in Relational Database. In ACM COMPUTE, pages 117--122. DOI=http://dx.doi.org/10.1145/2835043.2835047
[22]
Sarma, A. D., Theobald, M. and Widom, J.2008. Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic databases. In ICDE, vol 29, pages 1023--1032. DOI=http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=4497511
[23]
Simmhan, Y. L., Plale, B., and Gannon, D. 2005. Survey of data Provenance in e-science. In SIGMOD, vol 34, pages 31--36.
[24]
Vicknair, C. 2010. A comparison of a graph database and a relational database: a data provenance perspective. In Proceedings of the 48th Annual Southeast Regional Conference, ACM SE'10, Article no: 42. DOI= http://dl.acm.org/citation.cfm?id=1900067
[25]
Widom, J. 2005. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In CIDR, pages 262--276. DOI=http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.153.9613
[26]
Neo4j Graph Database http://neo4j.com/developer/get-started/
[27]
The TPC-H Benchmark. http://www.tpc.org/tpch/spec/tpch2.7.0.pdf, page 12.

Cited By

View all
  • (2023)Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented ProcessingModel and Data Engineering10.1007/978-3-031-49333-1_14(186-199)Online publication date: 22-Dec-2023
  • (2022)Social data provenance framework based on zero-information loss graph databaseSocial Network Analysis and Mining10.1007/s13278-022-00889-612:1Online publication date: 3-Jul-2022
  • (2021)Provenance Framework for Twitter Data using Zero-Information Loss Graph DatabaseProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3431014(74-82)Online publication date: 2-Jan-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
COMPUTE '16: Proceedings of the 9th Annual ACM India Conference
October 2016
178 pages
ISBN:9781450348089
DOI:10.1145/2998476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DPHQ
  2. Data Provenance
  3. Graph Database
  4. Neo4j
  5. Provenance Querying
  6. Query Inversion
  7. Relational Database
  8. TPC-H
  9. ZILD
  10. improved DPHQ

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACM COMPUTE '16
ACM COMPUTE '16: Ninth Annual ACM India Conference
October 21 - 23, 2016
Gandhinagar, India

Acceptance Rates

COMPUTE '16 Paper Acceptance Rate 22 of 117 submissions, 19%;
Overall Acceptance Rate 114 of 622 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented ProcessingModel and Data Engineering10.1007/978-3-031-49333-1_14(186-199)Online publication date: 22-Dec-2023
  • (2022)Social data provenance framework based on zero-information loss graph databaseSocial Network Analysis and Mining10.1007/s13278-022-00889-612:1Online publication date: 3-Jul-2022
  • (2021)Provenance Framework for Twitter Data using Zero-Information Loss Graph DatabaseProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3431014(74-82)Online publication date: 2-Jan-2021
  • (2021)Big social data provenance framework for Zero-Information Loss Key-Value Pair (KVP) DatabaseInternational Journal of Data Science and Analytics10.1007/s41060-021-00287-914:1(65-87)Online publication date: 9-Nov-2021
  • (2021)Twitter Data Modelling and Provenance Support for Key-Value Pair DatabasesDatabases Theory and Applications10.1007/978-3-030-69377-0_8(87-98)Online publication date: 10-Feb-2021
  • (2020)Food Safety Network for Detecting Adulteration in Unsealed Food Products Using Topological OrderingIntelligent Information and Database Systems10.1007/978-3-030-42058-1_38(451-463)Online publication date: 4-Mar-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media