[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2744680.2744691acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Provenance-Driven Data Curation Workflow Analysis

Published: 31 May 2015 Publication History

Abstract

Manually designed workflows can be error-prone and inefficient. Workflow provenance contains fine-grained data processing information that can be used to detect workflow design problems. In this paper, we propose a provenance-driven workflow analysis framework that exploits both prospective and retrospective provenance. We show how provenance information can help the user gain a deeper understanding of a workflow and provide the user with insights into how to improve workflow design.

References

[1]
Barker, A. and Hemert, J. Van Scientific workflow: a survey and research directions. Parallel Processing and Applied Mathematics. (2008), 746--753.
[2]
Batini, C. and Scannapieco, M. Data quality: concepts, methodologies and techniques. Springer, 2006.
[3]
Bowers, S. Scientific Workflow, Provenance, and Data Modeling Challenges and Approaches. Journal on Data Semantics. 1, 1 (Apr. 2012), 19--30.
[4]
Bowers, S. and Ludäscher, B. Actor-oriented design of scientific workflows. Lecture Notes in Computer Science. 3716, (2005), 369--384.
[5]
Brogi, A., Corfini, S. and Popescu, R. Semantics-based composition-oriented discovery of Web services. ACM Transactions on Internet Technology. 8, 4 (Sep. 2008), 1--39.
[6]
Callahan, S.P., Freire, J., Santos, E., et al. VisTrails: visualization meets data management. Proceedings of the 2006 ACM SIGMOD international conference on management of data. (2006), 745--747.
[7]
Cheney, J., Finkelstein, A., Ludäscher, B., et al. Principles of Provenance (Dagstuhl Seminar 12091). Dagstuhl Reports. 2, 2 (2012).
[8]
Clifford, B. and Foster, I. Tracking provenance in a virtual data grid. Concurrency and Computation: Practice and Experience. 20, 5 (2008), 565--575.
[9]
Cohen-Boulakia, S., Chen, J., Missier, P., et al. Distilling structure in Taverna scientific workflows: a refactoring approach. BMC bioinformatics. 15, Suppl 1 (Jan. 2014), S12.
[10]
Consortium, B. Interoperability with Moby 1.0---It's better than sharing your toothbrush! Briefings in bioinformatics. 9, 3 (2008), 220--231.
[11]
Cordella, L. and Foggia, P. A (sub) graph isomorphism algorithm for matching large graphs. Pattern Analysis and Machine Intelligence, IEEE Transactions. 26, 10 (2004), 1367--1372.
[12]
Dallachiesa, M., Ebaid, A., Eldawy, A., et al. NADEEF: a commodity data cleaning system. Proceedings of the 2013 ACM SIGMOD international conference on management of data (New York, New York, USA, Jun. 2013), 541--552.
[13]
Davidson, S.B. and Freire, J. Provenance and scientific workflows: challenges and opportunities. Proceedings of the 2008 ACM SIGMOD international conference on management of data (2008), 1345--1350.
[14]
Dou, L., Cao, G., Morris, P.J., et al. Kurator: A Kepler package for data curation workflows. Procedia Computer Science. 9, (Jan. 2012), 1614--1619.
[15]
Gil, Y., Ratnakar, V. and Kim, J. Wings: Intelligent workflow-based design of computational experiments. IEEE Intelligent Systems. (2010), 62--72.
[16]
Gunter, D., Deelman, E., Samak, T., et al. Online workflow management and performance analysis with stampede. Proceedings of the 7th International Conference on Network and Services Management. (2011), 152--161.
[17]
Leone, N., Pfeifer, G., Faber, W., et al. The DLV system.
[18]
Libkin, L. and Vrgoc, D. Regular path queries on graphs with data. Proceedings of the 15th International Conference on Database Theory. (2012), 74--85.
[19]
Ludäscher, B., Altintas, I., Berkley, C., et al. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience. 18, 10 (Aug. 2006), 1039--1065.
[20]
McPhillips, T. and Bowers, S. An approach for pipelining nested collections in scientific workflows. ACM SIGMOD Record. 34, 3 (Sep. 2005), 12--17.
[21]
McPhillips, T., Bowers, S. and Ludäscher, B. Collection-oriented scientific workflows for integrating and analyzing biological data. Data Integration in the Life Sciences. 4075, (2006), 248--263.
[22]
McPhillips, T., Bowers, S., Zinn, D., et al. Scientific workflow design for mere mortals. Future Generation Computer Systems. 25, 5 (2009), 541--551.
[23]
Song, T., Köhler, S. and Ludäscher, B. Towards automated design, analysis and optimization of declarative curation workflows. International Journal of Digital Curation. 9, 2 (2014), 111--122.
[24]
Tekle, K.T., Gorbovitski, M. and Liu, Y.A. Graph queries through datalog optimizations. Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming (New York, New York, USA, Jul. 2010), 25--34.
[25]
Vahi, K., Harvey, I., Samak, T., et al. A general approach to real-time workflow monitoring. High Performance Computing, Networking, Storage and Analysis (SCC). (2012), 108--118.
[26]
Vicario, S., Hardisty, A. and Haitas, N. BioVeL: Biodiversity virtual e-Laboratory. EMBnet.journal. 17, 2 (Sep. 2011), 5--6.
[27]
Weber, B., Reichert, M. and Rinderle-Ma, S. Change patterns and change support features--enhancing flexibility in process-aware information systems. Data & knowledge engineering. 66, 3 (2008), 438--466.
[28]
Wieczorek, J., Bloom, D., Guralnick, R., et al. Darwin Core: an evolving community-developed biodiversity data standard. PloS one. 7, 1 (Jan. 2012), e29715.
[29]
Wood, P.T. Query languages for graph databases. ACM SIGMOD Record. 41, 1 (Apr. 2012), 50--60.
[30]
Zinn, D. and Ludäscher, B. Abstract provenance graphs: anticipating and exploiting schema-level data provenance. Provenance and Annotation of Data and Processes. (2010), 206--215.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15 PhD Symposium: Proceedings of the 2015 ACM SIGMOD on PhD Symposium
May 2015
62 pages
ISBN:9781450335294
DOI:10.1145/2744680
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data curation
  2. workflow analysis
  3. workflow design

Qualifiers

  • Short-paper

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 PhD Symposium Paper Acceptance Rate 9 of 11 submissions, 82%;
Overall Acceptance Rate 40 of 60 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 189
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media