[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Using a suite of ontologies for preserving workflow-centric research objects

Published: 01 May 2015 Publication History

Abstract

Scientific workflows are a popular mechanism for specifying and automating data-driven in silico experiments. A significant aspect of their value lies in their potential to be reused. Once shared, workflows become useful building blocks that can be combined or modified for developing new experiments. However, previous studies have shown that storing workflow specifications alone is not sufficient to ensure that they can be successfully reused, without being able to understand what the workflows aim to achieve or to re-enact them. To gain an understanding of the workflow, and how it may be used and repurposed for their needs, scientists require access to additional resources such as annotations describing the workflow, datasets used and produced by the workflow, and provenance traces recording workflow executions.In this article, we present a novel approach to the preservation of scientific workflows through the application of research objects-aggregations of data and metadata that enrich the workflow specifications. Our approach is realised as a suite of ontologies that support the creation of workflow-centric research objects. Their design was guided by requirements elicited from previous empirical analyses of workflow decay and repair. The ontologies developed make use of and extend existing well known ontologies, namely the Object Reuse and Exchange (ORE) vocabulary, the Annotation Ontology (AO) and the W3C PROV ontology (PROVO). We illustrate the application of the ontologies for building Workflow Research Objects with a case-study that investigates Huntington's disease, performed in collaboration with a team from the Leiden University Medial Centre (HG-LUMC). Finally we present a number of tools developed for creating and managing workflow-centric research objects.

References

[1]
Ewa Deelman, Dennis Gannon, Matthew S. Shields, Ian Taylor, Workflows and e-science: An overview of workflow system features and capabilities, Future Gener. Comput. Syst., 25 (2009) 528-540.
[2]
David De Roure, Carole A. Goble, Robert Stevens, The design and realisation of the myExperiment virtual research environment for social sharing of workflows, Future Gener. Comput. Syst., 25 (2009) 561-567.
[3]
Phillip Mates, Emanuele Santos, Juliana Freire, Cláudio T. Silva, Crowdlabs: Social analysis and visualization for the sciences, in: Lecture Notes in Computer Science, vol. 6809, Springer, 2011, pp. 555-564.
[4]
Jun Zhao, José Manuél Gómez-Pérez, Khalid Belhajjame, Graham Klyne, Esteban García-Cuesta, Aleix Garrido, Kristina M. Hettne, Marco Roos, David De Roure, Carole A. Goble, Why workflows break-understanding and combating decay in taverna workflows, in: eScience, IEEE Computer Society, 2012, pp. 1-9.
[5]
Khalid Belhajjame, Semantic replaceability of escience web services, in: Third International Conference on e-Science and Grid Computing, e-Science 2007, 10-13 December 2007, Bangalore, India, IEEE, 2007, pp. 449-456.
[6]
Khalid Belhajjame, Carole A. Goble, Stian Soiland-Reyes, David De Roure, Fostering scientific workflow preservation through discovery of substitute services, in: eScience, IEEE Computer Society, 2011, pp. 97-104.
[7]
Sven Köhler, Sean Riddle, Daniel Zinn, Timothy M. McPhillips, Bertram Ludäscher, Improving workflow fault tolerance through provenance-based recovery, in: Scientific and Statistical Database Management-23rd International Conference, SSDBM 2011, Portland, OR, USA, July 20-22, 2011. Proceedings, Springer, 2011, pp. 207-224.
[8]
Daniel Crawl, Ilkay Altintas, A provenance-based fault tolerance mechanism for scientific workflows, in: Provenance and Annotation of Data and Processes, Second International Provenance and Annotation Workshop, IPAW 2008, Salt Lake City, UT, USA, June 17-18, 2008. Revised Selected Papers, Springer, 2008, pp. 152-159.
[9]
Sean Bechhofer, John D. Ainsworth, Jiten Bhagat, Iain E. Buchan, Philip A. Couch, Don Cruickshank, David De Roure, Mark Delderfield, Ian Dunlop, Matthew Gamble, Carole A. Goble, Danius T. Michaelides, Paolo Missier, Stuart Owen, David R. Newman, Shoaib Sufi, Why linked data is not enough for scientists, in: Sixth International Conference on e-Science, e-Science 2010, 7-10 December 2010, Brisbane, QLD, Australia, IEEE, 2010, pp. 300-307.
[10]
S. Soiland-Reyes, S. Bechhofer, K. Belhajjame, G. Klyne, D. Garijo, O. Corcho, E. Garcí a Cuesta, R. Palma, Wf4ever research object model 1.0., November 2013. http://dx.doi.org/10.5281/zenodo.12744.
[11]
Software Sustainability Institute and Curtis+Cartwrigh. Software preservation benefits framework, Technical report, 2010.
[12]
Brian Matthews, Brian McIlwrath, David Giaretta, Esther Conway, The significant properties of software: A study. Technical report, JISC report, 2008.
[13]
Quantifying reproducibility in computational biology: The case of the tuberculosis drugome, PLoS ONE, 8 (2013) e80278.
[14]
Best practices for computational science: Software infrastructure and environments for reproducible and extensible research, J. Open Res. Softw., 2 (2014) e21.
[15]
Greg Wilson, D.A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H.D. Haddock, Kathryn D. Huff, Ian M. Mitchell, Mark D. Plumbley, Best practices for scientific computing, PLoS Biology, 12 (2014) e1001745.
[16]
Juliana Freire, Cláudio T. Silva, Making computations and publications reproducible with vistrails, Comput. Sci. Eng., 14 (2012) 18-25.
[17]
Fernando Chirigati, Dennis Shasha, Juliana Freire, Reprozip: Using provenance to support computational reproducibility, in: Proc. of the 6th USENIX Workshop on Theory and Practice of Provenance, 2013.
[18]
Ian Foster Quan Pham, Tanu Malik, Using provenance for repeatability, in: Proceedings of the 5th USENIX Workshop on Theory and Practice of Provenance 2013, 2013.
[19]
Quan Pham, Tanu Malik, Ian T. Foster, Using provenance for repeatability, in: Proc. of the 6th USENIX Workshop on Theory and Practice of Provenance, 2013.
[20]
Sara Magliacane, Paul T. Groth, Towards reconstructing the provenance of clinical guidelines, in: CEUR Workshop Proceedings, vol. 952, CEUR-WS.org, 2012.
[21]
Carl Boettiger, An introduction to docker for reproducible research, with examples from the R environment. CoRR, abs/1410.0846, 2014.
[22]
Ryan R. Brinkman, Mélanie Courtot, Dirk Derom, Jennifer M. Fostel, Yongqun He, Phillip Lord, James Malone, Helen Parkinson, Bjoern Peters, Philippe Rocca-Serra, Modeling biomedical experimental processes with obi, J. Biomed. Semant., 1 (2010) S7.
[23]
Philippe Rocca-Serra, Marco Brandizi, Eamonn Maguire, Isa software suite: supporting standards-compliant experimental annotation and enabling curation at the community level, Bioinformatics, 26 (2010) 2354-2356.
[24]
Paolo Ciccarese, Elizabeth Wu, Gwen Wong, Marco Ocana, June Kinoshita, Alan Ruttenberg, Tim Clark, The swan biomedical discourse ontology, J. Biomed. Inform., 41 (2008) 739-751.
[25]
Paul Groth, Andrew Gibson, Jan Velterop, The anatomy of a nanopublication, Inform. Serv. Use, 30 (2010) 51-56.
[26]
Dean B. Krafft, Nicholas A. Cappadona, Brian Caruso, Jon Corson-Rikert, Medha Devare, Brian J. Lowe, et al. Vivo: Enabling national networking of scientists, in: Proceedings of the WebSci10, Raleigh, US, 2010, pp. 1310-1313.
[27]
The Huntington's disease collaborative research group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes, Cell, 72 (1993) 971-983.
[28]
R. Jelier, M.J. Schuemie, A. Veldhoven, L.C.J. Dorssers, G. Jenster, J.A. Kors, Anni 2.0: a multipurpose text-mining tool for the life sciences, Genome Biol., 9 (2008).
[29]
K.M. Hettne, A. Boorsma, D.A. van Dartel, J.J. Goeman, E. de Jong, A.H. Piersma, R.H. Stierum, J.C. Kleinjans, J.A. Kors, Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data, BMC Med Genom., 6 (2013).
[30]
K.M. Hettne, R. van Schouwen, E. Mina, Explain your data by concept profile analysis web services v1; ref status: approved with reservations 2}, F1000Research, 3 (2014).
[31]
Rudolf Mayer, Stefan Pröll, Andreas Rauber, Raúl Palma, Daniel Garijo, From preserving data to preserving research: Curation of process and context (demo), in: TPDL, Springer, 2013, pp. 490-491.
[32]
Eric PrudHommeaux, Andy Seaborne, et al. Sparql query language for rdf, in: W3C Recommendation, vol. 15, 2008.
[33]
Eleni Mina, Willeke van Roon-Mom, Peter A.C. 't Hoen, Mark Thompson, Reinout van Schouwen, Rajaram Kaliyaperumal, Kristina Hettne, Erik Schultes, Barend Mons, Marco Roos, Prioritizing hypotheses for epigenetic mechanisms in huntington's disease using an e-science approach, BioData Mining (2014).
[34]
Timothy Lebo, Satya Sahoo, Deborah McGuinness, Prov-o: The prov ontology. Technical report, W3C Recommendation, 2013.
[35]
Carl Lagoze, Herbert Van de Sompel, ORE specification-abstract data model. http://www.openarchives.org/ore/1.0/datamodel.html (Accessed on February 28, 2014).
[36]
Paolo Ciccarese, Marco Ocana, Leyla J. Garcia Castro, Sudeshna Das, Tim Clark, An open annotation ontology for science on web 3.0, J. Biomed. Semant., 2 (2011) S4.
[37]
K. Wolstencroft, R. Haines, D. Fellows, The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud, Nucl. Acids Res. (2013).
[38]
Yolanda Gil, Varun Ratnakar, Jihie Kim, Wings: Intelligent workflow-based design of computational experiments, IEEE Intell. Syst., 26 (2011) 62-72.
[39]
Jeremy Goecks, Anton Nekrutenko, James Taylor, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biol., 11 (2010) R86.
[40]
Carl Lagoze, Herbert Van de Sompel, Ore specification-vocabulary. http://www.openarchives.org/ore/1.0/vocabulary.html (Accessed on February 28, 2014).
[41]
Rudolf Mayer, Andreas Rauber, Martin Alexander Neumann, John Thomson, Gonçalo Antunes, Preserving scientific processes from design to publications, in: Theory and Practice of Digital Libraries, Springer, 2012, pp. 113-124.
[42]
Angela Dappert, Sébastien Peyrard, Carol C.H. Chou, Janet Delve, Describing and preserving digital object environments, New Rev. Inf. Netw., 18 (2013) 106-173.
[43]
Stian Soiland-Reyes, Matthew Gamble, Robert Haines, Research Object Bundle 1.0. November 2014. http://dx.doi.org/10.5281/zenodo.12586.
[44]
Steven P. Callahan, Juliana Freire, Emanuele Santos, Vistrails: Visualization meets data management, in: ACM SIGMOD, ACM Press, 2006, pp. 745-747.
[45]
Khalid Belhajjame, Annotating the behavior of scientific modules using data examples: A practical approach, in: Proc. of International Conference on Extending Database Technology, 2014, pp. 726-737.
[46]
Dagmar Krefting, Tristan Glatard, Vladimir Korkhov, Johan Montagnat, Silvia Olabarriaga, Enabling grid interoperability at workflow level. Grid Workflow Workshop 2011, 2011.
[47]
Paolo Missier, Saumen Dey, Khalid Belhajjame, Víctor Cuevas-Vicenttín, Bertram Ludäscher, D-PROV: extending the PROV provenance model with workflow structure, in: Computing Science, Newcastle University, 2013.
[48]
Daniel Garijo, Yolanda Gil, A new approach for publishing workflows: Abstractions, standards, and linked data, in: Proceedings of the 6th Workshop on Workflows in Support of Large-scale Science, ACM, 2011, pp. 47-56.
[49]
Alejandra Gonzalez-Beltran, Peter Li, Jun Zhao, Maria Susana Avila-Garcia, Marco Roos, Mark Thompson, Eelke van der Horst, Rajaram Kaliyaperumal, Ruibang Luo, Lee Tin-Lap, Lam Tak-wah, Scott C. Edmunds, Susanna-Assunta Sansone, Philippe Rocca-Serra, From peer-reviewed to peer-reproduced: a role for data standards, models and computational workflows in scholarly publishing, in: bioRxiv, 2014.
[50]
Thomas Russ, Cartic Ramakrishnan, Eduard Hovy, Mihail Bota, Gully Burns, Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case, BMC Bioinform., 12 (2011) 351.
[51]
Brian Matthews, Shoaib Sufi, Damian Flannery, Laurent Lerusse, Tom Griffin, Michael Gleaves, Kerstin Kleese, Using a core scientific metadata model in large-scale facilities, Int. J. Digit. Curation, 5 (2010) 106-118.
[52]
Science and Technology Facilities Council. Isis. http://www.isis.stfc.ac.uk/index.html. Accessed on the 20th of June 2014.
[53]
Science and Technology Facilities Council. Diamond light source. http://www.diamond.ac.uk. Accessed on the 20th of June 2014.
[54]
J. Hunter, Scientific publication packages: A selective approach to the communication and archival of scientific output, Int. J. Digit. Curation, 1 (2006).
[55]
Fernando Chirigati, Dennis Shasha, Juliana Freire, Packing experiments for sharing and publication, in: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013, pp. 977-980.
[56]
Quan Pham, Tanu Malik, Ian Foster, Roberto Di Lauro, Raffaele Montella, Sole: linking research papers with science objects, in: Provenance and Annotation of Data and Processes, Springer, 2012, pp. 203-208.
[57]
Victoria Stodden, Christophe Hurlin, Christophe Pérignon, Runmycode.org: a novel dissemination and collaboration platform for executing published computational results, in: Proc. of IEEE 8th International Conference on e-Science, IEEE, 2012, pp. 1-8.
[58]
Pieter Van Gorp, Steffen Mazanek, Share: a web portal for creating and sharing executable research papers, Proc. Comput. Sci., 4 (2011) 589-597.
[59]
Terri K. Attwood, Douglas B. Kell, Philip McDermott, James Marsh, Stephen Pettifer, David Thorne, Utopia documents: linking scholarly literature with research data, Bioinformatics, 26 (2010) 568-574.
[60]
Friedrich Leisch, Sweave: Dynamic generation of statistical reports using literate data analysis, in: Compstat, Springer, 2002, pp. 575-580.
[61]
Fernando Pérez, Brian E. Granger, IPython: a system for interactive scientific computing, Comput. Sci. Eng., 9 (2007) 21-29.
[62]
Marian Petre, Greg Wilson, Plos/mozilla scientific code review pilot: Summary of findings, 2013.
[63]
M. Crosas, The dataverse network: An open-source application for sharing, discovering and preserving data, D-Lib. Mag., 17 (2011).

Cited By

View all

Recommendations

Reviews

Lalit P Saxena

Today, scientific workflows are gaining popularity among the scientific community because of their features like reusability, easy modifications while sharing, and so on. These workflows depend on research objects, which are based on ontologies, such as object reuse and exchange vocabulary, the annotation ontology, and the W3C PROV ontology. This paper develops tools for creating and managing workflow-centric research objects (WRO). The authors incorporate four ontologies in their proposed models. They further develop a suite of interoperable tools, such as research object manager, for creating, annotating, publishing, and managing WRO, and a research object digital library for dealing with collaboration, versioning, evolution, and quality management of WRO. The authors add an extension to myExperiment, a popular virtual research environment, which allows end users to create, share, publish, and curate research objects. For experiments, the authors present a case study, a workflow-based experiment investigating the epigenetic mechanisms involved in Huntington's disease (HD). They consider epigenetic datasets from CpG islands and chromatin marks for analyzing the HD gene expression data of three different brain regions. They further present two analyses and three workflows for gene interpretation. This aids in creating WRO containing information like original hypothesis, example inputs, workflow definitions, metadata descriptions, and implementation details. For their current project, the authors like to collaborate with other communities, such as the EU Scape project or Timbus project, the EU BioVel project, GigaScience, FigShare, and Dataverse. They further describe the potential of their work in preserving scientific workflows, reusing existing workflows, and promoting and encouraging data citation and sharing, making this paper an interesting read. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Web Semantics: Science, Services and Agents on the World Wide Web
Web Semantics: Science, Services and Agents on the World Wide Web  Volume 32, Issue C
May 2015
54 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 May 2015

Author Tags

  1. Annotation
  2. Ontologies
  3. Preservation
  4. Provenance
  5. Research object
  6. Scientific workflow

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Data Provenance in Security and PrivacyACM Computing Surveys10.1145/359329455:14s(1-35)Online publication date: 22-Apr-2023
  • (2022)Methods includedCommunications of the ACM10.1145/348689765:6(54-63)Online publication date: 20-May-2022
  • (2022)FAIROs: Towards FAIR Assessment in Research ObjectsLinking Theory and Practice of Digital Libraries10.1007/978-3-031-16802-4_6(68-80)Online publication date: 20-Sep-2022
  • (2021)User-friendly Composition of FAIR Workflows in a Notebook EnvironmentProceedings of the 11th Knowledge Capture Conference10.1145/3460210.3493546(1-8)Online publication date: 2-Dec-2021
  • (2020)Findable and reusable workflow data productsSemantic Web10.3233/SW-20037411:5(751-763)Online publication date: 1-Jan-2020
  • (2020)Evidence Graphs: Supporting Transparent and FAIR Computation, with Defeasible Reasoning on Data, Methods, and ResultsProvenance and Annotation of Data and Processes10.1007/978-3-030-80960-7_3(39-50)Online publication date: 22-Jun-2020
  • (2020)ProvONE+: A Provenance Model for Scientific WorkflowsWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62008-0_30(431-444)Online publication date: 20-Oct-2020
  • (2019)Modeling digital humanities collections as research objectsProceedings of the 18th Joint Conference on Digital Libraries10.1109/JCDL.2019.00029(138-147)Online publication date: 2-Jun-2019
  • (2019)BNO—An ontology for understanding the transittability of complex biomolecular networksWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2019.01.00257:COnline publication date: 1-Aug-2019
  • (2019)Enabling FAIR research in Earth Science through research objectsFuture Generation Computer Systems10.1016/j.future.2019.03.04698:C(550-564)Online publication date: 1-Sep-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media