[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2753524.2753532acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Data Centric Discovery with a Data-Oriented Architecture

Published: 16 June 2015 Publication History

Abstract

Increasingly, scientific discovery is driven by the analysis, manipulation, organization, annotation, sharing, and reuse of high-value scientific data. While great attention has been given to the specifics of analyzing and mining data, we find that there are almost no tools nor systematic infrastructure to facilitate the process of discovery from data. We argue that a more systematic perspective is required, and in particular, propose a data-centric approach in which discovery stands on a foundation of data and data collections, rather than on fleeting transformations and operations. To address the challenges of data-centric discovery, we introduce a Data-Oriented Architecture and contrast it with the prevalent Service-Oriented Architecture. We describe an instance of the Data-Oriented Architecture and describe how it has been used in a variety of use cases.

References

[1]
The fourth paradigm: data-intensive scientific discovery. Microsoft Research, 2009.
[2]
Digital Asset Management. Wikipedia, 2014. http://en.wikipedia.org/wiki/Digital_asset_management.
[3]
Service-oriented architecture. Wikipedia, 2014. http://en.wikipedia.org/wiki/Service-oriented_architecture.
[4]
Akiki, P. A., et al. Adaptive model-driven user interface development systems. ACM Computing Surveys, 47, 1 (July 2015).
[5]
Allen, B., et al. Software as a service for data scientists. Communications of the ACM, 55 (Feb. 2012), 81.
[6]
Clark, T., et al. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. Journal of Biomedical Semantics, 5, 28 (July 2014).
[7]
Davis, P. M. and Connolly, M. J. L. Institutional Repositories: Evaluating the Reasons for Non-use of Cornell University's Installation of DSpace. D-Lib Magazine, 13 (Mar. 2007).
[8]
Dinov, I. D., et al., Efficient, distributed and interactive neuroimaging data analysis using the LONI pipeline. Frontiers in neuroinformatics (July 2009).
[9]
Fox, P. and Hendler, J. The Science of Data Science. Big Data, 2, 2 (Feb. 2014), 68--70.
[10]
Franklin, M., et al. From Databases to Dataspaces: A New Abstraction for Information Management. City, 2005.
[11]
Goecks, J., et al. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11, 8 (Aug. 2010).
[12]
Hedges, M., et al. Rule-based curation and preservation of data: A data grid approach using iRODS. Future Generation Computer Systems, 25, 4 (Apr. 2009), 446--452.
[13]
Howe, B., et al. Database-as-a-Service for Long-Tail Science. Springer Berlin Heidelberg, Berlin, 2011.
[14]
Marcus, D. S., et al. The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics, 5, 1 (Mar. 2007), 11--34.
[15]
Moreau, L. The Foundations for Provenance on the Web. Foundations and Trends® in Web Science, 2, 2--3 (Oct. 2010), 99--241.
[16]
Plale, B., et al. SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science. International Journal of Digital Curation, 8, 2 (- 2013), 172--180.
[17]
Swedlow, J. R., et al. Bioimage informatics for experimental biology. Annual review of biophysics, 38 (June 2009), 327--346.
[18]
White, T. Hadoop: the definitive guide. O'Reilly, 2012.
[19]
Winans, T. B. and Brown, J. S. Web Services 2.0: Policy-driven Service Oriented Architectures. Deloitte, 2008.
[20]
Wolstencroft, K., et al. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research, 41, W1 (July 2013), W557-W561.

Cited By

View all
  • (2024)Self-sustaining Software Systems (S4): Towards Improved Interpretability and AdaptationProceedings of the 1st International Workshop on New Trends in Software Architecture10.1145/3643657.3643910(5-9)Online publication date: 14-Apr-2024
  • (2024)Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models2024 IEEE 20th International Conference on e-Science (e-Science)10.1109/e-Science62913.2024.10678671(1-10)Online publication date: 16-Sep-2024
  • (2023)Let's Put the Science in eScience2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254914(1-3)Online publication date: 9-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SCREAM '15: Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models
June 2015
82 pages
ISBN:9781450335669
DOI:10.1145/2753524
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data driven discovery
  2. data management
  3. data-centric discovery
  4. data-oriented architecture
  5. digital asset management

Qualifiers

  • Research-article

Funding Sources

  • NIH

Conference

HPDC'15
Sponsor:

Acceptance Rates

SCREAM '15 Paper Acceptance Rate 8 of 12 submissions, 67%;
Overall Acceptance Rate 8 of 12 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Self-sustaining Software Systems (S4): Towards Improved Interpretability and AdaptationProceedings of the 1st International Workshop on New Trends in Software Architecture10.1145/3643657.3643910(5-9)Online publication date: 14-Apr-2024
  • (2024)Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models2024 IEEE 20th International Conference on e-Science (e-Science)10.1109/e-Science62913.2024.10678671(1-10)Online publication date: 16-Sep-2024
  • (2023)Let's Put the Science in eScience2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254914(1-3)Online publication date: 9-Oct-2023
  • (2020)Towards Co-Evolution of Data-Centric EcosystemsProceedings of the 32nd International Conference on Scientific and Statistical Database Management10.1145/3400903.3400908(1-12)Online publication date: 7-Jul-2020
  • (2018)Dac-ManProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291753(1-13)Online publication date: 11-Nov-2018
  • (2018)Dac-ManProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00075(1-13)Online publication date: 11-Nov-2018
  • (2017)Toward scalable monitoring on large-scale storage for software defined cyberinfrastructureProceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems10.1145/3149393.3149402(49-54)Online publication date: 12-Nov-2017
  • (2017)Ripple: Home Automation for Research Data Management2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW)10.1109/ICDCSW.2017.30(389-394)Online publication date: Jun-2017
  • (2017)Software Defined Cyberinfrastructure2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2017.333(1808-1814)Online publication date: Jun-2017
  • (2016)The Discovery Cloud: Accelerating and Democratizing Research on a Global Scale2016 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E.2016.46(68-77)Online publication date: Apr-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media