[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2485732.2485754acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Examining extended and scientific metadata for scalable index designs

Published: 30 June 2013 Publication History

Abstract

While file system metadata is well characterized by a variety of workload studies, scientific metadata is much less well understood. We characterize scientific metadata, in order to better understand the implications for index design. Based on our findings, existing solutions for either file system or scientific search will not suffice for indexing a large scientific file system. We describe the problems with existing solutions, and suggest column stores as an alternative approach.

References

[1]
Dryad. http://www.datadryad.org/, September 2012.
[2]
Hbase. http://hbase.apache.org/, September 2012.
[3]
Metadata Encoding & Transmission Standard. http://www.loc.gov/standards/mets/, November 2012.
[4]
The Open Archives Initiative Protocol for Metadata Harvesting. http://www.openarchives.org/OAI/openarchivesprotocol.html, November 2012.
[5]
Wide-field Infrared Survey Explorer (WISE) All-Sky Release. http://irsadist.ipac.caltech.edu/wise-allsky/, September 2012.
[6]
Argo. http://www.argodatamgt.org/, March 2013.
[7]
{fastbit-users>sparse data. https://hpcrdm.lbl.gov/pipermail/fastbit-users/2012-October/001510.html, Mar 2013.
[8]
N. Agrawal, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Generating realistic impressions for file-system benchmarking. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), pages 125--138, Feb. 2009.
[9]
N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), pages 31--45, Feb. 2007.
[10]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), Seattle, WA, Nov. 2006.
[11]
J. Chou, K. Wu, and P. Prabhat. Fastquery: A parallel indexing system for scientific data. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on, pages 455--464. IEEE, 2011.
[12]
E. Chu, J. Beckmann, and J. Naughton. The case for a wide-table approach to manage sparse relational data sets. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD '07, pages 821--832, New York, NY, USA, 2007. ACM.
[13]
S. Dayal. Characterizing HEC storage systems at rest. Technical report, Carnegie-Mellon University, 2008.
[14]
J. R. Douceur and W. J. Bolosky. A large-scale study of file-system contents. In Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS '99, 1999.
[15]
D. Giampaolo. Practical File System Design with the Be File Sstem. Morgan Kaufmann, 1st edition, 1999.
[16]
H. Graven, A. Kozyr, and R. M. Key. Historical observations of oceanic radiocarbon conducted prior to GEOSECS. http://cdiac.ornl.gov/ftp/oceans/Historical\_C14\_obs/, 2012.
[17]
Y. Hua, H. Jiang, Y. Zhu, D. Feng, and L. Tian. SmartStore: A new metadata organization paradigm with semantic-awareness for next-generation file systems. In Proceedings of SC09, Nov. 2009.
[18]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010.
[19]
A. Leung, M. Shao, T. Bisson, S. Pasupathy, and E. L. Miller. Spyglass: Fast, scalable metadata search for large-scale storage systems. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), pages 153--166, Feb. 2009.
[20]
A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller. Measurement and analysis of large-scale network file system workloads. In Proceedings of the 2008 USENIX Annual Technical Conference, June 2008.
[21]
J. Naps, M. Mokbel, and D. Du. Pantheon: Exascale file system search for scientific computing. In Scientific and Statistical Database Management, 2011.
[22]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4):351--385, June 1996.
[23]
S. Patil, G. A. Gibson, G. R. Ganger, J. Lopez, M. Polte, W. Tantisiroj, and L. Xiao. In search of an API for scalable file systems: under the table or above it? In Proceedings of the 2009 conference on Hot topics in cloud computing, HotCloud'09, Berkeley, CA, USA, 2009. USENIX Association.
[24]
M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A column oriented DBMS. In Proceedings of the 31st Conference on Very Large Databases (VLDB), pages 553--564, Trondheim, Norway, 2005.
[25]
C. Strong, S. Jones, A. Parker-Wood, A. Holloway, and D. D. E. Long. Los Alamos National Laboratory interviews. Technical Report UCSC-SSRC-11-06, University of California, Santa Cruz, Sept. 2011.
[26]
D. A. Talbert and D. Fisher. An empirical analysis of techniques for constructing and searching k-dimensional trees. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 26--33, New York, NY, USA, 2000. ACM.
[27]
R. van Heuven van Staereling, R. Appuswamy, D. van Moolenbroek, and A. Tanenbaum. Efficient, Modular Metadata Management with Loris. In Networking, Architecture and Storage (NAS), 2011 6th IEEE International Conference on, pages 278--287, July 2011.
[28]
F. Wang, Q. Xin, B. Hong, S. A. Brandt, E. L. Miller, D. D. E. Long, and T. T. McLarty. File system workload analysis for large scale scientific computing applications. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies, pages 139--152, College Park, MD, Apr. 2004.
[29]
K. Wu, S. Ahern, E. W. Bethel, J. Chen, H. Childs, E. Cormier-Michel, C. Geddes, J. Gu, H. Hagen, B. Hamann, W. Koegler, J. Lauret, J. Meredith, P. Messmer, E. Otoo, V. Perevoztchikov, A. Poskanzer, Prabhat, O. Rübel, A. Shoshani, A. Sim, K. Stockinger, G. Weber, and W.-M. Zhang. Fastbit: interactively searching massive data. Journal of Physics: Conference Series, 180(1), 2009.

Cited By

View all
  • (2021)Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark dataThe Journal of Supercomputing10.1007/s11227-020-03602-6Online publication date: 1-Feb-2021
  • (2017)Challenges of Research Data Management for High Performance ComputingResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-67008-9_12(140-151)Online publication date: 2-Sep-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SYSTOR '13: Proceedings of the 6th International Systems and Storage Conference
June 2013
198 pages
ISBN:9781450321167
DOI:10.1145/2485732
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. file systems
  2. index design
  3. metadata
  4. scientific data
  5. search

Qualifiers

  • Research-article

Funding Sources

Conference

SYSTOR '13
Sponsor:
  • INTEL
  • Riverbed
  • Technion
  • SIGOPS
  • EMC<sup>2</sup>
  • AXCIENT
  • USENIX Assoc
  • IBM
  • HP

Acceptance Rates

SYSTOR '13 Paper Acceptance Rate 20 of 49 submissions, 41%;
Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Like a rainbow in the dark: metadata annotation for HPC applications in the age of dark dataThe Journal of Supercomputing10.1007/s11227-020-03602-6Online publication date: 1-Feb-2021
  • (2017)Challenges of Research Data Management for High Performance ComputingResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-67008-9_12(140-151)Online publication date: 2-Sep-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media