[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Enhancing hybrid parallel file system through performance and space-aware data layout

Published: 01 November 2016 Publication History

Abstract

Hybrid parallel file systems PFSs, which consist of solid-state drive servers SServer and hard disk drive servers HServer, have recently attracted growing attention. Compared to a traditional HServer, an SServer consistently provides improved storage performance but lacks storage space. However, most current data layout schemes do not consider the differences in performance and space between heterogeneous servers and may significantly degrade the performance of the hybrid PFSs. In this article, we propose performance and space-aware PSA scheme, a novel data layout scheme, which maximizes the hybrid PFSs' performance by applying adaptive varied-size file stripes. PSA dispatches data on heterogeneous file servers not only based on storage performance but also storage space. We have implemented PSA within OrangeFS, a popular PFS in the high-performance computing domain. Our extensive experiments with representative benchmarks, including IOR, HPIO, MPI-TILE-IO, and BTIO, show that PSA provides superior I/O throughput than the default and performance-aware file data layout schemes.

References

[1]
<ref id="bibr1-1094342016631610">Carns PH, Walter I, Ligon B . 2000 PVFS: a parallel virtual file system for linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, Houston, TX, October 2000, pp. pp.317-–327. Belltown Media.
[2]
<ref id="bibr2-1094342016631610">Chen F, Koufaty DA, Zhang X 2009 Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In: Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pp. pp.181-–192.
[3]
<ref id="bibr3-1094342016631610">Chen F, Koufaty DA, Zhang X 2011 Hystor: making the best use of solid state drives in high performance storage systems. In: Proceedings of the international conference on Supercomputing, pp. pp.22-–32.
[4]
<ref id="bibr4-1094342016631610">Ching A, Choudhary A, Liao W-K . 2006 Evaluating I/O characteristics and methods for storing structured scientific data. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, April, 2006. IEEE.
[5]
<ref id="bibr5-1094342016631610">Cortes T, Labarta J 2003 Taking advantage of heterogeneity in disk arrays. Journal of Parallel and Distributed Computing Volume 63 Issue 4: pp.448-–464.
[6]
<ref id="bibr6-1094342016631610">Gong Z, Boyuka DA, Zou X . 2013 PARLO: parallel run-time layout optimization for scientific data explorations with heterogeneous access patterns. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[7]
<ref id="bibr7-1094342016631610">He S, Liu Y, Sun X-H 2014a PSA: a performance and space-aware data layout scheme for hybrid parallel file systems. In: Proceedings of the Data Intensive Scalable Computing Systems Workshop, pp. pp.563-–576.
[8]
<ref id="bibr8-1094342016631610">He S, Sun X-H, Feng B 2014b S4D-cache: smart selective SSD cache for parallel I/O systems. In: Proceedings of the International Conference on Distributed Computing Systems.
[9]
<ref id="bibr9-1094342016631610">He S, Sun X-H, Feng B . 2013 A cost-aware region-level data placement scheme for hybrid parallel I/O systems. In: Proceedings of the IEEE International Conference on Cluster Computing.
[10]
<ref id="bibr10-1094342016631610">He S, Sun X-H, Feng B . 2014c Performance-aware data placement in hybrid parallel file systems. In: Proceedings of the 14th International Conference on Algorithms and Architectures for Parallel Processing ICA3PP.
[11]
<ref id="bibr11-1094342016631610">He S, Sun X-H, Haider A 2015a HAS: heterogeneity-aware selective data layout scheme for parallel file systems on hybrid servers. In: Proceedings of 29th IEEE International Parallel and Distributed Processing Symposium.
[12]
<ref id="bibr12-1094342016631610">He S, Sun X-H, Wang Y 2016 Improving performance of parallel I/O systems through selective and layout-aware SSD cache. IEEE Transactions on Parallel and Distributed Systems TPDS Volume 99 : pp.1.
[13]
<ref id="bibr13-1094342016631610">He S, Sun X-H, Wang Y . 2015b A heterogeneity-aware region-level data layout scheme for hybrid parallel file systems. In: Proceedings of the 44th International Conference on Parallel Processing.
[14]
<ref id="bibr14-1094342016631610">He S, Wang Y, Sun X 2015c Boosting parallel file system performance via heterogeneity-aware selective data layout. IEEE Transactions on Parallel and Distributed Systems Volume 99 : pp.1-–1.
[15]
<ref id="bibr15-1094342016631610">Hennessy JL, Patterson DA 2011 Computer Architecture: A Quantitative Approach . USA: Morgan Kaufmann, Elsevier.
[16]
<ref id="bibr16-1094342016631610">Huang H, Hung W, Shin KG 2005 FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles, pp. pp.263-–276.
[17]
<ref id="bibr17-1094342016631610">2016 Available at: <ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/ior-sio/">http://sourceforge.net/projects/ior-sio/</ext-link>
[18]
<ref id="bibr18-1094342016631610">Jenkins J, Zou X, Tang H . 2014 RADAR: runtime asymmetric data-access driven scientific data replication. In: Proceedings of the International Supercomputing Conference. Springer, pp. pp.296-–313.
[19]
<ref id="bibr19-1094342016631610">Kandemir M, Son SW, Karakoy M 2008</year> Improving I/O performance of applications through compiler-directed code restructuring. In Proceedings of the 6th USENIX Conference on File and Storage Technologies, <year>2008, pp. pp.159-–174.
[20]
<ref id="bibr20-1094342016631610">Kim H, Seshadri S, Dickey CL . 2014 Evaluating phase change memory for enterprise storage systems: a study of caching and tiering approaches. In: Proceedings of the 12th USENIX conference on File and Storage Technologies, pp. pp.33-–45.
[21]
<ref id="bibr21-1094342016631610">Latham R, Ross R, Welch B . 2013 Parallel I/O in practice . Technical Report. Tutorial of the International Conference for High Performance Computing, Networking, Storage and Analysis.
[22]
<ref id="bibr22-1094342016631610">Leung AW, Pasupathy S, Goodson GR . 2008 Measurement and analysis of large-scale network file system workloads. In: USENIX Annual Technical Conference.
[23]
<ref id="bibr23-1094342016631610">Liu Y, Gunasekaran R, Ma X . 2014 Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX conference on File and Storage Technologies, pp. pp.213-–228.
[24]
<ref id="bibr24-1094342016631610">Microsystems S 2007 Lustre File System: High-performance Storage Architecture and Scalable Cluster File System . Technical Report Lustre File System White Paper.
[25]
<ref id="bibr25-1094342016631610">2016 Available at: <ext-link ext-link-type="uri" xlink:href="http://www.mcs.anl.gov/research/projects/pio-benchmark/">http://www.mcs.anl.gov/research/projects/pio-benchmark/</ext-link>
[26]
<ref id="bibr26-1094342016631610">2016. Available at: <ext-link ext-link-type="uri" xlink:href="http://www.orangefs.org/">http://www.orangefs.org/</ext-link>
[27]
<ref id="bibr27-1094342016631610">Ou J, Shu J, Lu Y . 2014 EDM: an endurance-aware data migration scheme for load balancing in SSD storage clusters. In: Proceedings of 28th IEEE International Parallel and Distributed Processing Symposium.
[28]
<ref id="bibr28-1094342016631610">Pritchett T, Thottethodi M 2010 SieveStore: a highly-selective, ensemble-level disk cache for cost-performance. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, pp. pp.163-–174.
[29]
<ref id="bibr29-1094342016631610">Rubin S, Bodik R, Chilimbi T 2002 An efficient profile-analysis framework for data-layout optimizations. ACM SIGPLAN Notices Volume 37 Issue 1: pp.140-–153.
[30]
<ref id="bibr30-1094342016631610">Schmuck F, Haskin R 2002 GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, pp. pp.231-–244.
[31]
<ref id="bibr31-1094342016631610">Shvachko K, Kuang H, Radia S . 2010 The hadoop distributed file system. In: Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies, pp. pp.1-–10.
[32]
<ref id="bibr32-1094342016631610">Song H, Jin H, He J . 2012 A server-level adaptive data layout strategy for parallel file systems. In: Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, pp. pp.2095-–2103.
[33]
<ref id="bibr33-1094342016631610">Song H, Yin Y, Chen Y . 2011 A cost-intelligent application-specific data layout scheme for parallel file systems. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp. pp.37-–48.
[34]
<ref id="bibr34-1094342016631610">Song H, Yin Y, Sun X-H . 2011 A segment-level adaptive data layout scheme for improved load balance in parallel file systems. In: Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing CCGrid, pp. pp.414-–423.
[35]
<ref id="bibr35-1094342016631610">Tantisiriroj W, Patil S, Gibson G . 2011 On the duality of data-intensive file system design: reconciling HDFS and PVFS. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC, pp. pp.1-–12.
[36]
<ref id="bibr36-1094342016631610">2016 Available at: <ext-link ext-link-type="uri" xlink:href="http://www.nas.nasa.gov/publications/npb.html">www.nas.nasa.gov/publications/npb.html</ext-link>
[37]
<ref id="bibr37-1094342016631610">Wang Y, Kaeli D 2003 Profile-guided I/O partitioning. In: Proceedings of the 17th Annual International Conference on Supercomputing, pp. pp.252-–260.
[38]
<ref id="bibr38-1094342016631610">Zhang X, Jiang S 2010 Interference removal: removing interference of disk access for MPI programs through data replication. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. pp.223-–232.
[39]
<ref id="bibr39-1094342016631610">Zhang X, Davis K, Jiang S 2012 iTransformer: using SSD to improve disk scheduling for high-performance I/O. In: Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium, pp. pp.715-–726.
[40]
<ref id="bibr40-1094342016631610">Zhang X, Liu K, Davis K . 2013 iBridge: improving unaligned parallel file access with solid-state drives. In: Proceedings of 27th IEEE International Parallel and Distributed Processing Symposium.
[41]
<ref id="bibr41-1094342016631610">Zhao D, Qiao K, Ioan R 2014 HyCache+: towards scalable high-performance caching middleware for parallel file systems. In: Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing CCGrid, 26-29 May 2014, pp. pp.267-–276.
[42]
<ref id="bibr42-1094342016631610">Zhu M, Li G, Ruan L . 2013 HySF: a striped file assignment strategy for parallel file system with hybrid storage. In: Proceedings of the IEEE International Conference on Embedded and Ubiquitous Computing, pp. pp.511-–517.
[43]
<ref id="bibr43-1094342016631610">Yang Q, Ren J 2011 I-CASH: intelligently coupled array of SSD and HDD. In: Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture, pp. pp.278-–289.
[44]
<ref id="bibr44-1094342016631610">Welch B, Noer G 2013 Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions. In: Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies MSST. IEEE, pp. pp.1-–12.

Cited By

View all
  • (2020)A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.294890131:4(830-842)Online publication date: 16-Jan-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications  Volume 30, Issue 4
11 2016
130 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 November 2016

Author Tags

  1. Parallel I/O system
  2. data layout
  3. hybrid I/O system
  4. parallel file system
  5. solid-state drive

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)A Holistic Heterogeneity-Aware Data Placement Scheme for Hybrid Parallel I/O SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.294890131:4(830-842)Online publication date: 16-Jan-2020

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media