[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2925426.2926290acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA

Published: 01 June 2016 Publication History

Abstract

Non-Volatile Memory (NVM) offers byte-addressability with DRAM like performance along with persistence. Thus, NVMs provide the opportunity to build high-throughput storage systems for data-intensive applications. HDFS (Hadoop Distributed File System) is the primary storage engine for MapReduce, Spark, and HBase. Even though HDFS was initially designed for commodity hardware, it is increasingly being used on HPC (High Performance Computing) clusters. The outstanding performance requirements of HPC systems make the I/O bottlenecks of HDFS a critical issue to rethink its storage architecture over NVMs. In this paper, we present a novel design for HDFS to leverage the byte-addressability of NVM for RDMA (Remote Direct Memory Access)-based communication. We analyze the performance potential of using NVM for HDFS and re-design HDFS I/O with memory semantics to exploit the byte-addressability fully. We call this design NVFS (NVM- and RDMA-aware HDFS). We also present cost-effective acceleration techniques for HBase and Spark to utilize the NVM-based design of HDFS by storing only the HBase Write Ahead Logs and Spark job outputs to NVM, respectively. We also propose enhancements to use the NVFS design as a burst buffer for running Spark jobs on top of parallel file systems like Lustre. Performance evaluations show that our design can improve the write and read throughputs of HDFS by up to 4x and 2x, respectively. The execution times of data generation benchmarks are reduced by up to 45%. The proposed design also reduces the overall execution time of the SWIM workload by up to 18% over HDFS with a maximum benefit of 37% for job-38. For Spark TeraSort, our proposed scheme yields a performance gain of up to 11%. The performances of HBase insert, update, and read operations are improved by 21%, 16%, and 26%, respectively. Our NVM-based burst buffer can improve the I/O performance of Spark PageRank by up to 24% over Lustre. To the best of our knowledge, this paper is the first attempt to incorporate NVM with RDMA for HDFS.

References

[1]
Apache HBase. http://hbase.apache.org.
[2]
Big data needs a new type of non-volatile memory. http://www.electronicsweekly.com/news/big-data-needs-a-new-type-of-non-volatile-memory-2015-10/.
[3]
Hadoop 2.6 Storage Policies. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html.
[4]
HDFS. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
[5]
HiBD. http://hibd.cse.ohio-state.edu/.
[6]
IDC. www.idc.com.
[7]
Kudu. https://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/.
[8]
NVMe. http://www.nvmexpress.org/.
[9]
NVRAM. http://www.enterprisetech.com/2014/08/06/flashtec-nvram-15-million-iops-sub-microsecond-latency/.
[10]
Statistical Workload Injector for MapReduce. https://github.com/SWIMProjectUCB.
[11]
TeraGen. http://hadoop.apache.org/docs/r0.20.0/api/org/apache/hadoop/examples/terasort/TeraGen.html.
[12]
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In 9th USENIX Conference on Networked Systems Design and Implementation (NSDI), 2012.
[13]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.
[14]
Y. Chen, S. Alspaugh, and R. Katz. Interactive Analytical Processing in Big Data Systems: A Cross-industry Study of MapReduce Workloads. Proc. VLDB Endow., 2012.
[15]
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, D. Burger, B. Lee, and D. Coetzee. Better I/O Through Byte-Addressable, Persistent Memory. In Symposium on Operating Systems Principles (SOSP), 2009.
[16]
S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys), 2014.
[17]
T. Harter, D. Borthakur, S. Dong, A. Aiyer, L. Tang, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Analysis of HDFS Under HBase: A Facebook Messages Case Study. In 12th USENIX Conference on File and Storage Technologies (FAST), 2014.
[18]
J. Huang, K. Schwan, and M. Qureshi. NVRAM-aware Logging in Transaction Systems. In 41st International Conference on Very Large Data Bases (VLDB), 2015.
[19]
N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda. SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-Enhanced HDFS. In 23rd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2014.
[20]
N. S. Islam, X. Lu, M. W. Rahman, D. Shankar, and D. K. Panda. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture. In 15th IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015.
[21]
N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. High Performance RDMA-based Design of HDFS over InfiniBand. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012.
[22]
N. S. Islam, M. W. Rahman, X. Lu, D. Shankar, and D. K. Panda. Performance Characterization and Acceleration of In-Memory File Systems for Hadoop and Spark Applications on HPC Clusters. In 2015 IEEE International Conference on Big Data (IEEE BigData), 2015.
[23]
W. K. Josephson, L. A. Bongo, K. Li, and D. Flynn. DFS: A File System for Virtualized Flash Storage. Trans. Storage, 2010.
[24]
K. Massey. Worldwide Financial Services 3rd Platform IT Spending, 2014 - 2019 - Opportunities Abound. http://www.idc.com/getdoc.jsp?containerId=US40697215.
[25]
K. R. Krish, A. Anwar, and A. Butt. hatS: A Heterogeneity-Aware Tiered Storage for Hadoop. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2014.
[26]
K. R. Krish, S. Iqbal, and A. Butt. VENU: Orchestrating SSDs in Hadoop Storage. In 2014 IEEE International Conference on Big Data (IEEE BigData), 2014.
[27]
H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. In ACM Symposium on Cloud Computing (SoCC), 2014.
[28]
T. Lipcon, D. Alves, D. Burkert, J. Cryans, A. Dembo, M. Percy, S. Rus, D. Wang, M. Bertozzi, C. P. McCabe, and A. Wang. Kudu: Storage for Fast Analytics on Fast Data. http://getkudu.io/kudu.pdf.
[29]
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the Role of Burst Buffers in Leadership-Class Storage Systems. In 2012 IEEE Conference on Massive Data Storage, 2012.
[30]
S. Pelley, T. F. Wenisch, B. T. Gold, and B. Bridge. Storage Management in the NVRAM Era. Proc. VLDB Endow., 2013.
[31]
S. Qiu and A. L. N. Reddy. NVMFS: A Hybrid File System for Improving Random Write in Nand-Flash SSD. In IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), 2013.
[32]
K. R, A. Khasymski, A. Butt, S. Tiwari, and M. Bhandarkar. AptStore: Dynamic Storage Management for Hadoop. In International Conference on Cluster Computing (CLUSTER), 2013.
[33]
P. Sehgal, S. Basu, K. Srinivasan, and K. Voruganti. An Empirical Study of File Systems on NVM. In IEEE 31st Symposium on Mass Storage Systems and Technologies, (MSST), 2015.
[34]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
[35]
W. Tantisiriroj, S. Patil, G. Gibson, S. Son, S. Lang, and R. Ross. On the Duality of Data-intensive File System Design:Reconciling HDFS and PVFS. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.
[36]
The Apache Software Foundation. Centralized Cache Management in HDFS. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html.
[37]
The Apache Software Foundation. The Apache Hadoop Project. http://hadoop.apache.org/.
[38]
T. Wang, K. Mohror, A. Moody, W. Yu, and K. Sato. BurstFS: A Distributed Burst Buffer File System for Scientific Applications. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015.
[39]
M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In 18th ACM Symposium on Operating Systems Principles (SOSP), 2001.
[40]
X. Wu and A. L. N. Reddy. SCMFS: A File System for Storage Class Memory. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.
[41]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), 2010.
[42]
Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A Reliable and Highly-Available Non-Volatile Memory System. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.

Cited By

View all
  • (2024)Turbo: Efficient Communication Framework for Large-scale Data Processing ClusterProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672241(540-553)Online publication date: 4-Aug-2024
  • (2023)Anchor: A Library for Building Secure Persistent Memory SystemsProceedings of the ACM on Management of Data10.1145/36267181:4(1-31)Online publication date: 12-Dec-2023
  • (2023)Accelerating I/O in Distributed Data Processing Systems with Apache Arrow CHFS2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61457.2023.00009(1-4)Online publication date: 31-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICS '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)216
  • Downloads (Last 6 weeks)32
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Turbo: Efficient Communication Framework for Large-scale Data Processing ClusterProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672241(540-553)Online publication date: 4-Aug-2024
  • (2023)Anchor: A Library for Building Secure Persistent Memory SystemsProceedings of the ACM on Management of Data10.1145/36267181:4(1-31)Online publication date: 12-Dec-2023
  • (2023)Accelerating I/O in Distributed Data Processing Systems with Apache Arrow CHFS2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61457.2023.00009(1-4)Online publication date: 31-Oct-2023
  • (2022)A Survey of Storage Systems in the RDMA EraIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318865633:12(4395-4409)Online publication date: 1-Dec-2022
  • (2022)PostMan: Rapidly Mitigating Bursty Traffic via On-Demand Offloading of Packet ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309226633:2(374-387)Online publication date: 1-Feb-2022
  • (2022)A cache sharing mechanism based on RD MA2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00073(319-326)Online publication date: Dec-2022
  • (2022)Identifying Challenges and Opportunities of In-Memory Computing on Large HPC SystemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.002Online publication date: Mar-2022
  • (2022)A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memoryThe Journal of Supercomputing10.1007/s11227-019-02876-975:10(6632-6662)Online publication date: 11-Mar-2022
  • (2021)An Empirical Performance Evaluation of Multiple Intel Optane Solid-State DrivesElectronics10.3390/electronics1011132510:11(1325)Online publication date: 31-May-2021
  • (2021)Exploring Efficient Architectures on Remote In-Memory NVM over RDMAACM Transactions on Embedded Computing Systems10.1145/347700420:5s(1-20)Online publication date: 22-Sep-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media