[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3297858.3304035acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

SOML Read: Rethinking the Read Operation Granularity of 3D NAND SSDs

Published: 04 April 2019 Publication History

Abstract

NAND-based solid-state disks (SSDs) are known for their superior random read/write performance due to the high degrees of multi-chip parallelism they exhibit. Currently, as the chip density increases dramatically, fewer 3D NAND chips are needed to build an SSD compared to the previous generation chips. As a result, SSDs can be made more compact. However, this decrease in the number of chips also results in reduced overall throughput, and prevents 3D NAND high density SSDs from being widely-adopted. We analyzed 600 storage workloads, and our analysis revealed that the small read operations suffer significant performance degradation due to reduced chip-level parallelism in newer 3D NAND SSDs. The main question is whether some of the inter-chip parallelism lost in these new SSDs (due to the reduced chip count) can be won back by enhancing intra-chip parallelism. Motivated by this question, we propose a novel SOML (Single-Operation-Multiple-Location) read operation, which can perform several small intra-chip read operations to different locations simultaneously, so that multiple requests can be serviced in parallel, thereby mitigating the parallelism-related bottlenecks. A corresponding SOML read scheduling algorithm is also proposed to fully utilize the SOML read. Our experimental results with various storage workloads indicate that, the SOML read-based SSD with 8 chips can outperform the baseline SSD with 16 chips.

References

[1]
Yu Cai, Erich~F. Haratsch, Onur Mutlu, and Ken Mai. 2012. Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). 521--526.
[2]
Yu Cai, Yixin Luo, Saugata Ghose, and Onur Mutlu. {n. d.} a. Read disturb errors in MLC NAND flash memory: Characterization, mitigation, and recovery. In 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 438--449.
[3]
Y. Cai, Y. Luo, E. F. Haratsch, K. Mai, and O. Mutlu. 2015. Data retention in MLC NAND flash memory: Characterization, optimization, and recovery. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 551--563.
[4]
Yu Cai, Onur Mutlu, Erich~F. Haratsch, and Ken Mai. {n. d.} b. Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation. In 2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, 123--130.
[5]
D. W. Chang, W. C. Lin, and H. H. Chen. 2016. FastRead: Improving Read Performance for Multilevel-Cell Flash Memory. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Sept 2016), 2998--3002.
[6]
Hyeokjun Choe, Seil Lee, Seongsik Park, Sei~Joon Kim, Eui-Young Chung, and Sungroh Yoon. 2016. Near-Data Processing for Machine Learning. http://arxiv.org/abs/1610.02273. (2016).
[7]
Nima Elyasi, Mohammad Arjomand, Anand Sivasubramaniam, Mahmut~T. Kandemir, Chita~R. Das, and Myoungsoo Jung. 2017. Exploiting Intra-Request Slack to Improve SSD Performance. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). 375--388.
[8]
Sumitha George, Minli Liao, Huaipan Jiang, Jagadish~B. Kotra, Mahmut Kandemir, Jack Sampson, and Vijaykrishnan Narayanan. 2018. MDACache:Caching for Multi-Dimensional-Access Memories. In The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50).
[9]
Aayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) .
[10]
Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Hao Luo, and Shuping Zhang. 2011. Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In Proceedings of the international conference on Supercomputing (SC) .
[11]
Jae-Woo Im, Woo-Pyo Jeong, Doo-Hyun Kim, Sang-Wan Nam, Dong-Kyo Shim, Myung-Hoon Choi, Hyun-Jun Yoon, Dae-Han Kim, You-Se Kim, Hyun-Wook Park, and others. 2015. 7.2 A 128Gb 3b/cell V-NAND flash memory with 1Gb/s I/O rate. In 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE.
[12]
Dawoon Jung, Jeong-UK Kang, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee. 2010. Superblock FTL: A superblock-based flash translation layer with a hybrid address translation scheme. ACM Transactions on Embedded Computing Systems (March 2010).
[13]
M. Jung, W. Choi, S. Srikantaiah, J. Yoo, and M. T. Kandemir. 2014. HIOS: A host interface I/O scheduler for Solid State Disks. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA) .
[14]
D. Kang, W. Jeong, C. Kim, D. H. Kim, Y. S. Cho, K. T. Kang, J. Ryu, K. M. Kang, S. Lee, W. Kim, H. Lee, J. Yu, N. Choi, D. S. Jang, J. D. Ihm, D. Kim, Y. S. Min, M. S. Kim, A. S. Park, J. I. Son, I. M. Kim, P. Kwak, B. K. Jung, D. S. Lee, H. Kim, H. J. Yang, D. S. Byeon, K. T. Park, K. H. Kyung, and J. H. Choi. 2016. 7.1 256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers. In 2016 IEEE International Solid-State Circuits Conference (ISSCC).
[15]
C. Kim, J. H. Cho, W. Jeong, I. h Park, H. W. Park, D. H. Kim, D. Kang, S. Lee, J. S. Lee, W. Kim, J. Park, Y. l Ahn, J. Lee, J. h Lee, S. Kim, H. J. Yoon, J. Yu, N. Choi, Y. Kwon, N. Kim, H. Jang, J. Park, S. Song, Y. Park, J. Bang, S. Hong, B. Jeong, H. J. Kim, C. Lee, Y. S. Min, I. Lee, I. M. Kim, S. H. Kim, D. Yoon, K. S. Kim, Y. Choi, M. Kim, H. Kim, P. Kwak, J. D. Ihm, D. S. Byeon, J. y Lee, K. T. Park, and K. h Kyung. 2017. 11.4 A 512Gb 3b/cell 64-stacked WL 3D V-NAND flash memory. In 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[16]
Wonjoo Kim, Sangmoo Choi, Junghun Sung, Taehee Lee, C. Park, Hyoungsoo Ko, Juhwan Jung, Inkyong Yoo, and Y. Park. 2009. Multi-layered Vertical Gate NAND Flash overcoming stacking limit for terabit density storage. In 2009 Symposium on VLSI Technology .
[17]
O. Kislal, M. T. Kandemir, and J. Kotra. 2016. Cache-Aware Approximate Computing for Decision Tree Learning. In Proceedings of IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) .
[18]
Jagadish Kotra, D. Guttman, Nachiappan. C. N., M. T. Kandemir, and C. R. Das. 2017a. Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors. In Proceedings of 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) .
[19]
Jagadish Kotra, S. Kim, K. Madduri, and M. T. Kandemir. 2017b. Congestion-aware memory management on NUMA platforms: A VMware ESXi case study. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC) .
[20]
J. B. Kotra, M. Arjomand, D. Guttman, M. T. Kandemir, and C. R. Das. 2016. Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches. In Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS) .
[21]
Jagadish~B. Kotra, Haibo Zhang, Alaa Alameldeen, Chris Wilkerson, and Mahmut~T. Kandemir. 2018. CHAMELEON: A Dynamically Reconfigurable Heterogeneous Memory System. In The 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50).
[22]
Miryeong Kwon, Jie Zhang, Gyuyoung Park, Wonil Choi, David Donofrio, John Shalf, Mahmut Kandemir, and Myoungsoo Jung. 2017. TraceTracker: Hardware/Software Co-Evaluation for Large-Scale I/O Workload Reconstruction. In 2016 IEEE International Symposium on Workload Characterization (IISWC) .
[23]
S. Lee, C. Kim, M. Kim, S. m. Joe, J. Jang, S. Kim, K. Lee, J. Kim, J. Park, H. J. Lee, M. Kim, S. Lee, S. Lee, J. Bang, D. Shin, H. Jang, D. Lee, N. Kim, J. Jo, J. Park, S. Park, Y. Rho, Y. Park, H. j. Kim, C. A. Lee, C. Yu, Y. Min, M. Kim, K. Kim, S. Moon, H. Kim, Y. Choi, Y. Ryu, J. Choi, M. Lee, J. Kim, G. S. Choo, J. D. Lim, D. S. Byeon, K. Song, K. T. Park, and K. h. Kyung. 2018. A 1Tb 4b/cell 64-stacked-WL 3D NAND flash memory with 12MB/s program throughput. In 2018 IEEE International Solid - State Circuits Conference - (ISSCC) .
[24]
Chun-Yi Liu, Yu-Ming Chang, and Yuan-Hao Chang. 2015a. Read Leveling for Flash Storage Systems. In Proceedings of the 8th ACM International Systems and Storage Conference (SYSTOR '15). New York, NY, USA.
[25]
Jun Liu, Jagadish Kotra, Wei Ding, and Mahmut Kandemir. 2015b. Network Footprint Reduction Through Data Access and Computation Placement in NoC-based Manycores. In Proceedings of the 52Nd Annual Design Automation Conference (DAC) .
[26]
R. S. Liu, M. Y. Chuang, C. L. Yang, C. H. Li, K. C. Ho, and H. P. Li. 2016. Improving Read Performance of NAND Flash SSDs by Exploiting Error Locality. IEEE Trans. Comput. (April 2016), 1090--1102.
[27]
Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu. 2018. HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) .
[28]
H. Maejima, K. Kanda, S. Fujimura, T. Takagiwa, S. Ozawa, J. Sato, Y. Shindo, M. Sato, N. Kanagawa, J. Musha, S. Inoue, K. Sakurai, N. Morozumi, R. Fukuda, Y. Shimizu, T. Hashimoto, X. Li, Y. Shimizu, K. Abe, T. Yasufuku, T. Minamoto, H. Yoshihara, T. Yamashita, K. Satou, T. Sugimoto, F. Kono, M. Abe, T. Hashiguchi, M. Kojima, Y. Suematsu, T. Shimizu, A. Imamoto, N. Kobayashi, M. Miakashi, K. Yamaguchi, S. Bushnaq, H. Haibi, M. Ogawa, Y. Ochi, K. Kubota, T. Wakui, D. He, W. Wang, H. Minagawa, T. Nishiuchi, H. Nguyen, K. H. Kim, K. Cheah, Y. Koh, F. Lu, V. Ramachandra, S. Rajendra, S. Choi, K. Payak, N. Raghunathan, S. Georgakis, H. Sugawara, S. Lee, T. Futatsuyama, K. Hosono, N. Shibata, T. Hisada, T. Kaneko, and H. Nakamura. 2018. A 512Gb 3b/Cell 3D flash memory on a 96-word-line-layer technology. In 2018 IEEE International Solid - State Circuits Conference - (ISSCC) .
[29]
Alessia~Marelli Rino~Micheloni, Luca~Crippa. 2010. Inside NAND Flash Memory .Springer Netherlands.
[30]
Samsung. 2018. Samsung Pro 950 SSD. https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-950-pro-nvme-512gb-mz-v5p512bw/. (Aug 2018).
[31]
Samsung. 2018. Samsung Pro 960 SSD. http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/960pro/. (Aug 2018).
[32]
Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, and Al Davis. {n. d.}. Micro-pages: Increasing DRAM Efficiency with Locality-aware Data Placement. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). 219--230.
[33]
X. Tang, M. Kandemir, P. Yedlapalli, and J. Kotra. 2016. Improving bank-level parallelism for irregular applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.
[34]
Xulong Tang, Orhan Kislal, Mahmut Kandemir, and Mustafa Karakoy. 2017. Data Movement Aware Computation Partitioning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50 '17). New York, NY, USA, 730--744.
[35]
X. Tang, A. Pattnaik, H. Jiang, O. Kayiran, A. Jog, S. Pai, M. Ibrahim, M. T. Kandemir, and C. R. Das. 2017. Controlled Kernel Launch for Dynamic Parallelism in GPUs. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 649--660.
[36]
Arash Tavakkol, Mohammad Arjomand, and Hamid Sarbazi-Azad. 2014. Unleashing the Potentials of Dynamism for Page Allocation Strategies in SSDs. In The 2014 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '14). 551--552.
[37]
Arash Tavakkol, Pooyan Mehrvarzy, Mohammad Arjomand, and Hamid Sarbazi-Azad. 2016. Performance Evaluation of Dynamic Page Allocation Strategies in SSDs. ACM Trans. Model. Perform. Eval. Comput. Syst. (June 2016), 7:1--7:33.
[38]
Guanying Wu and Xubin He. 2012. Reducing SSD Read Latency via NAND Flash Program and Erase Suspension. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST'12).
[39]
Qin Xiong, Fei Wu, Zhonghai Lu, Yue Zhu, You Zhou, Yibing Chu, Changsheng Xie, and Ping Huang. 2017. Characterizing 3D Floating Gate NAND Flash. In Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '17 Abstracts). ACM, 31--32.
[40]
R. Yamashita, S. Magia, T. Higuchi, K. Yoneya, T. Yamamura, H. Mizukoshi, S. Zaitsu, M. Yamashita, S. Toyama, N. Kamae, J. Lee, S. Chen, J. Tao, W. Mak, X. Zhang, Y. Yu, Y. Utsunomiya, Y. Kato, M. Sakai, M. Matsumoto, H. Chibvongodze, N. Ookuma, H. Yabe, S. Taigor, R. Samineni, T. Kodama, Y. Kamata, Y. Namai, J. Huynh, S. E. Wang, Y. He, T. Pham, V. Saraf, A. Petkar, M. Watanabe, K. Hayashi, P. Swarnkar, H. Miwa, A. Pradhan, S. Dey, D. Dwibedy, T. Xavier, M. Balaga, S. Agarwal, S. Kulkarni, Z. Papasaheb, S. Deora, P. Hong, M. Wei, G. Balakrishnan, T. Ariki, K. Verma, C. Siau, Y. Dong, C. H. Lu, T. Miwa, and F. Moogat. 2017. 11.1 A 512Gb 3b/cell flash memory on 64-word-line-layer BiCS technology. In 2017 IEEE International Solid-State Circuits Conference (ISSCC). 196--197.
[41]
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael~Hao Tong, Swaminathan Sundararaman, Andrew~A. Chien, and Haryadi~S. Gunawi. 2017. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In 15th USENIX Conference on File and Storage Technologies (FAST 17). 15--28.
[42]
P. Yedlapalli, J. Kotra, E. Kultursay, M. Kandemir, C. R. Das, and A. Sivasubramaniam. 2013. Meeting midway: Improving CMP performance with memory-side prefetching. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT) .
[43]
Chun yi Liu, Jagadish Kotra, Myoungsoo Jung, and Mahmut Kandemir. {n. d.}. PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs. In 16th USENIX Conference on File and Storage Technologies (FAST 18). USENIX Association, 67--82.
[44]
Kai Zhao, Wenzhe Zhao, Hongbin Sun, Xiaodong Zhang, Nanning Zheng, and Tong Zhang. 2013. LDPC-in-SSD: Making Advanced Error Correction Codes Work Effectively in Solid State Drives. In Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13). USENIX .
[45]
Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, 45--58.

Cited By

View all
  • (2025)RDA: A Read-Request Driven Adaptive Allocation Scheme for Improving SSD PerformanceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343568144:2(416-429)Online publication date: Feb-2025
  • (2024)Minato: A Read-Disturb-Aware Dynamic Buffer Management Scheme for NAND Flash MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336410943:7(1930-1943)Online publication date: Jul-2024
  • (2024)CDS: Coupled Data Storage to Enhance Read Performance of 3D TLC NAND Flash MemoryIEEE Transactions on Computers10.1109/TC.2023.333847473:3(694-707)Online publication date: Mar-2024
  • Show More Cited By

Index Terms

  1. SOML Read: Rethinking the Read Operation Granularity of 3D NAND SSDs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2019
    1126 pages
    ISBN:9781450362405
    DOI:10.1145/3297858
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 April 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D NAND
    2. SSD
    3. parallelism
    4. request scheduling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '19

    Acceptance Rates

    ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)366
    • Downloads (Last 6 weeks)53
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)RDA: A Read-Request Driven Adaptive Allocation Scheme for Improving SSD PerformanceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343568144:2(416-429)Online publication date: Feb-2025
    • (2024)Minato: A Read-Disturb-Aware Dynamic Buffer Management Scheme for NAND Flash MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336410943:7(1930-1943)Online publication date: Jul-2024
    • (2024)CDS: Coupled Data Storage to Enhance Read Performance of 3D TLC NAND Flash MemoryIEEE Transactions on Computers10.1109/TC.2023.333847473:3(694-707)Online publication date: Mar-2024
    • (2023)Realizing Strong Determinism Contract on Log-Structured Merge Key-Value StoresACM Transactions on Storage10.1145/358269519:2(1-29)Online publication date: 25-Mar-2023
    • (2023)Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free AccessesProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589071(1-16)Online publication date: 17-Jun-2023
    • (2023)FSPDA: A Full Sequence Program Data Allocation Scheme for Boosting 3-D nand Flash Read PerformanceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.329445242:12(4336-4349)Online publication date: Dec-2023
    • (2023)Access Characteristic Guided Partition for Nand Flash-Based High-Density SSDsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328217542:12(4643-4656)Online publication date: Dec-2023
    • (2023)Improving 3-D NAND SSD Read Performance by Parallelizing Read-RetryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319125642:3(768-780)Online publication date: Mar-2023
    • (2023)MGC: Multiple-Gray-Code for 3D NAND Flash based High-Density SSDs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070946(122-136)Online publication date: Feb-2023
    • (2022)A joint management middleware to improve training performance of deep recommendation systems with SSDsProceedings of the 59th ACM/IEEE Design Automation Conference10.1145/3489517.3530426(157-162)Online publication date: 10-Jul-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media