[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3627703.3650090acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Volley: Accelerating Write-Read Orders in Disaggregated Storage

Published: 22 April 2024 Publication History

Abstract

Modern data centers deploy disaggregated storage systems (e.g., NVMe over Fabrics, NVMe-oF) for fine-grained resource elasticity and high resource utilization. A client-side writeback cache is used to absorb writes and buffer frequently accessed data, thereby eliminating unnecessary remote storage accesses and improving performance. Yet, a cache miss on the full cache triggers an evict-and-fetch operation which evicts the old entries before new data blocks are fetched. Existing systems perform the evict-and-fetch operation by sequentially executing write and read I/O operations, which reduces the concurrency and makes it challenging to fully utilize the fast network and storage devices.
To overcome this problem, we propose Volley, a network storage protocol that guarantees the execution order of the write and read I/O operations, enabling the writeback cache to issue eviction and fetch operations simultaneously. We implement the Volley protocol by extending NVMe-oF atop commodity network and storage hardware and further the ACK free-rides and notification acceleration techniques to reduce the network overhead. We adopt Volley into two writeback caching systems (V-Cache and V-TriCache) for virtual machine storage and out-of-core computing, respectively. Evaluations show that V-Cache outperforms Linux page cache and SPDK OCF (a state-of-the-art caching system) by up to 6.84× and 3.01×. Experiments on a production workload of Facebook (Mixgraph) show that V-TriCache reduces the total running time by up to 16.7% compared with the state-of-the-art out-of-core computing system, TriCache.

References

[1]
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. 2020. Can Far Memory Improve Job Throughput?. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 14, 16 pages. https://doi.org/10.1145/3342195.3387522
[2]
Mijin An, Soojun Im, Dawoon Jung, and Sang-Won Lee. 2022. Your Read is Our Priority in Flash Storage. Proc. VLDB Endow. 15, 9 (may 2022), 1911--1923. https://doi.org/10.14778/3538598.3538612
[3]
Mijin An, In-Yeong Song, Yong-Ho Song, and Sang-Won Lee. 2022. Avoiding Read Stalls on Flash Storage. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1404--1417. https://doi.org/10.1145/3514221.3526126
[4]
Jens Axboe. 2023. fio - Flexible I/O tester. https://fio.readthedocs.io/en/latest/fio_doc.html.
[5]
Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, and Paramvir Bahl et al. 2023. Empowering Azure Storage with RDMA. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 49--67. https://www.usenix.org/conference/nsdi23/presentation/bai
[6]
Laurent Bindschaedler, Ashvin Goel, and Willy Zwaenepoel. 2020. Hailstorm: Disaggregated Compute and Storage for Distributed LSM-Based Databases. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 301--316. https://doi.org/10.1145/3373376.3378504
[7]
Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, and Rachit Agarwal. 2021. Understanding Host Network Stack Overheads. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (Virtual Event, USA) (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 65--77. https://doi.org/10.1145/3452296.3472888
[8]
Qizhe Cai, Midhul Vuppalapati, Jaehyun Hwang, Christos Kozyrakis, and Rachit Agarwal. 2022. Towards μs Tail Latency and Terabit Ethernet: Disaggregating the Host Network Stack. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM '22). Association for Computing Machinery, New York, NY, USA, 767--779. https://doi.org/10.1145/3544216.3544230
[9]
Wei Cao, Zhenjun Liu, Peng Wang, Sen Chen, Caifeng Zhu, Song Zheng, Yuhui Wang, and Guoqing Ma. 2018. PolarFS: An Ultra-Low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database. Proc. VLDB Endow. 11, 12 (aug 2018), 1849--1862. https://doi.org/10.14778/3229863.3229872
[10]
Wei Cao, Yingqiang Zhang, Xinjun Yang, Feifei Li, Sheng Wang, Qingda Hu, Xuntao Cheng, Zongzhi Chen, Zhenjun Liu, Jing Fang, Bo Wang, Yuhui Wang, Haiqing Sun, Ze Yang, Zhushi Cheng, Sen Chen, Jian Wu, Wei Hu, Jianwei Zhao, Yusong Gao, Songlu Cai, Yunyang Zhang, and Jiawang Tong. 2021. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 2477--2489. https://doi.org/10.1145/3448016.3457560
[11]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H.C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 209--223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
[12]
Open CAS. 2023. SPDK OCF. https://open-cas.github.io/getting_started_spdk.html.
[13]
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic Crash Consistency. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). Association for Computing Machinery, New York, NY, USA, 228--243. https://doi.org/10.1145/2517349.2522726
[14]
Vijay Chidambaram, Tushar Sharma, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2012. Consistency without ordering. In FAST, Vol. 12. 101--116.
[15]
Wikipedia contributors. 2023. Page cache - Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Page_cache.
[16]
Facebook. 2023. RocksDB|A persistent key-value store for fast storage environments. http://rocksdb.org.
[17]
Guanyu Feng, Huanqi Cao, Xiaowei Zhu, Bowen Yu, Yuanwei Wang, Zixuan Ma, Shengqi Chen, and Wenguang Chen. 2022. TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 395--411. https://www.usenix.org/conference/osdi22/presentation/feng
[18]
Hermit. 2023. Github Issue. https://github.com/uclasystem/hermit/issues/4.
[19]
Jaehyun Hwang, Qizhe Cai, Ao Tang, and Rachit Agarwal. 2020. TCP ≈ RDMA: CPU-efficient Remote Storage Access with i10. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 127--140. https://www.usenix.org/conference/nsdi20/presentation/hwang
[20]
Jaehyun Hwang, Midhul Vuppalapati, Simon Peter, and Rachit Agarwal. 2021. Rearchitecting Linux Storage Stack for μs Latency and High Throughput. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 113--128. https://www.usenix.org/conference/osdi21/presentation/hwang
[21]
Intel. 2023. SPDK: Storage Performance Development Kit. https://spdk.io/.
[22]
Intel. 2023. SPDK Vhost Target. https://spdk.io/doc/vhost.html.
[23]
The kernel development community. 2023. XArray. https://www.kernel.org/doc/html/v5.0/core-api/xarrayhtml.
[24]
Taehyun Kim, Deondre Martin Ng, Junzhi Gong, Youngjin Kwon, Minlan Yu, and KyoungSoo Park. 2023. Rearchitecting the TCP Stack for I/O-Offloaded Content Delivery. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 275--292. https://www.usenix.org/conference/nsdi23/presentation/kim-taehyun
[25]
Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. 2016. Flash Storage Disaggregation. In Proceedings of the Eleventh European Conference on Computer Systems (London, United Kingdom) (EuroSys '16). Association for Computing Machinery, New York, NY, USA, Article 29, 15 pages. https://doi.org/10.1145/2901318.2901337
[26]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. ReFlex: Remote Flash ≈ Local Flash. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi'an, China) (ASPLOS '17). Association for Computing Machinery, New York, NY, USA, 345--359. https://doi.org/10.1145/3037697.3037732
[27]
Chunghan Lee, Tatsuo Kumano, Tatsuma Matsuki, Hiroshi Endo, Naoto Fukumoto, and Mariko Sugawara. 2017. Understanding Storage Traffic Characteristics on Enterprise Virtual Desktop Infrastructure. In Proceedings of the 10th ACM International Systems and Storage Conference (Haifa, Israel) (SYSTOR '17). Association for Computing Machinery, New York, NY, USA, Article 13, 11 pages. https://doi.org/10.1145/3078468.3078479
[28]
Huaicheng Li, Mingzhe Hao, Stanko Novakovic, Vaibhav Gogte, Sriram Govindan, Dan R. K. Ports, Irene Zhang, Ricardo Bianchini, Haryadi S. Gunawi, and Anirudh Badam. 2020. LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 591--605. https://doi.org/10.1145/3373376.3378531
[29]
Qiang Li, Qiao Xiang, Yuxin Wang, Haohao Song, Ridi Wen, Wenhui Yao, Yuanyuan Dong, Shuqi Zhao, Shuo Huang, Zhaosheng Zhu, Huayong Wang, Shanyang Liu, Lulu Chen, Zhiwu Wu, Haonan Qiu, Derui Liu, Gexiao Tian, Chao Han, Shaozong Liu, Yaohui Wu, Zicheng Luo, Yuchao Shao, Junping Wu, Zheng Cao, Zhongjie Wu, Jiaji Zhu, Jinbo Wu, Jiwu Shu, and Jiesheng Wu. 2023. More Than Capacity: Performance-oriented Evolution of Pangu in Alibaba. In 21st USENIX Conference on File and Storage Technologies (FAST 23). USENIX Association, Santa Clara, CA, 331--346. https://www.usenix.org/conference/fast23/presentation/li-qiang-deployed
[30]
Xiaojian Liao, Youyou Lu, Erci Xu, and Jiwu Shu. 2020. Write Dependency Disentanglement with HORAE. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 549--565. https://www.usenix.org/conference/osdi20/presentation/liao
[31]
Xiaojian Liao, Youyou Lu, Zhe Yang, and Jiwu Shu. 2021. Crash Consistent Non-Volatile Memory Express. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 132--146. https://doi.org/10.1145/3477132.3483592
[32]
Xiaojian Liao, Zhe Yang, and Jiwu Shu. 2023. RIO: Order-Preserving and CPU-Efficient Remote Storage Access. In Proceedings of the Eighteenth European Conference on Computer Systems (Rome, Italy) (EuroSys '23). Association for Computing Machinery, New York, NY, USA, 703--717. https://doi.org/10.1145/3552326.3567495
[33]
Rui Miao, Lingjun Zhu, Shu Ma, Kun Qian, Shujun Zhuang, Bo Li, Shuguang Cheng, Jiaqi Gao, Yan Zhuang, Pengcheng Zhang, Rong Liu, Chao Shi, Binzhang Fu, Jiaji Zhu, Jiesheng Wu, Dennis Cai, and Hongqiang Harry Liu. 2022. From Luna to Solar: The Evolutions of the Compute-to-Storage Networks in Alibaba Cloud. In Proceedings of the ACM SIGCOMM 2022 Conference (Amsterdam, Netherlands) (SIGCOMM '22). Association for Computing Machinery, New York, NY, USA, 753--766. https://doi.org/10.1145/3544216.3544238
[34]
James Mickens, Edmund B. Nightingale, Jeremy Elson, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, Osama Khan, and Krishna Nareddy. 2014. Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 257--273. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/mickens
[35]
Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-Tenant Storage Disaggregation on SmartNIC JBOFs. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (Virtual Event, USA) (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 106--122. https://doi.org/10.1145/3452296.3472940
[36]
NVMe organization. 2023. Non-Volatile Memory express. https://nvmexpress.org.
[37]
Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON File Systems. SIGOPS Oper. Syst. Rev. 39, 5 (oct 2005), 206--220. https://doi.org/10.1145/1095809.1095830
[38]
Yifan Qiao, Chenxi Wang, Zhenyuan Ruan, Adam Belay, Qingda Lu, Yiying Zhang, Miryung Kim, and Guoqing Harry Xu. 2023. Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 181--198. https://www.usenix.org/conference/nsdi23/presentation/qiao
[39]
Junyi Shu, Ruidong Zhu, Yun Ma, Gang Huang, Hong Mei, Xuanzhe Liu, and Xin Jin. 2023. Disaggregated RAID Storage in Modern Datacenters. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (Vancouver, BC, Canada) (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 147--163. https://doi.org/10.1145/3582016.3582027
[40]
SNIA. 2007. Microsoft Enterprise Traces. http://iotta.snia.org/traces/130.
[41]
SNIA Block I/O Traces. 2017. Microsoft Enterprise Traces. http://iotta.snia.org/tracetypes/3.
[42]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 1041--1052. https://doi.org/10.1145/3035918.3056101
[43]
Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-Enabled IO Stack for Flash Storage. In 16th USENIX Conference on File and Storage Technologies (FAST 18). USENIX Association, Oakland, CA, 211--226. https://www.usenix.org/conference/fast18/presentation/won
[44]
Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 191--208. https://www.usenix.org/conference/osdi20/presentation/yang
[45]
Juncheng Yang, Yao Yue, and Rashmi Vinayak. 2021. Segcache: a memory-efficient and scalable in-memory key-value cache for small objects. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 503--518. https://www.usenix.org/conference/nsdi21/presentation/yang-juncheng
[46]
Da Zheng, Randal Burns, and Alexander S. Szalay. 2012. A Parallel Page Cache: IOPS and Caching for Multicore Systems. In Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (Boston, MA) (HotStorage'12). USENIX Association, USA, 5.

Cited By

View all
  • (2024)Performance Characterization of SmartNIC NVMe-over-Fabrics Target OffloadingProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689154(14-24)Online publication date: 16-Sep-2024

Index Terms

  1. Volley: Accelerating Write-Read Orders in Disaggregated Storage

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
    April 2024
    1245 pages
    ISBN:9798400704376
    DOI:10.1145/3627703
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2024

    Check for updates

    Author Tags

    1. Cache
    2. Disaggregated Storage
    3. File System
    4. NVMe over Fabrics
    5. SSD
    6. Storage Order

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Key R&D Program of China
    • National Natural Science Foundation of China

    Conference

    EuroSys '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)781
    • Downloads (Last 6 weeks)111
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Performance Characterization of SmartNIC NVMe-over-Fabrics Target OffloadingProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689154(14-24)Online publication date: 16-Sep-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media