[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3307441.3307444guideproceedingsArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

PASTE: a network programming interface for non-volatile main memory

Published: 09 April 2018 Publication History

Abstract

Non-Volatile Main Memory (NVMM) devices have been integrated into general-purpose operating systems through familiar file-based interfaces, providing efficient byte-granularity access by bypassing page caches. To leverage the unique advantages of these high-performance media, the storage stack is migrating from the kernel into user-space. However, application performance remains fundamentally limited unless network stacks explicitly integrate these new storage media and follow the migration of storage stacks into user-space. Moreover, we argue that the storage and the network stacks must be considered together when being designed for NVMM. This requires a thoroughly new network stack design, including low-level buffer management and APIs.
We propose PASTE, a new network programming interface for NVMM. It supports familiar abstractions-- including busy-polling, blocking, protection, and run-to-completion--with standard network protocols such as TCP and UDP. By operating directly on NVMM, it can be closely integrated with the persistence layer of applications. Once data is DMA'ed from a network interface card to host memory (NVMM), it never needs to be copied again--even for persistence. We demonstrate the general applicability of PASTE by implementing two popular persistent data structures: a write-ahead log and a B+ tree. We further apply PASTE to three applications: Redis, a popular persistent key-value store, pKVS, our HTTP-based key value store and the logging component of a software switch, demonstrating that PASTE not only accelerates networked storage but also enables conventional networking functions to support new features.

References

[1]
J. Arulraj, A. Pavlo, and S. R. Dulloor. "Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems". Proc. ACM SIGMOD/PODS. 2015.
[2]
J. Arulraj, M. Perron, and A. Pavlo. "Write-behind Logging". Proc. VLDB Endow. Nov. 2016.
[3]
K. Bailey, L. Ceze, S. D. Gribble, and H. M. Levy. "Operating System Implications of Fast, Cheap, Non-Volatile Memory". Proc. ACM HotOS. 2011.
[4]
A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. "IX: A Protected Dataplane Operating System for High Throughput and Low Latency". Proc. USENIX OSDI. 2014.
[5]
A. Birrell, M. Jones, and E. Wobber. "A Simple and Efficient Implementation of a Small Database". Proc. ACM SOSP. 1987.
[6]
S. Bykov, A. Geller, G. Kliot, J. R. Larus, R. Pandya, and J. Thelin. "Orleans: cloud computing for everyone". Proc. ACM SoCC. ACM. 2011.
[7]
B. Calder et al. "Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency". Proc. ACM SOSP. 2011.
[8]
A. Chatzistergiou, M. Cintra, and S. D. Viglas. "REWIND: Recovery Write-ahead system for In-memory Non-volatile Data-structures". Proc. VLDB Endow. Jan. 2015.
[9]
Cloudius Systems. Seastar. http://www.seastar-project.org/.
[10]
J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson. "NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories". Proc. ACM ASPLOS. 2011.
[11]
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. "Better I/O Through Byte-addressable, Persistent Memory". Proc. ACM SOSP. 2009.
[12]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. "Benchmarking cloud serving systems with YCSB". Proc. ACM SoCC. ACM. 2010.
[13]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. "Dynamo: Amazon's Highly Available Key-value Store". Proc. ACM SOSP. 2007.
[14]
S. Donovan and N. Feamster. "Intentional Network Monitoring: Finding the Needle Without Capturing the Haystack". Proc. ACM HotNets. 2014.
[15]
A. Dragojević, D. Narayanan, M. Castro, and O. Hodson. "FaRM: Fast Remote Memory". Proc. USENIX NSDI. 2014.
[16]
L. Eggert. warpcore. Jan. 2017.
[17]
G. Ganger, D. Engler, M. Kaashoek, H. Briceno, R. Hunt, and T. Pinckney. "Fast and flexible application-level networking on exokernel systems". ACM ToCS, 2002.
[18]
G. R. Ganger and Y. N. Patt. "Metadata Update Performance in File Systems". Proc. USENIX OSDI. 1994.
[19]
C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. "RDMA over Commodity Ethernet at Scale". Proc. ACM SIGCOMM. 2016.
[20]
M. Handley, C. Raiciu, A. Agache, A. Voinescu, A. W. Moore, G. Antichi, and M. Wójcik. "Rearchitecting Datacenter Networks and Stacks for Low Latency and High Performance". Proc. ACM SIGCOMM. 2017.
[21]
M. P. Herlihy and B. Liskov. "A Value Transmission Method for Abstract Data Types". ACM Trans. Program. Lang. Syst. Oct. 1982.
[22]
Hewlett Packard Enterprise. Turbo-charge performance with HPE Persistent Memory. https://www.hpe.com/h20195/v2/GetDocument.aspx?docname=4AA6-4771ENW&doctype=data%20sheet&doclang=EN_US. Mar. 2016.
[23]
M. Honda, L. Eggert, and D. Santry. "Paste: Network stacks must integrate with nvmm abstractions". Proc. ACM HotNets. ACM. 2016.
[24]
M. Honda, F. Huici, G. Lettieri, and L. Rizzo. "mSwitch: A Highly-scalable, Modular Software Switch". Proc. ACM SOSR. 2015.
[25]
M. Honda, F. Huici, C. Raiciu, J. Araujo, and L. Rizzo. "Rekindling Network Protocol Innovation with User-level Stacks". ACM SIGCOMM CCR, Apr. 2014.
[26]
J. Huang, K. Schwan, and M. K. Qureshi. "NVRAM-aware Logging in Transaction Systems". Proc. VLDB Endow. Dec. 2014.
[27]
Intel. Introduction to the Storage Performance Development Kit (SPDK). https://software.intel.com/en-us/articles/introduction-to-the-storage-performance-development-kit-spdk. Sep. 2015.
[28]
Intel Corporation. NVDIMM Namespace Specification. http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf.
[29]
Jeff Chang. NVDIMM-N Cookbook: A Soup-to-Nuts Primer on Using NVDIMM-Ns to Improve Your Storage Performance. http://www.snia.org/sites/default/files/SDC15_presentations/persistant_mem/JeffChang-ArthurSainio_NVDIMM_Cookbook.pdf. Sep. 2015.
[30]
E. Y. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. "mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems". Proc. USENIX NSDI. 2014.
[31]
A. Kalia, M. Kaminsky, and D. G. Andersen. "Design Guidelines for High Performance RDMA Systems". Proc. USENIX ATC. 2016.
[32]
A. Kangarlou, S. Shete, and J. D. Strunk. "Chronicle: Capture and Analysis of NFS Workloads at Line Rate". Proc. USENIX FAST. 2015.
[33]
S. Kannan, A. Gavrilovska, and K. Schwan. "pVM: Persistent Virtual Memory for Efficient Capacity Scaling and Object Storage". Proc. ACM EuroSys. 2016.
[34]
A. Kaufmann, S. Peter, N. K. Sharma, T. Anderson, and A. Krishnamurthy. "High Performance Packet Processing with FlexNIC". Proc. ACM ASPLOS. 2016.
[35]
J. Khalid, A. Gember-Jacobson, R. Michael, A. Abhashkumar, and A. Akella. "Paving the Way for NFV: Simplifying Middlebox Modifications Using StateAlyzr." Proc. USENIX NSDI. 2016.
[36]
W.-H. Kim, J. Kim, W. Baek, B. Nam, and Y. Won. "NVWAL: Exploiting NVRAM in Write-Ahead Logging". Proc. ACM ASPLOS. 2016.
[37]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. "SILT: A Memory-efficient, High-performance Key-value Store". Proc. ACM SOSP. 2011.
[38]
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. "MICA: A Holistic Approach to Fast In-Memory Key-Value Storage". Proc. USENIX NSDI. 2014.
[39]
X. Lin, Y. Chen, X. Li, J. Mao, J. He, W. Xu, and Y. Shi. "Scalable Kernel TCP Design and Implementation for Short-Lived Connections". Proc. ACM ASPLOS. 2016.
[40]
Y. Lu, J. Shu, Y. Chen, and T. Li. "Octopus: an RDMA-enabled Distributed Persistent Memory File System". Proc. USENIX ATC. 2017.
[41]
V. Maffione, L. Rizzo, and G. Lettieri. "Flexible virtual machine networking using netmap passthrough". Proc. IEEE LANMAN. IEEE. 2016.
[42]
I. Marinos, R. N. Watson, and M. Handley. "Network Stack Specialization for Performance". Proc. ACM SIGCOMM. 2014.
[43]
I. Marinos, R. N. Watson, M. Handley, and R. R. Stewart. "Disk|Crypt|Net: Rethinking the Stack for High-performance Video Streaming". Proc. ACM SIGCOMM. 2017.
[44]
Matthew Wilcox. DAX: Page cache bypass for filesystems on memory storage. https://lwn.net/Articles/618064/. Oct. 2014.
[45]
Micron. Breakthrough Nonvolatile Memory Technology. https://www.micron.com/about/emerging-technologies/3d-xpoint-technology.
[46]
J. C. Mogul, E. Argollo, M. Shah, and P. Faraboschi. "Operating System Support for NVM+DRAM Hybrid Main Memory". Proc. ACM HotOS. 2009.
[47]
M. Moshref, M. Yu, R. Govindan, and A. Vahdat. "Trumpet: Timely and Precise Triggers in Data Centers". Proc. ACM SIGCOMM. 2016.
[48]
S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin, W. Liu, S. Pan, S. Shankar, V. Sivakumar, L. Tang, and S. Kumar. "f4: Facebook's Warm BLOB Storage System". Proc. USENIX OSDI. 2014.
[49]
M. Nanavati, M. Schwarzkopf, J. Wires, and A. Warfield. "Non-volatile Storage". Commun. ACM, Dec. 2015.
[50]
M. Nanavati, J. Wires, and A. Warfield. "Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage". Proc. USENIX NSDI. 2017.
[51]
E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell, and Y. Suzue. "Flat Datacenter Storage". Proc. USENIX OSDI. 2012.
[52]
D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. "Fast Crash Recovery in RAMCloud". Proc. ACM SOSP. 2011.
[53]
Open vSwitch. Basic Configuration. http://docs.openvswitch.org/en/latest/faq/configuration/.
[54]
P4 Consortium. P4 Language Specification. https://p4lang.github.io/p4-spec/docs/P4-16-v1.0.0-spec.pdf.
[55]
S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. "Arrakis: The Operating System is the Control Plane". Proc. USENIX OSDI. 2014.
[56]
R. Potharaju and N. Jain. "Demystifying the Dark Side of the Middle: A Field Study of Middlebox Failures in Datacenters". Proc. ACM IMC. 2013.
[57]
Redis. Official Redis Website. https://redis.io/.
[58]
L. Rizzo. "netmap: A Novel Framework for Fast Packet I/O". Proc. USENIX ATC. 2012.
[59]
D. Santry and K. Voruganti. "Violet: A Storage Stack for IOPS/Capacity Bifurcated Storage Environments". Proc. USENIX ATC. 2014.
[60]
V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. "Design and Implementation of a Consolidated Middlebox Architecture". Proc. USENIX NSDI. 2012.
[61]
J. Sherry, P. X. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Maciocco, M. Manesh, J. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker. "Rollback-Recovery for Middleboxes". Proc. ACM SIGCOMM. 2015.
[62]
SNIA Technical Position. NVM Programming Model Version 1.2. https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf. 2017.
[63]
Stefan Hajnoczi. Using NVDIMM under KVM. https://vmsplice.net/~stefan/stefanha-fosdem-2017.pdf.
[64]
O. Tilmans, T. Bühler, S. Vissicchio, and L. Vanbever. "Mille-Feuille: Putting ISP Traffic Under the Scalpel". Proc. ACM HotNets. 2016.
[65]
S.-Y. Tsai and Y. Zhang. "LITE Kernel RDMA Support for Datacenter Applications". Proc. ACM SOSP. 2017.
[66]
J. Xu and S. Swanson. "NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories". Proc. USENIX FAST. 2016.
[67]
J. Yang, Q. Wei, C. Chen, C. Wang, K. L. Yong, and B. He. "NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems". Proc. USENIX FAST. 2015.
[68]
K. Yasukata, M. Honda, D. Santry, and L. Eggert. "StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs". Proc. USENIX ATC. 2016.
[69]
W. Zhang, J. Hwang, S. Rajagopalan, K. Ramakrishnan, and T. Wood. "Flurries: Countless Fine-Grained NFs for Flexible Per-Flow Customization". Proc. ACM CoNEXT. 2016.

Cited By

View all
  • (2023)Anchor: A Library for Building Secure Persistent Memory SystemsProceedings of the ACM on Management of Data10.1145/36267181:4(1-31)Online publication date: 12-Dec-2023
  • (2022)Understanding modern storage APIsProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534945(120-127)Online publication date: 6-Jun-2022
  • (2021)rkt-ioProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456255(490-506)Online publication date: 21-Apr-2021
  • Show More Cited By
  1. PASTE: a network programming interface for non-volatile main memory

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NSDI'18: Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation
    April 2018
    623 pages
    ISBN:9781931971430

    Sponsors

    • NetApp
    • Google Inc.
    • NSF
    • Microsoft: Microsoft

    Publisher

    USENIX Association

    United States

    Publication History

    Published: 09 April 2018

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Anchor: A Library for Building Secure Persistent Memory SystemsProceedings of the ACM on Management of Data10.1145/36267181:4(1-31)Online publication date: 12-Dec-2023
    • (2022)Understanding modern storage APIsProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534945(120-127)Online publication date: 6-Jun-2022
    • (2021)rkt-ioProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456255(490-506)Online publication date: 21-Apr-2021
    • (2020)FileMRProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388251(111-126)Online publication date: 25-Feb-2020
    • (2019)SpeicherProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323315(173-190)Online publication date: 25-Feb-2019
    • (2019)FlowblazeProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323278(531-547)Online publication date: 26-Feb-2019
    • (2019)Software Data PlanesProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362737(337-350)Online publication date: 20-Nov-2019
    • (2019)I/O Is Faster Than the CPUProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321426(81-87)Online publication date: 13-May-2019
    • (2019)I'm Not Dead Yet!Proceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321422(73-80)Online publication date: 13-May-2019

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media