[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A low-latency storage stack for fast storage devices

Published: 01 September 2017 Publication History

Abstract

Modern storage systems are facing an important challenge of making the best use of fast storage devices. Even though the underlying storage devices are being enhanced, the traditional storage stack falls short of utilizing the enhanced characteristics, as it has been optimized specifically for hard disk drives. In this article, we optimize the storage stack to maximize the benefit of low latency that fast storage devices provide. Our approach is to simplify the I/O path from application to the fast storage device by removing inefficient layers and the conventional block I/O. The proposed stack consists of three layers: an optimized device driver, a low-latency file system called L2FS, and a simplified VFS. The device driver provides a simple file I/O API to the file system instead of the existing block I/O API. L2FS, a variant of EXT4, performs low-latency I/O operations by using the file I/O API that our optimized device driver provides. We implement our storage stack on Linux 3.14.3 and evaluate it with multiple benchmarks. The results show that our system improves the throughput by up to 6.6 times and reduces the latency by an average of 54% compared to the existing storage stack on fast storage.

References

[1]
Ahmed, M., Uddin, M.M., Azad, M.S., Haseeb, S.: MySQL performance analysis on a limited resource server: Fedora vs. Ubuntu Linux. In: Proceedings of the 2010 Spring Simulation Multiconference, p. 99. Society for Computer Simulation International (2010)
[2]
Belay, A., Prekas, G., Klimovic, A., Grossman, S., Kozyrakis, C., Bugnion, E. IX: A protected dataplane operating system for high throughput and low latency. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 49---65 (2014)
[3]
Bonwick, J., Moore, B. ZFS: The last word in file systems
[4]
Caulfield, A.M., Mollov, T.I., Eisner, L.A., De, A., Coburn, J., Swanson, S.: Providing safe, user space access to fast, solid state disks. SIGARCH Comput. Archit. News 40(1), 387---400 (2012)
[5]
Chen, J., Wei, Q., Chen, C., Wu, L.: FSMAC: A file system metadata accelerator with non-volatile memory. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1---11 (2013)
[6]
Chen, P.M., Ng, W.T., Chandra, S., Aycock, C., Rajamani, G., Lowell, D.: The Rio file cache: surviving operating system crashes. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 1996), ASPLOS VII, ACM, pp. 74---83
[7]
Chen, S., Ailamaki, A., Athanassoulis, M., Gibbons, P.B., Johnson, R., Pandis, I., Stoica, R.: TPC-E vs. TPC-C: characterizing the new TPC-E benchmark via an I/O comparison study. SIGMOD Rec. 39, 5---10 (2011)
[8]
Chidambaram, V., Pillai, T.S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Optimistic crash consistency. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pp. 228---243. ACM, New York (2013)
[9]
Coburn, J., Bunker, T., Gupta, R.K., Swanson, S.: From ARIES to MARS: reengineering transaction management for next-generation, solid-state drives
[10]
Coburn, J., Bunker, T., Schwarz, M., Gupta, R., Swanson, S.: From ARIES to MARS: transaction support for next-generation, solid-state drives. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pp. 197---212. ACM, New York (2013)
[11]
Coburn, J., Caulfield, A.M., Akel, A., Grupp, L.M., Gupta, R.K., Jhala, R., Swanson, S.: NV-Heaps: making persistent objects fast and safe with next-generation. Non-volatile memories. SIGPLAN Not. 46(3), 105---118 (2011)
[12]
Condit, J., Nightingale, E.B., Frost, C., Ipek, E., Lee, B., Burger, D., Coetzee, D.: Better I/O through byte-addressable, persistent memory. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pp. 133---146. ACM, New York (2009)
[13]
Dieny, B., Sousa, R., Prenat, G., Ebels, U.: Spin-dependent phenomena and their implementation in spintronic devices. In: International Symposium on VLSI Technology, Systems and Applications, 2008 (VLSI-TSA 2008), pp. 70---71. IEEE (2008)
[14]
Dong, B., Zheng, Q., Tian, F., Chao, K.M., Ma, R., Anane, R.: An optimized approach for storing and accessing small files on cloud storage. J. Netw. Comput. Appl. 35(6), 1847---1862 (2012)
[15]
Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, pp. 15:1---15:15. ACM, New York (2014)
[16]
Hitz, D., Lau, J., Malcolm, M.A.: File system design for an NFS file server appliance. In: USENIX winter, vol. 94 (1994)
[17]
Husain, M.I., Ko, S.Y., Uurtamo, S., Rudra, A., Sridhar, R.: Bidirectional data verification for cloud storage. J. Netw. Comput. Appl. 45, 96---107 (2014)
[18]
J. Axboe. Fiobenchmark. http://freecode.com/projects/fio
[19]
Jiang, W., Ma, Y., Zhang, X., Wang, X., Shao, Z.: Adaptive security management of real-time storage applications over NAND based storage systems. J. Netw. Comput. Appl. 52, 139---153 (2015)
[20]
Kang, J., Zhang, B., Wo, T., Yu, W., Du, L., Ma, S., Huai, J.: SpanFS: a scalable file system on fast storage devices. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), pp. 249---261 (2015)
[21]
Kannan, S., Gavrilovska, A., Schwan, K.: pVM: persistent virtual memory for efficient capacity scaling and object storage. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, pp. 13:1---13:16. ACM, New York (2016)
[22]
Katti, R.R., Stadler, H.L., Wu, J.-C. Non-volatile magnetic random access memory. US Patent 5,289,410, 22 Feb 1994
[23]
Kim, H., Seshadri, S., Dickey, C.L., Chiu, L.: Evaluating phase change memory for enterprise storage systems: a study of caching and tiering approaches. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14), pp. 33---45. USENIX, Santa Clara 2014
[24]
Kim, H., Seshadri, S., Dickey, C.L., Chiu, L. Evaluating phase change memory for enterprise storage systems: a study of caching and tiering approaches. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14), pp. 33---45 (2014)
[25]
Kim, Y., Tauras, B., Gupta, A., Urgaonkar, B. Flashsim: A simulator for nand flash-based solid-state drives. In: First International Conference on Advances in System Simulation, 2009. SIMUL'09, pp. 125---131. IEEE (2009)
[26]
Lee, C., Sim, D., Hwang, J., Cho, S.: F2FS: A new file system for flash storage. In: 13th USENIX Conference on File and Storage Technologies (FAST 15), pp. 273---286 (2015)
[27]
Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L.: The New ext4 filesystem: current status and future plans. In: In Ottawa Linux Symposium. http://ols.108.redhat.com/2007/ Reprints/mathur-Reprint.pdf (2007)
[28]
McKusick, M.K., Joy, W.N., Leffler, S.J., Fabry, R.S.: A Fast File System for UNIX. ACM Trans. Comput. Syst. 2(3), 181---197 (1984)
[29]
NVM Express. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_1.pdf
[30]
Oi, H.: A case study: performance evaluation of a DRAM-based solid state disk. In: Japan-China Joint Workshop on Frontier of Computer Science and Technology, 2007 (FCST 2007), pp. 57---60
[31]
Ou, J., Shu, J., Lu, Y.: A high performance file system for non-volatile main memory. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, pp. 12:1---12:16. ACM, New York (2016)
[32]
Peter, S., Li, J., Zhang, I., Ports, D.R., Woos, D., Krishnamurthy, A., Anderson, T., and Roscoe, T. Arrakis: The operating system is the control plane. In: Proceedings of the 11th Symposium on Operating System Design and Implementation (OSDI14) (2014)
[33]
Prabhakaran, V., Bairavasundaram, L.N., Agrawal, N., Gunawi, H.S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: IRON file systems. In: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP '05, pp. 206---220. ACM, New York (2005)
[34]
Raoux, S., Burr, G., Breitwisch, M., Rettner, C., Chen, Y., Shelby, R., Salinga, M., Krebs, D., Chen, S.-H., Lung, H.L., Lam, C.: Phase-change random access memory: a scalable technology. IBM J. Res. Dev. 52(4.5), 465---479 (2008)
[35]
Rodeh, O., Bacik, J., Mason, C.: BTRFS: The Linux B-tree filesystem. ACM Trans. Storage (TOS) 9(3), 9 (2013)
[36]
Santos, J. FFSB (flexible file system benchmark). http://sourceforge.net/projects/ffsb/
[37]
Sato, K., Mohror, K., Moody, A., Gamblin, T., d. Supinski, B. R., Maruyama, N., Matsuoka, S.: A user-level infiniband-based file system and checkpoint strategy for burst buffers. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 21---30 (2014)
[38]
Seppanen, E., O'Keefe, M., Lilja, D.: High performance solid state storage under Linux. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1---12 (2010)
[39]
Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., Peck, G.: Scalability in the XFS file system. In: USENIX Annual Technical Conference, vol. 15 (1996)
[40]
TAILWINDSTORAGE. Extreme S3804. http://www.taejin.co.kr
[41]
Volos, H., Tack, A.J., Swift, M.M.: Mnemosyne: lightweight persistent memory. SIGPLAN Not. 47(4), 91---104 (2011)
[42]
Vuă¿inić, D., Wang, Q., Guyot, C., Mateescu, R., Blagojević, F., Franca-Neto, L., Le Moal, D., Bunker, T., Xu, J., Swanson, S., et al.: DC express: shortest latency protocol for reading phase change memory over PCI express. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14), pp. 309---315 (2014)
[43]
Woodhouse, D.: JFFS: the journalling flash file system. In: Ottawa linux symposium, vol. 2001 (2001)
[44]
Wu, M., Zwaenepoel, W.: eNVy: a non-volatile, main memory storage system. In: ACM SIGOPS Operating Systems Review, vol. 28, pp. 86---97. ACM (1994)
[45]
Wu, X., Reddy, A.L.N.: SCMFS: a file system for storage class memory. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11) pp. 39:1---39:11. ACM, New York (2011)
[46]
Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: 14th USENIX Conference on File and Storage Technologies (FAST 16), pp. 323---338. USENIX Association, Santa Clara (2016)
[47]
Yang, J., Minturn, D.B., Hady, F.: When poll is better than interrupt. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST'12, p. 3. USENIX Association, Berkeley (2012)
[48]
Yu, Y.J., Shin, D.I., Shin, W., Song, N.Y., Choi, J.W., Kim, H.S., Eom, H., Yeom, H.Y.: Optimizing the block I/O subsystem for fast storage devices. ACM Trans. Comput. Syst. 32(2), 6 (2014)
[49]
Zhang, J., Shu, J., Lu, Y. ParaFS: a log-structured file system to exploit the internal parallelism of flash devices. In: 2016 USENIX Annual Technical Conference (USENIX ATC 16) (2016)

Cited By

View all
  • (2022)Efficient hybrid polling for ultra-low latency storage devicesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102338122:COnline publication date: 1-Jan-2022
  • (2019)High-performance internet file system based on multi-download for convergence computing in mobile communication systemsCluster Computing10.1007/s10586-018-2885-522:4(1057-1071)Online publication date: 1-Dec-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Cluster Computing
Cluster Computing  Volume 20, Issue 3
September 2017
926 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2017

Author Tags

  1. Fast storage device
  2. I/O stack
  3. Linux

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Efficient hybrid polling for ultra-low latency storage devicesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102338122:COnline publication date: 1-Jan-2022
  • (2019)High-performance internet file system based on multi-download for convergence computing in mobile communication systemsCluster Computing10.1007/s10586-018-2885-522:4(1057-1071)Online publication date: 1-Dec-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media