[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3189759.3189763guideproceedingsArticle/Chapter ViewAbstractPublication PagesfastConference Proceedingsconference-collections
Article

WAFL iron: repairing live enterprise file systems

Published: 12 February 2018 Publication History

Abstract

Consistent and timely access to an arbitrarily damaged file system is an important requirement of enterpriseclass systems. Repairing file system inconsistencies is accomplished most simply when file system access is limited to the repair tool. Checking and repairing a file system while it is open for general access present unique challenges. In this paper, we explore these challenges, present our online repair tool for the NetApp® WAFL® file system, and show how it achieves the same results as offline repair even while client access is enabled. We present some implementation details and evaluate its performance. To the best of our knowledge, this publication is the first to describe a fully functional online repair tool.

References

[1]
Checking ZFS file system integrity. https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html#scrolltoc.
[2]
Disks and aggregates power guide. https://library.netapp.com/ecm/ecm_download_file/ECMLP2496263.
[3]
How you schedule automatic raid-level scrubs. https://library.netapp.com/ecmdocs/ECMP1196912/html/GUID-A2F3A870-5C8D-4A68-AC8C-912946CECAC0.html.
[4]
Kernel Bug Tracker. http://bugzilla.kernel.org/.
[5]
Linux btrfs blog posts. http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html.
[6]
Metrocluster for clustered data ontap 8.3.2. https://storageconsortium.de/content/sites/default/files/WP_NetApp%20Metrocluster%20for%20Clustered%20Data%20ONTAP%208.3.2.pdf.
[7]
Scalability and performance using flexgroup volumes power guide. http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.pow-fg-mgmt%2FGUID-A304BBC1-C00C-4E7A-989E-7C5A0E505146.html.
[8]
Wendy Bartlett and Lisa Spainhower. Commercial Fault Tolerance: A Tale of Two Systems. IEEE Transactions on Dependable and Secure Computing, 1(1):87-96, 2004.
[9]
Robert Baumann. Soft errors in advanced computer systems. IEEE Design & Test of Computers, 22(3):258-266, 2005.
[10]
Steve Best. JFS Overview. http://www.ibm.com/developerworks/library/l-jfs.html, 2000.
[11]
Sushrut Bhowmik, Vinay Kumar, Sreenath Korrakuti, Arun Pandey, and Sateesh Pola. Automatic incremental repair of granular filesystem objects. pending patent application.
[12]
Eric J. Bina and Perry A. Emrath. A Faster fsck for BSD Unix. In Proceedings of the USENIX Winter Conference, January 1989.
[13]
Jeff Bonwick and Bill Moore. ZFS: The Last Word in File Systems. http://opensolaris.org/os/community/zfs/docs/zfs\_last.pdf, 2007.
[14]
John Chapin, Mendel Rosenblum, Scott Devine, Tirthankar Lahiri, Dan Teodosiu, and Anoop Gupta. Hive: Fault Containment for Shared-Memory Multiprocessors. In Proceedings of the fifteenth ACM Symposium on Operating Systems Principles (SOSP), pages 12-25, 1995.
[15]
Peter Corbett, Bob English, Atul Goel, Tomislav Grcanac Steven Kleiman, James Leong, and Sunitha Sankar. Row-diagonal parity for double disk failure correction. In Proceedings of Conference on File and Storage Technologies (FAST), 2004.
[16]
Storage Performance Council. Storage performance council-1 benchmark. www.storageperformance.org/results/#spc1_overview.
[17]
Matthew Curtis-Maury, Vinay Devadas, Vania Fang, and Aditya Kulkarni. To waffinity and beyond: A scalable architecture for incremental parallelization of file system code. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 419-434, 2016.
[18]
Matthew Curtis-Maury, Ram Kesavan, and Mrinal K. Bhattacharjee. Scalable write allocation in the WAFL file system. In Proceedings of the International Conference on Parallel Processing (ICPP), August 2017.
[19]
Peter Denz, Matthew Curtis-Maury, and Vinay Devadas. Think global, act local: A buffer cache design for global ordering and parallel processing in the WAFL file system. In Proceedings of the International Conference on Parallel Processing (ICPP), August 2016.
[20]
John K. Edwards, Daniel Ellard, Craig Everhart, Robert Fair, Eric Hamilton, Andy Kahn, Arkady Kanevsky, James Lentini, Ashish Prakash, Keith A. Smith, and Edward Zayas. FlexVol: flexible, efficient file volume virtualization in WAFL. In Proceedings of the 2008 USENIX Annual Technical Conference, pages 129-142, Jun 2008.
[21]
Daniel Fryer, Kuei Sun, Rahat Mahmood, Ting-Hao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. Recon: Verifying file system consistency at runtime. In Proceedings of 10th USENIX Conference on File and Storage Technologies (FAST), February 2012.
[22]
Gregory R. Ganger and Yale N. Patt. Metadata Update Performance in File Systems. In Proceedings of 1st USENIX Conference on Operating Systems Design and Implementation (OSDI), November 1994.
[23]
Jim Gray. Why do computers stop and what can be done about it? Tandem Technical Report 85.7, June 1985.
[24]
Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. SQCK: A Declarative File System Checker. In Proceedings of 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.
[25]
Robert Hagmann. Reimplementing the Cedar File System Using Logging and Group Commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP), November 1987.
[26]
Val Henson. The Many Faces of fsck. http://lwn.net/Articles/248180/, 2007.
[27]
Val Henson, Zach Brown, Theodore Ts'o, and Arjan van de Ven. Reducing fsck time for ext2 file systems. In Linux Symposium, pages 395-407, 2006.
[28]
Val Henson, Arjan van de Ven, Amit Gud, and Zach Brown. Chunkfs: Using divide-and-conquer to improve file system reliability and repair. In Proceedings of the 2nd Conference on Hot Topics in System Dependency (HotDep), 2006.
[29]
Dave Hitz, James Lau, and Michael Malcolm. File system design for an NFS file server appliance. In Proceedings of USENIX Winter 1994 Technical Conference, pages 235-246, Jan 1994.
[30]
Microsoft Inc. Building the next generation file system for windows: Refs. https://blogs.msdn.microsoft.com/b8/2012/01/16/building-the-next-generation-file-system-for-windows-refs/, 2012.
[31]
NetApp Inc. Overview of wafliron. https://kb.netapp.com/support/index?page=content\&id=3011877, 2016.
[32]
Ram Kesavan, Rohit Singh, Travis Grusecki, and Yuvraj Patel. Algorithms and data structures for efficient free space reclamation in wafl. In 15th USENIX Conference on File and Storage Technologies (FAST), 2017.
[33]
Ram Kesavan, Rohit Singh, Travis Grusecki, and Yuvraj Patel. Efficient free space reclamation in WAFL. ACM Transactions on Storage (TOS), 13, October 2017.
[34]
Harendra Kumar, Yuvraj Patel, Ram Kesavan, and Sumith Makam. High performance metadata integrity protection in the WAFL copy-on-write file system. In 15th Usenix Conference on File and Storage Technologies (FAST), 2017.
[35]
Xin Li, Kai Shen, Michael C. Huang, and Lingkun Chu. A memory soft error measurement on production systems. In USENIX Annual Technical Conference (ATC), June 2007.
[36]
Ao Ma, Chris Dragga, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Marshall Kirk Mckusick. Ffsck: The fast file-system checker. ACM Transactions on Storage (TOS), 10(1):2:1-2:28, January 2014.
[37]
Joshua MacDonald, Hans Reiser, and Alex Zarochentcev. http://www.namesys.com/txndoc.html, 2002.
[38]
T. C. May and M. H. Woods. Alpha-particle-induced soft errors in dynamic memories. IEEE Tranactions on Electron Devices, 26(1), 1979.
[39]
Marshall Kirk McKusick. Running 'fsck' in the Background. In BSDCon '02, 2002.
[40]
Marshall Kirk McKusick, Willian N. Joy, Samuel J. Leffler, and Robert S. Fabry. Fsck - The UNIX File System Check Program. Unix System Manager's Manual - 4.3 BSD Virtual VAX-11 Version, 1986.
[41]
T. J. O'Gorman, J. M. Ross, A. H. Taber, J. F. Ziegler, H. P. Muhlfeld, C. J. Montrose, H. W. Curtis, and J. L. Walsh. Field testing for cosmic ray soft errors in semiconductor memories. IBM Journal of Research and Development, 40(1):41-50, 1996.
[42]
Justin Parisi. Netapp flexgroup volumes: An evolution of nas. https://blog.netapp.com/blogs/netapp-flexgroup-volumes-an-evolution-of-nas/.
[43]
David Patterson, Garth Gibson, and Randy Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In ACM SIGMOD International Conference on Management of Data, pages 109-116, June 1988.
[44]
Mendel Rosenblum and John Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems, 10(1), 1992.
[45]
Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. DRAM errors in the wild: A Large-Scale Field Study. In Proceedings of the 2009 Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS/ Performance '09), Seattle, Washington, June 2007.
[46]
Thomas J.E. Schwarz, Qin Xin, Ethan L. Miller, Darrell D.E. Long, Andy Hospodor, and Spencer Ng. Disk Scrubbing in Large Archival Storage Systems. In IEEE 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2004.
[47]
Christopher A. Stein, John H. Howard, and Margo I. Seltzer. Unifying File System Protection. In Proceedings of USENIX Annual Technical Conference, pages 79-90, June 2001.
[48]
Rajesh Sundaram. The Private Lives of Disk Drives. http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html, 2006.
[49]
Stephen C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, 1998.
[50]
Wikipedia. Btrfs. en.wikipedia.org/wiki/Btrfs, 2009.
[51]
Yichen Xie, Andy Chou, and Dawson Engler. ARCHER: using symbolic, path-sensitive analysis to detect memory access errors. In Proceedings of the 9th European software engineering conference (FSE), pages 327-336, September 2003.
[52]
Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. End-to-end data integrity for file systems: A ZFS case study. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST), 2010.
[53]
J. F. Ziegler and W. A. Lanford. Effect of cosmic rays on computer memories. Science, 206(4420):776-788, 1979.

Cited By

View all
  • (2019)Flexgroup volumesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358820(135-148)Online publication date: 10-Jul-2019
  • (2018)Efficient Search for Free Blocks in the WAFL File SystemProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225072(1-10)Online publication date: 13-Aug-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
FAST'18: Proceedings of the 16th USENIX Conference on File and Storage Technologies
February 2018
339 pages
ISBN:9781931971423

Sponsors

  • VMware
  • NetApp
  • Google Inc.
  • IBMR: IBM Research
  • NSF

Publisher

USENIX Association

United States

Publication History

Published: 12 February 2018

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Flexgroup volumesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358820(135-148)Online publication date: 10-Jul-2019
  • (2018)Efficient Search for Free Blocks in the WAFL File SystemProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225072(1-10)Online publication date: 13-Aug-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media