[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

The Zebra striped network file system

Published: 01 August 1995 Publication History

Abstract

Zebra is a network file system that increases throughput by striping the file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an approach similar to a log-structured file system. This provides high performance for writes of small files as well as for reads and writes of large files. Zebra also writes parity information in each stripe in the style of RAID disk arrays; this increases storage costs slightly, but allows the system to continue operation while a single storage server is unavailable. A prototype implementation of Zebra, built in the Sprite operating system, provides 4–5 times the throughput of the standard Sprite file system or NFS for large files and a 15–300% improvement for writing small files.

References

[1]
ANDERSON, T. E., CULLER, D. E., AND PATTERSON, D. A. 1995. A case for NOW (Networks of Workstations). IEEE M~cro. 15, 1 (Feb.), 54-64.]]
[2]
BAKER, M. AND SULLIVAN, M. 1992. The Recovery Box: Using fast recovery to provide high availability in the UNIX environment. In Proceedings of the Summer 1992 USENIX Conference (June). USENIX Assoc., Berkeley, Calif., 31-43.]]
[3]
BAKER, M., ASAMI, S., DEPRIT, E., AND OUSTERHOUT, J. 1992. Non-volatile memory for fast, reliable file systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems ( ASPLOS) (Boston, Mass., Oct.). ACM, New York, 10-22.]]
[4]
BAKER, M. G., HARTMAN, J. H., KUPFER, M. D., SmRmFF, K. W., AND OUSTER~OUT, J. K. 1991. Measurements of a distributed file system. In Proceedings of the 13th Symposmm on Operating Systems Principles (SOSP) (Asilomar, Calif., Oct.) ACM SIGOPS Oper. Syst. Rev. 25, 5, 198-212.]]
[5]
BERNSTEIN, P. A. AND GOODMAN, N. 1981. Concurrency control in d~stributed database systems. ACM Comput Surv. 13, 2 (June), 185-222.]]
[6]
BmRELL, A. D., LEVIN, R., NKEDHAM, R. M., AND SCHROEDER, M. D. 1982. Grapevine. An exercise in distributed computing. Commun. ACM 25, 4 (Apt), 260-274.]]
[7]
CABRE~, L.-F. AND LONG, D. D. E. 1991. Swift: Using distributed disk striping to provide high I/O data rates. Comput. Syst. 4, 4 (Fall), 405-436.]]
[8]
CAO, P., LIM, S. B., VEN~TARAMAN, S., AND WmKEs, J. 1993. The TickerTAIP parallel RAID architecture. In Proceedings of the 20th Annual International Symposium of Computer Architecture (May). ACM/IEEE, New York~ 52-63.]]
[9]
CHEN, r. M. AND PATTERSON, D. A. 1990. Maximizing performance in a striped disk array. In Proceedings of the 17th Annual International Symposium of Computer Architecture (May). ACM/IEEE, New York, 322-331.]]
[10]
CHUTANI, S., ANDERSON, O. T., KAZAR, M. L., LEVERETT, B. W., MASON, W. A.~ AND SIDEBOTHAM, R. N. 1992. The Episode Ffie System. In Proceedtngs of the Winter 1992 USENIX Conference (Jan.). USENIX Assoc, Berkeley, Calif., 43-60.]]
[11]
DIBBLE, P. C., SCOTT, M. L., AND ELLIS, C. S. 1988. Bridge: A high-performance file system for parallel processors. In Proceedings of the 8th International Conference on D~str~buted Computing Systems (ICDCS). IEEE. New York, 154-161.]]
[12]
DRAPEAU~ A L., SHIRRIFF, K., HARTMAN~ J. H, MILLER, E L, SESSAN, S., KATZ, R H., LUTZ, K., PATTERSON, D. A., LEE, E. K., CHEN, P. M., AND GIBSON, G. A. 1994. RAID-II: A high-bandwidth network file server. In Procee&ngs of the 21st Annual International Symposium of Computer Architecture (Apr.). ACM/IEEE, New York, 234-244.]]
[13]
FLOYD, R. A. AND ELLIS, C. S. 1989. Directory reference patterns in hierarchical file systems. IEEE Trans. Knowl. Data Eng. 1, 2 (June), 238-247.]]
[14]
FREEH, V. W., LOWENTHAL, D. K., AND ANDREWS, G. R. 1994. Distributed filaments: Efficient fine-grain parallehsm on a cluster of workstations. In Proceedings of the 1st USENIX Symposium on Operating' Systems Design and Implementation (OSDI) (Nov.). USENIX Assoc., Berkeley, Calif., 201-213.]]
[15]
GuY, R. G., HEIDEMANN, J. S., MAK, W., PAGE, T. W., JR., POPEK, G. J., AND ROTHMEIER, D. 1990. Implementation of the Ficus replicated file system. In Proceedings of the Summer 1990 USENIX Conference (Anaheim, Calif., June). USENIX Assoc., Berkeley, Calif., 63-71.]]
[16]
HAGMANN, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 13th Symposium on Operating Systems Principles (SOSP) (Nov.). ACM SIGOPS Oper. Syst. Rev. 21, 5, 155-162.]]
[17]
HARTMAN, J. H. AND OUSTERHOUT, J. K. 1993. Letter to the editor. ACM SIGOPS Oper. Syst. Rev. 27, i (Jan.), 7-10.]]
[18]
HISGEN, A., BIRRELL, A., MANN, T., SCHROEPER, M., AND SWART, G. 1989. Availability and consistency tradeoffs in the Echo distributed file system. In Proceedings of the 2nd Workshop on Workstation Operating Systems (Sept.). IEEE, New York, 49-54.]]
[19]
HOWARD, J. H., KAZAg, M. L., MENEES, S. G., NICHOLS, D. A., SATYANARAYANAN, M., SIDEBOTHAM, R. N., AND WEST, M. J. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 51-81.]]
[20]
LISKOV, B., GHEMAWAT, S., GRUBER, R., JOHNSON, P., SHRIRA, L., AND WILLIAMS, M. 1991. Replication in the Harp file system. In Proceedings of the 13th Symposium on Operating Systems Principles (SOSP) (Asilomar, Calif., Oct.). ACM SIGOPS Oper. Syst. Rev. 25. 5, 226-238.]]
[21]
LONG, D. D. E., MONTAGUE, B. R., AND CABRERA, L.-F. 1994. Swift/RAID: A distributed RAID system. Comput. Syst. 7, 3 (Summer), 333-359.]]
[22]
Lo VERSO, S. J., ISMAN, M., NANOPOULOS, A., NESHEIM, W., MILNE, E. D., AND WHEELER, R. 1993. sfs: A parallel file system for the CM-5. In Proceedings of the Summer 1993 USENIX Conference (Cincinnati, Ohio, June). USENIX Assoc., Berkeley, Calif., 291-305.]]
[23]
McKuslcK, M. K., JoY, W. N., LEFFLER, S. J., AND FABRY, R. S. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3 (Aug.), 181-197.]]
[24]
McVoY, L. W. AND KLEIMAN, S. R. 1991. Extent-like performance from a UNIX file system. In Proceedings of the Winter 1991 USENIX Conference (Dallas, Tex., Jan.). USENIX Assoc., Berkeley, Calif., 33-43.]]
[25]
NELSON, M. N., WELCH, B. B., AND OUSTERHOUT, J. K. 1988. Caching in the Sprite network file system. ACM Trans. Comput. Syst. 6, I (Feb.), 134-154.]]
[26]
OUSTERHOUT, J. 1995. A critique of Seltzer's 1993 USENIX paper. Available as http://www.smli.com/~ ouster/seltzer93.html.]]
[27]
OUSTERHOUT, J., CHERENSON, h., DOUGLIS, F., NELSON, M., AND WELCH, B. 1988. The Sprite network operating system. IEEE Comput. 21, 2 (Feb.), 23-36.]]
[28]
PATTERSON, D. A., GIBSON, G., AND KATZ, R. H. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the 1988 ACM Conference on Management of Data (SIGMOD) (Chicago, Ill., June). ACM, New York, 109-116.]]
[29]
PIERCE, P. 1989. A concurrent file system for a highly parallel mass storage subsystem. In Proceedings of the 4th Conference on Hypercubes (Monterey, Calif., Mar.). ACM, New York, 155-160.]]
[30]
ROSENBLUM, M. AND OUSTERHOUT, J. K. 1991. The design and implementation of a log-structured file system. In Proceedings of the 13th Symposium on Operating Systems Principles (SOSP) (Asilomar, Calif., Oct.). ACM SIGOPS Oper. Syst. Rev. 25, 5, 1-15.]]
[31]
SATYANARAYANAN, M., KISTLER, J. J., KUMAR, P., OKASAKI, M. E., SIEGEL, E. H., AND STEERE, D. C. 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4 (Apr.). 447-459.]]
[32]
SCHLOSS, G. A. AND STONEBRAKER, M. 1990. Highly redundant management of distributed data. In Proceedings of the IEEE Workshop on the Management of Replicated Data (Nov.). IEEE, New York, 91-95.]]
[33]
SELTZER, M, BOSTIC, K., McKusICK, M. K., AND STAELIN, C. 1993. An implementation of a log~structured file system for UNIX In Proceedings of the W~nter 1993 USENIX Conference (San Diego, Calif., Jan.). USENIX Assoc., Berkeley, Calif., 307-326.]]
[34]
SELTZER, M., SMITH, K. A., BALAKRISHNAN, H., CHANG, J., MCMAINS~ S., AND PADMANABHAN, V. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the W~nter 1995 USENIX Conference (Jan.). USENIX Assoc., Berkeley, Calif., 249-264.]]
[35]
SHELTZER, A. B., LINDELL, R., AND POPEK, G. J. 1986. Name service locality and cache design in a distributed operating system. In Proceedings of the 6th International Conference on Distr~buted Computing Systems (ICDCS) (May). IEEE, New York, 515-522.]]
[36]
SHIRRIFF, K. AND OUSTERHOUT, J. 1992. A trace-driven analysis of name and attribute caching in a distributed file system In Proceedings of the W~nter 1992 USENIX Conference (Jan.). USENIX Assoc, Berkeley, Calif., 315-331.]]
[37]
SIEGEL, A., BIRMAN, K., AND MARZULLO, K. 1990. Deceit: A flexible distributed file system. In Proceedings of the Summer 1990 USENIX Conference (Anaheim, Calif., June). USENIX Assoc., Berkeley, Calif., 51-61.]]
[38]
WALKER, B., POPEK, G., ENGLISH, R., KLINE, C., AND THIEL, G. 1983. The LOCUS d~stributed operating system. In Proceedings of the 9th Symposium on Operating Systems Principles (SOSP) (Nov.) ACM SYGOPS Oper. Syst. Rev 17, 5, 49-70.]]
[39]
WILKES, J. 1992. DataMesh research project, phase 1. In Proceedings of the USENIX Fde Systems Workshop (May). USENIX Assoc, Berkeley, Calif., 63 69.]]

Cited By

View all

Recommendations

Reviews

David Michael Bowen

Sometimes computer science moves forward in leaps as new ideas change the discipline. Other times it moves ahead in smaller steps, as ideas that have worked in one area are applied to others. The Zebra striped network file system is the result of one of these smaller steps. It takes two recent ideas—striped file systems, the basis for RAID technology, and log-based file systems—and combines them into a new approach for a network file system that promises to reduce many of the bottlenecks found in present technology. Disk striping has two disadvantages in its usual implementation. As the degree of striping increases, the size of its most efficient data transfer gets larger than the average file size, and the transfer rate of the striped disks exceeds the bandwidth of the server. Zebra avoids the second problem by splitting the stripe group over multiple servers, maintaining the practice of keeping an extra disk with parity information. It avoids the first problem by having each client gather all of the changes into a single log kept in memory and then writing the log to disk in the large transfers a striped system finds most efficient. The penalty for this log-based approach is the need to clean the log file as items in the stripes become outdated. This paper provides the details of each component in the system: clients, servers, the file manager, and the stripe cleaner. It also provides performance comparisons with the Berkeley Sprite file system and a standard NFS system running on the same hardware, and gives some ideas for further improvement. I find the ideas presented here and in Anderson et al. [1], the next step in this line of development, interesting, and I suspect that they may represent the future in distributed file systems. If you are interested in new developments and general trends in file systems, this paper is must reading.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 13, Issue 3
Aug. 1995
106 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/210126
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1995
Published in TOCS Volume 13, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RAID
  2. log-based striping
  3. log-structured file system
  4. parity computation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)118
  • Downloads (Last 6 weeks)24
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Oasis: Controlling Data Migration in Expansion of Object-based Storage SystemsACM Transactions on Storage10.1145/356842419:1(1-22)Online publication date: 19-Jan-2023
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Redundant Arrays of Independent Disks - RAIDStorage Systems10.1016/B978-0-32-390796-5.00014-0(269-336)Online publication date: 2022
  • (2021)Toward Uncensorable, Anonymous and Private Access Over Satoshi BlockchainsProceedings on Privacy Enhancing Technologies10.2478/popets-2022-00112022:1(207-226)Online publication date: 20-Nov-2021
  • (2020)MAPXProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386693(1-12)Online publication date: 24-Feb-2020
  • (2019)HarmoniaProceedings of the VLDB Endowment10.14778/3368289.336830113:3(376-389)Online publication date: 1-Nov-2019
  • (2019)URSAProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303967(1-17)Online publication date: 25-Mar-2019
  • (2019)Leveraging Glocality for Fast Failure Recovery in Distributed RAM StorageACM Transactions on Storage10.1145/328960415:1(1-24)Online publication date: 18-Feb-2019
  • (2018)RoVEr: Robust and Verifiable Erasure Code for Hadoop Distributed File Systems2018 27th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2018.8487406(1-9)Online publication date: Jul-2018
  • (2018)Block Placement in Distributed File Systems Based on Block Access FrequencyIEEE Access10.1109/ACCESS.2018.28515716(38411-38420)Online publication date: 2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media