[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1048935.1050203acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

An Efficient Data Location Protocol for Self.organizing Storage Clusters

Published: 15 November 2003 Publication History

Abstract

Component additions and failures are common for large-scale storage clusters in production environments. To improve availability and manageability, we investigate and compare data location schemes for a large self-organizing storage cluster that can quickly adapt to the additions or departures of storage nodes. We further present an efficient location scheme that differentiates between small and large file blocks for reduced management overhead compared to uniform strategies. In our protocol, small blocks, which are typically in large quantities, are placed through consistent hashing. Large blocks, much fewer in practice, are placed through a usage-based policy, and their locations are tracked by Bloom filters. The proposed scheme results in improved storage utilization even with non-uniform cluster nodes. To achieve high scalability and fault resilience, this protocol is fully distributed, relies only on soft states, and supports data replication. We demonstrate the effectiveness and efficiency of this protocol through trace-driven simulation.

References

[1]
{1} Ask Jeeves, Inc. URL http://www.ask.com/.
[2]
{2} CXFS: A high-performance, multi-OS SAN file system from SGI. SGI White Paper. URL http://www.sgi.com/products/storage/ cxfs.html.
[3]
{3} NFS: Network File System version 3 protocol specification. Technical Report SUN Microsystems, 1994.
[4]
{4} D. Anderson, J. Chase, and A. Vahdat. Interposed request routing for scalable network storage. In Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 00), October 2000.
[5]
{5} T. Anderson, M. Dahlin, J. Neefe, D. Patterson, D. Roselli, and R. Wang. Serverless network file systems. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP 95), December 1995.
[6]
{6} M. Baker, J. Hartman, M. Kupfer, K. Shirriff, and J. Ousterhout. Measurements of a distributed file system. In Proceedings of the 13th ACM symposium on Operating systems principles (SOSP 91), pages 198-212, Pacific Grove, CA, 1991. ACM Press. ISBN 0-89791-447-3.
[7]
{7} B. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the Association for Computing Machinery, 13(7): 422-426, 1970.
[8]
{8} S. A. Brandt, L. Xue, E. L. Miller, and D. D. E. Long. Efficient metadata management in large distributed file systems. In Proceedings of the 20th IEEE / 11th NASA Goddard Conference on Mass Storage Systems and Technologies, pages 290-298, April 2003.
[9]
{9} A. Brinkmann, K. Salzwedel, and C. Scheideler. Compact, adaptive placement schemes for non-uniform requirements. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA 02), pages 53-62, Winnipeg, Manitoba, Canada, 2002. ACM Press.
[10]
{10} P. Carns, W. Ligon III, R. Ross, and R. Thakur. PVFS: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317-327, Atlanta, GA, 2000. USENIX Association.
[11]
{11} M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. Wallach. Security for structured peer-to-peer overlay networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 02), Boston, MA, December 2002.
[12]
{12} C. Chang, B. Moon, A. Acharya, C. Shock, A. Sussman, and J. Saltz. Titan: a high-performance remote-sensing database. In Proceedings of the 13th International Conference on Data Engineering (ICDE 97), Birmingham, U.K., 1997.
[13]
{13} J. Chase, D. Anderson, P. Thakur, and A. Vahdat. Managing energy and server resources in hosting centers. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 01), October 2001.
[14]
{14} D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In Proceedings of SuperComputing, 2002.
[15]
{15} P. F. Corbett, D. G. Feltelson, J-P. Prost, G. S. Almasi, S. J. Baylor, A. S. Bolmarcich, Y. Hsu, J. Satran, M. Snir, R. Colao, B. D. Herr, J. Kavaky, T. R. Morgan, and A. Ziotek. Parallel file systems for the IBM SP computers. IBM Systems Journal, 34(2): 222-248, 1995. ISSN 0018-8670.
[16]
{16} F. Dabek, F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 01), Chateau Lake Louise, Banff, Canada, Octorber 2001.
[17]
{17} A. Demers, D. Greene, C. Hauser, W. Irish, and J. Larson. Epidemic algorithms for replicated database maintenance. In Proceedings of the 6th Annual ACM Symposium on Principles of Distributed Computing (PODC 87), pages 1-12. ACM Press, 1987.
[18]
{18} L. Fan, P. Cao, J. Almeida, and A. Broder. Summary Cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3): 281-293, 2000.
[19]
{19} J. Hartman, I. Murdock, and T. Spalink. The Swarm scalable storage system. In Proceedings of International Conference on Distributed Computing Systems, pages 74-81, 1999.
[20]
{20} J. Hartman and J. Ousterhout. The Zebra striped network file system. ACM Transactions on Computer Systems (TOCS), 13 (3): 274-310, 1995. ISSN 0734-2071.
[21]
{21} K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao. Distributed object location in a dynamic network. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA 02), pages 41-52, August 2002.
[22]
{22} R. J. Honicky and E. L. Miller. A fast algorithm for online placement and reorganization of replicated data. In Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 03), Nice, France, April 2003.
[23]
{23} D. Karger, E. Lehman, T. Leighton, M. Levine, D. Levin, and R. Panigraphy. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of ACM Symposium on Theory of Computing (STOC 97), pages 654-663, 1997.
[24]
{24} J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP 91), volume 25, pages 213-225. ACM Press, 1991.
[25]
{25} J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 00). ACM, November 2000.
[26]
{26} E. Lee and C. Thekkath. Petal: Distributed virtual disks. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 96), pages 84-92, Cambridge, MA, 1996.
[27]
{27} E. Lee, C. Thekkath, C. Whitaker, and J. Hogg. A Comparison of Two Distributed Disk Systems. Technical Report 155, Compaq (DEC) System Research Center, April 1998.
[28]
{28} W. Litwin, M-A. Neimat, and D. Schneider. LH* -- Linear Hashing for distributed files. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of data, pages 327-336, Washington, DC, 1993. ACM Press. ISBN 0-89791-592-5.
[29]
{29} W. Litwin, M-A. Neimat, and D. Schneider. LH* -- A scalable, distributed data structure. ACM Transactions on Database Systems (TODS), 21(4): 480-525, 1996.
[30]
{30} W. Litwin and T. Schwarz. LH*RS : A high-availability scalable distributed data structure using reed solomon codes. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, pages 237-248, 2000.
[31]
{31} T. Liu and M. Martonosi. Impala: A middleware system for managing autonomic parallel sensor systems. In Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 03), San Diego, CA, June 2003.
[32]
{32} S. Mullender and A. Tanenbaum. A distributed file service based on optimistic concurrency control. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (SOSP 85), pages 51-62, Orcas Island, WA, 1985. ACM Press. ISBN 0-89791-174-1.
[33]
{33} J. Ousterhout, A. Cherenson, F. Douglis, M. Nelson, and B. Welch. The Sprite network operating system. IEEE Computer Magazine, 21(2), 1988.
[34]
{34} D. Patterson, K. Asanovic, A. Brown, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, C. Kozyrakis, D. Martin, S. Perissakis, R. Thomas, N. Treuhaft, and K. Yelick. Intelligent RAM (IRAM): The industrial setting, applications, and architectures. In Proceedings of the International Conference on Computer Design (ISCA 97), 1997.
[35]
{35} S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A scalable content-addressable network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 01), pages 161-172, San Diego, CA, August 2001. ACM Press. ISBN 1-58113-411-8.
[36]
{36} S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz. Pond: The OceanStore prototype. In Proceedings of the 2nd Conference on File and Storage Technologies (FAST 03), pages 59-72, San Francisco, CA, March 2003.
[37]
{37} F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the First Conference on File and Storage Technologies (FAST 02), Monterey, CA, January 2002.
[38]
{38} K. Shen, T. Yang, L. Chu, J. L. Holliday, D. A. Kuschner, and H. Zhu. Neptune: Scalable replication management and programming support for cluster-based network services. In Proceedings of the 3rd USENIX Symposium on Internet Technologies and Systems (USITS 01), pages 197-208, San Francisco, CA, March 2001.
[39]
{39} I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 01), pages 149-160, San Diego, CA, August 2001. ACM Press.
[40]
{40} C. Thekkath, T. Mann, and E. Lee. Frangipani: A scalable distributed file system. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP 97), pages 224-237, 1997.
[41]
{41} W. Vogels. File system usage in Windows NT 4.0. In Proceedings of the 17th ACM symposium on Operating systems principles (SOSP 99), pages 93-109, Charleston, SC, 1999. ACM Press. ISBN 1-58113-140-2.
[42]
{42} J. Waxman and J. McArthur. Storage area networking -- Opportunity for the indirect channel. IDC White Paper, 2000.
[43]
{43} Z. Zhang and K. Ghose. yFS: A journaling file system design for handling large data sets with reduced seeking. In Proceedings of the 2nd Conference on File and Storage Technologies (FAST 03), San Francisco, CA, March 2003.

Cited By

View all
  • (2024)An Improved Consistent Hashing-Based Data Indexing Method for Distributed Photovoltaic Stations on Highways2024 IEEE 22nd International Conference on Industrial Informatics (INDIN)10.1109/INDIN58382.2024.10774463(1-7)Online publication date: 18-Aug-2024
  • (2024)A Scalable, Fault Resilient and Balanced Storage Architecture for Cyber-Physical Systems2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10665062(1-6)Online publication date: 5-Aug-2024
  • (2022)Whether Cache Advertisement is Necessary for Content Providers in CDNs? Model and Analysis2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00096(708-714)Online publication date: Dec-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing
November 2003
859 pages
ISBN:1581136951
DOI:10.1145/1048935
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SC '03
Sponsor:

Acceptance Rates

SC '03 Paper Acceptance Rate 60 of 207 submissions, 29%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Improved Consistent Hashing-Based Data Indexing Method for Distributed Photovoltaic Stations on Highways2024 IEEE 22nd International Conference on Industrial Informatics (INDIN)10.1109/INDIN58382.2024.10774463(1-7)Online publication date: 18-Aug-2024
  • (2024)A Scalable, Fault Resilient and Balanced Storage Architecture for Cyber-Physical Systems2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10665062(1-6)Online publication date: 5-Aug-2024
  • (2022)Whether Cache Advertisement is Necessary for Content Providers in CDNs? Model and Analysis2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00096(708-714)Online publication date: Dec-2022
  • (2014)A Lightweight Data Location Service for Nondeterministic Exascale Storage SystemsACM Transactions on Storage10.1145/262945110:3(1-22)Online publication date: 7-Aug-2014
  • (2013)ALDM: Adaptive Loading Data Migration in Distributed File SystemsIEEE Transactions on Magnetics10.1109/TMAG.2013.225161649:6(2645-2652)Online publication date: Jun-2013
  • (2013)Write bandwidth optimization of online Erasure Code based cluster file system2013 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2013.6702661(1-8)Online publication date: Sep-2013
  • (2009)Distributed Metadata Management Based on Hierarchical Bloom Filters in Data GridProceedings of the 2009 Fourth ChinaGrid Annual Conference10.1109/ChinaGrid.2009.15(95-101)Online publication date: 21-Aug-2009
  • (2008)HBAIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.7078819:6(750-763)Online publication date: 1-Jun-2008
  • (2008)RACEIEEE Transactions on Computers10.1109/TC.2007.7078857:1(25-40)Online publication date: 1-Jan-2008
  • (2008)A Novel Network Storage Scheme: Intelligent Network Disk Storage Cluster2008 IEEE International Conference on Networking, Sensing and Control10.1109/ICNSC.2008.4525199(142-147)Online publication date: Apr-2008
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media