This dissertation presents and evaluates several new algorithms that improve the performance of distributed file systems by migrating or copying files as necessary between system nodes. The results are based on analysis of file reference patterns from three commercial installations, modeling, and trace driven simulation.
The first part of the dissertation is an exploratory analysis of how shared user files are referenced. We find that although few files are shared, they are opened frequently, and account for a large fraction of the I/O traffic to user files. The reference pattern to shared files is not easily characterized, and varies widely among files. A batch Poisson process with geometric batch size is determined to be the most appropriate model.
Based on the exploratory analysis, we developed several algorithms for file migration and replication. The algorithms evaluated include those based on our file reference pattern analysis, as well as simple strategies such as static placement and movement on reference, and optimal look-ahead migration and placement. We found that only a few files should be migrated or replicated, but replication or migration can substantially reduce the network traffic (up to 63% for replication and 36% for migration, relative to static placement).
A policy based on a batch Poisson process with geometric batch size has the best performance when replication is not allowed. It uses as decision variables the fraction of a file accessed per open, the number of references from a user, and the number of changes in locality.
By replicating files, the network traffic can be reduced further compared to migration alone (up to 42%). Whether the additional copies should be invalidated or updated when the file is updated depends on the installation and the rules for placing users at nodes. The algorithms with the best performance use the average reference rate, the number of consecutive opens in update mode, and the time since the node started using the file as the decision variables. By comparing our realizable algorithms with optimal unrealizable algorithms, we show that it is unlikely that other migration or replication algorithms can achieve a substantially better performance.
Cited By
- Hurley R and Yeap S (1996). File Migration and File Replication, IEEE Transactions on Parallel and Distributed Systems, 7:6, (578-586), Online publication date: 1-Jun-1996.
- Ramakrishnan K, Biswas P and Karedla R Analysis of file I/O traces in commercial computing environments Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, (78-90)
- Sandhu H and Zhou S Cluster-based file replication in large-scale distributed systems Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, (91-102)
- Ramakrishnan K, Biswas P and Karedla R (2019). Analysis of file I/O traces in commercial computing environments, ACM SIGMETRICS Performance Evaluation Review, 20:1, (78-90), Online publication date: 1-Jun-1992.
- Sandhu H and Zhou S (2019). Cluster-based file replication in large-scale distributed systems, ACM SIGMETRICS Performance Evaluation Review, 20:1, (91-102), Online publication date: 1-Jun-1992.
Recommendations
File Migration and File Replication: A Symbiotic Relationship
Much of the past research on file migration and file replication has examined these two resource management strategies in isolation or in an environment where they do not work together. We establish through simulation that these two strategies can be ...
A Distributed Algorithm for Performance Improvement Through File Replication, File Migration, and Process Migration
The author presents a distributed algorithm that considers the number of read and write accesses to files for every process type, the number of processes and their demands on system resources, the utilization of bottlenecks on all machines, and file ...