[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Recoverable Distributed Shared Virtual Memory

Published: 01 April 1990 Publication History

Abstract

The problem of rollback recovery in distributed shared virtual environments, in which the shared memory is implemented in software in a loosely coupled distributed multicomputer system, is examined. A user-transparent checkpointing recovery scheme and a new twin-page disk storage management technique are presented for implementing recoverable distributed shared virtual memory. The checkpointing scheme can be integrated with the memory coherence protocol for managing the shared virtual memory. The twin-page disk design allows checkpointing to proceed in an incremental fashion without an explicit undo at the time of recovery. The recoverable distributed shared virtual memory allows the system to restart computation from a checkpoint without a global restart.

References

[1]
{1} K. Li and P. Hudak, "Memory coherence in shared virtual memory systems," in Proc. 5th ACM Symp. Principles Distributed Comput. , 1986, pp. 229-239.
[2]
{2} K. Li, "IVY: A shared virtual memory system for parallel computing," in Proc. 1988 Int. Conf. Parallel Processing, 1988, pp. 94-101.
[3]
{3} K. Li, "Shared virtual memory on loosely coupled multiprocessors," Ph.D. dissertation, Tech. Rep. YALEU/DCS/RR-492, Dep. Comput. Sci., Yale Univ., Sept. 1986.
[4]
{4} R. Bisiani, A. Nowatzyk, and M. Ravishankar, "Coherent shared memory on a distributed memory machine," in Proc. 1989 Int. Conf. Parallel Processing, Vol. I Architecture, 1989, pp. I-133-I-141.
[5]
{5} U. Ramachandran, M. Ahamad, and M. Y. A. Khalidi, "Coherence of distributed shared memory: Unifying synchronization and data transfer," in Proc. 1989 Int. Conf. Parallel Processing, Vol. II Software, 1989, pp. II-160-II-169.
[6]
{6} C. P. Thacker, L. C. Stewart, and E. H. Satterthwaite, Jr., "Firefly: A multiprocessor workstation," IEEE Trans. Comput., vol. 37, pp. 909-920, Aug. 1988.
[7]
{7} Balance 8000 Technical Summary, Sequent Computer Systems, Inc., Nov. 1984.
[8]
{8} G. F. Pfister, W. C. Brantley, et al., "The IBM research parallel processor prototype (RP3): Introduction and architecture," in Proc. 1985 Int. Conf. Parallel Processing, 1985, pp. 764-770.
[9]
{9} D. Gajski, D. Kuck, D. Lawrie, and A. Sameh, "Cedar--A large scale multiprocessor," in Proc. 1983 Int. Conf. Parallel Processing, 1983, pp. 524-529.
[10]
{10} K. H. Kim, "Programmer-transparent coordination of recovering concurrent processes: Philosophy and rules for efficient implementation," IEEE Trans. Software Eng., vol. 14, pp. 810-821, June 1988.
[11]
{11} Y.-H. Lee and K. G. Shin, "Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks," IEEE Trans. Comput. , vol. C-33, pp. 113-124, Feb. 1984.
[12]
{12} J. Kent and H. Garcia-Molina, "Optimizing shadow recovery algorithms," IEEE Trans. Software Eng., vol. 14, pp. 155-168, Feb. 1988.
[13]
{13} R. A. Lorie, "Physical integrity in a large segmented database," ACM Trans. Database Syst., vol. 2, pp. 91-104, Mar. 1977.
[14]
{14} A. Reuter, "A fast transaction-oriented logging scheme for UNDO recovery," IEEE Trans. Software Eng., vol. SE-6, pp. 348-356, July 1980.
[15]
{15} S. M. Thatte, "Persistent memory: A storage architecture for object-oriented database systems," in Proc. 1986 Int. Workshop Object-Oriented Database Syst., 1986, pp. 148-159.
[16]
{16} R. D. Schlichting and F. B. Schneider, "Fail-stop processors: An approach to designing fault-tolerant computing systems," ACM Trans. Comput. Syst., vol. 1, pp. 222-238, Aug. 1983.
[17]
{17} A. Chang and M. F. Mergen, "801 Storage: Architecture and programming," ACM Trans. Comput. Syst., vol. 6, pp. 28-50, Feb. 1988.
[18]
{18} A. Agarwal and A. Gupta, "Memory-reference characteristics of multiprocessor applications under MACH," in Proc. 1988 ACM SIGMETRICS Conf. Measurement Modeling Comput. Syst., 1988, pp. 215-225.
[19]
{19} S. J. Eggers and R. H. Katz, "A characterization of sharing in parallel programs and its application to coherency protocol evaluation," in Proc. 15th Annu. Int. Symp. Comput. Architecture, 1988, pp. 373-382.
[20]
{20} F. Darema-Rogers, G. F. Pfister, and K. So, "Memory access patterns of parallel scientific programs," in Proc. 1987 ACM SIGMETRICS Conf. Measurement Modeling Comput. Syst., 1987, pp. 46-58.

Cited By

View all
  • (2017)Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital SystemsACM Computing Surveys10.1145/309269950:4(1-38)Online publication date: 4-Oct-2017
  • (2004)Quantifying rollback propagation in distributed checkpointingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2004.01.00364:3(370-384)Online publication date: 1-Mar-2004
  • (2002)A survey of rollback-recovery protocols in message-passing systemsACM Computing Surveys10.1145/568522.56852534:3(375-408)Online publication date: 1-Sep-2002
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 39, Issue 4
April 1990
188 pages
ISSN:0018-9340
Issue’s Table of Contents

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 1990

Author Tags

  1. distributed processing
  2. distributed shared virtual environments
  3. loosely coupled distributed multicomputer system
  4. memory coherence protocol
  5. rollback recovery
  6. storage management
  7. twin-page disk storage management technique
  8. user-transparent checkpointing recovery scheme
  9. virtual memory
  10. virtual storage.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital SystemsACM Computing Surveys10.1145/309269950:4(1-38)Online publication date: 4-Oct-2017
  • (2004)Quantifying rollback propagation in distributed checkpointingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2004.01.00364:3(370-384)Online publication date: 1-Mar-2004
  • (2002)A survey of rollback-recovery protocols in message-passing systemsACM Computing Surveys10.1145/568522.56852534:3(375-408)Online publication date: 1-Sep-2002
  • (2002)An efficient causal logging scheme for recoverable distributed shared memory systemsParallel Computing10.1016/S0167-8191(02)00165-528:11(1549-1572)Online publication date: 1-Nov-2002
  • (2000)An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM ArchitecturesIEEE Transactions on Computers10.1109/12.85953749:5(414-430)Online publication date: 1-May-2000
  • (2000)A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory SystemsThe Journal of Supercomputing10.1023/A:100811651140215:3(295-320)Online publication date: 1-Mar-2000
  • (1999)Logging and Recovery in Adaptive Software Distributed Shared Memory SystemsProceedings of the 18th IEEE Symposium on Reliable Distributed Systems10.5555/829524.831044Online publication date: 18-Oct-1999
  • (1997)A Survey of Recoverable Distributed Shared Virtual Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/71.6154418:9(959-969)Online publication date: 1-Sep-1997
  • (1997)Consistent Global Checkpoints that Contain a Given Set of Local CheckpointsIEEE Transactions on Computers10.1109/12.58805946:4(456-468)Online publication date: 1-Apr-1997
  • (1996)Transparent fault tolerance for parallel applications on networks of workstationsProceedings of the 1996 annual conference on USENIX Annual Technical Conference10.5555/1268299.1268326(27-27)Online publication date: 22-Jan-1996
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media