Copyright © 1995 Published by Elsevier Science Inc.
Snooping fault-tolerant distributed shared memories
Available online 4 February 2000.
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
Distributed-memory systems that have no physically shared memory can be programmed by use of a shared-memory model by simulating a virtual shared address space. These distributed shared memory (DSM) systems can be easier to program than systems that use a message-passing model. However, as the number of processors implementing the DSM grows, the probability of failure grows. This article presents a set of fault-tolerant DSM (FTDSM) algorithms. The algorithms are derived from a system consisting of a set of processors connected by a broadcast network with one special processor, the snooper, that monitors network traffic. The snooper keeps a backup of the current state of the shared memory and can respond for failed processors. The addition of the snooper improves the reliability of the DSM at a lower cost than making each of the other processors fault tolerant. It is also shown how different degrees of application level fault tolerance can be incorporated into the FTDSM protocol to allow the recovery of application processes. The snooping FTDSM is extended to include multiple snoopers; then an integrated snooper FTDSM is developed that is not restricted to a broadcast network.







E-mail Article
Add to my Quick Links

Cited By in Scopus (1)





