Elsevier

Information Sciences

Volume 170, Issues 2–4, 25 February 2005, Pages 397-407
Information Sciences

Reaching fault diagnosis agreement on an unreliable general network

https://doi.org/10.1016/j.ins.2004.03.011Get rights and content

Abstract

Recently, Siu et al. failed in their attempt to use the FDAMIX protocol to eliminate the fault diagnosis agreement (FDA) problem with mixed faults on the processors in a general network. Therefore, in this study, a new protocol, the FDAL protocol, is introduced to solve the FDA problem with mixed faults on the links. The FDAL is capable of detecting/locating faulty links to reconfigure the unreliable general network into a reliable network, and is able to increase the system performance and strengthen network integrity.

Introduction

Distributed system reliability has become more important as a result of the fast growth of the Internet. When designing a fault-tolerance distributed system, Byzantine agreement (BA) [2], [4], [5], [6], [7], [10], [11], [12], [13], [14] regulation is one of the major considerations. Under many circumstances, a fault-free processor in a distributed system is able to reach an agreement before performing unique tasks [1], [2], [3]. For example, a well-known form of this problem, the transaction commit problem [3], involves having an agreement from all data processors participating in processing a particular transaction to record the transaction results in the database or to discard them.

A similar issue, the fault diagnosis agreement (FDA) problem [5], [12], aims to insure that every fault-free processor is able to detect/locate all faulty components within the network. That is, once the FDA is achieved, each fault-free processor is able to identify all faulty components in the network and ignore the influence from faulty components. The performance and integrity of a distributed system are thus guaranteed.

In one [11] of many previous studies, the BA problem was proposed and discussed for a general network [5], [7], [10], [11] using a mixed fault model [5], [7], [10], [11] involving arbitrary faults and dormant faults existing on the processors and links. In the FDA problem with mixed faults on the processors, the FDAMIX [5] protocol is applied to solve the problem. However, it has failed to solve the FDA problem in a general network with mixed faults on the links.

The reliability of the network connection state is another important consideration in terms of distributed system design. When the network connection state is stable, messages can be transmitted correctly and on time without fault influences. Conversely, if the connection state is unreliable, message transmitting is likely influenced by the faulty links. Moreover, the message may be transformed, and unable to arrive on time.

The faults on the links can be classified into three categories according to their symptoms: crash faults, stuck-at faults and arbitrary faults (also called Byzantine faults) [14]. A crash fault appears when a link is broken. A stuck-at fault exists when a message received from a certain link always maintains a constant value. The last and worst problem, an arbitrary fault, is likely to cause a link to exhibit unrestricted and arbitrary behavior, which causes more trouble on the link. Unlike faulty links, the fault-free link is able to transmit messages on time and correctly. Moreover, it is able to easily detect crash and omission faults when the protocol has appropriately encoded the transmitted message using either the Non-Return-to-Zero code or the Manchester code [9], [14] before transmission. Thus, these faults are called dormant faults. However, with arbitrary faults, the problems become arbitrary and unpredictable.

To solve the FDA problem with mixed faults on the links, two new protocols, the fault diagnosis agreement on link (FDAL) and the virtual relay fault-tolerance channel (VRFC), are introduced in this study, based on the following constraints:

  • Agreement:

    All fault-free processors must identify the common set of faulty links in the consensus reaching process.

  • Fairness:

    No faulty link is falsely detected as fault-free by any fault-free processor, and no fault-free link is falsely detected as faulty by any fault-free processor.


In the rest of this paper, the new protocols, FDAL and VRFC, will be introduced in detail in Section 2. An example executing these two new protocols is given in Section 3. Section 4 gives an analysis of the correctness of these protocols. The conclusion and discussion for our future work are presented in Section 5.

Section snippets

The proposed protocols

In this section, the proposed protocols, FDAL and VRFC, will be introduced in more detail to solve the mixed faults on links in a general network with FDA compliance. The parameters used and the assumptions made are as follows:

  • Processors used in the underlying network are assumed to be fault-free. (This can be achieved using the FDAMIX protocol [5].)

  • Let þ be the set of all processors in the general network, and |þ|=n.

  • Each processor in the network can be uniquely identified.

  • Each processor in the

An example of executing VRFC and FDAL

The following is an example of executing the FDAL and VRFC. Fig. 4(a) demonstrates a general network with fault-free processors using FDAMIX. There are seven processors and four connections in the network. The arbitrary faulty link appears between processors P1 and P2; and the dormant faulty link exists between processors P2 and P5.

The initial value of each fault-free processor is the agreed-upon value (the agreed-upon value is assumed as 1) from GPBA as illustrated in Fig. 4(b).

In the messages

The correctness of the proposed protocols

The following corollaries and properties are used to discuss the correctness of the FDAL which can detect/locate arbitrary faulty links, La, and dormant faulty links, Ld, in the general network, where c>2La+Ld:

Property 1

Any fault-free processor Pi can detect dormant faulty links that are connected to the fault-free processor Pi, where 1⩽in.

Proof

The fault-free processor can detect dormant faults, if the protocol appropriately encodes a transmitted message using either the Non-Return-to-Zero code or the

Conclusion

Due to the recent popularity of distributed systems, the reliability of distributed systems has become increasingly important in recent researches. At the same time, fault diagnosis has also become an important topic. In many existing researches, for example, Siu et al. [11] proposed the GPBA protocol to solve mixed faults on both processors and links in a general network with compliance to regulations in the BA. Another protocol proposed by the same authors, the FDAMIX, was later introduced to

References (14)

  • M. Fischer et al.

    A lower bound for the assure interactive consistency

    Information Processing Letters

    (1982)
  • K.Q. Yan et al.

    Consensus under unreliable transmission

    Information Processing Letters

    (1999)
  • A. Bar-Noy et al., Shifting gears: changing algorithms on the fly to expedite Byzantine agreement, in: Proc. Symposium...
  • M. Barborak et al.

    The consensus problem in fault-tolerant computing

    ACM Computing Surveys

    (1993)
  • D. Skeen et al.

    A formal model of crash recovery in a distributed system

    IEEE Transactions on Software Engineering

    (1983)
  • H.S. Hsiao et al.

    Reaching fault diagnosis agreement under a hybrid fault model

    IEEE Transactions on Computers

    (2000)
  • L. Lamport et al.

    The Byzantine generals problem

    ACM Transactions on Programming Language Systems

    (1982)
There are more references available in the full text version of this article.

Cited by (16)

  • Power system fault diagnosis based on history driven differential evolution and stochastic time domain simulation

    2014, Information Sciences
    Citation Excerpt :

    Accurate estimation of the fault location is vital for repairing and restoring the faulted transmission line in a timely manner; while fault time estimation is helpful for setting appropriate values of protective relays (PRs). Fault diagnosis is an active research area which has received significant attention [2,3,10,14,15,17,20,29,31]. There are also a large number of literatures studying fault diagnosis problems in power systems [1,4,5,7,9,11–13,17,22,23,25–28,32].

  • Output feedback control of asynchronous sequential machines with disturbance inputs

    2014, Information Sciences
    Citation Excerpt :

    In [4], a novel reconfiguration scheme is proposed for fault-tolerant mesh topology with redundant nodes and links. Yan and Wang [33] addresses the problem of fault diagnosis agreement (FDA) for general networks. Rigatos [24] presents a fault diagnosis scheme based on fuzzy automata, while [25] uses fuzzy automata for addressing adaptive fault-tolerant routing algorithms in interconnection networks.

  • The anatomy study of consensus agreement in MANETs

    2010, Computers and Electrical Engineering
  • An early fault diagnosis agreement under hybrid fault model

    2009, Expert Systems with Applications
    Citation Excerpt :

    The EFDA can induce each fault-free processor to obtain the maximum number of a common set of tolerably faulty processors. Furthermore, the EFDA can use a constant number of message exchanges to solve the FDA problem more efficiently and quickly than previous protocols (Buskens & Bianchini, 1993; Halsall, 1995; Hsiao et al., 2000; Preparata et al., 1967; Yan & Wang, 2005b). Shin & Ramanathan (1978) proved that no fault diagnosis protocol could detect all arbitrary faults completely.

  • The incremental agreement

    2008, Information Processing Letters
View all citing articles on Scopus
View full text