Abstract

A sequenced process of fault detection followed by dissemination of decision made at each node characterizes the sustained operations of a fault-tolerant wireless image sensor network (WISN). This paper presents a distributed self-fault diagnosis model for WISN where fault diagnosis is achieved by disseminating decision made at each node. Architecture of fault-tolerant wireless image sensor nodes is presented. Simulation results show that sensor nodes with hard and soft faults are identified with high accuracy for a wide range of fault rate. Both time and message complexity of the proposed algorithm are 𝑂(𝑛) for an 𝑛-node WISN.

1. Introduction

WISN is emerging as a promising solution for a variety of remote sensing applications like battlefield surveillance, environmental monitoring, intruder detection systems, intelligent infrastructure monitoring, and scientific data collection [1]. Irrespective of their purpose, all sensor networks are characterized by the requirement for energy efficiency, scalability, and fault tolerance. These requirements are particularly crucial in image sensor networks. There are certain issues which need to be addressed for the sustained operations of WISN: (1) WISN consisting of image sensor nodes may be deployed in unattended and possibly hostile environments which increases probability of node failure and (2) unlike conventional sensor nodes, image sensor nodes generate bulk amount of data which is routed to the sink node. Erroneous data generated by faulty sensor nodes must be protected from entering the network for effective bandwidth and energy utilization. These issues motivate to explore distributed self-fault diagnosis processes for WISN.

In this work, a distributed diagnosis algorithm is proposed which detects both hard and soft faults in the network. Each sensor node makes a decision based on comparison between its own reading and readings of its 1-hop neighbors. The sensor node is detected as fault-free if the sensor reading agrees with readings of more than π‘‡β„Ž neighbors where π‘‡β„Ž is a threshold. A timeout mechanism is used to detect hard faults where an unreported node is detected as hard faulty. All local diagnostic information is finally disseminated in the network in order to ensure that each mobile will have a global view of the network fault status, that is, each fault-free mobile correctly diagnoses the state of all the mobiles in the system. A spanning tree (ST) which spans all fault-free sensor nodes disseminates local diagnostics.

The proposed image sensor node architecture (refer to Figure 1) is simple and can be implemented with limited additional hardware complexity by extending the architecture proposed in [2, 3]. Each block is subject to failure, which in turn results in system failure. A node is detected as soft faulty when CMOS camera or the image processing module or embedded processor is faulty. A node is detected as hard faulty due to either of following reasons: (i) communication subsystem is faulty, (ii) battery is drained, and (iii) node is completely damaged.

The process of local detection and global diagnosis from a given fault instance is a multifaceted problem. The main contributions of this paper are as follows.(1)It proposes an architecture for image sensor nodes for fault-tolerant WISN.(2)Sensor nodes with hard and soft faults are identified with high accuracy for a wide range of fault rate by maintaining low time, message complexity.

The remainder of the paper is organized as follows. Section 2 presents related works. Section 3 presents the system model. Distributed diagnosis scheme is investigated in Section 4. The performance of the proposed work is evaluated in Section 5, and finally conclusion and future work are given in Section 6.

System-level fault diagnosis was introduced by Preparata, Preparata et al. in 1967 [4], as a technique intended to diagnose faults in a wired interconnected system. Comparison-based diagnosis is an effective approach to system-level fault diagnosis. The first comparison-based model proposed by Malek [5] (asymmetric comparison model), Chwa and Hakimi [6] (symmetric comparison model) assume the existence of a central arbiter which gathers information about comparison. This comparison syndrome is then used to diagnose the system. Previously developed distributed diagnosis algorithms were designed for wired networks [4–10] and hence not well suited for wireless networks.

The problem of fault detection and diagnosis in wireless sensor networks is extensively studied in literatures [11–17]. The problem of identifying faulty nodes (crashed) in WSN has been studied in [11]. This paper proposes the WINdiag diagnosis protocol which creates an ST for dissemination of diagnostic information. Authors in [12] have proposed a fault-tolerant detection scheme that explicitly introduces the sensor fault probability into the optimal event detection process where the optimal detection error decreases exponentially with the increase of the neighborhood size. Elhadef et al. have proposed a distributed fault identification protocol called Dynamic-DSDP for MANETs which uses an ST and a gossip-style dissemination strategy [13]. In [14], a localized fault diagnosis model for WSN is proposed that executes in tree-like networks. The approach proposed is based on local comparisons of sensed data and dissemination of the test results to the remaining sensors.

In [15], the authors have presented a distributed fault detection model for wireless sensor networks where each sensor node identifies its own state based on local comparisons of sensed data against some thresholds and dissemination of the test results. Krishnamachari and Iyengar have presented a Bayesian fault recognition model to solve the fault-event disambiguation problem in sensor networks [16]. A distributed fault detection scheme for sensor networks has been proposed in [17]. It uses local comparisons with a modified majority voting where each sensor node makes a decision based on comparisons between its own sensing data and neighbor’s data, while considering the confidence level of its neighbors.

Most of the existing literature addresses the fault detection and diagnosis problem in WSN by considering sensor nodes as temperature, humidity, or pressure sensors. In the author’s knowledge, there has been little work on the design of a fault diagnosis model for WISN. Although there is considerable amount of research on fault detection and diagnosis in WSNs, the current approaches may not be suitable for WISNs due to associated processing and communication cost. Czarlinska and Kundur [18] have investigated the event acquisition properties of WISNs. These techniques include lightweight image processing, decisions from 𝑛 sensors with or without cluster head fault, and attack detection. In [19], the authors have investigated the problem of image transport over error-prone wireless sensor networks, where a two-state Markov model of node transitions between an on and off state is considered. In their proposed work, authors have not investigated any node failure detection scheme. In [20], an improved distributed fault detection scheme is proposed which shows a better performance from detection accuracy perspective but needs more message exchange and thus not energy efficient. In [21], authors have proposed FIND, a method to detect nodes with data faults. In their work, nodes are ranked based on their sensing readings as well as their physical distances from the event. A node is considered faulty if there is a significant mismatch between the sensor data rank and the distance rank.

The authors believe that it is necessary to discuss why image sensor node fault detection model is indispensable. First, image data requires transmission bandwidth, that is, orders of magnitude higher than that supported by currently available sensors. Second, image compression models require complex hardware and make the energy consumption for computation comparable to communication energy dissipation. If a faulty image sensor node is allowed to participate in the network activity, then data generated by it will be routed to the sink node. All the intermediate nodes will dissipate energy in relaying this faulty information. For a high rate of node failure, this leads to severe decrease in network lifetime and wastage of network bandwidth.

3. System Model

The proposed model considers a densely deployed wireless sensor network which includes camera-equipped nodes. It has been assumed that there are 𝑛 sensor nodes nonuniformly distributed in a square area of side 𝐿, which is much larger than the communication range of the sensors. Every camera-equipped node is a full-function device (FFD). A node responds to an image query by generating a raw image within its sensing area, compressing the raw image and then applying forward error correcting (FEC) code before transmitting this image which is a general process of image transport in WISN.

The proposed model considers both hard and soft fault [22]. In hard-fault situation, the sensor node is unable to communicate with the rest of the network, whereas a node with soft fault continues to operate and communicate with altered behavior. These malfunctioning (soft faulty) sensors could participate in the network activities since still they are capable of routing information. The proposed model assumes that the sensor fault probability 𝑝 is uncorrelated and symmetric, that is, 𝑃(𝑆=π‘₯∣𝐴=Β¬π‘₯)=𝑃(𝑆=Β¬π‘₯∣𝐴=π‘₯)=𝑝,(1) where 𝑆 is the sensed image data by the sensor node, and 𝐴 is the actual image data.

3.1. Architecture of Proposed Wireless Image Sensor Nodes

In this section, the architecture of the proposed image sensor nodes is described in details (Figure 1).

CMOS image sensors have received greater attention over the last few decades because their performance is very promising compared to CCDs [2, 3]. However, remote and dangerous environments put more stress on the image sensing system (from radiation, heat, or pressure), possibly leading to pixel failure while making the replacement of faulty systems difficult. A fault-tolerant architecture [23] for CMOS camera can be adapted that effectively combines hardware redundancy in the active pixel sensor (APS) cells and software correction techniques. But this fault-tolerant architecture can tolerate up to certain pixel failure rate (PFrate), beyond which the quality reduction (QR) of a corrected image may not be tolerable, and the CMOS camera may be detected as faulty.

Uncompressed raw image data require excessive bandwidth for a multihop wireless environment. Conventional image compression models [24] are not suitable for resource-constrained wireless sensor networks because they require complex hardware and make the energy consumption for computation comparable to communication energy dissipation. The proposed architecture uses compression technique as suggested in [25].

Forward error correction coding is required to achieve reliable transmission. The proposed architecture uses Reed-Solomon (RS) codes to identify and correct errors in transmission. Coding redundancy determines the error correction capability of an RS code. A self-checking RS encoder [26] is used by the proposed architecture. As suggested in [3], wireless connection to other motes in the network can be established through a Texas Instruments CC2420 2.4 GHz IEEE 802.15.4/ZigBee-ready RF transceiver. Each device in ZigBee contains information about those devices located within its transmission range. This information is held in a table called the neighbor table 𝑁(𝑖). As suggested by the authors in [2], SAMSUNGs S3C44B0X is adopted as the embedded processor of image sensor node.

4. Distributed Fault Diagnosis Scheme

This section describes the novel model for energy-efficient diagnosis of WISNs. The proposed diagnosis scheme has two main phases: (i) detection phase and (ii) dissemination phase.

4.1. The Detection Phase

In this phase, the node enters to normal mode (S3C44B0X mainly consists of four modes: normal mode, slow mode, idle mode, and stop mode). The normal mode is used to supply clocks to CPU as well as all peripherals in S3C44B0X. CPU wakes up image sensor and image processing module from power down mode. Image sensor starts to capture image. In spite of the fault-tolerant architecture described in Section 3.1, an image produced by the image sensor may not be acceptable if the pixel failure rate is high. Thus, the CPU calculates the quality reduction (QR) in the corrected image using methods suggested in [23] and then makes a decision about whether or not to discard the image reading by comparing (QR) with a threshold (𝐼th). The embedded processor set 𝐹state is soft faulty if (QR)β‰₯𝐼th. The RS-encoder fault status of the proposed architecture can be mapped as follows:RSstatus=ξ‚»0ifPCout=00or11,1otherwise,(2) where PCout is the parity checker [26] output. Using (2), the embedded processor set 𝐹state is soft faulty or fault-free.

The image processing module fetches the 8Γ—8 test image stored in shared memory. The test image is processed in the processing module, and the generated coded bit stream is sent to the embedded processor. Then, the processed image is packed into the diagnosis packet format required by network protocol. CPU configures CC2420 into transmission mode. Packets are broadcasted by CC2420, and the node returns to the receive state. For each fault-free sensor node, its neighboring fault-free sensor nodes have broadcasted similar coded information. Let 𝑣𝑖 be neighbor of 𝑣𝑗 and 𝐢𝑖 contains the coded information at node 𝑣𝑖. The node 𝑣𝑖 agrees with 𝑣𝑗 only when the hamming distance is between 𝐢𝑖 and 𝐢𝑗; 𝐻𝑖𝑗≀𝛿 where 𝐻𝑖,𝑗= number of ones in (𝐢𝑖𝑋𝑂𝑅𝐢𝑗) and 𝛿 is the maximum number of bits a Reed-Solomon decoder can correct. For RS(𝑛,π‘Ÿ) with s-bit symbols, 𝛿=⌊(π‘›βˆ’π‘Ÿ)/2βŒ‹. An arbitrary node 𝑣𝑖 receives the sensor reading from neighboring nodes and forms a set ({𝐸}βŠ‚{𝑁(𝑖)}) of nodes with similar reading 𝑆. Node 𝑣𝑖 then compares its own reading 𝑆𝑖 and takes a decision on the basis of agreement and disagreement. In this phase, each sensor node makes a decision about whether or not to discard its own sensor reading in the face of the evidences |{𝐸}|, QR, and PCout. A formal description of this phase is presented in Algorithm 1. The value for this threshold is π‘‡β„Ž=0.5(π‘βˆ’1) (see the Appendix).

( 1 )  Obtain the sensor reading (image)
( 2 )  Evaluate Q R 𝑖 and R S s t a t u s 𝑖 .
( 3 )  Broadcast the coded test image 𝑆 𝑖 .
( 4 )  Set timer 𝑇 o u t
( 5 )  Obtain the sensor readings of 1-hop neighbors { 𝑁 𝑖 } .
( 6 )   if   𝑇 o u t = true then
( 7 )   Declare unreported nodes 𝑣 𝑗 ∈ { 𝑁 𝑖 } as hard faulty. i.e.,
     𝐹 s t a t e 𝑗 ← hard faulty.
( 8 )   end  if
( 9 )  Determine { 𝐸 } , the set of 1-hop neighbors report identical sensed data 𝑆 .
( 1 0 )   if ( 𝑆 = 𝑆 𝑖 and | { 𝐸 } | ≀ 𝑇 β„Ž ) or Q R 𝑖 β‰₯ 𝐼 t h or R S s t a t u s 𝑖 = 1   then
( 1 1 ) 𝐹 s t a t e 𝑖 ← soft faulty.
( 1 2 )   end  if

The detection algorithm uses timeout mechanism to detect hard faulty nodes. The node 𝑣𝑖 declares node π‘£π‘—βˆˆπ‘π‘– as hard faulty if 𝑣𝑖 does not receive the sensor reading from 𝑣𝑗 before 𝑇out. The node 𝑣𝑗 cannot report to 𝑣𝑖 if either the transceiver of 𝑣𝑗 is faulty or battery is drained or node is completely damaged. At the end of detection phase, every fault-free node in the network has the local diagnostic view.

4.2. Dissemination Phase

The local diagnostic snapshots are disseminated to obtain a global diagnostic view of the network. The local diagnostic views are disseminated using as ST which is constructed immediately after the deployment of the network. This work uses UDG-NNT algorithm [27] to construct an ST where each node is assigned a rank. The sink node has the highest rank in the network. Each node 𝑣𝑖, except sink node, selects the nearest node 𝑣𝑗 among its neighbors such that rank(𝑣𝑖)<rank(𝑣𝑗) and sends a connect message to 𝑣𝑗 to inform that (𝑣𝑖,𝑣𝑗) is an edge in the ST. In order to maintain a connected ST, immediately after detection phase nodes check whether they are still connected to the ST or not. If a node notices that its parent is faulty, then it sends a connect message to nearest fault-free node with higher rank.

All leaves of the ST send their local diagnosis views to their parents. Each parent has to wait until it collects diagnostics from each of its children. Once the parent has collected the diagnostics, it combines all of them with its own local diagnostic and updates its fault table. After updating the aggregated diagnostic message is transmitted to its parent in the ST, and the process continues until the sink node collects all the local diagnostics. Once sink node has the global diagnosis view, it disseminates it down the tree to all nodes. The proposed model now can identify the set of faulty nodes {𝑣𝑖}π‘–βˆˆπΉπ‘‡ present in the network. Here, 𝐹𝑇 is the true set of faulty nodes present in the network at time 𝑇. The set of faulty node inferred by the model is 𝐹𝑇. The difference between 𝐹𝑇 and 𝐹𝑇,thatis,(πΉπ‘‡βˆ’ξπΉπ‘‡), is the diagnosis error.

5. Performance Evaluation

The four performance metrics, namely, diagnosis latency, message complexity, detection accuracy (DA), and false detection rate (FDR) are used to evaluate the performance of the proposed algorithm. DA is defined as the number of faulty sensor nodes detected to the total number of faulty sensor nodes in the network. FDR is defined as the ratio of number of fault-free sensor nodes detected as faulty to total number of fault-free nodes in the network. The upper bound time complexity is expressed in terms of the following bounds:(i)𝑇𝑝: an upper bound on the time needed to propagate a message between sensor nodes;(ii)𝑇dip: an upper bound on the time required to encode (compression and RS encoding) the image.

Lemma 1. The proposed diagnosis model terminates before time 𝑇dip+(2𝑑st+3)𝑇𝑝+𝑇out, where 𝑑st is the depth of the spanning tree.

Proof. The detection phase takes at most 𝑇dip+𝑇out time in detecting its own status and to obtain IDs of hard faulty 1-hop neighbors. In ST maintenance phase, the node with faulty parent needs at most 3𝑇𝑝 time to get connected with ST. In at most 𝑑st𝑇𝑝, the sink node obtains the global diagnostic view of the network. The sink node disseminates this view that reaches the farthest node in at most 𝑑st𝑇𝑝. In worst case, 𝑑st=π‘›βˆ’1. Now, the upper bound time complexity can be expressed as 𝑇cost=𝑇dip+ξ€·2𝑑st𝑇+3𝑝+𝑇out=𝑂(𝑛).(3)

The total number of messages exchanged by nodes to establish a complete and correct diagnosis is termed as message complexity.

Lemma 2. The proposed model has a worst-case message complexity 𝑂(𝑛) in the network.

Proof. The diagnosis starts at each node by sending the coded message to its neighbors, costing one message per node, that is, 𝑛 messages in the network. In ST maintenance phase, the node with faulty parent needs three message exchanges to get connected with ST. In worst case, all nodes except sink node need to find a new parent to maintain ST, that is, 3(π‘›βˆ’1) messages need to be exchanged in the network to maintain ST. Each node, excluding the sink, sends one local diagnostic message. Each node, excluding the leaf node, sends one global diagnostic message, and in worst case, depth of ST is π‘›βˆ’1. Thus, message cost for disseminating diagnostic messages is 2(π‘›βˆ’1). So, the total number of exchanged messages is 𝑀cost=6π‘›βˆ’5=𝑂(𝑛).(4)

5.1. Simulation Results

Performance of the proposed scheme via simulations is presented in this section. This work uses OMNET++ as the simulation tool where all simulations are conducted on networks using the IEEE 802.15.4 at the MAC layer. The free space physical layer model is adopted where all nodes within the transmission range of a transmitting node receive a packet transmitted by the node after a very short propagation delay. The set of simulation parameters are summarized in Table 1.

The RS code is used with π‘š = 8 bits per symbol, 𝑛=255, and π‘Ÿ=223. For RS encoder, the time cost is 1.02 msec to encode bit stream for 8Γ—8 image. The time consumed in compression is 4.08 msec [25] (for 8Γ—8 test image). The threshold value is 𝐼th=30% pixel failure rate. The test image used is the 8Γ—8 block of Lena image. Every result shown is the average of 100 experiments. Each experiment uses a different randomly generated topology.

5.1.1. Experiment 1

In this experiment, the two performance metrics, namely, DA and FDR, of the proposed work are compared with the schemes proposed by [15, 16] for varying node failure rate and average numbers of neighbor nodes (𝑑). In this simulation experiment, sensor nodes are assumed to be faulty with probabilities of 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30. Both hard and soft faulty nodes are randomly deployed in the network. The simulation result for low average number of neighbor nodes π‘‘β‰ˆ4 is shown in Figure 2.

The main reason for not achieving an extremely high performance is that for a low 𝑑 fault-free sensor nodes are unlikely to pass the threshold test. The detection accuracy of the proposed work outperforms that of the scheme proposed by [16]. The work of [15] shows a marginal improvement over our work. The reason is that for π‘‡β„Ž=0.5(π‘βˆ’1) there is a probability that a faulty node with more than 0.5(π‘βˆ’1) faulty neighbors is detected fault-free. The scheme proposed by [15] considers π‘‡β„Žβ‰ˆπ‘‘ and the probability of a node with 𝑑 number of faulty neighbors is very less. Further, their scheme needs more 𝑛 number of message exchange in the network to achieve this marginal improvement. However, the proposed work shows better performance in terms of FDR. If we put these results into context, we will find that since the proposed scheme will be used in WISNs, which are known to be resource constraint, it would be preferable for a proposed scheme to maintain lower FDR and to be communication efficient. In other words, it would be better to achieve high network reliability while maintaining high level (>95%) of detection accuracy, which is what the proposed work tries to achieve.

DA and FDR for 𝑑=8 and 𝑑=12 are plotted in Figures 3 and 4, respectively. The key conclusion from these plots is that the performance of the detection model increases with the increase of 𝑑. For 𝑑=12, DA of the proposed work is very close to the scheme of [15] while maintaining low FDR. Due to the expected high node degree in wireless sensor networks, the proposed fault diagnosis scheme is robust.

5.1.2. Experiment 2

In this experiment, the average and worst-case latency of isolation of unhealthy nodes for varying node failure rate and 𝑑=12 is analyzed. Figures 5 and 6 show the diagnosis latency of the proposed work. From Lemma 1, it is obvious that dissemination of diagnostics contributes more to change in diagnosis latency with respect to node density. The depth of the ST decides the variation in diagnosis latency, as it is used to disseminate diagnostics. Thus, as expected and depicted in Figure 6, the time required to diagnose the WISN remains almost constant with change in fault rate.

6. Conclusions and Future Work

This paper presents a distributed model to address the fundamental problem of identifying faulty (soft and hard) nodes in a WISN. The model is simple and detects faulty sensor nodes with high accuracy for a wide range of fault probabilities, while maintaining low message overhead. The message and time complexity of the proposed model is 𝑂(𝑛) which is significantly low compared to present state-of-the-art approaches. Due to low message and time complexity, the model could be integrated to error resilient image transport protocols in wireless sensor networks. A natural extension of the model is to solve the transient and intermittent fault problem. Currently, work is going on to develop a model to identify transient and intermittent faults with lower message cost and the same or less latency.

Appendix

In this section, we formulate the threshold π‘‡β„Ž.

Theorem 3. The optimum value of π‘‡β„Ž which minimizes the error is 0.5(π‘βˆ’1).

Proof. Proof of this theorem closely follows a similar proof in [16]. The real situation at the sensor node is modeled by two variables 𝑆 and 𝐴 where 𝑆 represents the sensor reading and 𝐴 represents the actual reading. Let 𝐸(π‘₯,𝑙) be the manifest that 𝑙 out of 𝑁 1-hop neighbors of a node 𝑣𝑖 report the similar sensor reading π‘₯. The objective here is to determine the fault detection estimate (DE) after obtaining information about the sensor readings of neighboring nodes. The possible vales of 𝐷𝐸 are fault-free (FF) and faulty (𝐹). The probability that the detection estimate is fault-free, given that π‘˜ of the neighboring sensors report the same reading as node 𝑣𝑖 is defined as π‘ƒπ‘˜=𝑃(DE=FFβˆ£π‘†=π‘₯,𝐸(π‘₯,π‘˜)).(A.1) Let π‘“π‘˜ be the probability that π‘˜ out of 𝑁 neighbors of node 𝑣𝑖 are fault-free. This probability is determined as π‘“π‘˜=βŽ›βŽœβŽœβŽπ‘π‘˜βŽžβŽŸβŽŸβŽ π‘ƒξ€·π‘†π‘–=π‘₯βˆ£π΄π‘–ξ€Έ=π‘₯π‘˜ξ€·π‘†β‹…π‘ƒπ‘–=π‘₯βˆ£π΄π‘–ξ€Έ=Β¬π‘₯π‘βˆ’π‘˜=βŽ›βŽœβŽœβŽπ‘π‘˜βŽžβŽŸβŽŸβŽ (1βˆ’π‘)π‘˜π‘π‘βˆ’π‘˜.(A.2)
The correctness of the proposed algorithm can be analyzed by the conditional probabilities corresponding to combinations of DE,𝑆, and 𝐴. From these combinations, we can calculate the probability that the algorithm estimates the node is faulty though both sensed and actual readings are the same. By using marginal probability, this can be derived as 𝑃(DE=πΉβˆ£π‘†=π‘₯,𝐴=π‘₯)=1βˆ’π‘ƒ(DE=FFβˆ£π‘†=π‘₯,𝐴=π‘₯)=1βˆ’π‘ξ“π‘˜=0𝑃(DE=FF,𝐸(π‘₯,π‘˜)βˆ£π‘†=π‘₯,𝐴=π‘₯)=1βˆ’π‘ξ“π‘˜=0𝑃(DE=FFβˆ£π‘†=π‘₯,𝐴=π‘₯,𝐸(π‘₯,π‘˜))⋅𝑃(𝐸(π‘₯,π‘˜)βˆ£π‘†=π‘₯,𝐴=π‘₯)=1βˆ’π‘ξ“π‘˜=0π‘ƒπ‘˜β‹…π‘“π‘˜.(A.3) In a similar manner, we can calculate the probability that the algorithm estimates the node is fault-free though the sensor reading does not agree with actual reading =𝑃(DE=FFβˆ£π‘†=Β¬π‘₯,𝐴=π‘₯)π‘ξ“π‘˜=0=𝑃(DE=FF,𝐸(π‘₯,π‘βˆ’π‘˜)βˆ£π‘†=Β¬π‘₯,𝐴=π‘₯)π‘ξ“π‘˜=0=𝑃(DE=FFβˆ£π‘†=Β¬π‘₯,𝐴=π‘₯,𝐸(π‘₯,π‘βˆ’π‘˜))⋅𝑃(𝐸(π‘₯,π‘βˆ’π‘˜)βˆ£π‘†=Β¬π‘₯,𝐴=π‘₯)π‘ξ“π‘˜=0𝑃=(DE=FFβˆ£π‘†=Β¬π‘₯,𝐴=π‘₯,𝐸(Β¬π‘₯,π‘˜))⋅𝑃(𝐸(π‘₯,π‘βˆ’π‘˜)βˆ£π‘†=Β¬π‘₯,𝐴=π‘₯)π‘ξ“π‘˜=0π‘ƒπ‘˜β‹…π‘“π‘βˆ’π‘˜.(A.4)
Since, each block in the proposed architecture is assumed to fail or function independently of what happens to other blocks, it follows that the node failure probability 𝑝 is the same as individual block failure probability 𝑝𝑓. The probability of at least one block is faulty when source encoder detected as fault-free is 𝑝𝑠𝑔=ξ€·2𝑝2βˆ’π‘3ξ€Έ.(A.5) The probability of at least one block is faulty when source encoder detected as faulty is 𝑝𝑠𝑓=𝑝3βˆ’3𝑝2ξ€Έ.+2𝑝(A.6)
Equations (A.4) and (A.5) suffice to calculate the probability that the detection algorithm declares a fault-free node as faulty. This probability is given by 𝑃𝑔𝑓=𝑃(DE=𝐹,𝑆=π‘₯∣𝐴=π‘₯)⋅𝑝𝑠𝑔=𝑃(DE=πΉβˆ£π‘†=π‘₯,𝐴=π‘₯)⋅𝑃(𝑠=π‘₯∣𝐴=π‘₯)⋅𝑝𝑠𝑔=1βˆ’π‘ξ“π‘˜=0π‘ƒπ‘˜β‹…π‘“π‘˜ξƒͺβ‹…(1βˆ’π‘)⋅𝑝𝑠𝑔.(A.7)
Similarly, (A.4) and (A.6) suffice to calculate the probability that the detection algorithm declares a faulty node as fault-free can be derived as 𝑃𝑓𝑔=𝑃(DE=FF,𝑆=Β¬π‘₯∣𝐴=π‘₯)⋅𝑝𝑠𝑓𝑃=𝑃(DE=FFβˆ£π‘†=Β¬π‘₯,𝐴=π‘₯)β‹…(𝑠=Β¬π‘₯∣𝐴=π‘₯)⋅𝑝𝑠𝑓=ξƒ©π‘ξ“π‘˜=0π‘ƒπ‘˜β‹…π‘“π‘βˆ’π‘˜ξƒͺ⋅𝑝⋅𝑝𝑠𝑓.(A.8) In the proposed algorithm, the detection estimation is fault-free only when π‘˜>π‘˜min where π‘˜min is the threshold value of Algorithm 1. Thus, (A.1) can be rewritten as π‘ƒπ‘˜=ξ‚»1ifπ‘˜>π‘˜min,0otherwise.(A.9)
Thus, the error probability of the proposed algorithm in detecting the status of a node is given by 𝑃𝑒=𝑃𝑔𝑓+𝑃𝑓𝑔=(1βˆ’π‘)β‹…2𝑝2βˆ’π‘3ξ€Έβˆ’π‘ξ“π‘˜=π‘˜minξ€·(1βˆ’π‘)2𝑝2βˆ’π‘3ξ€Έπ‘“π‘˜ξ€·π‘+𝑝3βˆ’3𝑝2𝑓+2π‘π‘βˆ’π‘˜.(A.10)
Substituting π‘“π‘˜ in (A.10), the expression of summand of (A.10) can be written as βŽ›βŽœβŽœβŽπ‘π‘˜βŽžβŽŸβŽŸβŽ ξ€·(1βˆ’π‘)π‘˜+1π‘π‘βˆ’π‘˜ξ€·2𝑝2βˆ’π‘3βˆ’ξ€·π‘ξ€Έξ€Έπ‘˜+1(1βˆ’π‘)π‘βˆ’π‘˜ξ€·π‘3βˆ’3𝑝2=βŽ›βŽœβŽœβŽπ‘π‘˜βŽžβŽŸβŽŸβŽ ξ€·+2𝑝(1βˆ’π‘)π‘˜+1π‘π‘βˆ’π‘˜Γ—ξ€·π‘π‘βˆ’2π‘˜βˆ’1ξ€·2𝑝2βˆ’π‘3ξ€Έβˆ’(1βˆ’π‘)π‘βˆ’2π‘˜βˆ’1𝑝3βˆ’3𝑝2.+2𝑝(A.11)
For 𝑝<0.5, (A.11) is negative for 𝑁>2π‘˜+1, zero for 𝑁=2π‘˜+1, and positive for 𝑁<2π‘˜+1. Additional terms with negative contributions are produced by decreasing π‘˜min one at a time from 𝑁, while π‘˜min>0.5(π‘βˆ’1) and positive contributions once π‘˜min<0.5(π‘βˆ’1). It follows that 𝑝𝑒 achieves a minimum when π‘˜min=0.5(π‘βˆ’1).