Skip to main content
Log in

Fault tolerance in distributed systems using fused state machines

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Replication is a standard technique for fault tolerance in distributed systems modeled as deterministic finite state machines (DFSMs or machines). To correct \(f\) crash or \(\lfloor f/2 \rfloor \) Byzantine faults among \(n\) different machines, replication requires \(nf\) backup machines. We present a solution called fusion that requires just \(f\) backup machines. First, we build a framework for fault tolerance in DFSMs based on the notion of Hamming distances. We introduce the concept of an (\(f\), \(m\))-fusion, which is a set of \(m\) backup machines that can correct \(f\) crash faults or \(\lfloor f/2 \rfloor \) Byzantine faults among a given set of machines. Second, we present an algorithm to generate an (\(f\), \(f\))-fusion for a given set of machines. We ensure that our backups are efficient in terms of the size of their state and event sets. Third, we use locality sensitive hashing for the detection and correction of faults that incurs almost the same overhead as that for replication. We detect Byzantine faults with time complexity \(O(n f)\) on average while we correct crash and Byzantine faults with time complexity \(O(n \rho f)\) with high probability, where \(\rho \) is the average state reduction achieved by fusion. Finally, our evaluation of fusion on the widely used MCNC’91 benchmarks for DFSMs shows that the average state space savings in fusion (over replication) is 38 % (range 0–99 %). To demonstrate the practical use of fusion, we describe its potential application to two areas: sensor networks and the MapReduce framework. In the case of sensor networks a fusion-based solution can lead to significantly fewer sensor-nodes than a replication-based solution. For the MapReduce framework, fusion can reduce the number of map-tasks compared to replication. Hence, fusion results in considerable savings in state space and other resources such as the power needed to run the backups.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. In “Appendix 11”, we present the concept of the event-based decomposition of machines to replace a given machine \(A\) with a set of machines that contain fewer events than \(\Sigma _A\).

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  2. Balasubramanian, B., Garg, V.K.: Fused data structures for handling multiple faults in distributed systems. In: Proceedings of the 2011 31st International Conference on Distributed Computing Systems, ICDCS ’11, pp. 677–688, IEEE Computer Society, Washington, DC, USA (2011)

  3. Balasubramanian, B., Garg, V.K.: Fused fsm design tool (implemented in java 1.6). In: Parallel and Distributed Systems Laboratory, http://maple.ece.utexas.edu (2011)

  4. Balasubramanian, B., Garg, V.K.: Fused state machines for fault tolerance in distributed systems. In: Principles of Distributed Systems—15th International Conference, OPODIS 2011, Toulouse, France, December 13–16, 2011. Proceedings, volume 7109 of Lecture Notes in Computer Science, pp. 266–282. Springer (2011)

  5. Balasubramanian, B., Ogale, V., Garg, V.K.: Fault tolerance in finite state machines using fusion. In: Proceedings of International Conference on Distributed Computing and Networking (ICDCN) 2008, Kolkata, volume 4904 of Lecture Notes in Computer Science, pp. 124–134. Springer (2008)

  6. Balasubramanian, B., Ogale, V., Garg, V.K.: Fusion generator (implemented in java 1.6). In: Parallel and Distributed Systems Laboratory, http://maple.ece.utexas.edu (2007)

  7. Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-Tolerance in the Borealis Distributed Stream Processing System. In: ACM SIGMOD Conference, Baltimore, MD (June 2005)

  8. Berlekamp, E.R.: Algebraic Coding Theory. McGraw-Hill, New York (1968)

    MATH  Google Scholar 

  9. Clement, A., Marchetti, M., Wong, E., Alvisi, L., Dahlin, M.: Making Byzantine fault tolerant systems tolerate Byzantine faults. In: 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI) (April 2009)

  10. Cristian, F.: Probabilistic clock synchronization. Distrib. Comput. 3(3), 146–158 (1989)

    Article  MATH  Google Scholar 

  11. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (January 2008)

  12. Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms: taxonomy and survey. ACM Comput. Surv. 36(4), 372–421 (2004)

    Article  Google Scholar 

  13. Fischer, M.J., Lynch, N., Paterson, M.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2) (April 1985)

  14. Garg, V.K., Ogale, V.: Fusible data structures for fault tolerance. In: ICDCS 2007: Proceedings of the 27th International Conference on Distributed, Computing Systems (June 2007)

  15. Garg, V.K.: Implementing fault-tolerant services using state machines: beyond replication. In: Proceedings of the 24th International Conference on Distributed Computing, DISC’10, pp. 450–464, Springer, Berlin (2010 )

  16. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB ’99: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 518–529, Morgan Kaufmann Publishers Inc., San Francisco (1999)

  17. Hamming, R.: Error-detecting and error-correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)

    Article  MathSciNet  Google Scholar 

  18. Hartmanis, J., Stearns, R.E.: Algebraic Structure Theory of Sequential Machines (Prentice-Hall international series in applied mathematics). Prentice-Hall Inc, Upper Saddle River (1966)

    Google Scholar 

  19. Hopcroft, J.E.: An n log n algorithm for minimizing states in a finite automaton. Technical report, Stanford, CA, USA (1971)

  20. Huffman, DA.: The synthesis of sequential switching circuits. Technical report, Massachusetts, USA (1954)

  21. Kasten, O., Römer, K.: Beyond event handlers: programming wireless sensors with attributed state machines. In: Proceedings of the 4th International Symposium on Information Processing in Sensor Networks, IPSN ’05, IEEE Press, Piscataway (2005)

  22. Kothari, N., Millstein, T., Govindan, R.: Deriving state machines from tinyos programs using symbolic execution. In: Proceedings of the 7th International Conference on Information Processing in Sensor Networks, IPSN ’08, pp. 271–282, IEEE Computer Society, Washington (2008)

  23. Lamport, L., Fischer, M.: Byzantine generals and transaction commit protocols. Technical report (1982)

  24. Lamport, L.: The implementation of reliable distributed multiprocess systems. Comput. Netw. 2, 95–114 (1978)

    MathSciNet  Google Scholar 

  25. Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4, 382–401 (1982)

    Article  MATH  Google Scholar 

  26. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)

    Google Scholar 

  27. Lee, S.H., Lee, S., Song, H., Lee, H.S.: Wireless sensor network design for tactical military applications: remote large-scale environments. In: Proceedings of the 28th IEEE Conference on Military Communications, MILCOM’09, pp. 911–917, IEEE Press, Piscataway (2009)

  28. Lee, D., Yannakakis, M.: Closed partition lattice and machine decomposition. IEEE Trans. Comput. 51(2), 216–228 (2002)

    Article  MathSciNet  Google Scholar 

  29. Mainwaring, A., Culler, D., Polastre, J., Szewczyk, R., Anderson, J.: Wireless sensor networks for habitat monitoring. In: Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications, WSNA ’02, pp. 88–97, ACM, New York (2002)

  30. Melliar-Smith, P.M., Moser, L.E., Agrawala, V.: Broadcast protocols for distributed systems. IEEE Trans. Parallel Distrib. Syst. 1(1), 17–25 (1990)

    Article  Google Scholar 

  31. Mishchenko, A., Chatterjee, S., Brayton, R.: Dag-aware aig rewriting: a fresh look at combinational logic synthesis. In: DAC 06: Proceedings of the 43rd Annual Conference on Design Automation, pp. 532–536. ACM Press (2006)

  32. Ogale, V., Balasubramanian, B., Garg, V.K.: A fusion-based approach for tolerating faults in finite state machines. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS ’09, pp. 1–11, IEEE Computer Society, Washington (2009)

  33. Peterson, W.W., Weldon, E.J.: Error-Correcting Codes—Revised, 2nd Edition. The MIT Press, Cambridge (March 1972)

  34. Ramanathan, P., Shin, K.G., Butler, R.W.: Fault-tolerant clock synchronization in distributed systems. Computer 23(10), 33–42 (1990)

    Article  Google Scholar 

  35. Schneider, F.B.: Byzantine generals in action: implementing fail-stop processors. ACM Trans. Comput. Syst. 2(2), 145–154 (1984)

    Article  Google Scholar 

  36. Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)

    Article  Google Scholar 

  37. Shah, M.A., Hellerstein, J.M., Brewer, E: Highly available, fault-tolerant, parallel dataflows. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, pp. 827–838, ACM, New York (2004)

  38. Tenzakhti, F., Day, K., Ould-Khaoua, M.: Replication algorithms for the world-wide web. J. Syst. Archit. 50(10), 591–605 (2004)

    Google Scholar 

  39. Yang, S.: Logic Synthesis and Optimization Benchmarks User Guide Version 3.0 (1991)

  40. Youra, H., Inoue, T., Masuzawa, T., Fujiwara, H.: On the synthesis of synchronizable finite state machines with partial scan. Syst. Comput. Jpn. 29(1), 53–62 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bharath Balasubramanian.

Additional information

*This research was supported in part by the NSF Grants CNS-0718990, CNS-0509024, CNS-1115808 and Cullen Trust for Higher Education Endowed Professorship.

Appendices

Appendix A: Event-based decomposition of machines

In this section, we ask a question that is fundamental to the understanding of DFSMs, independent of fault-tolerance: Given a machine \(M\), can it be replaced by two or more machines executing in parallel, each containing fewer events than \(M\)? In other words, given the state of these fewer-event machines, can we uniquely determine the state of \(M\)? In Fig. 12, the 2-event machine \(M\) (it contains events 0 and 1 in its event set), checks for the parity of 0s and 1s. \(M\) can be replaced by two 1-event machines \(P\) and \(Q\), that check for the parity of just 1s or 0s respectively. Given the state of \(P\) and \(Q\), we can determine the state of \(M\). In this section, we explore the problem of replacing a given machine \(M\) with two or more machines, each containing fewer events than \(M\). We present an algorithm to generate such event-reduced machines with time complexity polynomial in the size of \(M\). This is important for applications with limits on the number of events each individual process running a DFSM can service. We first define the notion of event-based decomposition.

Fig. 12
figure 12

Event-based decomposition of a machine

Definition 5

A (k,e)-event decomposition of a machine \(M\;(X_M,\; \alpha _M,\; \Sigma _M,\; m^0)\) is a set of \(k\) machines \(\mathcal {E}\), each less than \(M\), such that \(d_{min}(M,\mathcal {E})>0\) and \(\forall P (X_P,\alpha _P,\Sigma _P,p^0)\in \mathcal {E},\; |\Sigma _P| \le |\Sigma _M|-e\).

As \(d_{min}(M,\mathcal {E})>0\), given the state of the machines in \(\mathcal {E}\), the state of \(M\) can be determined. So, the machines in \(\mathcal {E}\), each containing at most \(|\Sigma _M|-e\) events, can effectively replace \(M\). In Fig. 13, we present the eventDecompose algorithm that takes as input, machine \(M\), parameter \(e\), and returns a (\(k\),\(e\))-event decomposition of \(M\) (if it exists) for some \(k \le |X_M|^2\).

Fig. 13
figure 13

Algorithm for the event-based decomposition of a machine

In each iteration, Loop 1 generates machines that contain at least one event less than the machines of the previous iteration. So, starting with \(M\) in the first iteration, at the end of \(e\) iterations, \(\mathcal {M}\) contains the set of largest machines less than \(M\), each containing at most \(|\Sigma _M|-e\) events.

Loop 2, iterates through each machine \(P\) generated in the previous iteration, and uses the reduceEvent algorithm (same as the algorithm presented in Fig. 4) to generate the set of largest machines less than \(P\) containing at least one event less than \(\Sigma _P\). To generate a machine less than \(P\), that does not contain an event \(\sigma \) in its event set, the reduceEvent algorithm combines the states such that they loop onto themselves on \(\sigma \). The algorithm then constructs the largest machine that contains these states in the combined form. This machine, in effect, ignores \(\sigma \). This procedure is repeated for all events in \(\Sigma _P\) and the largest incomparable machines among them are returned. Loop 3 constructs an event-decomposition \(\mathcal {E}\) of \(M\), by iteratively adding at least one machine from \(\mathcal {M}\) to separate each pair of states in \(M\), thereby ensuring that \(d_{min}(\mathcal {E})> 0\). Since each machine added to \(\mathcal {E}\) can separate more than one pair of states, an efficient way to implement Loop 3 is to check for the pairs that still need to be separated in each iteration and add machines till no pair remains.

Let the 4-event machine \(M\) shown in Fig. 13 be the input to the eventDecompose algorithm with \(e=1\). In the first and only iteration of Loop 1, \(P=M\) and the reduceEvent algorithm generates the set of largest 3-event machines less than \(M\), by successively eliminating each event. To eliminate event 0, since \(m^0\) transitions to \(m^3\) on event \(0\), these two states are combined. This is repeated for all states and the largest machine containing all the combined states self looping on event 0 is \(M_1\). Similarly, the largest machines not acting on events 3,1 and 2 are \(M_2\), \(M_3\) and \(M_\bot \) respectively. The reduceEvent algorithm returns \(M_1\) and \(M_2\) as the only largest incomparable machines in this set. The eventDecompose algorithm returns \(\mathcal {E}=\{M_1,\; M_2\}\), since each pair of states in \(M\) are separated by \(M_1\) or \(M_2\). Hence, the 4-event \(M\) can be replaced by the 3-event \(M_1\) and \(M_2\), i.e., \(\mathcal {E}=\{M_1,M_2\}\) is a (2,1)-event decomposition of \(M\).

Theorem 1

Given machine \(M\;(X_M,\alpha _M,\Sigma _M,m^0)\), the eventDecompose algorithm generates a (\(k\),\(e\))-event decomposition of \(M\) (if it exists) for some \(k \le |X_M|^2\).

Proof

The reduceEvent algorithm exhaustively generates the largest incomparable machines that ignore at least one event in \(\Sigma _M\). After \(e\) such reduction in events, Loop 3 selects one machine (if it exists) among \(\mathcal {M}\) to separate each pair of states in \(X_M\). This ensures that at the end of Loop 3, either \(d_{min}(\mathcal {E})>0\) or the algorithm has returned \(\{\}\) (no (\(k\),\(e\))-event decomposition exists). Since there are at most \(|X_M|^2\) pairs of states in \(X_M\), there are at most \(|X_M|^2\) iterations of Loop 3, in which we pick one machine per iteration. Hence, \(k \le |X_M|^2\).\(\square \)

The reduceEvent algorithm visits each state of machine \(M\) to create blocks of states which loop to the same block on event \(\sigma \in \Sigma _M\). This has time complexity \(O(|X_M|)\) per event. The cost of generating the largest closed partition corresponding to this block is \(O(|X_M| |\Sigma _M|)\) per event. Since we need to do this for all events in \(\Sigma _M\), the time complexity to reduce at least one event is \(O(|X_M| |\Sigma _M|^2)\). In the eventDecompose algorithm, the first iteration generates at most \(|\Sigma _M|\) machines, the second iteration at most \(|\Sigma _M|^2\) machines and the \(e\mathrm{th}\) iteration will contain \(O(|\Sigma _M|^e)\) machines. The rest of the analysis is similar to the one presented in Sect. 4.2 and the time complexity of the reduceEvent algorithm is \(O(|X_M| |\Sigma _M|^{e+1})\).

To generate the (\(k\),\(e\))-event decomposition from the set of machines in \(\mathcal {M}\), we find a machine in \(\mathcal {M}\) to separate each pair of states in \(X_M\). Since there are \(O(|X_M|^2)\) such pairs, the number of iterations of Loop 3 is \(O(|X_M|^2)\). In each iteration of Loop 3, we find a machine among the \(O(|\Sigma _M|^{e})\) machines of \(\mathcal {M}\) that separates a pair \(m_i,m_j \in X_M\). To check if a machine separates a pair of states just takes \(O(|X_M|)\) time. Hence the time complexity of Loop 3 is \(O(|X_M|^3|\Sigma _M|^{e})\). So, the overall time complexity of the eventDecompose algorithm is the sum of the time complexities of Loop 1 and 3, which is \(O(|X_M||\Sigma _M|^{e+1}+|X_M|^3|\Sigma |^{e})\).

Appendix B: Incremental approach to generate fusions

In Fig. 14, we present an incremental approach to generate the fusions, referred to as the incFusion algorithm, in which we may never have to reduce the \({RCP}\) of all the primaries. In each iteration, we generate the fusion corresponding to a new primary and the \({RCP}\) of the (possibly small) fusions generated for the set of primaries in the previous iteration.

Fig. 14
figure 14

Incremental fusion algorithm

In Fig. 15, rather than generate a fusion by reducing the 8-state \({RCP}\) of \(\{A,B,C\}\), we can reduce the 4-state \({RCP}\) of \(\{A,B\}\) to generate fusion \(F'\) and then reduce the 4-state \({RCP}\) of \(\{C,F'\}\) to generate fusion \(F\). In the following paragraph, we present the proof of correctness for the incremental approach and show that it has time complexity \(O(\rho ^n)\) times better than that of the genFusion algorithm, where \(\rho \) is the average state reduction achieved by fusion.

Fig. 15
figure 15

Incremental approach: first generate \(F'\) and then \(F\)

Theorem 2

Given a set of \(n\) machines \(\mathcal {P}\), the incFusion algorithm generates an (\(f\), \(f\))-fusion of \(\mathcal {P}\).

Proof

We prove the theorem using induction on the variable \(i\) in the algorithm. For the base case, i.e., \(i=2,\; \mathcal {N}=\{P_1,P_2\}\) (since \(RCP(\{P_1\})=P_1\)). Let the (\(f\), \(f\))-fusion generated by the genFusion algorithm for \(\mathcal {N}=\{P_1,P_2\}\) be denoted \(\mathcal {F}^1\). For \(i=3\), let the (\(f\), \(f\))-fusion generated for \(\mathcal {N}=\{P_3, {RCP}(\mathcal {F}^1)\}\) be denoted \(\mathcal {F}^2\). We show that \(\mathcal {F}^2\) is an (\(f\), \(f\))-fusion of \(\{P_1,P_2,P_3\}\). Assume \(f\) crash faults among \(\{P_1P_2,P_3\} \cup \mathcal {F}^2\). Clearly, less than or equal to \(f\) machines in \(\{P_3\} \cup \mathcal {F}^2\) have crashed. Since \(\mathcal {F}^2\) is an (\(f\), \(f\))-fusion of \(\{P_3, {RCP}(\mathcal {F}^1)\}\), we can generate the state of all the machines in \({RCP}(\mathcal {F}^1)\) and the state of the crashed machines among \(\{P_3\} \cup \mathcal {F}^2\). Similarly, less than or equal to \(f\) machines have crashed among \(\{P_1,P_2\}\). Hence, using the state of the available machines among \(\{P_1,P_2\}\) and the states of all the machines in \(\mathcal {F}^1\) we can generate the state of the crashed machines among \(\{P_1,P_2\}\).

Induction Hypothesis: Assume that the set of machines \(\mathcal {F}^i\), generated in iteration \(i\), is an (\(f\), \(f\))-fusion of \(\{P_1 \ldots P_{i+1}\}\). Let the (\(f\), \(f\))-fusion of \(\{P_{i+2}, {RCP}(\mathcal {F}^i)\}\) generated in iteration \(i+1\) be denoted \(\mathcal {F}^{i+1}\). To prove: \(\mathcal {F}^{i+1}\) is an (\(f\), \(f\))-fusion of \(\{P_1 \ldots P_{i+2}\}\). The proof is similar to that for the base case. Using the state of the available machines in \(\{P_{i+2}\} \cup \mathcal {F}^{i+1}\), we can generate the state of all the machines in \(\mathcal {F}^{i}\) and \(\{P_{i+2}\} \cup \mathcal {F}^{i+1}\). Subsequently, we can generate the state of the crashed machines in \(\{P_1 \ldots P_{i+1}\}\).\(\square \)

From observation 1, the genfusion algorithm has time complexity,

\(O(fN^4|\Sigma |+fN^5)\) (assuming \(\bigtriangleup s=0\) and \(\bigtriangleup e=0\) for simplicity). Hence, if the size of \(\mathcal {N}\) in the \(i \mathrm{th}\) iteration of the incFusion algorithm is denoted by \(N_i\), then the time complexity of the incFusion algorithm, \(T_{inc}\) is given by the expression \(\Sigma _{i=2}^{i=n}O(fN_i^4|\Sigma |+fN_i^5)\).

Let the number of states in each primary be \(s\). For \(i=2\), the primaries are \(\{P_1,P_2\}\) and \(N_1=O(s^2)\). For \(i=3\), the primaries are {\({RCP}(\mathcal {F}^1),P_3\}\). Note that \({RCP}({\mathcal {F}^1)}\) is also a fusion machine. Since we assume an average reduction of \(\rho \) (size of \({RCP}\) of primaries/average size of each fusion), the number of states in \({RCP}(\mathcal {F}^1)\) is \(O(s^2/\rho )\). So , \(N_2=O(s^3/\rho )\). Similarly, \(N_3=O(s^4/\rho ^2)\) and \(N_{i}=O(s^{i+1}/\rho ^{i-1})\). So,

$$\begin{aligned} T_{inc}&= O(|\Sigma |f\Sigma _{i=2}^{i=n}s^{4i+4}/\rho ^{4i-4}+f\Sigma _{i=2}^{i=n}s^{5i+5}/\rho ^{5i-5})\\&= O(|\Sigma |fs^4\rho ^4\Sigma _{i=2}^{i=n}(s/\rho )^{4i}+fs^5\rho ^5\Sigma _{i=2}^{i=n}(s/\rho )^{5i}) \end{aligned}$$

This is the sum of a geometric progression and hence,

$$\begin{aligned} T_{inc} = O(|\Sigma |fs^4\rho ^4(s/\rho )^{4n}+fs^5\rho ^5(s/\rho )^{5n}) \end{aligned}$$

Assuming \(\rho \) and \(s\) are constants, \(T_{inc}=O(f|\Sigma |s^n/\rho ^n+fs^n/\rho ^n)\). Note that, the time complexity of the genFusion algorithm in Fig. 4 is \(O(f|\Sigma |s^n+ fs^n)\). Hence, the incFusion algorithm achieves \(O(\rho ^n)\) savings in time complexity over the genFusion algorithm.

Appendix C: Discussion: backups outside the closed partition set

So far in this paper, we have only considered machines that belong to the closed partition set. In other words, given a set of primaries \(\mathcal {P}\), our search for backup machines was restricted to those that are less than the \(RCP\) of \(\mathcal {P}\), denoted by \(R\). However, it is possible that efficient backup machines exist outside the lattice, i.e., among machines that are not less than or equal to \(R\). In this section, we present a technique to detect if a machine outside the closed partition set of \(R\) can correct faults among the primaries. Given a set of machines in \(\mathcal {F}\) each less than or equal to \(R\), we can determine if \(\mathcal {P \cup F}\) can correct faults based on the \(d_{min}\) of \(\mathcal {P\cup F}\) (Sect. 3.3). To find \(d_{min}\), we first determine the mapping between the states of \(R\) to the states of each of the machines in \(\mathcal {F}\). However, given a set of machines in \(\mathcal {G}\) that are not less than or equal to \(R\), how do we generate this mapping?

To determine the mapping between the states of \(R\) to the states of the machines in \(\mathcal {G}\), we first generate the \({RCP}\) of \({\{R\} \cup \mathcal {G}}\), denoted \(B\), which is be greater than all the machines in \({\{R\} \cup \mathcal {G}}\). Hence, we can determine the mapping between the states of \(B\) and the states of all the machines in \({\{R\} \cup \mathcal {G}}\). Given this mapping, we can determine the (non-unique) mapping between the states of \(R\) and the states of the machines in \(\mathcal {G}\). This enables us to determine \(d_{min}(R, {\{R\} \cup \mathcal {G}})\). If this \(d_{min}\) is greater than \(f\), then \(\mathcal {G}\) can correct \(f\) crash or \(\lfloor f/2 \rfloor \) Byzantine faults among the machines in \(\mathcal {P}\).

Consider the example shown in Fig. 16. Given the set of primaries \(\{A,B,C\}\) shown in Fig. 1, we want to determine if \(G\) can correct one crash fault among \(\{A,B,C\}\). Since \(G\) is outside the closed partition set of \(R\), we first construct \(B\), which is the \({RCP}\) of \(G\) and \(R\). Since \(B\) is greater than both \(R\) and \(G\), we can determine how its states are mapped to the states of \(R\) and \(G\) (similar to Fig. 2). For example, \(b^0\) and \(b^8\) are mapped to \(r^0\) in \(R\), while \(b^0\) and \(b^9\) are mapped to \(g^0\) in \(G\). Using this information, we can determine the mapping between the states of \(R\) and \(G\). For example, since \(b^0\) and \(b^9\) are mapped to \(r^0\) and \(r^2\) respectively, \(g^0=\{r^0, r^2\}\). Extending this idea, we get:

$$\begin{aligned} g^1=\{r^1, r^3\}; g^2=\{r^6, r^7\};g^3=\{r^4, r^5\}; g^4=\{r^0, r^2\} \end{aligned}$$

In Fig. 3 (ii), the weakest edges of \(G(\{A,B,C\})\) are \((r^0,r^1)\) and \((r^2,r^3)\) (the other weakest edges not shown). Since \(G\) separates all these edges, it can correct one crash fault among the machines in \(\{A,B,C\}\). However, note that, the machines in \(\{A,B,C\}\) cannot correct a fault in \(G\). For example, if \(G\) crashes and \(R\) is in state \(r^0\), we cannot determine if \(G\) was in state \(g^0\) or \(g^4\). This is clearly different from the case of the fusion machines presented in this paper, where faults could be corrected among both primaries and backups.

Fig. 16
figure 16

Machine outside the closed partition set of \(R\) in Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balasubramanian, B., Garg, V.K. Fault tolerance in distributed systems using fused state machines. Distrib. Comput. 27, 287–311 (2014). https://doi.org/10.1007/s00446-014-0209-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-014-0209-4

Keywords

Navigation