doi:10.1016/j.sysarc.2005.08.001
Copyright © 2005 Elsevier B.V. All rights reserved.
The algorithm of pipelined gossiping
University of Antwerp, Department of Mathematics and Computer Science, Performance Analysis of Telecommunication Systems Group, Middelheimlaan 1, 2020 Antwerp, Belgium
Interdisciplinary Institute for BroadBand Technology, Crommenlaan 8, 9050 Ghent-Ledeberg, Belgium
Received 11 February 2002;
revised 27 January 2005;
accepted 9 August 2005.
Available online 25 October 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
A family of gossiping algorithms depending on a parameter permutation is introduced, formalized, and discussed. Several of its members are analyzed and their asymptotic behaviour is revealed, including a member whose model and performance closely follows the one of hardware pipelined processors. This similarity is exposed. An optimizing algorithm is finally proposed and discussed as a general strategy to increase the performance of the base algorithms.
Keywords: Algorithms; Combinatorics; Gossiping; Inter-process communication; Finite-state automata
Fig. 1. The state diagram of the FSA run by processor i. The first row consists of i couples (WR, R).
represents a permutation of the N integers 0, … , i − 1, i + 1, … , N. The last row contains N − i couples (WR, R).
Fig. 2. Processor i − 1 blocks processor i only if 2i − 1 < N. A transmission i.e., two used slots, is represented by an arrow. In dotted arrows the sender is processor i − 1, for normal arrows it is processor i. Note the cluster of i − 1 columns with two concurrent transmissions (adding up to four used slots) in each of them.
Fig. 3. For any processor i >
(N + 1)/2
, there exists only one cluster of N − i columns with four used slots inside.
Fig. 4. A graphical representation for run-table 20 when
is the identity permutation. Light gray pixels represent wasted slots, gray pixels represent R actions, black slots are sending actions. Note the black “blocks” which represent the clusters mentioned in Fig. 2 and Fig. 3.
Fig. 5. Comparison between lengths in the case of the identity permutation (dotted parabola) and that of the random permutation (piecewise line), 1
N
160. The lowest curve (λ = 0.71N2 − 3.88N + 88.91) is the parabola best fitting with the piecewise line—which suggests a quadratic execution time as in the case of the identity permutation.
Fig. 6. Comparison between values of μ in the case of a pseudo-random permutation (piecewise line) and that of the identity permutation (dotted curve), 1
N
160. Note how the former is strictly over the latter. Note also how μ seems to tend to a value right above 2.6 for the identity permutation, as claimed by Proposition 17.
Fig. 7. Comparison between values of ε in the case of the random permutation (piecewise line) and that of the identity permutation (dotted curve), 1
N
160. Also in this graph the former is strictly over the latter, though they get closer to each other and to zero as N increases, as proven for the identity permutation in Proposition 16.
Fig. 8. A graphical representation for run-table 20 when
is a pseudo-random permutation.
Fig. 10. Comparison between values of μ derived from the identity permutation (dotted parabola) and those from permutation (17) for 1
N
10.
Fig. 11. Comparison of efficiencies when
is the identity permutation and in the case of permutation (17), for 1
N
160.
Fig. 12. A graphical representation for run-table 30 when
is permutation (17).
Fig. 14. Values of μ for the three cases of Fig. 13.
Fig. 15. Values of ε for the three cases of Fig. 13. Peek values are in Table 8.
Fig. 16. Comparison of lengths when
is a pseudo-random permutation (dots) and with the addition of Algorithm 2 (piecewise line), 1
N
160.
Fig. 17. Comparison of the values of μ in the two cases of Fig. 16.
Fig. 18. Comparison of the values of ε in the two cases of Fig. 16.
Fig. 19. Values of λ for 1
N
40 when
is (17), with (piecewise line) and without (dotted line) the optimization of Algorithm 2.
Fig. 20. Values of μ in the two cases of Fig. 19.
Fig. 21. Values of ε in the two cases of Fig. 19.
Fig. 22. A restoring organ [12], i.e., a N-modular redundant system with N voters, when N = 3. Note that a de-multiplexer is required to produce the single final output.
Fig. 23. Structure of the EFTOS VF for N = 3.
Fig. 24. A Hopfield Neural Network.
Table 1.
A run (N = 4), with
equal to the identity permutation

The step row represents time steps. Ids identify processors.
is the utilization string (see Definition 13). In this case μ, or the average utilization is 2.22 slots out of 5, with an efficiency ε = 44.44% and a length λ = 18. Note that, if the slot is used, then entry
of this matrix represents relation
.
Table 2.
Run-table 7 for
equal to the identity permutation

Average utilization is 2.38 slots out of 8, or an efficiency of 29.79%.
Table 3.
Run-table 5 when
is chosen pseudo-randomly

μ is 2.5 slots out of 6, which implies an efficiency of 41.67%.
Table 4.
Run-table of a run for N = 9 using permutation of Eq. (17)

In this case μ, or the average utilization is 6.67 slots out of 10, with an efficiency ε = 66.67% and a length λ = 27. Note that
is in this case a palindrome i.e., as well known [24], a string like “21012” which can be read indifferently from left to right or vice versa.
Table 5.
Run-table of a run for N = 8 using the permutation of Eq. (17)

μ is equal to six slots out of nine, with an efficiency ε = 66.67% and a length λ = 24. Note how
is a palindrome string.
Table 6.
The algorithm is modified so that multiple gossiping sessions take place

The central, best performing area is consequently prolonged. Therein ε is equal to N/(N + 1). Note how within that area there are consecutive “zones” of 10 columns each, within whom five gossiping sessions reach their conclusion. For instance, such a zone is the region between columns 7 and 16: therein, at entries (4, 7), (0, 9), (1, 10), (2, 11), and (3, 12), a processor gets the last value of a broadcast and can perform some work on a full set of values. This brings to a throughput of t/2, where t is the duration of a slot.
Table 7.
Run-table 7 for
equal to the identity permutation, modified by Algorithm 2 (μ = 5.89, ε = 73.68%)

Table 8.
ε values for different values of N = 2i − 1

Table 9.
Run-table 4 in pipelined gossiping mode and applying Algorithm 2

μ = 3.33 slots out of five, or an efficiency of 66.67%. In other words, Algorithm 2 affected the run-table without developing any improvement—in particular, the ending order has changed.