Abstract
We present a fault tolerant algorithm for matrix factorization in the presence of multiple hardware faults which can be used for solving the linear systemAx=b without determining the correctZU decomposition ofA. HereZ is eitherL for ordinary Gaussian decomposition with partial pivoting,X for pairwise or neighbor pivoting (motivated by the Gentleman-Kung systolic array structure), orQ for the usualQR decomposition. Our algorithm generalizes that of Luk and Park whose method allows for the correction of a single error in a single iterate of the matrixU. Using ideas from the theory of error correcting codes we prove that the algorithm of Luk and Park can in fact tolerate multiple errors in multiple iterates ofU provided these are all confined to a single column. We then generalize the algorithm to one that tolerates multiple errors in multiple iterates ofU provided they are confined to two columns. Our procedure for identifying the erroneous columns is based on the extended Euclidean algorithm and it analogous to the decoding algorithms for BCH codes. We indicate how our methods may be adapted to apply to any number of columns and finally we show how to compute a correct factorization ofA.
Similar content being viewed by others
References
J.M. Speiser and H.J. Whitehouse, Signal processing computational needs: An update,in Mathematics in Signal Processing (J.G. McWhirter, ed.), OUP, 1990, pp. 633–664.
F.T. Luk and H. Park, Fault tolerant matrix triangularization on systolic arraysIEEE Trans. Comp., Vol. C-37 (1988) pp. 1434–1438.
W.M. Gentleman and H.T. Kung, Matrix triangularization by systolic arrays, inReal Time Signal Processing TV, Proc. SPIE (T.F. Tao, ed.), Vol. 298 (1981) pp. 19–26.
K.H. Huang and J.A. Abraham, Algorithm-based fault tolerance for matrix operations,IEEE Trans. Comp., Vol. C-33 (1984) pp. 518–528.
J-Y. Jou and J.A. Abraham, Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures,Proc. IEEE, Vol. 74 (1986) pp. 732–741.
P. Fitzpatrick and C.C. Murphy, Solution of linear systems of equations in the presence of two transient hardware faults,IEE Proc. Pt. E., Vol. 140 (1993) pp. 247–254.
C.J. Anfinson and F.T. Luk, A linear algebraic model of algorithm-based fault tolerance,IEEE Trans. Comp., Vol. C-37 (1988) pp. 1599–1604.
R.P. Brent, F.T. Luk, and C.J. Anfinson, Checksum schemes for fault tolerant systolic computing,in Mathematics in Signal Processing (J.G. McWhirter, ed.), OUP, 1990, pp. 791–804.
D.L. Boley, R.P. Brent, G.H. Golub, and F.T. Luk, Error correction via the Lanczos process,Tech. Rep. EE-CEG-91-1, School of Elect. Eng., Cornell Univ. (Jan. 1991).
T. Fuja, C. Heegard, and R. Goodman, Linear sum codes for random access memories,IEEE Trans. Comp., Vol. C-37 (1988) pp. 1030–1042.
B.W. Johnson,Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley, 1989.
T.R.N. Rao and E. Fujiwara,Error-control coding for Computer Systems, Prentice-Hall, 1989.
R.J. McEliece,The Theory of Information and Coding, Addison-Wesley, Reading, Mass. 1977.
H. Golub and C.F. Van Loan,Matrix Computations, Johns Hopkins University Press, Baltimore, Maryland, 1983.
G. Pólya and G. Szegö,Aufgaben und Lehräsätze aus der Analysis, Vol. 2 (second edition), Springer, Berlin, 1954.
Author information
Authors and Affiliations
Additional information
The author gratefully acknowledeges financial support from the International Fund for Ireland. ¢Controller, HMSO, London, 1991.
Rights and permissions
About this article
Cite this article
Fitzpatrick, P. On fault tolerant matrix decomposition. Journal of VLSI Signal Processing 8, 293–303 (1994). https://doi.org/10.1007/BF02106453
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02106453