Parameter estimation in wireless sensor networks with faulty transducers: A distributed EM approach
Introduction
Wireless sensor networks (WSNs) consist of many small, spatially distributed autonomous nodes, equipped with one or more on-board sensors to collect information from the surrounding environment, and which collaborate to jointly perform a variety of inference and information processing tasks. Applications include environmental and healthcare monitoring, event detection, target classification, and industrial automation [1], [2]. Distributed processing, by which computations are carried out within the network in order to avoid raw data transmission to a fusion center, is a desirable feature of WSNs since it usually results in energy savings and improved robustness [3], [4]. In particular, distributed estimation of unknown parameters in WSNs is an important problem which has been extensively considered over the past few years [5], [6], [7], [8], [9], [10], [11].
In practice, estimation performance may be severely degraded when the information collected by the nodes becomes unreliable due to sensor malfunction [12], [13], [14], [15], and therefore it is important to efficiently identify faulty nodes [16], [17]. Given that nodes are typically deployed in outdoor, potentially harsh environments, sensor malfunction effects should not be lightly dismissed. We consider the problem of distributed estimation of a vector-valued parameter from the observations collected by a WSN where some nodes may be subject to random transducer faults, so that their reports contain only noise [13], [18]. In the presence of such unreliable observations, one possibility is to run a node classification stage previously to the estimation stage [19]; however, this entails increased computational complexity and communication cost. In relation to algorithms based on prior detection of faulty nodes, the Mixed Detection and Estimation (MDE) scheme in [18] performs the node classification and estimation tasks in a jointly distributed manner. However, since MDE classifies nodes based on hard decisions, it is prone to decision errors whenever the signal-to-noise ratio (SNR) is not sufficiently high. To avoid this problem, we adopt an approach in which a soft classification of the data is performed by means of the expectation-maximization (EM) algorithm, a well-known method for computing the maximum likelihood (ML) estimate in the presence of hidden variables [20], [21]. The EM algorithm implicitly and iteratively produces estimates of the class probabilities, alternating between an expectation step (E-step), where access to the whole network dataset is required, and a maximization step (M-step), where updated estimates are obtained.
Distributed implementations of the EM algorithm for Gaussian mixture density estimation and clustering have been previously proposed. For example, in incremental approaches [22], [23], [24], [25], computations involving global network information at the E-step are addressed via aggregation strategies, assigning routing paths or junction trees within the network. This problem is avoided in [26], [27], [29], which apply full-blown gossip- or consensus-based schemes at each E-step so that all nodes arrive at an agreement about every intermediate estimate. The main drawback of these methods, however, is the need to exchange a large amount of information among neighbor nodes, with the consequent penalty in energy efficiency. In [28] a distributed EM algorithm based on the alternating direction method of multipliers (ADMM) is proposed for clustering. In this scheme the communication overhead is reduced but at the cost of significantly increasing the computational cost since each node has to solve a convex optimization problem via, e.g., interior point methods at each iteration. A potential way to overcome these problems is the use of diffusion strategies [11], by which nodes exchange local information only once per EM iteration and perform averaging over the values in their neighborhoods [30], [31], [32] (see [33] for an extension to general mixture models). Convergence analyses of these schemes either assume that an infinite amount of data is available at each node [30], [32], or adopt a stochastic framework under an independence assumption [31].
The algorithm proposed in this paper is based on a different diffusion-based approach [34], [35], in which the propagation of information throughout the network is embedded in the iterative parameter update. This is done by appropriately combining two terms for information diffusion and information averaging (consensus) in the update equations. The resulting iteration, termed diffusion-averaging distributed Expectation-Maximization (DA-DEM), is reminiscent of so-called consensus+innovations (C+I) algorithms for distributed estimation in linear models [36], whose updates combine a consensus term and a local innovation term; nevertheless, several important differences should be highlighted. First, the model underlying C+I schemes is linear, but in our setting this property does not apply due to the potential presence of faulty nodes. Second, C+I schemes are usually designed for on-line adaptation, i.e., sensors keep acquiring new observations as time progresses, whereas the DA-DEM algorithm is of batch type in which a single measurement is available to each sensor. Thus, in our setting, the “innovation” provided by the diffusion term does not correspond to information provided by new measurements, but rather to that provided by the iterative refinement of the estimates. Third, in contrast with [18], [34], [35], [36] where the diffusion and averaging terms have different asymptotic decay rates, thus leading to mixed time-scale recursions, in DA-DEM both terms have the same rate. In contrast with [30], [31], [32], this feature allows for the development of a local convergence analysis under a deterministic setting with a finite amount of data, showing that any convergent point of the centralized EM iteration, and therefore a (possibly local) maximum of the likelihood function, must be an asymptotically convergent point of DA-DEM. Numerical examples show that the DA-DEM estimator asymptotically attains the performance of centralized EM in terms of mean square error (). In addition to the aforementioned convergence analysis, further contributions with respect to [35] include lack of knowledge about the a priori probability of a sensor fault and the consideration of vector-valued parameter. In contrast with incremental strategies, DA-DEM does not require the computation and management of routing paths through the network, resulting in sizable reduction in convergence time and thus leading to energy savings.
The paper is organized as follows. Section 2 describes the signal model, and Section 3 presents the centralized EM-based estimator, the starting point for the development of the distributed implementation in Section 4. The convergence analysis of DA-DEM is developed in Section 5. Finally, simulation results and conclusions are presented in Sections 6 and 7 respectively.
Notation: We use lowercase, bold lowercase, and bold uppercase symbols to respectively denote scalars, vectors and matrices. The transpose and inverse of matrix A are denoted by AT and respectively. The 2-norm of a vector v is denoted by ‖v‖, whereas for a matrix A, ‖A‖F denotes its Frobenius norm, ‖A‖ its spectral norm (i.e., its largest singular value) and, for A square, ρ(A) is the spectral radius (largest of the moduli of the eigenvalues). For an n × n symmetric matrix S, is a vector of size obtained by stacking the entries of the upper triangular part of S. The composition of two functions f and g is denoted by f○g, so that and denotes statistical expectation.
Section snippets
Problem statement
We consider the problem of estimating a parameter vector based on a set of N ≫ L independent observations given by where are assumed known ∀i, {wi, ∀i} are independent, identically distributed (i.i.d.) zero-mean Gaussian random variables with variance σ2, modeling the observation noise, and {ai, ∀i} are i.i.d. Bernoulli random variables with independent of wj, ∀{i, j}. A value of indicates that node i has actually sensed the
Centralized EM estimator
Starting from an initial estimate, the EM algorithm alternates between an E-step, where the expected log-likelihood function (LLF) of the observations is computed using the current estimates, and an M-step, where the parameters maximizing the expected LLF are obtained; under mild conditions, the EM will converge to a maximum, possibly local, of the LLF [20], [21]. Consider the observation vector in (2) with pdf given by (6). We regard y as the incomplete observation and {y, a} as the complete
A diffusion-averaging distributed EM estimator
The proposed distributed implementation of the EM estimator hinges on the fact that in the centralized version the information from the different nodes is aggregated by means of averages, as can be seen in (13)–(17). This property is similar to that used in [38] for distributed computation of a Least Squares estimate. However, in contrast with [38], in our estimation problem not all of the quantities to be averaged are available at the nodes from the very beginning; rather, they depend on the
Local convergence analysis
We analyze now the convergence properties of the DA-DEM algorithm derived in Section 4. Recall from (33) that the step-size sequence αk governs the diffusion/consensus process, gradually switching from one to the other as long as this sequence converges to zero. The use of vanishing step-sizes is common in stochastic approximation [46] and it is found also in consensus applications with noisy signals [41], [42]. In particular, we consider the following choice: Note that
Simulation results
The theoretical results from Section 5 are supported here with computer simulations of a network composed of nodes randomly deployed over a unit square with connectivity radius . The nodes sense a unit-norm parameter vector with randomly generated and fixed throughout the simulation. Each node has access to one measurement with x is assumed sensed with probability and w is taken as a Metropolis weight matrix [38]. In each run, the
Conclusion
We have proposed a diffusion-averaging distributed EM algorithm for estimation of a vector-valued parameter with a wireless sensor network in the presence of noisy observations and with potentially faulty transducers. The DA-DEM recursion combines an initial period where the process of information diffusion is gradually switched off at the same time as an information averaging process is gradually switched on. The switching mechanism is controlled by proper choice of vanishing step-size
Acknowledgments
This work was supported by the Ministerio de Economía y Competitividad of the Spanish Government, ERDF funds [TEC2013-41315-R,TEC2015-69648-REDC,TEC2016-75067-C4-2-R,TEC2013-47020-C2-1-R, TEC2016-76409-C2-2-R]; and the Galician Government [Agrupación Estratéxica Consolidada de Galicia accreditation 2016–2019, Red Temática RedTEIC 2017-2018].
References (46)
A survey for design methods for failure detection in dynamic systems
Automatica
(1976)- F. Zhao, L. Guibas, Wireless sensor networks: an information processing approach, Morgan Kaufmann, San Mateo, CA,...
- et al.
Wireless Sensor Networks.
(2010) - et al.
Gossip algorithms for distributed signal processing
Proc. IEEE
(2010) - et al.
Consensus and cooperation in networked multi-agent systems
Proc. IEEE,
(2007) - et al.
Distributed compression-estimation using wireless sensor networks
IEEE Signal Process. Mag.
(2006) - et al.
Randomized gossip algorithms
IEEE Trans. Info. Theory,
(2006) - et al.
Decentralized maximum-likelihood estimation for sensor networks composed of nonlinearly coupled dynamical systems
IEEE Trans. Signal Process.
(2007) - et al.
Information-driven distributed maximum likelihood estimation based on Gauss-Newton method in wireless sensor networks
IEEE Trans. Signal Process.
(2007) - et al.
Consensus in ad hoc WSNs with noisy links - part I: distributed estimation of deterministic signals
IEEE Trans. Signal Process.
(2008)
Decentralized parameter estimation by consensus based stochastic approximation
IEEE Trans. Autom. Control
Diffusion adaptation over networks
Detection and diagnosis of sensor and actuator failures using IMM estimator
IEEE Trans. Aerosp. Electron. Syst.
Sensor network data fault types
ACM Trans. Sensor Netw.
A collaborative sensor-fault detection scheme for robust distributed estimation in sensor networks
IEEE Trans. Commun.
Fault diagnosis in wireless sensor networks: a survey
IEEE Commun. Surv. Tuts.
Defective sensor identification for WSNs involving generic local outlier detection tests
IEEE Trans. Signal Inf. Process. Netw.
Distributed estimation in sensor networks with imperfect model information: an adaptive learning-based approach
Proc. IEEE Int. Conf. Acoust. Speech Signal Process.
Distributed fault detection in sensor networks via clustering and consensus
Proc. IEEE Conf. Decis. Control,
Maximum likelihood from incomplete data via the EM algorithm
J. R. Stat. Soc. Ser. B
The expectation-maximization algorithm
IEEE Signal Process. Mag.
Distributed EM algorithms for density estimation and clustering in sensor networks
IEEE Trans. Signal Process.
Fully distributed EM for very large datasets
Proc. Int. Conf. Machine Learning
Cited by (9)
Consensus variational Bayesian moving horizon estimation for distributed sensor networks with unknown noise covariances
2022, Signal ProcessingCitation Excerpt :Distributed state estimation has received great attention in recent years for its application in the context of distributed sensor networks (DSN) [1–7].
Expectation maximization algorithm over Fourier series (EMoFS)
2022, Signal ProcessingCitation Excerpt :The impact of this algorithm on scientific research has been felt tremendously in numerous disciplines. Many studies in the various domains of signal processing, artificial intelligence and telecommunications made extensive use of the EM algorithm, e.g. wireless networks, image processing, channel estimation and detection, message passing algorithm, mixture based learning, direct position determination, supervised learning, neural networks, emission tomography, intelligent anomaly detection, etc. [3–11]. Numerous examples exist in the literature detailing the usage of the EM algorithm for different well-known probability distributions, e.g. uniform [12], chi-square [12], Beta [13,14], normal distribution [15], Rayleigh (alternatively at [16]) and even application of the Dirac delta function to present probabilities at individual values [17].
Transfer-Learning-Based Gaussian Mixture Model for Distributed Clustering
2023, IEEE Transactions on CyberneticsDistributed Banach-Picard Iteration for Locally Contractive Maps
2023, IEEE Transactions on Automatic ControlDistributed Banach-Picard Iteration: Application to Distributed Parameter Estimation and PCA
2023, IEEE Transactions on Signal Processing
- 1
Present address: CNS Group, Universitat Pompeu Fabra, C/Ramon Trias Fargas, 25–27, 08005 Barcelona, Spain.
- 2
EURASIP Member