Abstract
A dual-step approach for speaker localization based on a
microphone array is addressed in this paper. In the first stage,
which is not the main concern of this paper, the time difference
between arrivals of the speech signal at each pair of microphones
is estimated. These readings are combined in the second stage to
obtain the source location. In this paper, we focus on the second
stage of the localization task. In this contribution, we propose
to exploit the speaker's smooth trajectory for improving the
current position estimate. Three localization schemes, which use
the temporal information, are presented. The first is a recursive
form of the Gauss method. The other two are extensions of the
Kalman filter to the nonlinear problem at hand, namely, the
extended Kalman filter and the unscented Kalman
filter. These methods are compared with other algorithms, which
do not make use of the temporal information. An extensive
experimental study demonstrates the advantage of using the
spatial-temporal methods. To gain some insight on the obtainable
performance of the localization algorithm, an approximate
analytical evaluation, verified by an experimental study, is
conducted. This study shows that in common TDOA-based localization
scenarios—where the microphone array has small interelement
spread relative to the source position—the elevation and azimuth
angles can be accurately estimated, whereas the Cartesian
coordinates as well as the range are poorly estimated.