Elsevier

Signal Processing

Volume 87, Issue 3, March 2007, Pages 374-407
Signal Processing

Temporal and time-frequency correlation-based blind source separation methods. Part I: Determined and underdetermined linear instantaneous mixtures

https://doi.org/10.1016/j.sigpro.2006.05.012Get rights and content

Abstract

We propose two types of correlation-based blind source separation (BSS) methods, i.e. a time-domain approach and extensions which use time-frequency (TF) signal representations and thus apply to much more general conditions. Our basic TF methods only require each source to be isolated in a tiny TF area, i.e. they set very limited constraints on the source sparsity and overlap, unlike various previously reported TF-BSS methods. Our approaches consist in identifying the columns of the (scaled permuted) mixing matrix in TF areas where these methods detect that a source is isolated. Both the detection and identification stages of these approaches use local correlation parameters of the TF transforms of the observed signals. Two such Linear Instantaneous TIme-Frequency CORRelation-based BSS methods are proposed, using Centered or Non-Centered TF transforms. These methods, which are resp. called LI-TIFCORR-C and LI-TIFCORR-NC, are especially suited to non-stationary sources. We derive their performance from many tests performed with mixtures of speech signals. This demonstrates that their output SIRs have a low sensitivity to the values of their TF parameters and are quite high, i.e. typically 60 to 80 dB, while the SIRs of all tested classical methods range about from 0 to 40 dB. We also extend these approaches to achieve partial BSS for underdetermined mixtures and to operate when some sources are not isolated in any TF area.

Introduction

Blind source separation (BSS) methods aim at restoring a set of unknown source signals from a set of observed signals which are mixtures of these source signals [1], [2], [3]. Most of the approaches that have been developed to this end concern linear instantaneous mixtures and are based on independent component analysis (ICA). They assume the sources to be random stationary statistically independent signals, and they recombine the observed signals so as to obtain statistically independent output signals. The latter signals are then equal to the sources, up to some indeterminacies and under some conditions (especially, at most one source may be Gaussian for such methods to be applicable if no additional constraints are set on the sources).

In addition to ICA, a few other general concepts have been used for achieving BSS. This includes the class of approaches based on time-frequency (TF) analysis, which is the main framework considered in this paper. TF tools have been used in various ways in the BSS methods reported so far, as may be seen e.g. in [4], [5], [6], [7], [8], [9], [10], [11]. Among the trends which emerge from these papers, the following ones should especially be mentioned. A first set of methods is composed of approaches based on ratios of TF transforms of observed signals [4], [5], [6]. Some of these methods require the sources to have no overlap in the TF domain [4], which is quite restrictive. On the contrary, only slight differences in the TF representations of the sources are required by the type of methods that we introduced and extended in [5], [6]. Another general concept that has been proposed for achieving BSS consists in exploiting the sparsity of the sources in an adequate representation of these signals. The representation used in some of these approaches is based on a TF transform of the signals. This yields a second set of TF-BSS methods, which especially contains the approaches proposed in [7]. This second set of methods has some relationships with the above-mentioned first set, in the sense that the different constraints on the TF overlap between sources in the first set of methods may be considered as various conditions on the degree of sparsity of these transformed sources. All the approaches presented in the papers [4], [5], [6], [7] which compose the above two sets use the same TF transform, i.e. the short-time Fourier transform (STFT), which is a linear transform. On the contrary, other approaches use quadratic TF transforms (see e.g. [8], [9], [10], [11]), thus forming a third set of methods. This set especially includes TF-BSS methods which are significantly related to classical BSS approaches, as they consist of TF adaptations of previously developed joint-diagonalization methods, with subsequent modifications. It should be noted that, unlike classical ICA-based BSS methods, TF-based BSS approaches are intrinsically well-suited to non-stationary signals (and set no restrictions on the Gaussianity of the sources). They are therefore e.g. especially attractive for speech sources.

This first part of our paper mainly describes original linear TF-BSS approaches applicable to linear instantaneous mixtures. These approaches use STFTs, like the above-mentioned two sets of linear methods, but they rely on other types of parameters, which are based on the local correlations of the observed signals in the TF domain. Before describing these TF-BSS methods, we present a purely temporal version of such approaches, which only applies to more restrictive conditions, as shown below.

The remainder of this first part of our paper is therefore organized as follows. In Section 2 we define the first configuration, based on determined linear instantaneous mixtures, that we consider and the resulting goal of our investigation. We then present the associated temporal BSS method in Section 3 and we introduce its TF extensions in Section 4. Section 5 is devoted to the versions of these approaches intended for more complex configurations, especially underdetermined mixtures. Section 6 reports on a detailed analysis of the experimental performance achieved by all our temporal and TF approaches for artificial mixtures of real speech sources. It also contains a comparison to the performance of various BSS methods from the literature and of a somewhat related TF approach that we proposed in a previous paper. Section 7 contains a discussion of the features of the proposed methods, as compared to classical BSS approaches. This section also presents the conclusions drawn from this first part of our overall investigation and outline extensions of the proposed methods. In addition, specific topics are detailed in the appendices.

Section snippets

Problem statement

Let us first introduce the features of the considered configuration which apply to the BSS approaches proposed in Sections 3 and 4. We assume that N unknown, possibly complex-valued, source signals sj(t) are mixed in a linear instantaneous way, thus providing a set of N observed signals xi(t). In other words, as in most papers dealing with BSS, we consider determined mixtures, i.e. the number of sources is here assumed to be known and equal to the number of available observations (the case when

Assumptions and definitions

In this section, we consider random signals and introduce a statistical temporal BSS approach. We first present the only assumptions that we make with respect to the sources in this approach, and the associated definitions.

Definition 1

A source is said to be “isolated” in a time area if only this source (among all considered mixed sources) has a non-zero variance in this time area.

This definition corresponds to the theoretical point of view. From a practical point of view, this means that the variances of

Motivations and basic principles

The approach that we introduced in the previous section is attractive because of its simplicity. It may be considered to be of limited practical applicability however, because it assumes all sources to be isolated in associated time areas, which is a restrictive condition. But it opens the way to much more powerful methods if we now take into account the TF distributions of the signals, instead of their plain time distributions considered up to this point. Indeed, the TF extension of the above

Extensions of proposed TF and temporal approaches

Up to now we only considered the configuration based on the following assumptions:

  • 1.

    the number P of observations is equal to the number N of sources,

  • 2.

    all sources are “accessible” (in the above-defined senses).

The proposed BSS methods may be extended beyond this “standard” configuration as follows. Their first extensions concern the situations when the number P of observations is different from the number N of sources. The overdetermined case, i.e. P>N, is known to be handled easily in the framework of

Performance of proposed TF methods for a fixed mixing matrix

In Sections 6.1 and 6.2, we present a large number of tests performed in the following conditions:

  • we start from various real English speech signals sampled at 20 kHz,

  • we derive various artificial linear instantaneous mixtures of these sources,

  • we process these mixed signals with the main BSS methods proposed in this paper, i.e. the standard version of LI-TIFCORR-C and LI-TIFCORR-NC that we defined in Section 4.

The performance achieved in each test is measured by the overall signal-to-interference

Discussion and conclusions

In this paper, we proposed two types of CORRelation-based BSS approaches for Linear Instantaneous mixtures. The first approach operates in the TEMPoral domain, on the Centered version of the signals, and is therefore called LI-TEMPCORR-C. It was introduced in a statistical framework. It therefore compares as follows to the taxonomy of statistical methods for BSS and ICA that may be defined in connection with [18]:

  • 1.

    The most classical class in this taxonomy consists of methods intended for

Acknowledgments

The authors would like to thank the five anonymous reviewers for their very detailed and helpful comments.

References (19)

There are more references available in the full text version of this article.

Cited by (50)

  • Null space component analysis for noisy blind source separation

    2015, Signal Processing
    Citation Excerpt :

    Initially, linear instantaneous (memoryless) mixing models were used [3], followed by linear convolution mixing models [4]. More recently, nonlinear mixing models [5–7], bounded component analysis [8,9], and the sparsity-based approach [10,11] have been exploited. Now, the blind source separation problem is a fundamental issue in applications of biomedical engineering, signal processing, and communications.

  • Blind spatial unmixing of multispectral images: New methods combining sparse component analysis, clustering and non-negativity constraints

    2012, Pattern Recognition
    Citation Excerpt :

    More recently, other methods have been proposed for solving the BSS problem. This especially includes methods based on sparse component analysis (SCA) [1,7,8,19], which exploit the sparsity properties of sources in different representation domains. Most approaches based on ICA (resp.

  • Multi-source TDOA estimation in reverberant audio using angular spectra and clustering

    2012, Signal Processing
    Citation Excerpt :

    We design and conduct a large-scale evaluation of angular spectrum-based and clustering-based methods on 1482 different configurations and investigate the use of the former for the initialization of the latter. In addition, we introduce and evaluate five new TDOA estimation methods inspired from signal-to-noise ratio (SNR) weighting or probabilistic modeling techniques that have been successful for anechoic TDOA estimation [19–21], histogram-based reverberant TDOA estimation [10] or audio source separation [22,23], but have not yet been explored for angular spectrum-based or clustering-based reverberant TDOA estimation. The proposed methods account for the presence of diffuse noise or interfering sources in each time–frequency bin and rely prioritarily on the time–frequency bins resulting from the direct sound of a single source.

View all citing articles on Scopus
View full text