Elsevier

Neurocomputing

Volume 70, Issues 7–9, March 2007, Pages 1424-1438
Neurocomputing

Generalized locally recurrent probabilistic neural networks with application to text-independent speaker verification

https://doi.org/10.1016/j.neucom.2006.05.012Get rights and content

Abstract

An extension of the well-known probabilistic neural network (PNN) to generalized locally recurrent PNN (GLR PNN) is introduced. The GLR PNN is derived from the original PNN by incorporating a fully connected recurrent layer between the pattern and output layers. This extension renders GLR PNN sensitive to the context in which events occur, and therefore, capable of identifying temporal and spatial correlations. In the present work, this capability is exploited to improve the speaker verification performance. A fast three-step method for training GLR PNNs is proposed. The first two steps are identical to the training of original PNNs, while the third step is based on the differential evolution (DE) optimization method.

Introduction

Following the introduction of the probabilistic neural network (PNN) by Specht [27], numerous enhancements, extensions, and generalizations of the original model have been proposed. These efforts aim at improving either the learning capability [33], [3], or the classification accuracy [6] of PNNs; or alternatively, at optimizing network size, thereby reducing memory requirements and the resulting complexity of the model, as well as achieving lower operational times [29], [30]. An architecture, referred to as a modified PNN (MPNN), was introduced in [35] for equalization of a non-linear frequency modulated communication channel. The MPNN, which is closely related to the PNN, represents a vector-quantized form of the general regression neural networks (GRNN) [28] of Specht. Improved versions of the MPNN were employed in numerous signal processing, pattern recognition [34], and financial prediction applications [15]. A temporal updating technique for tracking changes in a sequence of images, based on periodic supervised and unsupervised updates of the PNN, has been also developed [32]. A locally recurrent global-feedforward PNN-based classifier, combining the desirable features of both feedforward and recurrent neural networks was introduced in [10]. Specifically, by incorporating a new hidden layer, comprised of recurrent neurons, we extended the original PNN architecture to locally recurrent PNN (LR PNN) [10], in order to capture the inter-frame correlations present in a speech signal.

Initially, the locally recurrent global-feedforward architecture was proposed by Back and Tsoi [1], who considered an extension of the multi-layer perceptron neural network (MLP NN) to exploit contextual information. In their work, they introduced the infinite impulse response (IIR) and finite impulse response (FIR) synapses, able to utilize time dependencies in the input data. The FIR synapse has connections to his own, current and delayed, inputs, while the IIR synapse has also connections to its past outputs. Ku and Lee [16] proposed diagonal recurrent neural networks (DRNN) for the task of system identification in real-time control applications. Their approach is based on the assumption that a single feedback from the neuron's own output is sufficient. Thus, they simplify the fully connected neural network to render training easier. A comprehensive study of several MLP-based locally recurrent neural networks is available in Campolucci et al. [5]. The authors of [5] introduced a unifying framework for the gradient calculation techniques, called causal recursive back-propagation. All aforementioned approaches consider gradient based training techniques for neural networks, which, as it is well-known, require differentiable transfer functions.

The work presented here draws on the concept of the locally recurrent global-feedforward architecture, and the recurrent layer we propose is similar to the IIR synapse introduced in [1] and the DRNN defined by Ku and Lee [16]. Our approach differs from the previously mentioned, primarily, because we consider PNNs instead of MLP NN. Most importantly, however, in the architecture proposed here each neuron in the recurrent layer receives as input not only current and past values of its inputs, but also the N previous outputs of all neurons in that layer. This can be considered as an generalization of the locally recurrent global-feedforward PNN architecture [10] that was obtained by adding time-lagged values of inputs to recurrent layer linkage of the LR PNN. Thus, in the GLR PNN, the neurons of the recurrent layer are linked to all current and L past inputs, and to N past outputs of all neurons from the same layer—in contrast to the LR PNN where connections to the past inputs were not implemented.

In comparison to [11], the present contribution updates the GLR PNN architecture, its training procedure, and provides comprehensive results. Specifically, now all feedbacks, which origin from past outputs of neurons belonging to the recurrent layer, embrace the activation functions of these neurons. The last facilitates the training procedure of the recurrent layer weights and contributes for an improved convergence of the training process due to reduced dynamic range of values that the weight coefficients take. More importantly however, the aforementioned modification of the GLR PNN architecture leads to transformed dynamic range of the weight coefficients of the recurrent layer feedbacks. The dynamic range of their values is now comparable to the one of the weight coefficients of the inputs for that layer. In turn, the last leads to a lower sensitivity to rounding errors, an more economical hardware implementation, and most importantly, to an improved overall robustness of GLR PNNs.

Besides this improvement, in the present work the GLR PNN architecture evolves further allowing independence of the number of past inputs L and past outputs N considered in the hidden recurrent layer linkage. Thus, the earlier works [10], [11] can be considered as two special cases of the generalized architecture: for L=0 and L=N, respectively. The aforementioned generalization of the GLR PNN architecture adds a new degree of freedom into the hands of researchers, and therefore contribute to improved flexibility and applicability to a wider range of classification tasks.

Another important development that we bring forward in the present work, when compared to [11], is the amended error function, which is object of minimization during the recurrent layer training. The error function was modified in a manner to seek for specific predefined balance of training among the classes, which guarantees a better steering of the learning rate for each and every class, and a better customization of the individual classes.

The layout of the present article is described as follows: The theoretical foundations of the original PNN are briefly discussed in Section 2. In Section 3, we present the updated architecture of the GLR PNN, and in Section 4, a modification of the training method is proposed. A brief description of the speaker verification (SV) task and the specific form of the GLR PNN when involved in this task is presented in Section 5. In addition, Section 5 outlines our SV system, referred to WCL-1, and discusses some measures for assessment of the SV performance. Section 6 describes the PolyCost speaker recognition database. Next, in Section 7 the experimental setup is discussed and comparative results on the task of text-independent SV are presented. Specifically, firstly the efficiency of various differential evolution (DE) operators for training the GLR PNNs recurrent layer is studied. Then, the ability of the GLR PNN to exploit correlations in the input data, for several values of the recurrence depth N, is investigated. A comparative evaluation of the GLR PNNs performance with that of the LR PNNs one, and with other locally recurrent structures, like the DRNNs, IIRs and FIR's ones, is performed. Finally, results from a comparative evaluation of the GLR PNN with respect to the original PNN, as well as to a Gaussian mixture models (GMMs)—based classifier, are reported. The article ends with concluding remarks.

Section snippets

Theoretical foundations of the PNN

In Fig. 1, the general structure of a PNN for classification in K classes, is illustrated. As presented in the figure, the first layer of the PNN, designated as an input layer, accepts the input vectors to be classified. The nodes of the second layer, which is designated as a pattern layer, are grouped in K groups depending on the class ki they belong. These nodes, also referred to as pattern units or kernels, are connected to all inputs of the first layer. Although numerous probability density

The generalized locally recurrent PNN architecture

Although there exist numerous improved versions of the original PNN, which are either more economical, or exhibit a significantly superior performance, for simplicity of exposition, we adopt the original PNN as a starting point for introducing the GLR PNN architecture. The development of the PNN architecture that we propose in the present section does not interfere with the aforementioned enhancements, and therefore, it can be applied straightforwardly to the more advanced PNNs.

The GLR PNN is

The GLR PNN training

A three-step training procedure for the GLR PNN is proposed. By analogy to the original PNN, the first training step creates the actual topology of the network. Specifically, in the first hidden layer, a pattern unit for each training vector is created by setting its weight vector equal to the corresponding training vector. The outputs of the pattern units associated with the class ki are then connected to the corresponding summation units of the second hidden layer neurons. The number of

The SV task

The SV process, based on an identity claim and a sample of speaker's voice, provides an answer to the unambiguous question: “Is the present speaker the one s/he claims to be, or not?” The output of the verification process is a binary decision “Yes, s/he is !” or “No, s/he is not !”. The actual decision depends on the degree of similarity between the speech sample and a predefined model for the enrolled user, whose identity the speaker claims. When an enrolled user claims his own identity, we

Speaker recognition database

Our experimentations (see Section 7) are based on the well-known PolyCost speaker recognition corpus [14]. In the present work, we use version v1.0 with bugs from 1 to 5 fixed.

PolyCost is comprised of real-world telephone quality speech recordings (English spoken by non-native speakers) collected across the international land-based telephone networks of Europe. The speech data are representative for research related to telephone-driven voice services and automated call-centers.

The database

Experiments and results

The WCL-1 system outlined in Section 5.2 is used as a platform to compare the performance of various classifiers. Initially, we study several variation operators for training the GLR PNN and the influence of the recurrence depth N over the SV performance. Next, a comparison between the distribution of output scores for the GLR PNN and original PNN is performed. Subsequently, the performance of the GLR PNN is contrasted to the one obtained by substituting the fully connected recurrent layer with

Conclusion

Introducing the generalized locally recurrent PNN, we extended the traditional PNN architecture to exploit the temporal correlation among the features extracted from successive speech frames. When compared to earlier work, a further development of the GLR PNN architecture and a revised training method was presented. Comparative experimental results for text-independent speaker verification (SV) confirmed the practical value of the proposed GLR PNN. For both male and female speakers, a better

Todor D. Ganchev received his Diploma Engineer degree in Electrical Engineering from the Technical University of Varna, Bulgaria, in 1993. From February 1994 to August 2000, he was with Technical University of Varna, where he consequently occupied engineering, research, and teaching staff positions. During the period, his research activities were mainly in the area of low-bit-rate speech coding. Since September 2000, he is with the Wire Communications Laboratory, University of Patras, Greece.

References (35)

  • M. Berthold et al.

    Constructive training of probabilistic neural networks

    Neurocomputing

    (1998)
  • J. Hennebert et al.

    Polycost: a telephone-speech database for speaker recognition

    Speech Commun.

    (2000)
  • D.F. Specht

    Probabilistic neural networks

    Neural Networks

    (1990)
  • A.D. Back et al.

    FIR and IIR synapses, a new neural network architecture for time series modeling

    Neural Comput.

    (1991)
  • T. Bayes

    An essay towards solving a problem in the doctrine of chances

    Philos. Trans. R. Soc. London

    (1763)
  • T. Cacoullos

    Estimation of multivariate density

    Ann. Instit. Statist. Math.

    (1966)
  • P. Campolucci et al.

    On-line learning algorithms for locally recurrent neural networks

    IEEE Trans. Neural Networks

    (1999)
  • B.J. Cain, Improved probabilistic neural networks and its performance relative to the other models, in: Proceedings...
  • T. Ganchev, N. Fakotakis, G. Kokkinakis, Text-independent speaker verification based on probabilistic neural networks,...
  • T. Ganchev, N. Fakotakis, G. Kokkinakis, Comparative evaluation of various MFCC implementations on the speaker...
  • T. Ganchev et al.

    Text-independent speaker verification for real fast-varying noisy environments

    Int. J. Speech Technol.

    (2004)
  • T. Ganchev, D.K. Tasoulis, M.N. Vrahatis, N. Fakotakis, Locally recurrent probabilistic neural networks for text...
  • T. Ganchev, D.K. Tasoulis, M.N. Vrahatis, N. Fakotakis, Generalized locally recurrent probabilistic neural networks for...
  • V.L. Georgiou, N.G. Pavlidis, K.E. Parsopoulos, Ph.D. Alevizos, M.N. Vrahatis, Optimizing the performance of...
  • J.A. Hartigan et al.

    A k-means clustering algorithm

    Appl. Statist.

    (1979)
  • T. Jan et al.

    Financial prediction using modified probabilistic learning network with embedded local linear model

  • C.C. Ku et al.

    Diagonal recurrent neural networks for dynamic system control

    IEEE Trans. Neural Network

    (1995)
  • Cited by (26)

    • Using an unsupervised approach of Probabilistic Neural Network (PNN) for land use classification from multitemporal satellite images

      2015, Applied Soft Computing
      Citation Excerpt :

      In addition, it is considered as an implementation of the Bayes optimal decision rule in the NN form based on nearest neighbor classifiers [36,37]. Several recent studies [4,8,38–46] used PNN for classification and showed that this method provides satisfactory results if the initial target classes are defined correctly. In this way, finding the basis function centers (classes) with their appropriate number is an important step to achieve suitable classification.

    • Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers

      2015, Applied Soft Computing Journal
      Citation Excerpt :

      We would like to note here that the proposed method had a 100% recognition rate on the training data. The results are compared to two established methods in literature: MFCC [32] with ANN referred to as MFCC-NN, and LPC with ANN referred to as LPC-NN [33]. The results are tabulated in Table 3.

    • Neural network and wavelet average framing percentage energy for atrial fibrillation classification

      2014, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      On the other hand, the possibility of working in an unsupervised training mode makes the system work online. This is easier for implementation as well as giving PNN the ability to provide the confidence in its decision that follows directly from the Bayes theorem [15,29]. Although this process doesn’t affect the system performance, it will offer speedy processing as well as performing in a time efficient manner.

    • Power quality disturbances classification based on S-transform and probabilistic neural network

      2012, Neurocomputing
      Citation Excerpt :

      Unlike other neural networks, PNNs do not require a learning process or initial weights. In PNNs, there is no relationship between the learning process and the recalling process [11–14]. The spatial complexity (i.e., the amount of memory) of the PNN is O((h+1)i), and the time complexity given by the parallel implementation is O(1), which is severe for hardware applications because the input vector's dimension i and the number of training samples d can be quite large.

    • Average framing linear prediction coding with wavelet transform for text-independent speaker identification system

      2012, Computers and Electrical Engineering
      Citation Excerpt :

      In this work, we use the PNN for the speaker feature vectors classification. The essential motivation of such choice is the possibility of working in an unsupervised training mode that makes the system work online, which is easier for implementation as well as giving PNN the ability to provide the confidence in its decision that follows directly from the Bayes’ theorem [57,35]. Although this process does not affect the system performance, in addition, it will offer a speedy process as well as performing in real timely manner.

    View all citing articles on Scopus

    Todor D. Ganchev received his Diploma Engineer degree in Electrical Engineering from the Technical University of Varna, Bulgaria, in 1993. From February 1994 to August 2000, he was with Technical University of Varna, where he consequently occupied engineering, research, and teaching staff positions. During the period, his research activities were mainly in the area of low-bit-rate speech coding. Since September 2000, he is with the Wire Communications Laboratory, University of Patras, Greece. In 2005 he received his Ph.D. degree in the area of Speaker Recognition. Presently, he is a post-doctoral researcher at the same laboratory. His current research interests include Speech Processing, Neural Networks, and Differential Evolution.

    Dimitris K. Tasoulis received his Degree in Mathematics form the Department of Mathematics, University of Patras, Greece in 2000. He is currently a post-graduate student in the post-graduate course “Mathematics of Computers and Decision Making” from which he was awarded a postgraduate fellowship. His research activities focus on Unsupervised Clustering, Neural Networks, Data-Mining and Applications. He was a Visiting Research Fellow at the INRIA, Sophia-Antipolis, France, in 2003. He is co-author of more than 50 publications (12 of which are published in international refereed journals). His research publications have received more than 40 citations.

    Michael N. Vrahatis is a Professor at the Department of Mathematics, University of Patras, Greece, since August 2000. He received the Diploma and Ph.D. degrees in Mathematics from the University of Patras, in 1978 and 1982, respectively. He is the author or (co-author) of more than 270 publications (more than 120 of which are published in international refereed journals) in his research areas, including computational mathematics, optimization, neural networks, evolutionary algorithms, data mining, and artificial intelligence. His research publications have received more than 1000 citations. He has been a principal investigator of several research grants from the European Union, the Hellenic Ministry of Education and Religious Affairs, and the Hellenic Ministry of Industry, Energy, and Technology. He is among the founders of the “University of Patras Artificial Intelligence Research Center (UPAIRC)”, established in 1997, where currently he serves as director. He is the founder of the “Computational Intelligence Laboratory (CI Lab)”, established in 2004 at the Department of Mathematics of University of Patras, where currently he serves as director.

    Nikos D. Fakotakis received the B.Sc. degree in Electronics from the University of London (UK) in 1978, the M.Sc. degree in Electronics from the University of Wales (UK), and the Ph.D. degree in Speech Processing from the University of Patras, Greece, in 1986. From 1986 to 1992 he was lecturer in the Electrical and Computer Engineering Department of the University of Patras, from 1992 to 1999 Assistant Professor, from 2000 to 2003 Associate Professor, and since 2004 he has been Professor in the area of Speech and Natural Language Processing and Head of the Speech and Language Processing Group at the Wire Communications Laboratory. He is author of over 200 publications in the area of Signal, Speech and Natural Language Processing. His current research interests include Speech Recognition/Understanding, Speaker Recognition, Speech Modeling, Spoken Dialogue Processing, Natural Language Processing and Optical Character Recognition. Dr. Fakotakis is a member of the Executive Board of ELSNET (European Language and Speech Network of Excellence), Editor-in-Chief of the “European Student Journal on Language and Speech”, WEB-SLS.

    View full text