Prototype generation method using a growing self-organizing map applied to the banking sector

Ruiz-Moreno, Sara; Núñez-Reyes, Amparo; García-Cantalapiedra, Adrián; Pavón, Fernando

doi:10.1007/s00521-023-08630-w

Prototype generation method using a growing self-organizing map applied to the banking sector

Original Article
Open access
Published: 12 May 2023

Volume 35, pages 17579–17597, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Prototype generation method using a growing self-organizing map applied to the banking sector

Download PDF

Sara Ruiz-Moreno ORCID: orcid.org/0000-0001-7632-8211¹,
Amparo Núñez-Reyes¹,
Adrián García-Cantalapiedra² &
…
Fernando Pavón²

794 Accesses
1 Citation
Explore all metrics

Abstract

In fields like security risk analysis, Fast Moving Consumer Goods, Internet of Things, or the banking sector, it is necessary to deal with large datasets containing a great list of variables. In these situations, the analysis becomes intricate and computationally expensive, so data reduction techniques play an important role. Prototype generation methods provide a reduced dataset with the same properties as the original. GSOMs (growing self-organizing maps) reduce the data size without the need for prefixing the number of neurons needed to represent the input space. To the best of the authors’ knowledge, this is the first time that the GSOM is applied for reduction and generation of prototypes, posing an advantage over their predecessors, the SOMs (self-organizing maps), which do not have the automatic growth feature. This work addresses the use of a GSOM to reduce the number of prototypes to use in a 1-NN (1 nearest neighbor) classifier. The proposed methodology is applied to an income dataset for testing and a large bank dataset that contain classifications into two different groups. The 1-NN classifier is used to obtain predictions using the nodes of the GSOM as prototypes. This article demonstrates that GSOMs save a significant amount of time in obtaining nearly the same validation results as SOMs by comparing the classifications obtained in the bank dataset. The results show data reductions of more than 99%, and accuracies greater than 80% for the income dataset and 74% for the bank dataset.

Supervised Classification Algorithms in Machine Learning: A Survey and Review

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Machine Learning: A Review of the Algorithms and Its Applications

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, the development of new technologies and computing systems gives humans the ability to collect large amounts of data. The more data we have on a system, the better representation we obtain of it, and this will give us greater capacity to extract information and describe it [1]. The problem arises when, in real-world applications, the amount of data needed is too large. For instance, in banking applications, where there are millions of clients, with many different behaviors and operations, the computational resources needed are unaffordable [2]. In this paper, the results for a bank with more than 2 million customers will be presented.

One of the most thrilling problems in the banking sector is the classification of customers to detect unusual behaviors. The k-nearest neighbors (k-NN) technique is widely used for recognition tasks. It consists of obtaining a representative dataset formed by prototypes for which the classes are known. When new data are fed to the algorithm, the class corresponding to its k nearest neighbors is assigned within the prototype space. It is one of the most useful algorithms in data mining, but it has the drawbacks of high storage requirements, low efficiency, and low noise tolerance, especially in the case of 1-NN, since it gives importance to all data [3]. One way to improve the performance of the k-NN classifier is to use data reduction techniques, more specifically, prototype reduction, which can be achieved by either prototype selection or generation.

Prototype selection consists of choosing some representative instances from the dataset to use them as prototypes. The objective is to obtain a training set of smaller size than the original one, but with the same characteristics, achieving similar or higher classification accuracy for new data [3]. On the other hand, prototype generation consists of creating new training data that represent the previous one with the same objective as prototype selection.

The self-organizing map, also called Kohonen map after the first person who described it as a neural network, is an unsupervised method that organizes data and also provides a low-dimensional representation employing competitive learning [4,5,6,7]. This algorithm consists of a non-linear, ordered, and smooth assignation of high-dimensional input data to a low-dimensional matrix–typically 2D—by assigning weights to each node of its topology. Since a SOM organizes the data in groups, it allows for a better understanding of the input space and is very useful in applications like data mining or pattern recognition. A SOM can be understood in two different ways: considering the output as the low-dimensional map gives a dimensional reduction and can be used as a clustering technique; considering the output as the weights gives a data reduction in which each weight conforms a new prototype. A drawback of the SOMs is that the number of neurons must be pre-specified before the training phase. A low number of nodes leads to general information and a relatively straightforward explanation of the data, whereas more nodes allow for more detailed information. The growing self-organizing map (GSOM) addresses this issue by creating new neurons during the training process so that it can adapt its structure [8]. This, along with a high reduction in training times [9], is a great advantage over the SOM. This work addresses the problem of data reduction for k-NN using a GSOM.

The main contribution of this work is the implementation of a GSOM for prototype generation in large datasets. To the best of the authors’ knowledge, this is a novel approach for prototype generation that has not been previously published. The main objective of the work is to reduce the number of instances in large datasets without compromising the accuracy. Moreover, a real dataset with 350000 clients was used. The GSOM is applied over the Census Income dataset and a bank dataset to obtain a representative space of the data used afterward in a 1-NN classification problem. This way, it is easier to distinguish patterns of behavior in clients and make predictions. The advantages of this method are the decrease in time needed to apply the 1-NN algorithm and the use of a machine learning technique that allows a visual understanding of the problem. To enhance the adaptation of the network, we have added the use of several iterations of the algorithm in each of its phases. This aspect is pre-selected.

This document organizes as follows. Section two provides a literature review and discusses some related work. In section three, the proposed methodology is defined and the techniques are described, as well as the evaluation and visualization metrics. The experimental setup is outlined in section four, showing the results obtained for two different datasets. Next, a discussion is provided in section five and, finally, section six extracts some conclusions from the work.

2 Literature overview

2.1 k-nearest neighbors

The k-NN classifier is an algorithm that assigns a class to an input vector based on the classes of the k nearest prototypes in the input space. It is a well-known technique used in many different applications. For example, Konieczny and Stojek [10] use a k-NN classifier to classify the wear condition of a pump, Santos Ruiz et al. [11] apply the k-NN algorithm to locate leaks in water distribution networks and Tharwat et al. [12] classify human activities with information from smart devices.

Despite their wide use and ease of application, k-NN classifiers have low efficiency, low noise tolerance, and high data storage requirements. To overcome these problems, there are different approaches [13]. Some of them are based on modifying the parameters and equations of the k-NN, for example, by selecting the optimal value of k [14] or doing it adaptively [15], combining with genetic algorithms in the selection of neighbors [16], designing distance functions [17] or using frameworks prepared for big data applications [18]. Other techniques include reducing the number of prototypes, either by prototype selection or by prototype generation.

2.2 Prototype reduction

There are many approaches to prototype selection. Rosero-Montalvo et al. [19] present an analysis of neighborhood criterion using the condensed neighbor algorithm to eliminate redundant data. The work by Suyal and Singh [20] approaches the problem of prototype selection by using multi-label k-NN. Gurumoorthy et al. [21] propose a framework for prototype selection based on optimal transport and compare it with other methods evaluating it with a 1-NN classifier on several datasets. The study by Kasemtaweechok and Suwannik [22] proposes a technique based on geometric median and compares it with different methods that show high accuracy with low times.

Prototype generation, on the other hand, also has various applications in literature. Triguero et al. [23] classify and compare different types of prototype generation algorithms applied to k-NN classifiers. Ougiaroglou et al. [24] use an algorithm based on reduction through homogeneous clustering for multilabel k-NN classification. Elkano et al. [25] propose the use of a new distributed MapReduce method called CHI-PG for big data classification problems. Few works in the literature address the problem of prototype reduction by using a SOM. For instance, Lechevallier and Ciampi [26] integrate a SOM with other clustering methods and apply it to nutritional data and, with regard to financial applications, Sarlin and Peltonen [27] use a SOM for data and dimensionality reduction to monitor vulnerabilities and map the state of financial stability.

2.3 Self-organizing maps

The SOM and all its variations have great potential in applications where it is necessary to analyze great amounts of data, and it becomes extremely expensive computationally. For that reason, it can be used in many preprocessing tasks in areas such as data mining: data and dimensionality reduction, master and multiple curves approximation, clustering, and classification. Self-organizing maps have also been widely used in cybersecurity since ten years after the appearance of the algorithm [28]. Ichimura et al. [29] use a SOM and automatically defined groups to obtain the distribution of spam email and classification that would serve to adjust email filtering. Sarkar et al. [30] use a k-means algorithm combined with SOM to extract patterns in accidents at work. Christyawan et al. [31] propose the use of a type of GSOM with a clustering reference vector for an intrusion detection system. Since the 1990 s, SOMs have been used in financial applications [32]. Regarding performance analysis, Shanmuganathan [33] states that the SOM is a useful tool to examine the returns and measures implemented in financial sectors. Concerning financial crisis monitoring, López Iturriaga and Pastor Sanz [34] use neural networks based on SOMs to compare macroeconomic imbalances in European countries. In these fields, SOMs allow a visual representation of the data and their relationship, which leads to a better understanding of its triggering factors. Barman and Chowdhury [35] use a SOM and a minimum spanning tree for customer segmentation. In the field of fraud detection, Quah and Sriganesh [36] analyze the behavior of credit card customers in real time to find hidden patterns without the need for previous information, and Balasupramanian et al. [37] propose a fraud detection and prevention method that uses a SOM to detect patterns. Ganegedara and Alahakoon [38] address the use of parallel GSOM and propose a method to reduce redundant neurons. Studies like the work by Kuo et al. [39] prove that GSOMs perform computationally better than SOMs, in this case, combined with bee colony optimization.

3 Techniques applied

The outline of this work is presented in figure 1. The first step is to train the neural network with a three-phase process [9] over a training set of data. After that, different error measures are calculated, as well as the U-matrix, which allows a visual explanation of the relationships between neurons. Finally, we use the GSOM as prototypes for a one-neighbor k-NN in order to classify the clients and assess the neural network’s capacity to represent the input space.

3.1 GSOM

This subsection aims to describe the GSOM and the metrics used in this paper to evaluate it. The general idea of a self-organizing map consists of a network with a given topology that adapts the weights of its nodes as it receives input data without losing topological properties. A neighborhood accomplishes the adaptation of the weights to the net so that the arrival of new data during the training process will lead to a major adaptation in those neurons whose weights are closer to these data. A two-classes dataset with 700 vectors is used for exemplification purposes, obtaining a GSOM with 88 nodes.

The growing self-organizing map includes the possibility of automatically modifying the size of the network, which allows the use of the number of neurons necessary for each application without the need for it to be fixed beforehand.

The GSOM training process is as follows [9]. Initially, there is a small net, generally composed of four nodes or neurons, each with a weight vector of the same size as the input data that will fit in the input space. In SOMs and GSOMs, as new data arrive, the neuron with the greatest activation–this is generally calculated as the lowest Euclidean distance between the weights and the input data–will be elected as the winner (best matching unit or BMU), as shown in Fig. 2. With a decreasing learning rate, this weight vector will be modified to resemble the input data. This learning rate can be linear, potential, or inverse time. Similarly, the nearest nodes to the BMU weights will adapt too by a space-time neighborhood function–the most common is the Gaussian function. The adaptation is usually expressed as:

$$\begin{aligned} w_j(k+1) = w_j(k) + {LR(k)} \epsilon (k)\left( v(k) - w_j(k) \right) \end{aligned}$$

(1)

where k denotes the current learning epoch, j is the node, LR(k) is the learning rate, $\epsilon (k)$ is the neighborhood, and v(k) represents each element of the input data. This learning procedure leads to an ordered topological mapping of the represented input data.

The learning rate LR is calculated from the following equation, where $\alpha$ and R are parameters that must be pre-selected and n(k) is the number of nodes.

$$\begin{aligned} LR(k) = LR (k-1) \cdot \alpha \cdot \left( 1 - \frac{R}{n(k)}\right) \end{aligned}$$

(2)

In a GSOM [40], it is required to keep track of the accumulated error so that when a given threshold is reached, it will be necessary to expand the number of nodes. The growth of the map GT is controlled by the spread factor SF as given by the following equation, where D is the dimension of the data.

$$\begin{aligned} GT = - D \cdot ln({SF}) \end{aligned}$$

(3)

As the network receives new data, the maximum error of the nodes is calculated. If it reaches GT, new nodes will be added around the winner. When a node not belonging to the frontier is selected for growing, the error of its neighbors increases by the distribution factor FD, as in Eq. 5, with the computation of the BMU given by Eq. 4.

$$\begin{aligned} BMU(k)= & {} \arg \min _j{||v(k)-w_j(k) ||,\;\;\;\; j = 1,...,n(k)} \end{aligned}$$

(4)

$$\begin{aligned} E= & {} \left\{ \begin{matrix} GT /2 &{} \text{ if } &{} \text{ node } \text{ is } \text{ BMU } \\ E(1+FD) &{} \text{ if } &{} \text{ node } \text{ is } \text{ neighbor } \text{ of } \text{ BMU } \end{matrix}\right. \end{aligned}$$

(5)

The neighborhood function is given by Eq. 6, where $d_t$ is the topological distance between each neuron j and the winner BMU, and $\sigma$ represents the selection of neighbors and decreases each epoch a given value $\Delta \sigma$.

$$\begin{aligned} \epsilon = e(k)^{-\frac{d_t(j,BMU(k))}{2\sigma ^2}} \end{aligned}$$

(6)

For a major control over the neighborhood, we added the possibility of selecting the value of $\sigma$ with a certain number of iterations. The decrease $\Delta \sigma$ is generally selected directly, but this is not as intuitive when it comes to parameter tuning. This work will fix it through the number of iterations in each phase of the algorithm so that the increment will be equal to an initial value of $\sigma$ minus a minimum value divided by the number of total iterations N.

$$\begin{aligned} \Delta \sigma = \frac{\sigma _{ini}-\sigma _{min}}{N} \end{aligned}$$

(7)

This process is accomplished in three phases: initialization, growing and smooth. The last one consists of refining the weights of the nodes once the net has reached its definite size. This is detailed in algorithm 1 [9].

Several tests with different network topologies (squared and hexagonal) have been conducted. In squared networks, a neuron can have up to four neighbors, while the number of neighbors may reach six in the hexagonal topologies. Thus, when new nodes are added to the net, the growth will be more intense in the case of hexagonal topologies.

3.1.1 U-matrix

The U-matrix (Unified Distance Matrix) is a bidimensional representation of the neurons in the input space to visualize the distances between them. The mean distance between the weight vector and its neighbors is calculated for each neuron, and a color is associated. The nodes whose colors correspond to lower values will have more similar weights [41].

The self-organizing maps allow a bidimensional representation of the data space that will be able to be visualized through the U-matrix. This work employs a GSOM representation with a U-matrix intending to facilitate its comprehension and favor the following analysis process.

3.1.2 Evaluation

Three tools have been employed to compare the performance of the various networks obtained: standard deviation, quantification error, and simplicity. The first one (stdNodes, in Eq. 8) collects the mean standard deviation of the distances between the input data and the associated BMUs. The quantification error or heterogeneity (quantE, in Eq. 9) provides a measure of the adjustment of neurons to training vectors using the mean of the distances to the corresponding winning node [42]. Simplicity [43], in Eq. 10, is the accumulated neighborhood distance. It measures the network expansion and collects the sum of the distance between the BMUs and their neighbors. Also, the percentage of prototype reduction will be computed.

$$\begin{aligned} stdNodes = \frac{1}{n(N)} \sum _{j=1}^{n(N)} { s_{X(j)}(||w_j(N) - v(N) ||)} \end{aligned}$$

(8)

with s the standard deviation and $X(j) = \lbrace x: j = BMU(x) \rbrace$

$$\begin{aligned} quantE= & {} \frac{1}{N_p} \sum _{i=1}^{N_p}{||v(i) - w_{BMU(v(i))}||} \end{aligned}$$

(9)

$$\begin{aligned} simplicity= & {} \sum _{i: X(i)\ne \emptyset }{\sum _{j: j\; neighbor\; of\; i}{||w_i-w_j||} } \end{aligned}$$

(10)

3.1.3 Illustrative examples

GSOM alone can be used for classification and generation of new neurons presenting the initial topology. To validate the proposed methodology and visually check the adaptation of the net, two preliminary experiments are presented. The GSOM features shown in the examples (ring and letter G) will be exploited as a prototype generation method for a k-NN classifier in the results section.

The use of the U-matrix for classification is exemplified in Figs. 3 and 4. They show, respectively, the weight vectors and U-matrix of a GSOM applied to a 700 vectors dataset with two features (x,y) in which two classes (z-axis) are distinguished. Neurons with blue tones are associated with minor distances to neighboring nodes, and reddish tones associate with major distances. The U-matrix allows one to visually distinguish both classes, since orange neurons constitute a frontier between them.

The second example was conducted on a set of tridimensional data distributed in the space and forming a figure. The dataset is conformed by points randomly scattered building a letter G. It has been divided into three subsets (s1, s2 and s3) that contain 1000, 10000 and 20000 data, respectively. The variables are x, y and z. This example shows the advantages of GSOMs since they can generate the number of nodes necessary to adapt to datasets of different sizes without the need to know it beforehand. Not necessarily a set with more data must be represented with a larger map.

Table 1 shows the parameters used for training the different GSOMs. Id is the GSOM identifier (which starts with the subset identifier x, meaning 1, 2 or 3), type is the shape of the net (GRID means squared network and HEX means hexagonal network), $LR_{ini}^g$ is the initial learning rate in the growth phase and $LR_{ini}^s$ is the initial learning rate in the smooth phase. In all cases, it has been used an initial network of four nodes, a spread factor $SF=0.5$, a distribution factor $FD=0.1$, a value of $R=3.8$, [9], and initial and final $\sigma$ values of 1 y 0.1, respectively. Some of the GSOMs have been trained setting $\Delta \sigma$ and others fixing the number of iterations in each phase.

Table 1 Training parameters for each letter G GSOM, where sx indicates subsets 1, 2 or 3

Prototype generation method using a growing self-organizing map applied to the banking sector

Abstract

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

A Comprehensive Survey of Anomaly Detection Algorithms

Machine Learning: A Review of the Algorithms and Its Applications

1 Introduction

2 Literature overview

2.1 k-nearest neighbors

2.2 Prototype reduction

2.3 Self-organizing maps

3 Techniques applied

3.1 GSOM

3.1.1 U-matrix

3.1.2 Evaluation

3.1.3 Illustrative examples

3.2 k-NN

3.2.1 Evaluation

4 Application results

4.1 Census income results

4.1.1 GSOM and k-NN performance

4.2 Bank data results

4.2.1 GSOM and k-NN performance

5 Discussion

6 Conclusions

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation