Credit risk modeling using Bayesian network with a latent variable
Introduction
The banking system is crucially affected by the credit risk which may lead to economic stagnation worldwide (Nkusu, 2011). Known as credit crisis, the 2007 sub-prime mortgage crisis had a significant effect on the economy as it predominantly triggered the global financial crisis of 2008 (Longstaff, 2010). To control the credit risk, banks have used both qualitative and quantitative methods in order to minimize households’ payment defaults. To this end, many credit scoring procedures have been adopted to evaluate and analyze the credit risk. According to Thomas, Edelman, and Crook (2002), credit scoring is a set of decision models and techniques which allow lenders to appropriately select their customers. In this context, many methodologies have been developed (García, Marqués, & Sánchez, 2015) such as the statistical methods (Hand & Henley, 1997) and the artificial intelligence methods (Lessmann, Baesens, Seow, Thomas, 2015, Louzada, Ara, Fernandes, 2016). The statistical methods include the linear discriminant analysis (Altman, 1968) and the logistic regression (Abid, Masmoudi, & Zouari-Ghorbel, 2016) which are popular credit scoring techniques thanks to their accuracy and easy implementation (Lessmann et al., 2015). To illustrate the artificial intelligence techniques, we can cite the support vector machines (Harris, 2015, Tomczak, Zieba, 2015), artificial neural networks (Zhao et al., 2015), decision trees (Bijak & Thomas, 2012) and Bayesian networks (Pearl, 1988).
A Bayesian Network (BN) is a graphical representation of a probabilistic model that encodes a set of conditional independence relationships (Ghribi, Masmoudi, 2013, Pearl). It has become a popular tool for decision making systems in various fields such as biology (Hassen, Masmoudi, & Rebai, 2008), computer science (Bouchaala, Masmoudi, Gargouri, & Rebai, 2010) and finance (Abid, Zaghdene, Masmoudi, & Ghorbel, 2017). Indeed, the BNs are one of the most comprehensive and consistent formalisms for the acquisition and modeling of complex systems outperforming the logistic regression in terms of diagnostic prediction (Gevaert et al., 2006).
Based on credit worthiness, the authors of Abid et al. (2017) used a discrete BN model for personal loans prediction and classification. They set up the conditional relationships between the factors affecting the credit risk and used the calibrated conditional probability tables to analyze the payment default causes and effects.
In this paper, we introduced a new discrete BN model containing a latent variable that affects all the other observable variables. While the BN structure models the probabilistic relationships between factors leading to credit default payment, the latent variable allows representing different classes of probability distributions. A full procedure for learning this model was proposed relying on a customized Expectation Maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977). The proposed model was used to evaluate credit risk and cluster loans subscribers enabling a deeper analysis of customers’ payment defaults.
The remaining of this paper was structured as follows: Section 2 detailed the previous studies related to the topic. Section 3 described the discrete BN with a latent variable and the proposed procedure for learning this class of BNs using a customized EM algorithm. The proposed method was applied in the context of loans classification and credit risk evaluation in Section 4. Finally, our main conclusions were drawn in the ultimate section.
Section snippets
Related work
The credit risk and bankruptcy prediction were extensively studied over the recent years. Various models and techniques were employed in these studies in the context of risk evaluation and debtors classification. For instance, Danenas and Garsva (2015) introduced a new approach based on linear SVM combined with external evaluation and sliding window testing. Their method addresses the imbalanced classes issue and is suitable for large data sets. They showed that their method provides equivalent
Discrete Bayesian network with a latent variable
A BN consists of a directed acyclic graph (DAG) and a set of associated conditional probability distributions. The DAG reflects a set of conditional independence relationships between a set of variables (nodes). A finite Discrete BN is a BN whose nodes are discrete random variables taking a finite number of values. In this paper, the following notations were used:
- •
d denotes the number of variables in the BN.
- •
Each node Xi takes ri possible values encoded as .
- •
The parents
Application: loans subscriber classification
In this section, the proposed BN, described in Section 3, was used to model the credit risk and cluster the loans subscribers. This model evaluates the credit default probability taking into account several explanatory variables and a classifying latent variable C. The resulting model presents a default probability distribution with several regimes or classes. These regimes correspond for example to different market conditions (depending on economic and political environment). They can also
Discussion and conclusion
In this paper, we described the BN with a latent variable and proposed a procedure for its calibration. This model was used to evaluate the payment default probability of loans subscribers. The calibrated model takes into account several risk factors including clients attributes (Age, job,...) and contract characteristics (amount, duration,...). It also describes the relationships between these factors relying on their probabilistic conditional dependencies. Finally, the loan contracts can be
References (37)
- et al.
Innovative default prediction approach
Expert Systems with Applications
(2015) - et al.
Does segmentation always improve model performance in credit scoring?
Expert Systems with Applications
(2012) - et al.
Improving algorithms for structure learning in Bayesian networks using a new implicit score
Expert Systems with Applications
(2010) - et al.
Financial distress prediction using the hybrid associative memory with translation
Applied Soft Computing
(2016) - et al.
Selection of support vector machines based classifiers for credit risk domain
Expert Systems with Applications
(2015) - et al.
Predicting corporate bankruptcy using a self-organizing map: An empirical study to improve the forecasting horizon of a financial failure model
Decision Support Systems
(2011) - et al.
Prediction of financial distress: An empirical study of listed chinese companies using data mining
European Journal of Operational Research
(2015) - et al.
A compound poisson model for learning discrete Bayesian networks
Acta Mathematica Scientia
(2013) Credit scoring using the clustered support vector machine
Expert Systems with Applications
(2015)- et al.
Ranking evaluation of institutions based on a Bayesian network having a latent variable
Knowledge-Based Systems
(2013)
Using hidden nodes in Bayesian networks
Artificial Intelligence
Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research
European Journal of Operational Research
The subprime credit crisis and contagion in financial markets
Journal of Financial Economics
A deep learning approach for credit scoring using credit default swaps
Engineering Applications of Artificial Intelligence
Financial distress prediction: The case of french small and medium-sized firms
International Review of Financial Analysis
A new dynamic modeling framework for credit risk assessment
Expert Systems with Applications
An artificial neural network and Bayesian network model for liquidity risk assessment in banking
Neurocomputing
Classification restricted Boltzmann machine for comprehensible credit scoring model
Expert Systems with Applications
Cited by (59)
Investigating the beneficial impact of segmentation-based modelling for credit scoring
2024, Decision Support SystemsIntelligent attribution modeling for enhanced digital marketing performance
2024, Intelligent Systems with ApplicationsBoosting credit risk models
2023, British Accounting ReviewA conceptual design decision approach by integrating rough Bayesian network and game theory under uncertain behavior selections
2022, Expert Systems with ApplicationsACGAN and BN based method for downhole incident diagnosis during the drilling process with small sample data size
2022, Ocean EngineeringCitation Excerpt :After the data are discretized, there are three levels: (low) L, (medium) M, and (high) H. Three discretization methods are used in this paper to build BNs. The EM algorithm consists of two main steps (Masmoudi et al., 2019). Step E: Expectation calculation, where the affiliation probability of each variable with respect to each cluster is calculated based on the current parameters.