Elsevier

Expert Systems with Applications

Volume 127, 1 August 2019, Pages 157-166
Expert Systems with Applications

Credit risk modeling using Bayesian network with a latent variable

https://doi.org/10.1016/j.eswa.2019.03.014Get rights and content

Highlights

  • Discrete Bayesian Networks with latent variable is introduced.

  • A full procedure parameters and structure learning is provided.

  • Credit risk is modeled using the proposed Bayesian network.

Abstract

Credit risk assessment is an important task for the implementation of the bank policies and commercial strategies. In this paper, we used a discrete Bayesian network with a latent variable to model the payment default of loans subscribers. The proposed Bayesian network includes a built-in clustering feature. A full procedure for learning its parameters, based on a customized Expectation-Maximization algorithm was provided. This model allows evaluating the payment default probability taking into account several factors and handling a multi-class situation. Relying on a real data set describing loans contracts, we calibrated the model and performed several analyses. The obtained results highlight a regime switching of the default probability distribution: Two classes were determined showing a change in credit risk profiles.

Introduction

The banking system is crucially affected by the credit risk which may lead to economic stagnation worldwide (Nkusu, 2011). Known as credit crisis, the 2007 sub-prime mortgage crisis had a significant effect on the economy as it predominantly triggered the global financial crisis of 2008 (Longstaff, 2010). To control the credit risk, banks have used both qualitative and quantitative methods in order to minimize households’ payment defaults. To this end, many credit scoring procedures have been adopted to evaluate and analyze the credit risk. According to Thomas, Edelman, and Crook (2002), credit scoring is a set of decision models and techniques which allow lenders to appropriately select their customers. In this context, many methodologies have been developed (García, Marqués, & Sánchez, 2015) such as the statistical methods (Hand & Henley, 1997) and the artificial intelligence methods (Lessmann, Baesens, Seow, Thomas, 2015, Louzada, Ara, Fernandes, 2016). The statistical methods include the linear discriminant analysis (Altman, 1968) and the logistic regression (Abid, Masmoudi, & Zouari-Ghorbel, 2016) which are popular credit scoring techniques thanks to their accuracy and easy implementation (Lessmann et al., 2015). To illustrate the artificial intelligence techniques, we can cite the support vector machines (Harris, 2015, Tomczak, Zieba, 2015), artificial neural networks (Zhao et al., 2015), decision trees (Bijak & Thomas, 2012) and Bayesian networks (Pearl, 1988).

A Bayesian Network (BN) is a graphical representation of a probabilistic model that encodes a set of conditional independence relationships (Ghribi, Masmoudi, 2013, Pearl). It has become a popular tool for decision making systems in various fields such as biology (Hassen, Masmoudi, & Rebai, 2008), computer science (Bouchaala, Masmoudi, Gargouri, & Rebai, 2010) and finance (Abid, Zaghdene, Masmoudi, & Ghorbel, 2017). Indeed, the BNs are one of the most comprehensive and consistent formalisms for the acquisition and modeling of complex systems outperforming the logistic regression in terms of diagnostic prediction (Gevaert et al., 2006).

Based on credit worthiness, the authors of Abid et al. (2017) used a discrete BN model for personal loans prediction and classification. They set up the conditional relationships between the factors affecting the credit risk and used the calibrated conditional probability tables to analyze the payment default causes and effects.

In this paper, we introduced a new discrete BN model containing a latent variable that affects all the other observable variables. While the BN structure models the probabilistic relationships between factors leading to credit default payment, the latent variable allows representing different classes of probability distributions. A full procedure for learning this model was proposed relying on a customized Expectation Maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977). The proposed model was used to evaluate credit risk and cluster loans subscribers enabling a deeper analysis of customers’ payment defaults.

The remaining of this paper was structured as follows: Section 2 detailed the previous studies related to the topic. Section 3 described the discrete BN with a latent variable and the proposed procedure for learning this class of BNs using a customized EM algorithm. The proposed method was applied in the context of loans classification and credit risk evaluation in Section 4. Finally, our main conclusions were drawn in the ultimate section.

Section snippets

Related work

The credit risk and bankruptcy prediction were extensively studied over the recent years. Various models and techniques were employed in these studies in the context of risk evaluation and debtors classification. For instance, Danenas and Garsva (2015) introduced a new approach based on linear SVM combined with external evaluation and sliding window testing. Their method addresses the imbalanced classes issue and is suitable for large data sets. They showed that their method provides equivalent

Discrete Bayesian network with a latent variable

A BN consists of a directed acyclic graph (DAG) and a set of associated conditional probability distributions. The DAG reflects a set of conditional independence relationships between a set of variables (nodes). A finite Discrete BN B=(G,P) is a BN whose nodes are discrete random variables (X1,,Xd) taking a finite number of values. In this paper, the following notations were used:

  • d denotes the number of variables in the BN.

  • Each node Xi takes ri possible values encoded as 1,2,,ri.

  • The parents

Application: loans subscriber classification

In this section, the proposed BN, described in Section 3, was used to model the credit risk and cluster the loans subscribers. This model evaluates the credit default probability taking into account several explanatory variables and a classifying latent variable C. The resulting model presents a default probability distribution with several regimes or classes. These regimes correspond for example to different market conditions (depending on economic and political environment). They can also

Discussion and conclusion

In this paper, we described the BN with a latent variable and proposed a procedure for its calibration. This model was used to evaluate the payment default probability of loans subscribers. The calibrated model takes into account several risk factors including clients attributes (Age, job,...) and contract characteristics (amount, duration,...). It also describes the relationships between these factors relying on their probabilistic conditional dependencies. Finally, the loan contracts can be

References (37)

  • C.-K. Kwoh et al.

    Using hidden nodes in Bayesian networks

    Artificial Intelligence

    (1996)
  • S. Lessmann et al.

    Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research

    European Journal of Operational Research

    (2015)
  • F.A. Longstaff

    The subprime credit crisis and contagion in financial markets

    Journal of Financial Economics

    (2010)
  • C. Luo et al.

    A deep learning approach for credit scoring using credit default swaps

    Engineering Applications of Artificial Intelligence

    (2017)
  • N. Mselmi et al.

    Financial distress prediction: The case of french small and medium-sized firms

    International Review of Financial Analysis

    (2017)
  • M.R. Sousa et al.

    A new dynamic modeling framework for credit risk assessment

    Expert Systems with Applications

    (2016)
  • M. Tavana et al.

    An artificial neural network and Bayesian network model for liquidity risk assessment in banking

    Neurocomputing

    (2018)
  • J.M. Tomczak et al.

    Classification restricted Boltzmann machine for comprehensible credit scoring model

    Expert Systems with Applications

    (2015)
  • Cited by (59)

    • Boosting credit risk models

      2023, British Accounting Review
    • ACGAN and BN based method for downhole incident diagnosis during the drilling process with small sample data size

      2022, Ocean Engineering
      Citation Excerpt :

      After the data are discretized, there are three levels: (low) L, (medium) M, and (high) H. Three discretization methods are used in this paper to build BNs. The EM algorithm consists of two main steps (Masmoudi et al., 2019). Step E: Expectation calculation, where the affiliation probability of each variable with respect to each cluster is calculated based on the current parameters.

    View all citing articles on Scopus
    View full text