doi:10.1016/j.csda.2007.02.011
Copyright © 2007 Elsevier B.V. All rights reserved.
Using differential evolution to improve the accuracy of bank rating systems
aEVALife group, Department of Computer Science, University of Aarhus, Aabogade 34, 8200 Aarhus N, Denmark
bCEFIN Centro Studi Banca e Finanza, Department of Political Economics, University of Modena and Reggio E., viale Berengario 51, 41100 Modena, Italy
cIEMIF Istituto di Economia dei Mercati e degli Intermediari Finanziari, Bocconi University, Viale Isonzo 25, 20135 Milan, Italy
Available online 24 February 2007.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Credit rating is the evaluation of the likelihood of an obligor to default on a loan. Each obligor in the bank's credit portfolio is assigned to a certain rating class, or PD (probability of default) bucket; all obligors in a PD bucket then receive the same “pooled” PD, based on which a capital charge against credit risk must be computed. The only analytical approach to this problem is based on k-means and has some limitations in practice. An error minimization approach to credit rating using differential evolution (DE) is introduced. The performances of DE and other common search heuristics are compared using credit rating data of a major Italian bank. Empirical results show that DE is clearly superior compared to a genetic algorithm (GA), particle swarm optimization (PSO), random search (RS) and two naı¨ve partitioning approaches. Moreover, the proposed approach obtained better results than k-means in much less runtime for a simplified instance of the problem where within-groups variances can be used for clustering.
Keywords: Credit rating; PD bucket; Differential evolution; Clustering; Probability of default
Fig. 1. Sample distribution of the probability of default for the data set.
Fig. 2. Distribution of exposure relative frequency across probability of default deciles.
Fig. 3. Fitness landscape for 2 of the 6 dimensions of the within-groups variance error minimization problem for g=7 buckets using our bank client data near the optimal solution.
Fig. A1. DE, GA and PSO rate of convergence: number of generations versus mean best fitness value in 30 runs. g=15, fitness=expected loss.
Fig. A2. DE, GA and PSO rate of convergence: number of generations versus mean best fitness value in 30 runs. g=15, fitness=regulatory capital.
Table 1.
Parameters of the genetic algorithm (GA), particle swarm optimization (PSO) and differential evolution (DE)

Pop.Size to the population size and Num.Gen. to the number of generations. For the GA—pc: probability of crossover; pm: probability of mutation; sm: variance factor of mutation; elite size: number of the best candidate solutions in the population that are kept untouched by the GA operators. For the PSO—w: inertia weight; fmin,fmax: lower and upper bounds of the random weights. For the DE—CF: crossover factor; f: scaling factor.
Table 2.
Summary statistics for 30 repetitions with within-groups variance criterion (without constraints) for differential evolution (DE), genetic algorithms (GA), particle swarm optimization (PSO), random search (RS), k-means, naı¨ve approach with equal number of objects per bucket (Naı¨ve_EqObj), naı¨ve approach with equal sum of exposure per bucket (Naı¨ve_EqExp) for number of buckets g=7, 10 and 15

Best results in bold. Table columns from left to right report the names of the algorithms, the best values (the minimum values obtained in 30 repetitions), the mean values, the worst values (the maximum values obtained in 30 repetitions), the standard deviations, the 75th percentiles, the 90th percentiles, the number of times the best values occur out in 30 repetitions.
Table 3.
Summary statistics for 30 repetitions with expected loss criterion for differential evolution (DE), genetic algorithms (GA), particle swarm optimization (PSO), random search (RS), k-means, naı¨ve approach with equal number of objects per bucket (Naı¨ve_EqObj), naı¨ve approach with equal sum of exposure per bucket (Naı¨ve_EqExp) for number of buckets g=7, 10 and 15

Best results in bold. Table columns from left to right report the names of the algorithms, the best values (the minimum values obtained in 30 repetitions), the mean values, the worst values (the maximum values obtained in 30 repetitions), the standard deviations, the 75th percentiles, the 90th percentiles, the number of times the best values occur in 30 repetitions.
Table 4.
Summary statistics for 30 repetitions with regulatory capital criterion for differential evolution (DE), genetic algorithms (GA), particle swarm optimization (PSO), random search (RS), k-means, naı¨ve approach with equal number of objects per bucket (Naı¨ve_EqObj), naı¨ve approach with equal sum of exposure per bucket (Naı¨ve_EqExp) for number of buckets g=7, 10 and 15

Best results in bold. Table columns from left to right report the names of the algorithms, the best values (the minimum values obtained in 30 repetitions), the mean values, the worst values (the maximum values obtained in 30 repetitions), the standard deviations, the 75th percentiles, the 90th percentiles, the number of times the best values occur in 30 repetitions.
Table A1.
PSO parameter tuning

Summary statistics for PSO in 10 repetitions for within-groups variance (without constraints) for g=7. Table Columns from left to right report the population size, the inertia weight w, the best value, the mean value, the standard deviation, the 90th percentile, the number of times the best values occur in 30 repetitions.
Table A2.
GA Parameter tuning

(a): pc versuspm, (b): pc versus σm, (c): pm versus σm. Each cell reports the number of cases in which the GA converged to the best known value in all runs. g=7, objective function: within-groups variance (without constraints).
Table A3.
GA parameters tuning

Summary statistics for GA (pm=0.1, pc=0.1,σm=0.01) in 10 repetitions for expected loss for g=7. Column 1 reports the populations size, Column 2 the best value, Column 3 reports the mean value, Column 4 reports the standard deviation, Column 5 reports the 90th percentile, Column 6 reports the number of times the best values occurs out of the 10 repetitions.
Table A4.
Average number (in 30 runs) and standard deviation of solutions tried to have an initial population of 100 candidate solutions satisfying constraint 1 when g=7, 10 and 15 for a small and large data set
