Regular PaperPairwise independence and its impact on Estimation of Distribution Algorithms
Introduction
Nearly-decomposable systems were conceptually interpreted by Holland as composing the rationale for the performance of simple Genetic Algorithm (GAs) [1], [2]. From such ideas the schema theory and the Building-Block (BB) hypothesis were proposed and further extended by Goldberg [3]. In a general sense, Holland and Goldberg׳s theories rely on reductionist ideas, in which complex optimization problems would be composed of interacting substructures that could increase the efficiency of the search if effectively exploited by GAs [1], [2].
The schema theory formalized such ideas and provided an explanation for GAs effectiveness [2], [3], [4]. It argues that certain common features shared by high-quality solutions are privileged during the selection step. Therefore, in order to guarantee an efficient search, such features should not be frequently destroyed during reproduction [2]. These ideas grounded the so-called BB hypothesis, from which much has been investigated and also many controversies have arisen [4], [5].
Estimation of Distribution Algorithm (EDAs) emerged in that context as an alternative for traditional evolutionary algorithms, in which reproduction operators could rely on explicit probability distributions estimated from the population in an attempt to provide a more efficient search [6], [7], [8]. The relative success of early EDAs promoted a more intense interaction between machine learning and evolutionary computation [9], which resulted in EDAs of increasing complexity.
The field progressed with the assumption that the effectiveness of EDAs would mainly depend on how accurately the probabilistic models employed could encode the information contained in the population. Therefore, simple models based on univariate factorizations of the joint probability distribution [10], [11], [12] were put aside for the sake of more complex bivariate [13], [14] and multivariate factorizations [15], [16], [17], [18]. However, due to the high computational cost associated with the building (linkage learning) and sampling of multivariate probabilistic graphical models, their appropriate use in the context of EDAs is still an intense area of research [19].
In order to obtain efficient model-building procedures, most EDAs restrict the generality of their models and/or employ approximate algorithms. The extended Compact Genetic Algorithm (eCGA), for example, assumes that the joint probability distribution can be factorized as the product of disjoint multivariate factors [18]. More general models, such as Bayesian networks, have also been applied, but since the building of optimal Bayesian networks is an NP-complete problem [20], EDAs as the Bayesian Optimization Algorithm (BOA) [17] and the Estimation of Bayesian Network Algorithm (EBNA) [16] rely on greedy approximations of optimal Bayesian networks. Due to these constraints, the limits and difficulties of learning accurate probabilistic models has also become a research focus [21].
The accuracy of the probabilistic models employed by EDAs determines how effectively the search can be performed. Therefore, by understanding the required conditions which enable an accurate learning of probability distributions, the robustness of EDAs could also be assessed. One of the first discussions in this direction was proposed by Coffin et al. [22], [23], who showed that for concatenated parity functions (CPFs) the hierarchical BOA [24] requires an exponential number of fitness evaluations (in average) to find an optimal solution. The studies that followed indicated pairwise independence as the main cause for the difficulty of learning CPFs and the bad performance of EDAs.
Although being hard for learning, CPFs are easy to solve, Chen et al. [25], for example, showed that the eCGA [26] can solve them in polynomial time, although not being able to learn accurate models. Echegoyen et al. [27], on the other hand, showed that optimal Bayesian networks could correctly model the high-order dependences of CPFs, evidencing that such functions do not limit EDAs applicability but only challenges current approximate model-building procedures. Iclanzan [28] proposed a high-order learning procedure, from which the eCGA could model high-order dependences correctly and he also investigated more difficult hierarchical functions composed of pairwise independent variables [29]. The main criticism on these results stands on the artificiality of CPFs and the fact such results do not explain the causes of pairwise independence.
Regarding non-separable functions, Echegoyen et al. [30], [31] showed that the degree of interactions between decision variables was also a cause for inaccurate learning. It was shown that as the number of interactions among variables increases, the capacity of EBNA to learn the correct dependences decreases. At a certain degree of interactions a phase transition occurs in which the models produced stop to help the search. These results clearly indicate limits for EDAs applicability in terms of degree of interactions, however, only artificial functions were employed in the investigation and it is not evident how likely such extreme situations occur in practice.
In an attempt to close the gap between the conclusions drawn for artificial functions and their implications when solving more practical problems, some studies have been reported in the literature. Liaw and Ting [32], for example, showed that univariate and bivariate EDAs could outperform more complex algorithms in some instances of the NK-model. Martins et al. [33], [34], on the other hand, investigated the accuracy of the linkage-tree models produced by the Linkage-Tree Genetic Algorithm (LTGA) and the performance of other EDAs on the Multidimensional Knapsack Problem (MKP). The authors also compared the LTGA performance when employing linkage-trees and random linkage-trees, with no evidence indicating that linkage-tree learning helped to solve the MKP [35]. Sadowski et al. [36] faced similar difficulties when applying the LTGA to the MAX-SAT. According to the literature, such bad results could be caused by a high degree of interactions [30] or the non-uniformity of fitness contributions associated with subsets of variables [37]. However, few studies have followed in such a direction [38], [39].
Martins et al. [40], [41] investigated the connections between multimodality and pairwise independence. Considering additively separable functions, the increase in multimodality was shown to increase the difficulty of learning, with negative impacts on the accuracy of the models produced. It was argued that in those functions pairwise independence emerge as a consequence of extreme multimodality. Such results contributed to the understanding of how multimodality affects model-building and EDAs in general, a long-term issue in the field that many approaches have tried to circumvent [42], [43], [44].
The research on EDAs initially progressed on the perspective of proving the concept, and a large number of experimental comparisons with traditional evolutionary algorithms were performed to show the validity of EDAs. We believe that in order to keep the progress of the field it is important to understand the limitations of current EDAs to: (1) circumvent the limitations with new proposals or (2) identify the characteristic of the problems that can be properly solved by current EDAs.
This study focus on the second alternative and hypothesizes that there is a certain class of optimization problems in which contemporary EDAs excel. Following from the results of Echegoyen et al. [30] and Martins et al. [40], [35], we suppose such problems are defined by the degree of interactions among variables and multimodality. Since the degree of interactions was already investigated, we focus on the theoretical relation between multimodality and pairwise independence on additively separable functions. Additionally, in order to infer a range for the effectiveness of EDAs, we investigate experimentally the influence of N and K on the emergence of pairwise independence in NK-landscape instances.
The paper is organized as follows. Section 2 formally defines EDAs and reviews some concepts from information theory which are used along the paper. Section 3 summarizes and updates the assumptions made by Martins and Delbem [40], which define conditions for accurate bivariate learning and describes how extreme multimodality can induce pairwise independence. Section 4 analyzes some multivariate EDAs and proves that their model-building procedures are not able to learn high-order dependences in the presence of pairwise independence. Section 5 generalizes the results of the previous sections for non-separable NK-landscapes instances, in order to estimate the range of K (degree of interactions and multimodality) in which EDAs excel. Section 6 discusses the results and shows that the LTGA performed significantly better than its randomized version only for instances with small K which indicates that pairwise independence might appear with the increase of K. Section 7 summarizes the paper and discusses general implications.
Section snippets
Background
This section introduces methods, concepts and the notation used along the paper. Section 2.1 provides a general overview of EDAs and their classification in terms of the probabilistic models they employ, namely: univariate, bivariate and multivariate. Section 2.2 describes concepts from information theory, such as entropy and mutual information, which are fundamental to the analysis reported in Section 3.
Bivariate linkage learning
Most EDAs (except those with univariate models) rely on measurements of bivariate statistics in some step of their model-building procedures. This section describes how the increase of multimodality impacts such measurements, leading to pairwise independence in the extreme cases. The analysis is restricted to additively separable pseudo-Boolean functions and reviews/summarizes some previous results [40].
Multivariate linkage learning
The limitations of bivariate linkage learning have been constantly reported in the literature as motivation for the use of more complex probabilistic graphic models, in which multivariate dependences could be modeled. However, many studies have shown that some multivariate EDAs, which use Bayesian networks as probabilistic models, have their performance undermined by the presence of pairwise independence.
This section demonstrates how the approximation algorithms usually employed to build
Empirical analysis
In the previous sections we have analyzed two main points. Section 3 showed that, regarding additively separable functions, multimodality may considerably impact the observation of statistical dependences between random variables that should indeed be seen as dependent. In other words, multimodality may be one of the causes of pairwise independence. Such a characteristic may limit the usefulness of linkage learning to problems in which the multimodality degree does not exceed a certain limit.
Discussions
The experiments aimed at verifying the existence of a range of usefulness for linkage learning, delimited by K, outside which pairwise independence is likely to occur. The results have shown the validity of our hypothesis and the performance of the LTGA deteriorated, in relation to the rLTGA, as K increased. Now, we have evidences to conclude linkage learning is more effective in instances with K≈5. Coincidentally, most of the previous studies that evaluated EDAs in NK-landscapes used values of
Conclusions
EDAs were proposed as an alternative for traditional evolutionary algorithms in which reproduction operators could rely on probabilistic models learned from the population to enable a more effective search. In the pursuit for effective EDAs, many different linkage learning procedures and probabilistic graphic models have been investigated. However, due to the algorithmic complexity associated with the learning of representative models, some concessions had to be made in order to obtain useful
Acknowledgments
The authors are indebted to the FAPESP for the financial support provided for this research (2011/07792-4).
References (68)
- et al.
Linkage learning through probabilistic expression
Comput. Methods Appl. Mech. Eng.
(2000) - et al.
On the performance of linkage-tree genetic algorithms for the multidimensional knapsack problem
Neurocomputing
(2014) - et al.
Comprehensive characterization of the behaviors of estimation of distribution algorithms
Theor. Comput. Sci.
(2015) - et al.
An introduction and survey of estimation of distribution algorithms
Swarm Evol. Comput.
(2011) - et al.
Scalability of the Bayesian optimization algorithm
Int. J. Approx. Reason.
(2002) - et al.
Optimal implementations of UPGMA and other common clustering algorithms
Inf. Process. Lett.
(2007) The Sciences of the Artificial
(1996)Adaptation in Natural and Artificial Systems
(1975)Genetic Algorithms in Search, Optimization, and Machine Learning
(1989)The Design of InnovationLessons from and for Competent Genetic Algorithms
(2002)
Genetic Algorithms: Principles and PerspectivesA Guide to GA Theory
Towards a New Evolutionary ComputationAdvances on Estimation of Distribution Algorithms
Studies in Fuzziness and Soft Computing
Scalable Optimization via Probabilistic ModelingFrom Algorithms to Applications
Evolutionary computation meets machine learninga survey
IEEE Comput. Intell. Mag.
The compact genetic algorithm
IEEE Trans. Evol. Comput.
MIMICfinding optima by estimating probability densities
Adv. Neural Inf. Process. Syst.
FDA—a scalable evolutionary algorithm for the optimization of additively decomposed functions
Evol. Comput.
Linkage problem, distribution estimation, and Bayesian networks
Evol. Comput.
A review on probabilistic graphical models in evolutionary computation
J. Heuristics
Research topics in discrete estimation of distribution algorithms based on factorizations
Memet. Comput.
Cited by (13)
Shared manufacturing-based distributed flexible job shop scheduling with supply-demand matching
2024, Computers and Industrial EngineeringAn offline learning co-evolutionary algorithm with problem-specific knowledge
2022, Swarm and Evolutionary ComputationCitation Excerpt :Various meta-heuristics have been presented in literature to address the complex optimization problems [6,7] including genetic algorithm (GA) [8,9], differential evolution (DE) [10–13], particle swarm optimization (PSO) [14,15], and estimation of distribution algorithm (EDA) [16,17].
A comparative study of hybrid estimation distribution algorithms in solving the facility layout problem
2021, Egyptian Informatics JournalCitation Excerpt :Current research hypothesizes that the effectiveness of EDA depends primarily on the accuracy of the probabilistic model to extract the information in the population. Therefore, the multivariate probabilistic model becomes an option to build a better probabilistic model [8]. This reasoning determined the use of two probabilistic models in EDA, the ordinal and the dependency, in this research.
Diversity based selection for many-objective evolutionary optimisation problems with constraints
2021, Information SciencesCitation Excerpt :Various methods have been considered: The Estimation of Distribution Algorithms (EDA) that build a probabilistic model of promising candidate solutions, Linkage Learning approaches, that attempt to learn the dependencies between genes, decomposition based approaches and methods based on dominance relation. EDA’s have been shown to pay little attention to the correlation among genes [27], while Linkage Learning approaches have been analytically proven to be unable to learn high-order dependencies when the pairwise independence of genes is present [28]. In [29] the authors show the poor performance of EDA on the Knapsack Problem.
Reproductive bias, linkage learning and diversity preservation in bi-objective evolutionary optimization
2019, Swarm and Evolutionary ComputationCitation Excerpt :For that, the first step is to characterize bi-objective linkages. Linkage learning in single-objective optimization has exhibited limitations when dealing with highly multimodal search spaces [34–38]. The same occurring for some artificial multiobjective problems [27].
A set of new compact firefly algorithms
2018, Swarm and Evolutionary ComputationCitation Excerpt :This field started in 1999 [52] when Harik et al. proposed the first compact genetic algorithm (cGA). As estimation of distribution algorithms (EDS) [53,54], cGA generates offspring population according to the estimated probabilistic model instead of using recombination and mutation operators. The population is represented by the probabilistic vector (PV).