Regular Paper
Pairwise independence and its impact on Estimation of Distribution Algorithms

https://doi.org/10.1016/j.swevo.2015.10.001Get rights and content

Abstract

Estimation of Distribution Algorithms (EDAs) were proposed as an alternative for traditional evolutionary algorithms in which reproduction operators could rely on information extracted from the population to enable a more effective search. Since information is usually represented as a probabilistic graphic model, the effectiveness of EDAs strongly depends on how accurately such models represent the population. In this sense, models of increasing complexity have been employed by EDAs, with the most successful ones being able to encode multivariate factorizations of joint probability distributions. However, some studies have shown that even multivariate EDAs fail to build accurate models for problems in which there is an intrinsic pairwise independence between variables. This study elucidates how pairwise independence impacts the linkage learning procedures of multivariate EDAs and affects their accuracy. First, the necessary conditions for learning additively separable functions are assessed, from which it is shown that extreme multimodality can induce pairwise independence. Second, it is demonstrated that in the presence of pairwise independence the approximate linkage learning procedures employed by many EDAs are not able to retrieve high-order dependences. Finally, in an attempt to infer how likely pairwise independence occur in practical problems, the case of non-separable functions is empirically investigated. For this purpose, the NK-model and the Linkage-Tree Genetic Algorithm (LTGA) were used as a study case and a range of usefulness for the LTGA was estimated according to N (problem size) and K (degree of interactions among variables and multimodality). The results indicated that LTGA linkage learning is probably more useful for K6 on instances with random linkages (this range grows with N), and for K9 on instances with nearest-neighbor linkages (this range is stable with N). Outside these ranges, pairwise independence is more likely to occur, which deteriorates models accuracy and impairs LTGA performance.

Introduction

Nearly-decomposable systems were conceptually interpreted by Holland as composing the rationale for the performance of simple Genetic Algorithm (GAs) [1], [2]. From such ideas the schema theory and the Building-Block (BB) hypothesis were proposed and further extended by Goldberg [3]. In a general sense, Holland and Goldberg׳s theories rely on reductionist ideas, in which complex optimization problems would be composed of interacting substructures that could increase the efficiency of the search if effectively exploited by GAs [1], [2].

The schema theory formalized such ideas and provided an explanation for GAs effectiveness [2], [3], [4]. It argues that certain common features shared by high-quality solutions are privileged during the selection step. Therefore, in order to guarantee an efficient search, such features should not be frequently destroyed during reproduction [2]. These ideas grounded the so-called BB hypothesis, from which much has been investigated and also many controversies have arisen [4], [5].

Estimation of Distribution Algorithm (EDAs) emerged in that context as an alternative for traditional evolutionary algorithms, in which reproduction operators could rely on explicit probability distributions estimated from the population in an attempt to provide a more efficient search [6], [7], [8]. The relative success of early EDAs promoted a more intense interaction between machine learning and evolutionary computation [9], which resulted in EDAs of increasing complexity.

The field progressed with the assumption that the effectiveness of EDAs would mainly depend on how accurately the probabilistic models employed could encode the information contained in the population. Therefore, simple models based on univariate factorizations of the joint probability distribution [10], [11], [12] were put aside for the sake of more complex bivariate [13], [14] and multivariate factorizations [15], [16], [17], [18]. However, due to the high computational cost associated with the building (linkage learning) and sampling of multivariate probabilistic graphical models, their appropriate use in the context of EDAs is still an intense area of research [19].

In order to obtain efficient model-building procedures, most EDAs restrict the generality of their models and/or employ approximate algorithms. The extended Compact Genetic Algorithm (eCGA), for example, assumes that the joint probability distribution can be factorized as the product of disjoint multivariate factors [18]. More general models, such as Bayesian networks, have also been applied, but since the building of optimal Bayesian networks is an NP-complete problem [20], EDAs as the Bayesian Optimization Algorithm (BOA) [17] and the Estimation of Bayesian Network Algorithm (EBNA) [16] rely on greedy approximations of optimal Bayesian networks. Due to these constraints, the limits and difficulties of learning accurate probabilistic models has also become a research focus [21].

The accuracy of the probabilistic models employed by EDAs determines how effectively the search can be performed. Therefore, by understanding the required conditions which enable an accurate learning of probability distributions, the robustness of EDAs could also be assessed. One of the first discussions in this direction was proposed by Coffin et al. [22], [23], who showed that for concatenated parity functions (CPFs) the hierarchical BOA [24] requires an exponential number of fitness evaluations (in average) to find an optimal solution. The studies that followed indicated pairwise independence as the main cause for the difficulty of learning CPFs and the bad performance of EDAs.

Although being hard for learning, CPFs are easy to solve, Chen et al. [25], for example, showed that the eCGA [26] can solve them in polynomial time, although not being able to learn accurate models. Echegoyen et al. [27], on the other hand, showed that optimal Bayesian networks could correctly model the high-order dependences of CPFs, evidencing that such functions do not limit EDAs applicability but only challenges current approximate model-building procedures. Iclanzan [28] proposed a high-order learning procedure, from which the eCGA could model high-order dependences correctly and he also investigated more difficult hierarchical functions composed of pairwise independent variables [29]. The main criticism on these results stands on the artificiality of CPFs and the fact such results do not explain the causes of pairwise independence.

Regarding non-separable functions, Echegoyen et al. [30], [31] showed that the degree of interactions between decision variables was also a cause for inaccurate learning. It was shown that as the number of interactions among variables increases, the capacity of EBNA to learn the correct dependences decreases. At a certain degree of interactions a phase transition occurs in which the models produced stop to help the search. These results clearly indicate limits for EDAs applicability in terms of degree of interactions, however, only artificial functions were employed in the investigation and it is not evident how likely such extreme situations occur in practice.

In an attempt to close the gap between the conclusions drawn for artificial functions and their implications when solving more practical problems, some studies have been reported in the literature. Liaw and Ting [32], for example, showed that univariate and bivariate EDAs could outperform more complex algorithms in some instances of the NK-model. Martins et al. [33], [34], on the other hand, investigated the accuracy of the linkage-tree models produced by the Linkage-Tree Genetic Algorithm (LTGA) and the performance of other EDAs on the Multidimensional Knapsack Problem (MKP). The authors also compared the LTGA performance when employing linkage-trees and random linkage-trees, with no evidence indicating that linkage-tree learning helped to solve the MKP [35]. Sadowski et al. [36] faced similar difficulties when applying the LTGA to the MAX-SAT. According to the literature, such bad results could be caused by a high degree of interactions [30] or the non-uniformity of fitness contributions associated with subsets of variables [37]. However, few studies have followed in such a direction [38], [39].

Martins et al. [40], [41] investigated the connections between multimodality and pairwise independence. Considering additively separable functions, the increase in multimodality was shown to increase the difficulty of learning, with negative impacts on the accuracy of the models produced. It was argued that in those functions pairwise independence emerge as a consequence of extreme multimodality. Such results contributed to the understanding of how multimodality affects model-building and EDAs in general, a long-term issue in the field that many approaches have tried to circumvent [42], [43], [44].

The research on EDAs initially progressed on the perspective of proving the concept, and a large number of experimental comparisons with traditional evolutionary algorithms were performed to show the validity of EDAs. We believe that in order to keep the progress of the field it is important to understand the limitations of current EDAs to: (1) circumvent the limitations with new proposals or (2) identify the characteristic of the problems that can be properly solved by current EDAs.

This study focus on the second alternative and hypothesizes that there is a certain class of optimization problems in which contemporary EDAs excel. Following from the results of Echegoyen et al. [30] and Martins et al. [40], [35], we suppose such problems are defined by the degree of interactions among variables and multimodality. Since the degree of interactions was already investigated, we focus on the theoretical relation between multimodality and pairwise independence on additively separable functions. Additionally, in order to infer a range for the effectiveness of EDAs, we investigate experimentally the influence of N and K on the emergence of pairwise independence in NK-landscape instances.

The paper is organized as follows. Section 2 formally defines EDAs and reviews some concepts from information theory which are used along the paper. Section 3 summarizes and updates the assumptions made by Martins and Delbem [40], which define conditions for accurate bivariate learning and describes how extreme multimodality can induce pairwise independence. Section 4 analyzes some multivariate EDAs and proves that their model-building procedures are not able to learn high-order dependences in the presence of pairwise independence. Section 5 generalizes the results of the previous sections for non-separable NK-landscapes instances, in order to estimate the range of K (degree of interactions and multimodality) in which EDAs excel. Section 6 discusses the results and shows that the LTGA performed significantly better than its randomized version only for instances with small K which indicates that pairwise independence might appear with the increase of K. Section 7 summarizes the paper and discusses general implications.

Section snippets

Background

This section introduces methods, concepts and the notation used along the paper. Section 2.1 provides a general overview of EDAs and their classification in terms of the probabilistic models they employ, namely: univariate, bivariate and multivariate. Section 2.2 describes concepts from information theory, such as entropy and mutual information, which are fundamental to the analysis reported in Section 3.

Bivariate linkage learning

Most EDAs (except those with univariate models) rely on measurements of bivariate statistics in some step of their model-building procedures. This section describes how the increase of multimodality impacts such measurements, leading to pairwise independence in the extreme cases. The analysis is restricted to additively separable pseudo-Boolean functions and reviews/summarizes some previous results [40].

Multivariate linkage learning

The limitations of bivariate linkage learning have been constantly reported in the literature as motivation for the use of more complex probabilistic graphic models, in which multivariate dependences could be modeled. However, many studies have shown that some multivariate EDAs, which use Bayesian networks as probabilistic models, have their performance undermined by the presence of pairwise independence.

This section demonstrates how the approximation algorithms usually employed to build

Empirical analysis

In the previous sections we have analyzed two main points. Section 3 showed that, regarding additively separable functions, multimodality may considerably impact the observation of statistical dependences between random variables that should indeed be seen as dependent. In other words, multimodality may be one of the causes of pairwise independence. Such a characteristic may limit the usefulness of linkage learning to problems in which the multimodality degree does not exceed a certain limit.

Discussions

The experiments aimed at verifying the existence of a range of usefulness for linkage learning, delimited by K, outside which pairwise independence is likely to occur. The results have shown the validity of our hypothesis and the performance of the LTGA deteriorated, in relation to the rLTGA, as K increased. Now, we have evidences to conclude linkage learning is more effective in instances with K≈5. Coincidentally, most of the previous studies that evaluated EDAs in NK-landscapes used values of

Conclusions

EDAs were proposed as an alternative for traditional evolutionary algorithms in which reproduction operators could rely on probabilistic models learned from the population to enable a more effective search. In the pursuit for effective EDAs, many different linkage learning procedures and probabilistic graphic models have been investigated. However, due to the algorithmic complexity associated with the learning of representative models, some concessions had to be made in order to obtain useful

Acknowledgments

The authors are indebted to the FAPESP for the financial support provided for this research (2011/07792-4).

References (68)

  • C.R. Reeves et al.

    Genetic Algorithms: Principles and PerspectivesA Guide to GA Theory

    (2002)
  • P. Larrañaga et al.
    (2002)
  • J.A. Lozano et al.

    Towards a New Evolutionary ComputationAdvances on Estimation of Distribution Algorithms

    Studies in Fuzziness and Soft Computing

    (2006)
  • M. Pelikan et al.

    Scalable Optimization via Probabilistic ModelingFrom Algorithms to Applications

    (2006)
  • J. Zhang et al.

    Evolutionary computation meets machine learninga survey

    IEEE Comput. Intell. Mag.

    (2011)
  • H. Mühlenbein, G. Paaß, From recombination of genes to the estimation of distributions I. Binary parameters, in:...
  • S. Baluja, Population-based incremental learning: a method for integrating genetic search based function optimization...
  • G. Harik et al.

    The compact genetic algorithm

    IEEE Trans. Evol. Comput.

    (1999)
  • J. De Bonet et al.

    MIMICfinding optima by estimating probability densities

    Adv. Neural Inf. Process. Syst.

    (1997)
  • M. Pelikan, H. Müehlenbein, The bivariate marginal distribution algorithm, in: Advances in Soft Computing, Springer,...
  • H. Mühlenbein et al.

    FDA—a scalable evolutionary algorithm for the optimization of additively decomposed functions

    Evol. Comput.

    (1999)
  • R. Etxeberria, P. Larranaga, Global optimization using Bayesian networks, in: Second Symposium on Artificial...
  • M. Pelikan et al.

    Linkage problem, distribution estimation, and Bayesian networks

    Evol. Comput.

    (2000)
  • G.R. Harik, F.G. Lobo, K. Sastry, Linkage learning via probabilistic modeling in the Extended Compact Genetic Algorithm...
  • P. Larraãga et al.

    A review on probabilistic graphical models in evolutionary computation

    J. Heuristics

    (2012)
  • D. Chickering, Learning bayesian networks is np-complete, in: D. Fisher, H.-J. Lenz (Eds.), Learning from Data, Lecture...
  • R. Santana et al.

    Research topics in discrete estimation of distribution algorithms based on factorizations

    Memet. Comput.

    (2009)
  • D.J. Coffin, R.E. Smith, The limitations of distribution sampling for linkage learning, in: Proceedings of the Congress...
  • D. Coffin, R.E. Smith, Linkage learning in estimation of distribution algorithms, in: Linkage in Evolutionary...
  • M. Pelikan, Hierarchical bayesian optimization algorithm, in: Hierarchical Bayesian Optimization Algorithm, Studies in...
  • S.-C. Chen, T.-L. Yu, Difficulty of linkage learning in estimation of distribution algorithms, in: Proceedings of the...
  • C. Echegoyen, R. Santana, J. Lozano, P. Larraaga, The impact of Exact Probabilistic Learning Algorithms in EDAs based...
  • D. Iclanzan, Higher-order linkage learning in the eCGA, in: Proceedings of the Genetic and Evolutionary Computation...
  • D. Iclanzan, Hierarchical allelic pairwise independent functions, in: Proceedings of the Genetic and Evolutionary...
  • Cited by (13)

    • An offline learning co-evolutionary algorithm with problem-specific knowledge

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      Various meta-heuristics have been presented in literature to address the complex optimization problems [6,7] including genetic algorithm (GA) [8,9], differential evolution (DE) [10–13], particle swarm optimization (PSO) [14,15], and estimation of distribution algorithm (EDA) [16,17].

    • A comparative study of hybrid estimation distribution algorithms in solving the facility layout problem

      2021, Egyptian Informatics Journal
      Citation Excerpt :

      Current research hypothesizes that the effectiveness of EDA depends primarily on the accuracy of the probabilistic model to extract the information in the population. Therefore, the multivariate probabilistic model becomes an option to build a better probabilistic model [8]. This reasoning determined the use of two probabilistic models in EDA, the ordinal and the dependency, in this research.

    • Diversity based selection for many-objective evolutionary optimisation problems with constraints

      2021, Information Sciences
      Citation Excerpt :

      Various methods have been considered: The Estimation of Distribution Algorithms (EDA) that build a probabilistic model of promising candidate solutions, Linkage Learning approaches, that attempt to learn the dependencies between genes, decomposition based approaches and methods based on dominance relation. EDA’s have been shown to pay little attention to the correlation among genes [27], while Linkage Learning approaches have been analytically proven to be unable to learn high-order dependencies when the pairwise independence of genes is present [28]. In [29] the authors show the poor performance of EDA on the Knapsack Problem.

    • Reproductive bias, linkage learning and diversity preservation in bi-objective evolutionary optimization

      2019, Swarm and Evolutionary Computation
      Citation Excerpt :

      For that, the first step is to characterize bi-objective linkages. Linkage learning in single-objective optimization has exhibited limitations when dealing with highly multimodal search spaces [34–38]. The same occurring for some artificial multiobjective problems [27].

    • A set of new compact firefly algorithms

      2018, Swarm and Evolutionary Computation
      Citation Excerpt :

      This field started in 1999 [52] when Harik et al. proposed the first compact genetic algorithm (cGA). As estimation of distribution algorithms (EDS) [53,54], cGA generates offspring population according to the estimated probabilistic model instead of using recombination and mutation operators. The population is represented by the probabilistic vector (PV).

    View all citing articles on Scopus
    View full text