Skip to main content
Log in

A locally convergent rotationally invariant particle swarm optimization algorithm

  • Published:
Swarm Intelligence Aims and scope Submit manuscript

Abstract

Several well-studied issues in the particle swarm optimization algorithm are outlined and some earlier methods that address these issues are investigated from the theoretical and experimental points of view. These issues are the: stagnation of particles in some points in the search space, inability to change the value of one or more decision variables, poor performance when the swarm size is small, lack of guarantee to converge even to a local optimum (local optimizer), poor performance when the number of dimensions grows, and sensitivity of the algorithm to the rotation of the search space. The significance of each of these issues is discussed and it is argued that none of the particle swarm optimizers we are aware of can address all of these issues at the same time. To address all of these issues at the same time, a new general form of velocity update rule for the particle swarm optimization algorithm that contains a user-definable function \(f\) is proposed. It is proven that the proposed velocity update rule guarantees to address all of these issues if the function \(f\) satisfies the following two conditions: (i) the function \(f\) is designed in such a way that for any input vector \(\vec {y}\) in the search space, there exists a region \(A\) which contains \(\vec {y}\) and \( f\!\left( {\vec {y}} \right) \) can be located anywhere in \(A\), and (ii) \(f\) is invariant under any affine transformation. An example of function \(f\) is provided that satisfies these conditions and its performance is examined through some experiments. The experiments confirm that the proposed algorithm (with an appropriate function \(f)\) can effectively address all of these issues at the same time. Also, comparisons with earlier methods show that the overall ability of the proposed method for solving benchmark functions is significantly better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Note that the swarm size issue should not be mixed with the concept of selecting the best population size for an algorithm. The swarm size issue means that the algorithm should be able to perform well even with a small number of particles. However, it does not mean that a larger population size does not affect the performance of the algorithm.

  2. Note that the definition of the “problem scale” issue should not be mixed with “large scale optimization.” In large scale optimization the aim is to improve algorithms so that they are able to find quality solutions for problems with high number of dimensions. In the problem scale issue (discussed in this paper) we are after guaranteeing improvement of the initial solutions by the algorithm when the number of dimensions grows. The presence of the problem scale issue implies that the algorithm is not a good method for large scale optimization. However, a good performance in large scale optimization by an algorithm implies that there is no problem scale issue in that algorithm.

  3. Alternatively, these two random matrices are often considered as two random vectors. In this case, the multiplication of these random vectors by \(\overrightarrow{PI} \) and \(\overrightarrow{SI} \) is element-wise.

  4. Note that in continuous space, the probability of hitting a particular point is zero. Thus, the probability of \( \vec {p}_{t}^{i} = \vec {g}_{t}\) or \(\vec {x}_{t}^{i} = \vec {g}_{t}\) or any other equality like \(\vec {x}_{t}^{i} = \vec {p}_{t}^{k}\) is zero. However, because of the representation of the floating points on computers, the probability of this situation is non-zero. In addition, the particles converge to their equilibrium point if their coefficients are set within the convergence boundary. As in the equilibrium point \(\vec {V}_{t}^{i} = 0\) and \(\vec {x}_{t}^{i} = \vec {p}_{t}^{i} = \vec {g}_{t}\), the stagnation happens for the particles at the equilibrium point. However, there is no guarantee that the equilibrium point is a local minimizer of the objective function. Also, the convergence to an equilibrium point is independent from the starting position of the particles, which means that restarting particles cannot guarantee avoidance of stagnation.

  5. Note that, as local convergence is a property of an algorithm, it is sufficient if the final solution presented by the algorithm is in the optimality region. As the final solution generated by a PSO algorithm is (usually) the global best vector, a PSO algorithm is locally convergent if the condition \(\lim _{{t \rightarrow \infty }} P\left( {\vec {g}_{t} \in R_{\varepsilon } } \right) = 1\) is satisfied. However, if a PSO method satisfies this condition, there is only a guarantee that one particle (the global best particle) will converge to the optimality region. Thus, the other particles might converge to any other points in the search space that are not in the optimality region.

  6. Note that we represent this function by a lowercase \(f\). This should not be confused with the uppercase \(F\) which represents the objective function.

  7. Note that, in continuous space, it is impossible to hit a point (\(\vec {x}_{t}^{i} \ne f\left( {\vec {p}_{t}^{k} } \right) \) is always true). However, in the computer simulation, because of the limited floating point precision, it is possible that \(\vec {x}_{t}^{i} = f\left( {\vec {p}_{t}^{k} } \right) \).

  8. Note that the function \(f\) is the same for all particles; however, it is different for different neighbors (variance is different for different neighbors). Thus, we refer to these functions as \(f_{kt}(y) \) because they are independent of \(i\).

  9. A Matlab source code for this variant is available as online supplementary material.

  10. Note that, as all of the discussions in this paper are around continuous space, hitting a point is impossible. Hence, whenever we are using “a point” we refer to “arbitrarily close to a point.”

  11. Note that both \( \vec {\lambda }_{{t^{\prime } ,\omega }} \) and \( \vec {\zeta }_{{t^{\prime } ,\psi }} \) are dependent on \(t\) and \(i\) as well, however, for simplicity, these two indexes are not mentioned in the usage of \( \vec {\lambda }_{{t^{\prime } ,\omega }}\) and \( \vec {\zeta }_{{t^{\prime } ,\psi }}\).

References

  • Bonyadi, M. R., Li, X., & Michalewicz, Z. (2013). A hybrid particle swarm with velocity mutation for constraint optimization problems. In Genetic and evolutionary computation conference (pp. 1–8). New York; ACM. doi:10.1145/2463372.2463378.

  • Bonyadi, M. R., & Michalewicz, Z. (2014). SPSO2011—analysis of stability, local convergence, and rotation sensitivity. In Genetic and evolutionary computation conference (pp. 9–15). Vancouver, Canada. ACM. doi:10.1145/2576768.2598263.

  • Bonyadi, M. R., Michalewicz, Z., & Li, X. (2014). An analysis of the velocity updating rule of the particle swarm optimization algorithm. Journal of Heuristics. doi:10.1007/s10732-014-9245-2.

  • Chen, D. B., & Zhao, C. X. (2009). Particle swarm optimization with adaptive population size and its application. Applied Soft Computing, 9(1), 39–48. doi:10.1016/j.asoc.2008.03.001.

    Article  Google Scholar 

  • Cheng, M.-Y., Huang, K.-Y., & Chen, H.-M. (2011). Dynamic guiding particle swarm optimization with embedded chaotic search for solving multidimensional problems. Optimization Letters, 6(6), 719–729. doi:10.1007/s11590-011-0297-z.

    MathSciNet  Google Scholar 

  • Clerc, M. (2006). Particle swarm optimization. Chichester: Wiley-ISTE.

    Book  MATH  Google Scholar 

  • Clerc, M., & Kennedy, J. (2002). The particle swarm—explosion, stability, and convergence in a multidimensional complex space. IEEE Transactions on Evolutionary Computation, 6(1), 58–73. doi:10.1109/4235.985692.

    Article  Google Scholar 

  • Deb, K., Joshi, D., & Anand, A. (2002). Real-coded evolutionary algorithms with parent-centric recombination. In Congress on evolutionary computation (pp. 61–66). Honolulu: IEEE. doi:10.1109/CEC.2002.1006210.

  • Engelbrecht, A. (2005). Fundamentals of computational swarm intelligence. Hoboken, NJ: Wiley.

    Google Scholar 

  • Engelbrecht, A. (2011). Scalability of a heterogeneous particle swarm optimizer. In Symposium on swarm intelligence (pp. 1–8). Paris: IEEE. doi:10.1109/SIS.2011.5952563.

  • Engelbrecht, A. (2012). Particle swarm optimization: Velocity initialization. In Congress on evolutionary computation (pp. 1–8). Brisbane: IEEE.

  • García-Nieto, J., & Alba, E. (2011). Restart particle swarm optimization with velocity modulation: A scalability test. Soft Computing, 15(13), 2221–2232. doi:10.1007/s00500-010-0648-1.

    Article  Google Scholar 

  • Ghosh, S., Das, S., Kundu, D., Suresh, K., Panigrahi, B. K., & Cui, Z. (2010). An inertia-adaptive particle swarm system with particle mobility factor for improved global optimization. Neural Computing and Applications, 21(4), 237–250. doi:10.1007/s00521-010-0356-x.

    Google Scholar 

  • Gut, A. (2009). An intermediate course in probability. New York: Springer.

    Book  MATH  Google Scholar 

  • Hansen, N., Ros, R., Mauny, N., Schoenauer, M., & Auger, A. (2011). Impacts of invariance in search: When CMA-ES and PSO face ill-conditioned and non-separable problems. Applied Soft Computing, 11(10), 5755–5769. doi:10.1016/j.asoc.2011.03.001.

    Article  Google Scholar 

  • Hao, G., & Wenbo, X. (2011). A new particle swarm algorithm and its globally convergent modifications. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 41(7), 1334–1351. doi:10.1109/tsmcb.2011.2144582.

    Article  Google Scholar 

  • Helwig, S., & Wanka, R. (2007). Particle swarm optimization in high-dimensional bounded search spaces. In Swarm intelligence symposium (pp. 198–205). Honolulu: IEEE. doi:10.1109/SIS.2007.368046.

  • Helwig, S., & Wanka, R. (2008). Theoretical analysis of initial particle swarm behavior. In International conference on parallel problem solving from nature (pp. 889–898). Berlin: Springer. doi:10.1007/978-3-540-87700-4_88.

  • Hsieh, S. T., Sun, T. Y., Liu, C. C., & Tsai, S. J. (2009). Efficient population utilization strategy for particle swarm optimizer. IEEE Transactions on Systems Man and Cybernetics Part B: Cybernetics, 39(4), 444–456. doi:10.1109/Tsmcb.2008.2006628.

    Article  Google Scholar 

  • Huang, H., Qin, H., Hao, Z., & Lim, A. (2010). Example-based learning particle swarm optimization for continuous optimization. Information Sciences. doi:10.1016/j.ins.2010.10.018.

  • Hutter, F., Hoos, H. H., Leyton-Brown, K., & Murphy, K. (2010). Time-bounded sequential parameter optimization. In Learning and intelligent optimization (pp. 281–298). Berlin: Springer.

  • Jiang, M., Luo, Y., & Yang, S. (2007a). Particle swarm optimization-stochastic trajectory analysis and parameter selection. Swarm intelligence focus on ant and particle swarm optimization. Wien: I-TECH Education and Publishing.

  • Jiang, M., Luo, Y. P., & Yang, S. Y. (2007b). Stochastic convergence analysis and parameter selection of the standard particle swarm optimization algorithm. Information Processing Letters, 102(1), 8–16. doi:10.1016/j.ipl.2006.10.005.

    Article  MATH  MathSciNet  Google Scholar 

  • Kennedy, J. (2003). Bare bones particle swarms. In Swarm intelligence symposium (pp. 80–87). doi:10.1109/SIS.2003.1202251.

  • Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In International conference on neural networks (Vol. 4, pp. 1942–1948). Piscataway: IEEE.

  • Lehre, P. K., & Witt, C. (2013). Finite first hitting time versus stochastic convergence in particle swarm optimisation. In L. Di Gaspero, A. Schaerf, & T. Stützle (Eds.), Advances in metaheuristics. New York: Springer.

    Google Scholar 

  • Li, X., & Yao, X. (2011). Cooperatively coevolving particle swarms for large scale optimization. IEEE Transactions on Evolutionary Computation, 16(4), 210–224.

    Google Scholar 

  • Liang, J. J., Qin, A. K., Suganthan, P. N., & Baskar, S. (2006). Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Transactions on Evolutionary Computation, 10(5), 281–295. doi:10.1109/Tevc.2005.857610.

    Article  Google Scholar 

  • Malan, K., & Engelbrecht, A. P. (2008). Algorithm comparisons and the significance of population size. In IEEE World Congress on computational intelligence (pp. 914–920). Hong Kong: IEEE. doi:10.1109/CEC.2008.4630905.

  • Matyas, J. (1965). Random optimization. Automation and Remote Control, 26(4), 246–253.

    MathSciNet  Google Scholar 

  • Mendes, R., Kennedy, J., & Neves, J. (2004). The fully informed particle swarm: Simpler, maybe better. IEEE Transactions on Evolutionary Computation, 8(5), 204–210. doi:10.1109/TEVC.2004.826074.

    Article  Google Scholar 

  • Montes de Oca, M. A., Aydın, D., & Stützle, T. (2011). An incremental particle swarm for large-scale continuous optimization problems: An example of tuning-in-the-loop (re) design of optimization algorithms. Soft Computing, 15(13), 2233–2255. doi:10.1007/s00500-010-0649-0.

    Article  Google Scholar 

  • Montes de Oca, M. A., & Stützle, T. (2008). Convergence behavior of the fully informed particle swarm optimization algorithm. In Genetic and evolutionary computation conference (pp. 71–78). New York: ACM. doi:10.1145/1389095.1389106.

  • Montes de Oca, M. A., Stützle, T., Birattari, M., & Dorigo, M. (2009). Frankenstein’s PSO: A composite particle swarm optimization algorithm. IEEE Transactions on Evolutionary Computation, 13(7), 1120–1132. doi:10.1109/Tevc.2009.2021465.

    Article  Google Scholar 

  • Poli, R. (2008). Analysis of the publications on the applications of particle swarm optimisation. Journal of Artificial Evolution and Application, 2008(5), 1–10. doi:10.1155/2008/685175.

    Google Scholar 

  • Poli, R. (2009). Mean and variance of the sampling distribution of particle swarm optimizers during stagnation. IEEE Transactions on Evolutionary Computation, 13(6), 712–721. doi:10.1109/Tevc.2008.2011744.

    Article  Google Scholar 

  • Poli, R., Kennedy, J., & Blackwell, T. (2007). Particle swarm optimization: An overview. Swarm Intelligence, 1(1), 33–57. doi:10.1007/s11721-007-0002-0.

    Article  Google Scholar 

  • Potter, M., & De Jong, K. (1994). A cooperative coevolutionary approach to function optimization. Parallel problem solving from nature (pp. 249–257). Berlin: Springer. doi:10.1007/3-540-58484-6_269.

  • Ratnaweera, A., Halgamuge, S. K., & Watson, H. C. (2004). Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Transactions on Evolutionary Computation, 8(5), 240–255. doi:10.1109/tvec.2004.826071.

    Article  Google Scholar 

  • Rockafellar, R. T. (1996). Convex analysis (Vol. 28). Princeton: Princeton University Press.

    Google Scholar 

  • Schmitt, M., & Wanka, R. (2013). Particle swarm optimization almost surely finds local optima. In Genetic and evolutionary computation conference, Amsterdam, The Netherlands (pp. 1629–1636). New York: ACM. doi:10.1145/2463372.2463563.

  • Shi, Y., & Eberhart, R. (1998a). A modified particle swarm optimizer. In World Congress on computational intelligence (pp. 69–73). Los Alamitos: IEEE. doi:10.1109/icec.1998.699146.

  • Shi, Y., & Eberhart, R. (1998b). Parameter selection in particle swarm optimization. In Evolutionary programming VII (pp. 591–600). Berlin: Springer. doi:10.1007/BFb0040810.

  • Solis, F. J., & Wets, R. J.-B. (1981). Minimization by random search techniques. Mathematics of Operations Research, 6(1), 19–30.

    Article  MATH  MathSciNet  Google Scholar 

  • Spears, W. M., Green, D. T., & Spears, D. F. (2010). Biases in particle swarm optimization. International Journal of Swarm Intelligence Research, 1(4), 34–57. doi:10.4018/jsir.2010040103.

    Article  Google Scholar 

  • Spiegel, M. R. (1959). Theory and problems of vector analysis: Schaum’s outline series. New York: McGraw-Hill.

    Google Scholar 

  • Suganthan, P. N., Hansen, N., Liang, J. J., Deb, K., Chen, Y., Auger, A., et al. (2005). Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization. KanGAL Report 2005005, Technical Report.

  • Tang, K., Yao, X., Suganthan, P. N., MacNish, C., Chen, Y. P., Chen, C. M., et al. (2007). Benchmark functions for the CEC’2008 special session and competition on large scale global optimization. Nature Inspired Computation and Applications Laboratory, USTC, China, Technical Report.

  • Trelea, I. C. (2003). The particle swarm optimization algorithm: Convergence analysis and parameter selection. Information Processing Letters, 85(8), 317–325. doi:10.1016/S0020-0190(02)00447-7.

    Article  MATH  MathSciNet  Google Scholar 

  • Tu, Z., & Lu, Y. (2004). A robust stochastic genetic algorithm (StGA) for global numerical optimization. IEEE Transactions on Evolutionary Computation, 8(7), 456–470. doi:10.1109/TEVC.2004.831258.

    Article  Google Scholar 

  • Van den Bergh, F., & Engelbrecht, A. (2002). A new locally convergent particle swarm optimiser. In Systems, man and cybernetics, Hammamet, Tunisia (Vol. 3, pp. 96–101): Los Alamitos: IEEE.

  • Van den Bergh, F., & Engelbrecht, A. P. (2001). Effects of swarm size on cooperative particle swarm optimisers. In Genetic and evolutionary computation conference, San Fransisco, USA.

  • Van den Bergh, F., & Engelbrecht, A. P. (2004). A cooperative approach to particle swarm optimization. IEEE Transactions on Evolutionary Computation, 8(5), 225–239. doi:10.1109/TEVC.2004.826069.

    Google Scholar 

  • Van den Bergh, F., & Engelbrecht, A. P. (2006). A study of particle swarm optimization particle trajectories. Information Sciences, 176(10), 937–971. doi:10.1016/j.ins.2005.02.003.

    MATH  MathSciNet  Google Scholar 

  • Van den Bergh, F., & Engelbrecht, A. P. (2010). A convergence proof for the particle swarm optimiser. Fundamenta Informaticae, 105(6), 341–374. doi:10.3233/FI-2010-370.

    MATH  MathSciNet  Google Scholar 

  • Vesterstrom, J., & Thomsen, R. (2004). A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In Congress on evolutionary computation (Vol. 2, Vol. 1982, pp. 1980–1987): Los Alamitos: IEEE. doi:10.1109/CEC.2004.1331139.

  • Wang, Y., Li, B., Weise, T., Wang, J., Yuan, B., & Tian, Q. (2011). Self-adaptive learning based particle swarm optimization. Information Sciences, 181(20), 4515–4538. doi:10.1016/j.ins.2010.07.013.

    Article  MATH  MathSciNet  Google Scholar 

  • Wilke, D. (2005). Analysis of the particle swarm optimization algorithm. Pretoria: University of Pretoria.

    Google Scholar 

  • Wilke, D. N., Kok, S., & Groenwold, A. A. (2007a). Comparison of linear and classical velocity update rules in particle swarm optimization: Notes on diversity. International Journal for Numerical Methods in Engineering, 70(10), 962–984. doi:10.1002/nme.1867.

    Article  MATH  MathSciNet  Google Scholar 

  • Wilke, D. N., Kok, S., & Groenwold, A. A. (2007b). Comparison of linear and classical velocity update rules in particle swarm optimization: Notes on scale and frame invariance. International Journal for Numerical Methods in Engineering, 70(10), 985–1008. doi:10.1002/nme.1914.

    Article  MATH  MathSciNet  Google Scholar 

  • Witt, C. (2009). Why standard particle swarm optimisers elude a theoretical runtime analysis. In Foundations of genetic algorithms, New York, NY, USA (pp. 13–20). New York: ACM. doi:10.1145/1527125.1527128.

  • Xinchao, Z. (2010). A perturbed particle swarm algorithm for numerical optimization. Applied Soft Computing, 10(1), 119–124. doi:10.1016/j.asoc.2009.06.010.

    Article  Google Scholar 

  • Zhao, S., Liang, J., Suganthan, P., & Tasgetiren, M. (2008). Dynamic multi-swarm particle swarm optimizer with local search for large scale global optimization. In IEEE World Congress on computational intelligence (pp. 3845–3852). Los Alamitos: IEEE. doi:10.1109/CEC.2008.4631320.

Download references

Acknowledgments

The authors would like to extend their great appreciation to Maris Ozols, Luigi Barone, Frank Neumann, and Markus Wagner for constructive comments and discussions that have helped us to improve the quality of the paper. This work was partially funded by the ARC Discovery Grant DP130104395 and by Grant N N519 5788038 from the Polish Ministry of Science and Higher Education (MNiSW).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Reza Bonyadi.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (rar 4 KB)

Appendices

Appendix 1

A PSO method is locally convergent if

$$\begin{aligned} \forall i \lim \nolimits _{{t \rightarrow \infty }} P\left( {\vec {p}_{t}^{i} \in R_{\varepsilon } } \right) = 1, \end{aligned}$$
(19)

i.e., the probability that the personal best of each particle \(i\), \(\vec {p}_{t}^{i}\), is in the optimality region \(R_{\varepsilon }\) approaches 1 when the iteration number \(t\) approaches infinity.

In this appendix we prove the following

Theorem

If the function \(f\) in the velocity update rule of the LcRiPSO (Eq. 12b) satisfies the condition:

$$\begin{aligned} \forall \vec {y} \in S \exists A_{y} \subseteq S \, \forall \vec {z} \in A_{y} \quad \forall \delta > 0 \quad P\left( {\left| {f\left( {\vec {y}} \right) - \vec {z}} \right| < \delta } \right) > 0, \end{aligned}$$
(20)

then the LcRiPSO algorithm is locally convergent. In the condition 20, \(\vec {y}\) is an arbitrary point in the search space \(S\), \(A_{y}\) is an open set which contains \(\vec {y}\), \(\vec {z}\) is an arbitrary point in \(A_{y}\), \(\delta \) is a positive value.

Equation 20 can be explained as follows: for all \({\vec {y}}\) in the search space, there exist an open set \(A_{y}\) in the search space that contains \(\vec {y}\) such that for every point \(\vec {z}\) in this open set, for every real value \(\delta >0\), the point \(f\left( {\vec {y}}\right) \) is closer than \(\delta \) to \(\vec {z}\) with non-zero probability. Informally, 20 states that the value of \(f\left( {\vec {y}} \right) \) can map the point \(\vec {y}\) to any point in the open set \(A_{y}\) with non-zero probability.

If the function \(f\) satisfies condition 20, the personal best of at least one of the particles converges to the optimality region of the objective function with probability one.

Before we proceed, let us define a general form for a stochastic algorithm (GSA) (Solis and Wets 1981):

Algorithm GSA:

  1. 1)

    initialize \(\varvec{p}_{0}\) from the search space \(\varvec{S}\) and set \(\varvec{t=1}\)

  2. 2)

    generate a random sample \(x_t \) from \(\varvec{S}\)

  3. 3)

    generate the candidate solution \(\varvec{p}_{t}=D(p_{t-1}, x_{t})\), set \(\varvec{t=t+1}\) , and go to 2

where \(D(a, b)\) is an operator which selects one of \(a\) or \(b\). There are three important components in GSA: (1) a random sample \(x_{t}\), (2) a candidate solution \(p_{t}\), and (3) an update procedure of \(p_{t}\) (the operator \(D)\). We investigate the local convergence condition (condition 19) for GSA. We introduce two conditions C1 and C2, and we prove that if GSA satisfies C1 and C2, then GSA is locally convergent. We then show that each particle \(i\) in the LcRiPSO is a specific model of GSA. Hence, if C1 and C2 are satisfied for each particle in the LcRiPSO then the algorithm is locally convergent. Finally, we prove that all particles in the LcRiPSO satisfy C1 and C2. This would complete the proof of local convergence for the LcRiPSO.

Let us start with defining conditions C1 and C2.

Condition C1: GSA satisfies condition C1 if:

$$\begin{aligned} p_t =D( {p_{t-1} ,x_t })=\left\{ \begin{array}{ll} {x_t } &{} {\mathrm{{if}} F( {x_t })<F( {p_{t-1} })-\varepsilon _0 } \\ {p_{t-1} } &{} {\mathrm{{otherwise}}}, \\ \end{array} \right. \end{aligned}$$

where \(\varepsilon _0 \) is a positive value that is smaller than or equal to \(\varepsilon \) (\(\varepsilon \) in the definition of \(R_{\varepsilon i} )\). This means that the new solution \(x_{t}\) should be better than \(p_{t-1}\) at least by the constant \(\varepsilon _0 \) to update \(p_{t}\). In computer simulations, we can set \(\varepsilon _0 \) to the smallest possible float/double value (Matyas 1965).

Condition C2: GSA satisfies the condition C2 if:

$$\begin{aligned} \exists \varepsilon >0 \exists \eta >0 \exists \delta \in \left( {0,1} \right] \forall t\ge 0 \exists t^{\prime }>0 P( {F( {p_{t+t^{\prime }} })\le F( {p_t })-\eta }) >\delta \quad \mathrm{{or}} \quad p_t \epsilon R_\varepsilon , \end{aligned}$$

i.e., \(p_{t+t'}\) is better than \(p_{t}\) at least by \(\eta \) in terms of the objective value \(F(.)\). In other words, the probability that the \(p_{t+t'}\) is better than \(p_{t }\)(in terms of objective value \(F)\) at least by the value \(\eta \) is larger than \(\delta \) unless \(p_{t}\) is already in the optimality region. We will prove (lemma 1) that if both C1 and C2 are satisfied, GSA is locally convergent.

Lemma 1

If GSA satisfies conditions C1 and C2, GSA is locally convergent.

Proof

Let us define

$$\begin{aligned} A( {t, t^{\prime }})=\left\{ \begin{array}{ll} {\mathrm{{true}}} &{} \quad {F( {p_{t+t^{\prime }} })\le F( {p_t })-\eta } \\ {\mathrm{{false}}}&{} \quad {\mathrm{{otherwise}}}. \\ \end{array} \right. \end{aligned}$$

Then, the negation of \(A(t, t')\), not \(A(t, t'), \) is:

$$\begin{aligned} \bar{A}\left( {t,t^{\prime }} \right) = \left\{ \begin{array}{ll} {\mathrm{{true}}} &{} \quad {F\left( {p_{{t + t^{\prime }}}} \right) > F\left( {p_{t} } \right) - \eta } \\ {\mathrm{{false}}} &{} \quad {\mathrm{{otherwise}}}. \\ \end{array}\right. \end{aligned}$$
(21)

So, according to the C2, we can say that for any \(t\), there exist \(t'\) such that the probability of \(p_{t+t'}\) being not better than \(p_{t}\) is smaller than (or equal to) \(1-\delta \) (the complement of the probability in the condition C2):

$$\begin{aligned} P\left( {\bar{A}\left( {t,t^{\prime }} \right) } \right) \le 1 - \delta . \end{aligned}$$

Let us consider now a sequence of \(k\) successive occurrences of \( \bar{A}\left( {t,t^{\prime }}\right) \): \(\bar{A}\left( t + t{{'}}_{0} ,t{{'}_{1} } \right) , \bar{A}\left( {t + t{'}_{0} + t{'}_{1} ,t{'}_{2} } \right) , \ldots ,\bar{A}\left( {t + t{'}_{0} + t{'}_{1} \ldots + t{'}_{{k -1}} ,t^{{'}} _{k} } \right) \), i.e., \(p_{t+t{'}_0 +t{'}_1 \ldots +t^{\prime }_{k-1} +t^{\prime }_k } \) being not better than \(p_t \). Note that (based on C1) \(p_{t+1} \) is not worse than \(p_t ,\) so, if \(p_{t+t{'}_0 +t{'}_1 \ldots +t^{\prime }_{k-1} +t^{\prime }_k } \) is not better than \(p_t \), then none of \(p_l \) is better than \(p_t \) for all \(l\in \left[ {t, t+t^{\prime }_0 +t^{\prime }_1 \ldots +t^{\prime }_{k-1} +t^{\prime }_k } \right] \). Hence, for any t and for any number of steps k, the probability of \(p_{t+t{'}_0 +t{'}_1 \ldots +t^{\prime }_{k-1} +t^{\prime }_k } \) being not better than \(p_t \) is calculated by:

$$\begin{aligned}&P\left( {\bar{A}\left( {t + t{^{\prime }}_{0} ,t{^{\prime }}_{1} } \right) } \right) {{*}}P\left( {\bar{A}\left( {t + t{^{\prime }}_{0} + t{^{\prime }}_{1} ,t{^{\prime }}_{2} } \right) } \right) {{*}} \ldots {{*}}P\left( {\bar{A}\left( {t + t{^{\prime }}_{0} + t{^{\prime }}_{1} \ldots + t{^{\prime }}_{{k - 1}} ,t{{^{\prime }}} _{k} } \right) } \right) \\&\quad = \mathop \prod \limits _{{l = 1}}^{k} P\left( {\bar{A}\left( {t + \mathop \sum \limits _{{j = 0}}^{{l - 1}} t{^{\prime }}_{j} ,t{^{\prime }}_{l} } \right) } \right) \le \left( {1 - \delta } \right) ^{k}, \end{aligned}$$

where \(t_0^{\prime }\) is 0. Therefore, the probability that at least one of \(\{p_{t+1} , p_{t+2} , \ldots ,p_{t+t^{\prime }_0 +t^{\prime }_1 \ldots +t^{\prime }_{k-1} +t^{\prime }_k } \}\) is better than \(p_t \) is given by:

$$\begin{aligned} 1 - \mathop \prod \limits _{{l = 1}}^{k} P\left( {\bar{A}\left( {t + \mathop \sum \limits _{{j = 0}}^{{l - 1}} t{'}_{j} ,t{'}_{l} } \right) } \right) > 1 - (1 - \delta )^{k}. \end{aligned}$$

As \(k\) grows, the right hand side of the inequality approaches 1. Thus, the probability that at least one of \(\{p_{t+1}, p_{t+2}, \ldots , p_{t+t^{\prime }_0 +t^{\prime }_1 \ldots +t^{\prime }_{k-1} +t^{\prime }_k } \}\) is better than \(p_t \) by at least \(\eta \), grows to 1. Let us denote the event of \(p_t \) becoming better by at least \(\eta \) in the next \(t{'}_0 +t{'}_1 \ldots +t^{\prime }_{k-1} +t{'}_k \) iterations by \(A^*(t, t_k^{\prime } )\). The probability of this event approaches 1 as the number of steps \(k\) grows.

Now, with further iterations of the GSA algorithm (i.e., as \(t\) grows), a sequence of occurrences of \(A^{*}(t, t_k^{\prime } )\) takes place—every occurrence of \(A^{*}(t, t_k^{\prime } )\) results in improvement by at least \(\eta \) of the personal best vector with probability 1. As the number of such occurrences \(A^{*}(t, t_k^{\prime } )\) grows, \(p_{t}\) will arrive at the optimality region with probability 1. This completes the proof for the lemma 1. \(\square \)

Clearly, each particle \(i\) in the LcRiPSO is a specific model of the GSA. In fact, the personal best of the particle \(i\) (\(\vec {p}_{t}^{i})\) corresponds to a candidate solution \(p_{t}\) in GSA. Further, the personal best of the particle \(i\) is updated by:

$$\begin{aligned} \vec {p}_{t}^{i} = \left\{ \begin{array}{ll} {\vec {x}_{t}^{i} } &{} \quad {\mathrm{{if}} F\left( {\vec {x}_{t}^{i} } \right) < F\left( {\vec {p}_{{t - 1}}^{i} } \right) - \varepsilon _{0} } \\ {\vec {p}_{{t - 1}}^{i} } &{} \quad {\mathrm{{otherwise}}}, \\ \end{array} \right. \end{aligned}$$
(22)

where \(\varepsilon _0 \) is the desired precision of the optimality region (see the condition C1) that can be set to the smallest possible value in computer simulations. Thus, the updating procedure 22 in LcRiPSO for the particle \(i\) corresponds to the operator \(D\) in GSA. Also, the current position of the particle \(i\) (\(\vec {x}_{t}^{i})\) corresponds to the random sample \(x_{t}\) in GSA. As particle \(i\) contains all three components in GSA, i.e., the current position in a particle is corresponding to the random sample, and the personal best of a particle is corresponding to candidate solution, and, finally, Eq. 23 of a particle is corresponding to operator \(D\), we conclude that each particle \(i\) in LcRiPSO is a specific model of GSA. Because each particle \(i\) in LcRiPSO is a specific model of GSA, if we prove that each particle \(i\) satisfies C1 and C2, we would prove also that LcRiPSO is locally convergent (recall that, based on Eq. 19, if all particles converge to the optimality region, the PSO method is locally convergent).

Before we start the proof of the theorem, let us analyze how the position of the particle \(i\) is updated. The position and velocity update rule in LcRiPSO for the ith particle at the time \(t\) is written as:

$$\begin{aligned}&\vec {x}_{{t + 1}}^{i} = \vec {x}_{t}^{i} + \vec {V}_{{t + 1}}^{i}, \end{aligned}$$
(23)
$$\begin{aligned}&\vec {V}_{{t + 1}}^{i} = \omega \vec {V}_{t}^{i} + \vec {v}_{t}^{i}, \end{aligned}$$
(24)
$$\begin{aligned}&\vec {v}_{t}^{i} = \mathop \sum \limits _{{k \in T_{t}^{i} }} \varphi _{k} r_{{kt}}^{i} \left( {f_{k} (\vec {p}_{t}^{k} ) - \vec {x}_{t}^{i} } \right) , \end{aligned}$$
(25)

where \(T_t^i \) is the set of indexes of the particles which contribute to the velocity update rule of the particle \(i\), \(\vec {p}_{t}^{k}\) is the personal best of the kth particle, and \(\varphi _k \) and \(\omega \) are constants. Note that based on the definition of \(T_t^i \), we always assume that \(i\in T_t^i \) (i.e., the particle \(i\) always contributes to its own velocity update rule). \(\vec {p}_{t}^{i}\) is updated using Eq. 22. By combining Eqs. 23 and 24, we get

$$\begin{aligned} \vec {x}_{{t + 1}}^{i} = \vec {x}_{t}^{i} + \omega \vec {V}_{t}^{i} + \vec {v}_{t}^{i}. \end{aligned}$$
(26)

Also, according to Eq. 23,

$$\begin{aligned} \vec {V}_{t}^{i} = \vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i}. \end{aligned}$$
(27)

By combining Eqs. 26 and 27 we get

$$\begin{aligned} \vec {x}_{{t + 1}}^{i} = \vec {x}_{t}^{i} + \vec {v}_{t}^{i} + \omega (\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} ). \end{aligned}$$
(28)

Let us pay attention to the first two components of this formula, \(\vec {x}_{t}^{i}\) and \(\vec {v}_{t}^{i}\). Since calculation of \(\vec {v}_{t}^{i}\) (see 25) involves random values \(r_{kt}^i \), then \(\vec {x}_{t}^{i} + \vec {v}_{t}^{i}\) is a random point. Now we introduce a new construct that plays an important role in our proof. For the particle \(i\), we define a convex hull \(M_{t}^{i}\) that is defined by \(card( {T_t^i })+1\) vertices: \(\vec {x}_{t}^{i}\), \(\vec {x}_{t}^{i} + \varphi _{k} \left( {f_{k} (\vec {p}_{t}^{k} ) - \vec {x}_{t}^{i} } \right) \;\mathrm{for\;all}\;k \in \left\{ {T_{t}^{i} } \right\} \). Figure 8 shows an example of this convex hull for the particle \(i\).

Fig. 8
figure 8

A particular example for the convex hull \(M_{t}^{i}\) where the size \(T_{t}^{i}\) is 4. Five points that the convex hull has been defined by have been shown in the figure by black circles. The gray areas show the area \(A_{{\vec {p}}t^{k}}\) for each \(\vec {p}_{t}^{k}\). Note that \(f(\vec {p}_{t}^{k})\) can be anywhere in \(A_{{\vec {p}_{t}^{k} }}\)

We are now going to introduce another lemma (lemma 2) that is essential for proving the satisfaction of C2 by LcRiPSO.

Lemma 2

For every convex hull \(M\) which is defined by the points {Y\(_{1}\), Y\(_{1}\)+Y\(_{2}\),..., Y\(_{1}\)+Y\(_{n}\)}, there exist \(\left\{ {r_2 ,\ldots ,r_n } \right\} , r_k \in \left[ {0, 1} \right] f\!or k=2,\ldots ,n\), such that any point \(m\) inside \(M\) can be represented by: \(m=Y_1 +\mathop \sum \limits _{k=2}^n r_k Y_k \).

Proof

Let us define \(Z_{1}=Y_{1}\) and \(Z_{k}=Y_{1}+Y_{k}\) for \(k=2,{\ldots },n\). If we translate the origin of the coordinate system to \(Y_{1}, \) then the vertices for \(M'\) (the convex hull \(M\) in the new coordinate system) are \(\{Z_{1}-Y_{1},Z_{2}-Y_{1},{\ldots },{Z}_{n}-Y_{1}\}\) that is \(\{Z{'}_{1}, Z{'}_{2}, {\ldots }, Z{'}_{n}\}\) where \(Z'_{1}\) is in the origin of the new coordinate system. Any arbitrary point \(m'\) inside \(M'\) (note that \(m'=m-Y_{1}\) where \(m\) is a point inside \(M)\) can be represented by a convex combination of all vertices in \(M'\) (Rockafellar 1996). In other words:

$$\begin{aligned} \exists \left\{ {r_1 , r_2 ,\ldots ,r_n } \right\} , r_k \in \left[ {0, 1} \right] \text{ for } \text{ all } k=1, 2,\ldots ,n, m^{\prime }=\mathop \sum \limits _{k=1}^n r_k Z{'}_k. \end{aligned}$$

Note that, according to the definition of convex combination, \(\left\{ {r_1 , r_2 ,\ldots ,r_n } \right\} \) have the property \(\mathop \sum \limits _{k=1}^n r_k =1\). As \(Z'_{1}\) is the origin, it is clear that \(Z'_{1}=r_{1} Z'_{1}\) for any\( r_{1}\). Thus, we get

$$\begin{aligned} \exists \left\{ {r_2 ,\ldots ,r_n } \right\} , r_k \in \left[ {0, 1} \right] \text{ for } \text{ all } k=2,\ldots ,n, m^{\prime }=Z^{\prime }_1 +\mathop \sum \limits _{k=2}^n r_k Z{'}_k. \end{aligned}$$

By substituting \(m^{\prime }=m-Y_1 \) and \(Z{'}_k =Z_k -Y_1 \), any point m inside M can be represented by \(m-Y_1 =Z_1 -Y_1 +\mathop \sum \limits _{k=2}^n r_k (Z_k -Y_1 )=\mathop \sum \limits _{k=2}^n r_k Y_k \). Hence, there exist \(\left\{ {r_2 ,\ldots ,r_n } \right\} , r_k \in \left[ {0, 1} \right] for k=2,\ldots ,n \) such that \(m=Y_1 +\mathop \sum \limits _{k=2}^n r_k Y_k \). \(\square \)

According to the lemma 2, for any point \(\vec {m}_{t}^{i}\) in the convex hull \(M_{t}^{i}\), there exist \(r_{kt}^i \) (for all \(k\in T_t^i )\) that \( \vec {m}_{t}^{i} = \vec {x}_{t}^{i} + \vec {v}_{t}^{i} \) (see Eq. 25). Thus, we can write 28 as

$$\begin{aligned} \vec {x}_{{t + 1}}^{i} = \vec {m}_{t}^{i} + \omega (\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} ), \end{aligned}$$
(29)

where \(\vec {m}_{t}^{i}\) is a random point in \(M_{t}^{i}\). Therefore, the point \( \vec {x}_{{t + 1}}^{i} \) is a random point from the subspace (\( M_{t}^{i} + \omega (\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i}\)).

We introduce two simple observations that are used in the main proof.

Observation (a) For any point \(\vec {y}\) in the search space \(S\), there exists a hyper-sphere with the center \( \vec {y} \) and the radius \(\rho \) (we use the notation \( n_{\rho } \left( {\vec {y}} \right) \) for the set of all points in a hyper-sphere with the center \( \vec {y} \) and radius \(\rho )\) that \( f\left( {\vec {y}} \right) )\) can be arbitrarily close to any point in that hyper-sphere with non-zero probability. In other words, \( \forall \vec {y} \in S~\exists \rho > 0~\forall \vec {z} \in n_{\rho } \left( {\vec {y}} \right) ~\forall \varepsilon > 0~P\left( {\left| {f\left( {\vec {y}} \right) - \vec {z}} \right| < \varepsilon } \right) > 0 \) where \( n_{\rho } \left( {\vec {y}} \right) \) is the hyper-sphere with the radius \(\rho \) centered at \( {\vec {y}}\) (Fig. 9).

Fig. 9
figure 9

\( f(\vec {y}) \) can be any point in the light gray area. The dark area shows the open set \(A_{y}\)

This is because \(A_{y}\) (in 20) is an open set which contains \( \vec {y} \), and as \( n_{\rho } \left( {\vec {y}} \right) \) is a sphere-neighborhood of \( \vec {y} \), all points in \( n_{\rho } \left( {\vec {y}} \right) \) are in \(A_{y}\). Thus, \( n_{\rho } \left( {\vec {y}} \right) \) is a subset of \(A_{y}\). According to the condition 20, \( f(\vec {y}) \) can be arbitrarily close to any point in \(A_{y}\) with non-zero probability. Hence, \( f(\vec {y}) \) has non-zero probability to be arbitrarily close to any point in \( n_{\rho } \left( {\vec {y}} \right) \). Observation (b) For every particle \(i\) at every iteration \(t\), for every \(\psi >0\), the convex hull \(M_t^i \) has non-empty intersection with \( n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \). In fact,

$$\begin{aligned} {\text {~}}\forall i{\text {~}}\forall t{\text {~}}\forall \psi > 0~M_{t}^{i} \mathop \cap \nolimits ^{} n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \ne \emptyset . \end{aligned}$$

This is because \(\vec {x}_{t}^{i} \in M_{t}^{i}\) (note that based on the definition of \(M_{t}^{i}\), \( \vec {x}_{t}^{i} \) is always a point in \(M_{t}^{i})\), hence \( M_{t}^{i} \mathop \cap \nolimits ^{} n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \ne \emptyset \). Also, note that this intersection reduces to only one point \( \vec {x}_{t}^{i} \) when \( M_{t}^{i} = \left\{ {\vec {x}_{t}^{i} } \right\} \) where \( \left\{ {\vec {x}_{t}^{i} } \right\} = \left\{ {f_{k} \left( {\vec {p}_{t}^{k} } \right) } \right\} \mathrm{for all} k \in \left\{ {T_{t}^{i} } \right\} \). Also, this intersection is a sub-space of \( n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \) when \( M_{t}^{i} - \left\{ {\vec {x}_{t}^{i} } \right\} \ne \emptyset \). Figure 10 shows this intersection when \( M_{t}^{i} - \left\{ {\vec {x}_{t}^{i} } \right\} \ne \emptyset \).

Fig. 10
figure 10

The intersection between \( n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \) and \(M_t^i \) is non-empty. The dark area shows the intersection. Note that the shape of \(M_{t}^{i}\) depends on \( f_{k} (\vec {p}_{t}^{k} ) \) and \( \vec {x}_{t}^{i} \), while it is always convex

This observation implies also that for all \(\psi >0\) there exist a vector \( \vec {a} \) which its length is shorter than \(\psi \) and the vector \( \vec {x}_{t}^{i} + \vec {\alpha } \) is a member of \(M_t^i \) (note that \( n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \) has non-empty intersection with \(M_t^i )\). Also, according to the lemma 2, \( \vec {m}_{t}^{i} \) (a random point in \(M_t^i )\) can be any point in \( M_{t}^{i} \mathop \cap \nolimits ^{} n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \) with non-zero probability for all \(\psi \). Note that, in this case, \( \vec {m}_{t}^{i} \) can be represented by \( \vec {x}_{t}^{i} + \vec {\alpha } \) where \( \vec {\alpha } \) is a random vector in \( M_{t}^{i} \mathop \cap \nolimits ^{} n_{\psi } \left( {\vec {x}_{t}^{i} } \right) \) and its length is less than \(\psi \).

Lemma 3

In the LcRiPSO algorithm, if the function f satisfies the condition 20, then

$$\begin{aligned} \forall \omega \in \left( {0,~1} \right) \exists \beta > 0 \forall i \forall t \exists h > \frac{\beta }{{1 - \omega }} \forall \vec {z} \in n_{h} \left( {\vec {p}_{t}^{i} } \right) P\left( {\vec {z} \in M_{t}^{i} } \right) > 0. \end{aligned}$$

In other words, for every value of\( \omega \) in the interval [0, 1], there exists \(\beta >0\) that for every particle i at every iteration t, every point in \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) has non-zero probability to be in the convex hull M\(_{t}^{i}\).

Proof

According to the Observation (\(a)\), for all \( \vec {y} \) in the search space, there exists a \(\rho \) where \(f\left( {\vec {y}} \right) \) can be arbitrarily close to any point in \( n_{\rho } \left( {\vec {y}} \right) \) with non-zero probability. Based on the definition of \(M_{t}^{i}\), \( f(\vec {p}_{t}^{i} ) \) is in \(M_{t}^{i}\). This implies that any point in \( n_{\rho } \left( {\vec {p}_{t}^{i} } \right) \) has non-zero probability to be in \(M_{t}^{i}\). Also, for any \(\rho >0\), for all \(\omega \), there exist a \(\beta \) in a way that \(\frac{\beta }{1-\omega }=\rho \). Thus, by considering \(h=\rho \), any point in \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) has non-zero probability to be in \(M_{t}^{i}\). This completes the proof for the lemma 3. \(\square \)

Now, we are ready to prove the main theorem.

Theorem

If the function \(f\) in the velocity update rule of LcRiPSO (Eq. 12b) satisfies condition:

$$\begin{aligned} \forall \vec {y} \in S \exists A_{y} \subseteq S \forall \vec {z} \in A_{y} \forall \delta > 0 P\left( {\left| {f\left( {\vec {y}} \right) - \vec {z}} \right| < \delta } \right) > 0, \end{aligned}$$
(30)

then the algorithm LcRiPSO (the proposed PSO algorithm which uses Eq. 12b for velocity update rule and the standard position update rule) is locally convergent.

Proof

We will show that any particle \(i \)in LcRiPSO satisfies conditions C1 and C2. As the personal best of each particle is updated by Eq. 23, the condition C1 is satisfied for each particle. Now, for any iteration \(t\), let us consider \(t'>0\) additional iterations. There are two cases to consider:

$$\begin{aligned} {\text {case }}\left( {1}\right) \;F\left( {\vec {p}_{{t + t^{\prime }}}^{i} } \right) < F\left( {\vec {p}_{t}^{i} } \right) \\ {\text {case }}\left( {2}\right) F\left( {\vec {p}_{{t + t^{\prime }}}^{i} } \right) = F\left( {\vec {p}_{t}^{i} } \right) \end{aligned}$$

Case (1) implies that there exist \(0<\tau \le t'\) that \( F\left( {\vec {p}_{{t + \tau }}^{i} } \right) < F\left( {\vec {p}_{t}^{i} } \right) \). According to 23, if \( F\left( {\vec {p}_{{t + \tau }}^{i} } \right) \) is better than \( F\left( {\vec {p}_{t}^{i} } \right) \), it is better by at least \(\varepsilon _0 \). By setting \(\varepsilon _0 =\eta \) in the condition C2, this condition is satisfied for \(\tau =t'\). This completes the proof for this case. Let us continue with the case (2).

Case (2) implies that \(\vec {p}_{{t}}^{i} = \vec {p}_{{t+\tau }}^{i}\) for all \(0<\tau \le t^{\prime }\). We will show that in this case, for all \(0<\omega <1\) for all \(\psi >0\) there exist \(h\)>\(0\) for each particle \(i \)at all iteration \(t\), there exist \(t'\)>\(0\) that \( \vec {x}_{t + t^{\prime }}^{i}\) can be any pointFootnote 10 in \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) with non-zero probability. Hence, if there exists a point in \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) that is better than \( \vec {p}_{t}^{i} ~ \) by at least \(\eta \), there is a non-zero probability that \( \vec {x}_{{t + t^{\prime } }}^{i} \) is that point, and hence, \( \vec {p}_{{t + t^{\prime } }}^{i} \) is updated to \( \vec {x}_{{t + t^{\prime } }}^{i} \). As \( \vec {p}_{{t + t^{\prime } }}^{i} \) is better than \( \vec {p}_{t}^{i} \) by at least \(\eta \) with non-zero probability, C2 is satisfied. We also show that if such a point does not exist then \(\vec {p}_{t}^{i}\) is already in the optimality region. This would complete the proof for the case (2).

According to the Observation (\(b)\), for all \(\psi \), \( \vec {m}_{t}^{i} = \vec {x}_{t}^{i} + \vec {\alpha }_{0} \) with non-zero probability, where \( \left| {\vec {\alpha }_{0} } \right| < \psi \). From the updating equation for \(\vec {x}_{{t + 1}}^{i}\) (Eq. 29), we get:

$$\begin{aligned} \vec {x}_{{t + 1}}^{i} = \vec {x}_{t}^{i} + \vec {\alpha }_{0} + \omega \left( {\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} } \right) , \end{aligned}$$
(31)

with non-zero probability. Similarly, for \(\vec {x}_{{t + 2}}^{i}\):

$$\begin{aligned} \vec {x}_{{t + 2}}^{i}&= \vec {m}_{{t + 1}}^{i} + \omega \left( {\vec {x}_{{t + 1}}^{i} - \vec {x}_{t}^{i} } \right) = \vec {x}_{{t + 1}}^{i} + \vec {\alpha }_{1} + \omega \left( {\vec {x}_{t}^{i} + \omega \left( {\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} } \right) + \vec {\alpha }_{0} - \vec {x}_{t}^{i} } \right) \nonumber \\ {}&= \vec {x}_{{t + 1}}^{i} + \vec {\alpha }_{1} + \omega \left( {\omega \left( {\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} } \right) + \vec {\alpha }_{0} } \right) = \vec {x}_{{t + 1}}^{i} + \vec {\alpha }_{1} + \omega \vec {\alpha }_{0} + \omega ^{2} \left( {\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} } \right) ,\nonumber \\ \end{aligned}$$
(32)

with non-zero probability. In general,

$$\begin{aligned} \vec {x}_{{t + t^{\prime }}}^{i} = \vec {m}_{{t + t^{\prime }- 1}}^{i} + \omega ^{{t^{\prime }}} \left( {\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} } \right) + \mathop \sum \limits _{{l = 0}}^{{t^{\prime } - 1}}\omega ^{{t^{\prime } - 1 - l}} \vec {\alpha }_{l}, \end{aligned}$$
(33)

with non-zero probability, where \(\left| {\vec {\alpha }_{l} } \right| < \psi \) for all \(l\). For simplicity, we re-write the last equation as:

$$\begin{aligned} \vec {x}_{{t + t^{\prime } }}^{i} = \vec {m}_{{t + t^{\prime } - 1}}^{i} + \vec {\lambda }_{{t^{\prime } ,\omega }} + \vec {\zeta }_{{t^{\prime } ,\psi }}, \end{aligned}$$
(34)

where \( \vec {\zeta }_{{t^{{\text {'}}} ,\psi }} = \mathop \sum \limits _{{l = 0}}^{{t^{{{'}}} - 1}} \omega ^{l} \vec {\alpha }_{l} \), \( \left| {\vec {\alpha }_{l} } \right| < \psi \) and \( \vec {\lambda }_{{t^{{{'}}} ,\omega }} = \omega ^{{t^{{{'}}} }} \left( {\vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} } \right) \).Footnote 11 At the iteration \(t+t'\), the random sample \( \vec {m}_{{t + t^{{\text {'}}} - 1}}^{i} \) is transformed by the vector \( \vec {\lambda }_{{t^{{\text {'}}} ,\omega }} + \vec {\zeta }_{{t^{{\text {'}}} ,\psi }} \).

Let us analyze all three components of Eq. 34 (i.e., \( \vec {m}_{{t + t^{\prime } - 1}}^{i} \), \( \vec {\lambda }_{{t^{\prime } ,\omega }} \), and \( \vec {\zeta }_{{t^{\prime } ,\psi }} \)).

Analysis of \(\vec {\lambda }_{{t}^{\prime },\omega } \): Clearly, \(\omega ^{{t}^{\prime }}\) becomes smaller as \(t{'}\) grows (recall that \(0\le \omega <1\) is essential for PSO to be locally convergent), thus \( \left| {\vec {\lambda }_{{t^{\prime } ,\omega }} } \right| \) becomes closer to zero as \(t'\) grows (note that the length of \( \vec {x}_{t}^{i} - \vec {x}_{{t - 1}}^{i} \) is constant).

Analysis of \( \vec {\zeta }_{{t}^{\prime },\psi } \): The longest possible vector that can be generated by this term is achieved if for all \(l\), \( \vec {\alpha }_{l} \) are in the same direction and their lengths are \(\psi \) (according to the Observation (\(b)\) the lengths of \(\vec {\alpha }_{l}\) is at most \(\psi )\). Thus,

$$\begin{aligned} \left| {\vec {\zeta }_{{t^{\prime } ,\psi }} } \right| = \left| {\mathop \sum \limits _{{l = 0}}^{{t^{\prime }- 1}} \omega ^{l} \vec {\alpha }_{l} } \right| < \mathop \sum \limits _{{l = 0}}^{{t^{\prime }- 1}} \omega ^{l} \psi = \psi \mathop \sum \limits _{{l = 0}}^{{t^{\prime }- 1}} \omega ^{{t^{\prime } - 1 - l}}, \end{aligned}$$

that is always smaller than \(\psi \frac{\omega }{1-\omega }\). Hence,

$$\begin{aligned} \forall \omega \in \left( {0, 1} \right) \forall \psi > 0 \forall i \forall t \forall t^{\prime } > 0 \left| {\vec {\zeta }_{{t^{\prime } ,\psi }} } \right| < \psi \frac{\omega }{{1 - \omega }}. \end{aligned}$$

Figure 11 shows the updating steps for \( \vec {x}_{t}^{i} \) in three iterations (it shows the changes of \(\vec {\zeta }_{{t^{\prime } ,\psi }}\) and \(\vec {\lambda }_{{t^{\prime } ,\omega }}\). It is clear that the length of \(\vec {\lambda }_{{t^{\prime } ,\omega }}\) shrinks (red vectors). Also, the length of \(\vec {\zeta }_{{t^{\prime } ,\psi }}\) (blue vectors) in each iteration is the sum of all previous \(\vec {\alpha }_{l}\) multiplied by \(\omega ^{{t}^{\prime }-1-l}\) (\(0<l<t^{\prime }-1)\).

Fig. 11
figure 11

The blue vectors show \(\vec {\zeta }_{{t}^{\prime },{\psi }}\). The gray areas are \( {\text {M}}_{\mathrm{t}^\mathrm{i}} \cap {\text {n}}_{\psi } \left( {\overrightarrow{{\text {x}}} _{\mathrm{t}}^\mathrm{i}}\right) \). The length of the blue vectors is always smaller than \(\psi \). Also, the red vectors show the term \( {\vec {\lambda }}_{\mathrm{t}^{\prime } , \omega } \). It is clear that the red vectors become smaller in each iteration.

Based on the Analysis of \( \vec {\zeta }_{{t}^{\prime } ,\psi } \), it is clear that \( \left| {\vec {\zeta }_{{t^{\prime }} ,\psi }} + \vec {\lambda }_{{t^{\prime } ,\omega }}\right| \le \left| {\vec {\zeta }_{{t^{\prime } ,\psi }} } \right| + \left| {\vec {\lambda }_{{t^{{'}} ,\omega }} } \right| < \psi \frac{\omega }{{1 - \omega }} + \left| {\vec {\lambda }_{{t^{\prime } ,\omega }} } \right| \). Because \( \vec {\lambda }_{{t^{\prime } ,\omega }} \) shrinks as \(\mathrm{t}^{\prime }\) grows (Analysis of \( {\vec {\lambda }}_{{t}^{\prime } ,\omega }\)), we can say that

$$\begin{aligned} \forall \omega \in \left( {0, 1} \right) \forall \psi > 0 \forall i \forall t \exists t^{\prime } > 0 \left| {\vec {\zeta }_{{t^{\prime } ,\psi }} + \vec {\lambda }_{{t^{\prime } ,\omega }} } \right| < \psi \frac{\omega }{{1 - \omega }}. \end{aligned}$$

Also, as this statement is true for all \(\psi \), the statement is still true if the value of \(\psi \) is smaller than the value of \(\beta \) in the lemma 3, i.e., there exists \(t'\) that \(\left| {\vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime } ,\omega }} } \right| < \beta \frac{\omega }{{1 - \omega }}\), and, based on Observation (b), that \(\vec {m}_{t}^{i}\) can be any point in the intersection \(M_{t}^{i} \mathop \cap \nolimits ^ n_{\psi } \left( {\varvec{x}_{t}^{i} } \right) \ne \emptyset \) for all \(\psi \). According to the lemma 3, there exists \(h>\frac{\beta }{1-\omega }=\beta +\beta \frac{\omega }{1-\omega }\), obviously \(h > \beta + \beta \frac{\omega }{{1 - \omega }} > \beta \frac{\omega }{{1 - \omega }} > \left| {\vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime } ,\omega }} } \right| \). Hence, for all  \( h>\beta +\beta \frac{\omega }{1-\omega }\), it is also true that \(h > \left| {\vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime } ,\omega }} } \right| + \beta \). See Fig. 12 for the relation between \(\left| {\vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime } ,\omega }} } \right| ,~\beta \), and \(h\).

Fig. 12
figure 12

The relation between \(\left| \vec {\zeta }_{{t}^{\prime },\psi } + \vec {\lambda }_{{\text {t}^{\prime } ,{\omega }}} \right| \), \(\text{ h }\), and \(\beta \). The black dot in the middle is \({\vec {p}}_{\mathrm{t}^{\mathrm{i}}}\), the largest circle is \( {\text {n}}_{{\text {h}}} \left( \vec {p}_{{\text {t}}}^{{\text {i}}} \right) \), the dash-dotted circle is a circle with the radius \( \left| {\vec {\zeta }}_{{{\text {t}}^{\prime } ,{\psi }}} + {\vec {\lambda }}_{{{\text {t}}^{\prime } ,{\omega }}} \right| \), the dashed circle is a circle with the radius \(\beta \). Note that \({\text {h}} > \left| {{\vec {\zeta }}_{{{\text {t}}^{\prime }} ,{\beta }}} + {\vec {\lambda }}_{{{\text {t}}^{{\text {'}}} ,{\omega }}} \right| + {{\beta }}\)

Analysis of \(\vec {m}_{{t} + {t}^{\prime }- 1}^{i}\): Based on the lemma 3, for every value of\(\omega \) in the interval [0, 1], there exist \(\beta >0\) for every particle \(i\) at every iteration \(t\) there exist \(h>\frac{\beta }{1-\omega }\) such that \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) is in the convex hull \(M_{t}^{i}\). As \(\vec {m}_{{t + t^{\prime } - 1}}^{i}\) can be any point in \(M_{t}^{i}\) with non-zero probability (lemma 2), it can be any point in \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) as well with non-zero probability.

Based on Eq. 34, \( \vec {m}_{{t + t^{\prime }- 1}}^{i} \) is transformed by the vector \( \vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime } ,\omega }}\) to generate \( \vec {x}_{{t + t^{\prime }}}^{i}\). As \( \vec {m}_{{t + t^{\prime }- 1}}^{i}\) is selected from \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \) with non-zero probability (Analysis of \( \vec {m}_{{t} + {t}^{\prime }- 1}^{i}\), and because \(h > \left| {\vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime }} ,\omega }} \right| + \beta \), the point \( \vec {x}_{{t + t^{\prime } }}^{i} = \vec {m}_{{t + t^{\prime } - 1}}^{i} + \vec {\zeta }_{{t^{\prime } ,\beta }} + \vec {\lambda }_{{t^{\prime } ,\omega }}\) can be any point in \(n_{h} \left( {\vec {p}_{t}^{i} } \right) \) with non-zero probability as well.

There are two possible cases for the points in \( n_{h} \left( {\vec {p}_{t}^{i} } \right) \):

  • there exists a point \({\vec {p}} \) in \( {n}_{h} \left( {\vec {p}_{t}^{i}} \right) \) that is better than \(\vec {p}_{\mathrm{t}^\mathrm{i}}\) by at least \(\varepsilon _{0} \)

  • there is no point in \(\text {n}_\mathrm{h}\left( \vec {p}_{{t}^\mathrm{i}}\right) \) that is better than \(\vec {p}_{{t}^{i}}\) by at least \(\varepsilon _{0} \)

In the first case, \(\vec {x}_{{t + t^{\prime }}}^{i}\) has non-zero probability to be \(\vec {p}\), which satisfies C2 as \(F(\vec {p}) < F(\vec {p}_{t}^{i} ) - \varepsilon _{0}\). In the second case, \( \vec {p}_{t}^{i} \) is already in the optimality region as the objective value of \( \vec {p}_{t}^{i} \) is better than all points in \(n_{h} \left( {\vec {p}_{t}^{i} } \right) \) by \(\varepsilon _0 \). Thus, the particle \(i\) satisfies C2, which implies that the particle \(i\) converges to the optimality region. As this particle was an arbitrary particle, all particles converge to the optimality region. This implies that the algorithm is locally convergent. \(\square \)

Appendix 2

Theorem

it is proven that LcRiPSO is rotation, scale, and translation invariant if

$$\begin{aligned} \forall s \in R \forall Orth\left( Q \right) \forall \vec {b}, \vec {y} \in R^{d} sQf(\vec {y}) + \vec {b} = \hat{f}(sQ\vec {y} + \vec {b}). \end{aligned}$$

Proof

It is well-known that an algorithm is rotation, scalar, and translation invariant (RST invariant) if for all \(t>0\) \(\hat{x}_{{t + 1}}^{i} = sQ\vec {x}_{{t + 1}}^{i} + \vec {b} \) for any scalar s, orthogonal matrix Q, and vector \( \vec {b} \) (This is called general RST invariant condition), and \(\hat{x}_{t+1}^i \)is the position vector in the rotated, scaled, and translated space (Wilke et al. 2007b; Spiegel 1959) at iteration \(t+1\).

Let us re-write the position and velocity update rules of the proposed method as follows:

$$\begin{aligned} \vec {x}_{{t + 1}}^{i} = \vec {x}_{t}^{i} + \vec {V}_{{t + 1}}^{i}, \end{aligned}$$
(35)
$$\begin{aligned} \vec {V}_{{t + 1}}^{i} = \omega \vec {V}_{t}^{i} + \vec {v}_{t}^{i} , \end{aligned}$$
(36)
$$\begin{aligned} \vec {v}_{t}^{i} = \mathop \sum \limits _{{k \in T_{t}^{i} }} \varphi _{k} r_{{kt}}^{i} \left( {f_{k} (\vec {p}_{t}^{k} ) - \vec {x}_{t}^{i} } \right) . \end{aligned}$$
(37)

Note that, this notation is algebraically the same as the original notation. By calculating the left hand side \((\hat{x}_{t+1}^i )\) and right hand side (\( sQ\vec {x}_{{t + 1}}^{i} + \vec {b} \)) of the general RST invariant condition for the position update rule of the proposed method (Eq. 35), Eqs. 22 and 23 respectively emerge:

$$\begin{aligned} \hat{x}_{{t + 1}}^{i} = \hat{x}_{t}^{i} + \hat{V}_{{t + 1}}^{i} = sQ\vec {x}_{t}^{i} + \vec {b} + \hat{V}_{{t + 1}}^{i}, \end{aligned}$$
(38)
$$\begin{aligned} sQ\vec {x}_{{t + 1}}^{i} + \vec {b} = sQ\left( {\vec {x}_{t}^{i} + \vec {V}_{{t + 1}}^{i} } \right) + \vec {b} = sQ\vec {x}_{t}^{i} + \vec {b} + sQ\vec {V}_{{t + 1}}^{i}. \end{aligned}$$
(39)

By comparing Eqs. 38 and 39, it is obvious that the general RST invariant condition is satisfied if \(\hat{V}_{{t + 1}}^{i} = sQ\vec {V}_{{t + 1}}^{i}\). The left hand side and right hand side of this equation for the proposed velocity update rule (Eq. 36) are calculated as Eqs. 40 and 41, respectively.

$$\begin{aligned} \hat{V}_{{t + 1}}^{i} = \omega \hat{V}_{t}^{i} + \hat{v}_{t}^{i} = sQ\omega \vec {V}_{t}^{i} + \hat{v}_{t}^{i}, \end{aligned}$$
(40)
$$\begin{aligned} sQ\vec {V}_{{t + 1}}^{i} = sQ\omega \vec {V}_{t}^{i} + sQ\vec {v}_{t}^{i}. \end{aligned}$$
(41)

Note that, if \(\hat{V}_{{t + 1}}^{i} = sQ\vec {V}_{{t + 1}}^{i}\) then \(\hat{V}_{t}^{i} = sQ\vec {V}_{t}^{i}\). By comparing Eqs. 40 and 41, it is obvious that \(\hat{V}_{{t + 1}}^{i} = sQ\vec {V}_{{t + 1}}^{i}\) is satisfied iff \(\hat{v}_{t}^{i} = sQ\vec {v}_{t}^{i}\). Thus, the general RST invariant condition is satisfied in the proposed method iff \(\hat{v}_{t}^{i} = sQ\vec {v}_{t}^{i}\) for all Q and s, where \(\hat{v}_t^i \) is the stochastic velocity that has been transformed (rotated, scaled, and translated). This condition is called RST invariant condition. The left hand side and right hand side of the RST invariant condition for Eq. 37 is written as

$$\begin{aligned} \hat{v}_{t}^{i} = \mathop \sum \limits _{{k \in T_{t}^{i} }} \varphi _{k} r_{{kt}}^{i} \left( {\hat{f}_{k} (\hat{p}_{t}^{k} ) - \hat{x}_{t}^{i} } \right) = \mathop \sum \limits _{{k \in T_{t}^{i} }} \varphi _{k} r_{{kt}}^{i} \left( {\hat{f}_{k} (Q\vec {p}_{t}^{i} + \vec {b}) - sQ\vec {x}_{t}^{i} - \vec {b}} \right) , \qquad \quad \end{aligned}$$
(42)
$$\begin{aligned} sQ\vec {v}_{t}^{i} = sQ\left( {\mathop \sum \limits _{{k \in T_{t}^{i} }} \varphi _{k} r_{{kt}}^{i} \left( {f_{k} (\vec {p}_{t}^{k} ) - \vec {x}_{t}^{i} } \right) } \right) = \mathop \sum \limits _{{k \in T_{t}^{i} }} \varphi _{k} r_{{kt}}^{i} \left( {sQf_{k} (\vec {p}_{t}^{k} ) + \vec {b} - sQ\vec {x}_{t}^{i} - \vec {b}} \right) . \nonumber \\ \end{aligned}$$
(43)

Equation 43 has been written according to this principle that, because \(r\) is a random scalar, we can say Qr = rQ where Q is an orthogonal matrix. By comparing Eqs. 42 and 43, the condition \(\hat{v}_{t}^{i} = sQ\vec {v}_{t}^{i}\) is satisfied if \(sQf(\vec {y}) + \vec {b} = \hat{f}(sQ\vec {y} + \vec {b})\). \(\square \)

Appendix 3

According to Gut (2009), if \(N(\mu ,V)\) (\(V\) is the covariance matrix and \(\mu \) is the mean vector of the distribution) is rotated by Q, translated by b, and scaled by s, the mean of the new distribution is \(b+sQ\mu \) and its covariance matrix is \(s^{2}QVQ^{T}\), i.e., \(\hat{N}( {\hat{\mu },V})=N( {sQ\mu +b,s^2QVQ^T})\). Also, \(N(\mu ,V)= \mu + N(0,V)\)

To study the condition \(\hat{f}( {\hat{x}})=sQf( x)+b\) for \(f( \mu )=N( {\mu ,V})\), for the left hand side of the condition we have:

$$\begin{aligned} \hat{f}( {\hat{x}})=\hat{N}( {\hat{\mu },V})=N( {sQ\mu +b,s^2QVQ^T})=sQ\mu +b+N( {0,s^2QVQ^TI}). \end{aligned}$$
(44)

For the right hand side of the condition:

$$\begin{aligned} sQf( x)+b&= sQN( {\mu ,V})+b=sQ\mu +b+sQN( {0,V})\nonumber \\ {}&= sQ\mu +b+N( {0,s^2QQ^TVI}). \end{aligned}$$
(45)

By comparing Eqs. 45 and 44, in order to have the condition satisfied, we need to have \(Q^TV=VQ^T\). This equation is satisfied if \(V=aI\) where \(a\) is scalar and \(I\) is the identity matrix.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bonyadi, M.R., Michalewicz, Z. A locally convergent rotationally invariant particle swarm optimization algorithm. Swarm Intell 8, 159–198 (2014). https://doi.org/10.1007/s11721-014-0095-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11721-014-0095-1

Keywords

Navigation