Abstract
An essential activity to obtain valuable information to identify, for example, intrusions, faults, system failures, etc, is outliers detection. This paper proposes a bio-inspired algorithm able to detect anomaly data in distributed systems. Each data object is associated with a mobile agent that follows the well-known bio-inspired algorithm of flocking. The agents are randomly disseminated onto a virtual space where they move autonomously in order to form one or more flocks. Through a tailored similarity function, the agents associated with similar objects join in the same flock, whereas, the agents associated with dissimilar objects do not join in any flock. The objects associated with isolated agents or associated with agents grouped into flock with a number of entities lower than a given threshold, represent the outliers. Experimental results on synthetic and real data sets confirm the validity of the approach.
Similar content being viewed by others
References
Acuna E, Rodriguez C (2004) A meta analysis study of outlier detection methods in classification Technical paper. Department of Mathematics, University of Puerto Rico at Mayaguez
Aggarwal CC (2013) Outlier analysis. Springer Science & Business Media
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, USA, SIGMOD ’01, pp 37–46
Aggarwal CC, Han J, Wang J, Yu PS (2007) On clustering massive data streams: A summarization paradigm. In: Data Streams - Models and Algorithms, pp 9–38
Alam S, Dobbie G, Riddle P, Naeem MA (2010) A swarm intelligence based clustering approach for outlier detection. In: 2010 IEEE Congress on Evolutionary Computation (CEC), IEEE, pp 1–7
Arning A, Agrawal R, Raghavan P (1996) A linear method for deviation detection in large databases. In: KDD, pp 164–169
Asuncion A, Newman D (2007) Uci machine learning repository
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ACM, New York, NY, USA, PODS ’02, pp 1–16
Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 29–38
Bonabeau E, Dorigo M, Theraulaz G (1999) Swarm intelligence: from natural to artificial systems, vol 4. Oxford university press, New York
Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp 328–339
Cui X, Potok TE (2006) A distributed agent implementation of multiple species flocking model for document partitioning clustering. Springer, Lecture Notes in Computer Science, vol 4149, pp 124–137
Eberhart RC, Shi Y, Kennedy J (2001) Swarm Intelligence. Morgan Kaufmann
Elahi M, Li K, Nisar W, Lv X, Wang H (2008) Efficient clustering-based outlier detection algorithm for dynamic data stream. In: FSKD (5), IEEE Computer Society, pp 298–304
Ellabib I, Calamai PH, Basir O A (2007) Exchange strategies for multiple ant colony system. Inf Sci 177(5):1248–1264
Eskin E, Arnold A, Prerau M, Portnoy L, o SS (2002) A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer, Kluwer
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings 2 nd Int. Conf. on Knowledge Discovery and Data Mining(KDD’ 96), pp 226–231
Folino G, Forestiero A, Spezzano G (2009) An adaptive flocking algorithm for performing approximate clustering. Inf Sci 179(18):3059–3078
Forestiero A, Pizzuti C, Spezzano G (1) Flockstream: A bio-inspired algorithm for clustering evolving data streams. In: ICTAI, IEEE Computer Society
Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: A survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Huang L, Nguyen X, Garofalakis M, Jordan MI, Joseph A, Taft N (2006) In-network pca and anomaly detection. In: Advances in Neural Information Processing Systems, pp 617–624
Jindal R, Sharma SD, Manoj Sharma M (2013) A new technique to increase the working performance of the ant colony optimization algorithm. International Journal of Innovative Technology and Exploring Engineering 3(2):128–131
Khalilian M, Mustapha N (2010) Data stream clustering: Challenges and issues. CoRR abs/1006.5261
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: VLDB, Morgan Kaufmann, pp 392–403
Liu B, Cai M, Yu J (2015) Swarm intelligence and its application in abnormal data detection. Informatica 39 (1)
Locasto M E, Parekh J J, Stolfo S, Misra V (2004) Collaborative distributed intrusion detection
Mohemmed AW, Zhang M, Browne WN (2010) Particle swarm optimisation for outlier detection. In: GECCO, ACM, pp 83–84
Monmarch N, Slimane M, Venturini G (1999) On improving clustering in numerical databases with artificial ants. In: ECAL, Springer, Lecture Notes in Computer Science, vol 1674, pp 626–635
Murugavel P, Punithavalli M (2011) Improved hybrid clustering and distance-based technique for outlier removal. Int J Comput Sci Eng (IJCSE) 3(1):333–339
Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Disc 12(2-3):203–228
Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2003) Distributed deviation detection in sensor networks. ACM SIGMOD Rec 32(4):77–82
Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: CIDM, IEEE, pp 504–515
Porras PA, Neumann PG (1997) Emerald: Event monitoring enabling response to anomalous live disturbances. In: Proceedings of the 20th national information systems security conference, pp 353–365
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Chen W, Naughton JF, Bernstein PA (eds) SIGMOD Conference, ACM, pp 427–438, sIGMOD Record 29(2), June 2000
Reynolds CW (1987) Flocks, herds and schools: A distributed behavioral model. In: Stone MC (ed) SIGGRAPH, ACM, pp 25–34
Shafiq A, Gillian D, Riddle P (2008) An evolutionary particle swarm optimization algorithm for data clustering. In: Swarm Intelligence Symposium, IEEE, IEEE, pp 1–6
Su L, Han W, Yang S, Zou P, Jia Y (2007) Continuous adaptive outlier detection on distributed data streams. In: High Performance Computing and Communications. Springer, pp 74–85
Tang J, Chen Z, Fu AWC, Cheung DW (2007) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl Inf Syst 11 (1):45–84
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Forestiero, A. Bio-inspired algorithm for outliers detection. Multimed Tools Appl 76, 25659–25677 (2017). https://doi.org/10.1007/s11042-017-4443-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4443-1