Abstract
Prior works on the trajectory outlier detection problem solely consider individual outliers. However, in real-world scenarios, trajectory outliers can often appear in groups, e.g., a group of bikes that deviates to the usual trajectory due to the maintenance of streets in the context of intelligent transportation. The current paper considers the Group Trajectory Outlier (GTO) problem and proposes three algorithms. The first and the second algorithms are extensions of the well-known DBSCAN and kNN algorithms, while the third one models the GTO problem as a feature selection problem. Furthermore, two different enhancements for the proposed algorithms are proposed. The first one is based on ensemble learning and computational intelligence, which allows for merging algorithms’ outputs to possibly improve the final result. The second is a general high-performance computing framework that deals with big trajectory databases, which we used for a GPU-based implementation. Experimental results on different real trajectory databases show the scalability of the proposed approaches.
- Fabrizio Angiulli, Stefano Basta, Stefano Lodi, and Claudio Sartori. 2016. GPU strategies for distance-based outlier detection. IEEE Transactions on Parallel and Distributed Systems 27, 11 (2016), 3256--3268.Google ScholarDigital Library
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In ACM SIGMOD Record, Vol. 29. 93--104.Google ScholarDigital Library
- Marcus A. Brubaker, Andreas Geiger, and Raquel Urtasun. 2015. Map-based probabilistic visual self-localization. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 4 (2015), 652--665.Google ScholarDigital Library
- José Camacho, Roberto Therón, José M García-Giménez, Gabriel Maciá-Fernández, and Pedro García-Teodoro. 2019. Group-wise principal component analysis for exploratory intrusion detection. IEEE Access 7 (2019), 113081--113093.Google ScholarCross Ref
- Gilles Celeux, Didier Chauveau, and Jean Diebolt. 1996. Stochastic versions of the EM algorithm: An experimental study in the mixture case. Journal of Statistical Computation and Simulation 55, 4 (1996), 287--314.Google ScholarCross Ref
- Raghavendra Chalapathy, Edward Toth, and Sanjay Chawla. 2018. Group anomaly detection using deep generative models. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 173--189.Google Scholar
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 3 (2009), 15.Google ScholarDigital Library
- Kaustav Das and Jeff Schneider. 2007. Detecting anomalous records in categorical datasets. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 220--229.Google ScholarDigital Library
- Kaustav Das, Jeff Schneider, and Daniel B. Neill. 2008. Anomaly pattern detection in categorical datasets. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 169--176.Google Scholar
- Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.Google ScholarDigital Library
- David Dernoncourt, Blaise Hanczar, and Jean-Daniel Zucker. 2014. Analysis of feature selection stability on high dimension and small sample data. Computational Statistics 8 Data Analysis 71 (2014), 681--693.Google Scholar
- Youcef Djenouri, Asma Belhadi, Jerry Chun-Wei Lin, Djamel Djenouri, and Alberto Cano. 2019. A survey on urban traffic anomalies detection algorithms. IEEE Access 7 (2019), 12192--12205.Google ScholarCross Ref
- John D. Eblen, Charles A. Phillips, Gary L. Rogers, and Michael A. Langston. 2012. The maximum clique enumeration problem: Algorithms, applications, and implementations. In BMC Bioinformatics, Vol. 13. BioMed Central, S5.Google Scholar
- Andries P. Engelbrecht. 2007. Computational Intelligence: An Introduction. John Wiley 8 Sons.Google ScholarCross Ref
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of KDD. 226--231.Google Scholar
- Jinan Fan, Qianru Zhang, Jialei Zhu, Meng Zhang, Zhou Yang, and Hanxiang Cao. 2020. Robust deep auto-encoding Gaussian process regression for unsupervised anomaly detection. Neurocomputing 376 (2020), 180--190.Google ScholarDigital Library
- Philippe Fournier-Viger, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Cheng-Wei Wu, and Vincent S. Tseng. 2014. SPMF: A Java open-source pattern mining library. The Journal of Machine Learning Research 15, 1 (2014), 3389--3393.Google ScholarDigital Library
- Yanjie Fu, Guannan Liu, Yong Ge, Pengyang Wang, Hengshu Zhu, Chunxiao Li, and Hui Xiong. 2018. Representing urban forms: A collective learning model with heterogeneous human mobility data. IEEE Transactions on Knowledge and Data Engineering 31, 3 (2018), 535--548.Google ScholarDigital Library
- Yong Ge, Hui Xiong, Zhi-hua Zhou, Hasan Ozdemir, Jannite Yu, and Kuo Chu Lee. 2010. Top-eye: Top-k evolving trajectory outlier detection. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 1733--1736.Google ScholarDigital Library
- Stuart Geman and Donald Geman. 1987. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. In Readings in Computer Vision. Elsevier, 564--584.Google ScholarDigital Library
- Chong Yang Goh, Justin Dauwels, Nikola Mitrovic, Muhammad Tayyab Asif, Ali Oran, and Patrick Jaillet. 2012. Online map-matching based on hidden markov model for real-time traffic sensing applications. In 2012 15th International IEEE Conference on Intelligent Transportation Systems. IEEE, 776--781.Google ScholarCross Ref
- Baofeng Guo and Mark S. Nixon. 2008. Gait feature subset selection by mutual information. IEEE Transactions on Systems, MAN, and Cybernetics-part a: Systems and Humans 39, 1 (2008), 36--46.Google Scholar
- Tom Harris. 1993. A Kohonen SOM based, machine health monitoring system which enables diagnosis of faults not seen in the training set. In Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Vol. 1. IEEE, 947--950.Google ScholarCross Ref
- Xiaofei He, Deng Cai, and Partha Niyogi. 2006. Laplacian score for feature selection. In Advances in Neural Information Processing Systems. 507--514.Google Scholar
- Victoria Hodge and Jim Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Review 22, 2 (2004), 85--126.Google ScholarDigital Library
- Yi Huang, Dong Xu, and Feiping Nie. 2012. Semi-supervised dimension reduction using trace ratio criterion. IEEE Transactions on Neural Networks and Learning Systems 23, 3 (2012), 519--526.Google ScholarCross Ref
- Mohamed Amine Kafi, Yacine Challal, Djamel Djenouri, Abdelmadjid Bouabdallah, Lyes Khelladi, and Nadjib Badache. 2012. A study of wireless sensor network architectures and projects for traffic light monitoring. In Proceedings of the 3rd International Conference on Ambient Systems, Networks and Technologies (ANT 2012), the 9th International Conference on Mobile Web Information Systems (MobiWIS-2012), Niagara Falls, Ontario, Canada, August 27-29, 2012 (Procedia Computer Science), Elhadi M. Shakshuki and Muhammad Younas (Eds.), Vol. 10. Elsevier, 543--552.Google ScholarCross Ref
- Ramakrishnan Kannan, Hyenkyun Woo, Charu C. Aggarwal, and Haesun Park. 2017. Outlier detection for text data. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 489--497.Google ScholarCross Ref
- Elmouatezbillah Karbab, Djamel Djenouri, Sahar Boulkaboul, and Antoine B. Bagula. 2015. Car park management with networked wireless sensors and active RFID. In IEEE International Conference on Electro/Information Technology, EIT 2015, Dekalb, IL, USA, May 21-23, 2015. IEEE, 373--378. DOI:https://doi.org/10.1109/EIT.2015.7293372Google ScholarCross Ref
- Kenji Kira and Larry A. Rendell. 1992. A practical approach to feature selection. In Machine Learning Proceedings 1992. Elsevier, 249--256.Google Scholar
- Xiangjie Kong, Ximeng Song, Feng Xia, Haochen Guo, Jinzhong Wang, and Amr Tolba. 2017. LoTAD: Long-term traffic anomaly detection based on crowdsourced bus trajectory data. World Wide Web (2017), 1--23.Google Scholar
- Christopher W. Landsea and James L. Franklin. 2013. Atlantic hurricane database uncertainty and presentation of a new database format. Monthly Weather Review 141, 10 (2013), 3576--3592.Google ScholarCross Ref
- Jae-Gil Lee, Jiawei Han, and Xiaolei Li. 2008. Trajectory outlier detection: A partition-and-detect framework. In Proc. of ICDE. 140--149.Google ScholarDigital Library
- Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: A partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 593--604.Google ScholarDigital Library
- Huanhuan Li, Jingxian Liu, Kefeng Wu, Zaili Yang, Ryan Wen Liu, and Naixue Xiong. 2018. Spatio-temporal vessel trajectory clustering based on data mapping and density. IEEE Access (2018).Google Scholar
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50, 6 (2018), 94.Google ScholarDigital Library
- Junli Li, Jifu Zhang, Ning Pang, and Xiao Qin. 2018. Weighted outlier detection of high-dimensional categorical data using feature grouping. IEEE Transactions on Systems, Man, and Cybernetics: Systems99 (2018), 1--14.Google Scholar
- Sheng Li, Ming Shao, and Yun Fu. 2018. Multi-view low-rank analysis with applications to outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 3 (2018), 32.Google Scholar
- Wenjia Li, Houbing Song, and Feng Zeng. 2018. Policy-based secure and trustworthy sensing for internet of things in smart cities. IEEE Internet of Things Journal 5, 2 (2018), 716--723.Google ScholarCross Ref
- Yang Li, Qixing Huang, Michael Kerber, Lin Zhang, and Leonidas Guibas. 2013. Large-scale joint map matching of GPS traces. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 214--223.Google ScholarDigital Library
- Chuanren Liu, Hui Xiong, Yong Ge, Wei Geng, and Matt Perkins. 2012. A stochastic model for context-aware anomaly detection in indoor location traces. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 449--458.Google ScholarDigital Library
- Yanchi Liu, Chuanren Liu, Nicholas Jing Yuan, Lian Duan, Yanjie Fu, Hui Xiong, Songhua Xu, and Junjie Wu. 2017. Intelligent bus routing with heterogeneous human mobility patterns. Knowledge and Information Systems 50, 2 (2017), 383--415.Google ScholarDigital Library
- Dennis Luxen and Christian Vetter. 2011. Real-time routing with OpenStreetMap data. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 513--516.Google ScholarDigital Library
- Jiali Mao, Pengda Sun, Cheqing Jin, and Aoying Zhou. 2018. Outlier detection over distributed trajectory streams. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 64--72.Google ScholarCross Ref
- Jiali Mao, Tao Wang, Cheqing Jin, and Aoying Zhou. 2017. Feature grouping-based outlier detection upon streaming trajectories. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2696--2709.Google ScholarCross Ref
- Takumi Matsumoto, Yuya Sasaki, and Makoto Onizuka. 2019. Data slice search for local outlier view detection: A case study in fashion EC. In EDBT/ICDT Workshops.Google Scholar
- Natwar Modani and Kuntal Dey. 2008. Large maximal cliques enumeration in sparse graphs. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 1377--1378.Google ScholarDigital Library
- Andrew Moore and Weng-Keen Wong. 2003. Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In ICML, Vol. 3. 552--559.Google Scholar
- Feiping Nie, Shiming Xiang, Yangqing Jia, Changshui Zhang, and Shuicheng Yan. 2008. Trace ratio criterion for feature selection. In AAAI, Vol. 2. 671--676.Google ScholarDigital Library
- Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis 8 Machine Intelligence8 (2005), 1226--1238.Google Scholar
- Anatolii Prokhorchuk, Justin Dauwels, and Patrick Jaillet. 2019. Estimating travel time distributions by bayesian network inference. IEEE Transactions on Intelligent Transportation Systems (2019).Google ScholarCross Ref
- Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. 2000. Efficient algorithms for mining outliers from large data sets. In ACM SIGMOD Record, Vol. 29. 427--438.Google ScholarDigital Library
- Kai-Quan Shen, Chong-Jin Ong, Xiao-Ping Li, Zheng Hui, and Einar PV Wilder-Smith. 2007. A feature selection method for multilevel mental fatigue EEG classification. IEEE Transactions on Biomedical Engineering 54, 7 (2007), 1231--1237.Google ScholarCross Ref
- Hossein Soleimani and David J. Miller. 2016. ATD: Anomalous topic discovery in high dimensional discrete data. IEEE Transactions on Knowledge and Data Engineering 28, 9 (2016), 2267--2280.Google ScholarDigital Library
- Chenfei Sun, Zhongmin Yan, Qingzhong Li, Yongqing Zheng, Xudong Lu, and Lizhen Cui. 2019. Abnormal group-based joint medical fraud detection. IEEE Access 7 (2019), 13589--13596.Google ScholarCross Ref
- Guanting Tang, Jian Pei, James Bailey, and Guozhu Dong. 2015. Mining multidimensional contextual outliers from categorical relational data. Intelligent Data Analysis 19, 5 (2015), 1171--1192.Google ScholarCross Ref
- Kevin Toohey and Matt Duckham. 2015. Trajectory similarity measures. Sigspatial Special 7, 1 (2015), 43--50.Google ScholarDigital Library
- Edward Toth and Sanjay Chawla. 2018. Group deviation detection methods: A survey. ACM Computing Surveys (CSUR) 51, 4 (2018), 77.Google ScholarDigital Library
- Md Zia Uddin. 2019. A wearable sensor-based activity prediction system to facilitate edge computing in smart healthcare system. J. Parallel and Distrib. Comput. 123 (2019), 46--53.Google ScholarCross Ref
- Thé Van Luong, Nouredine Melab, and El-Ghazali Talbi. 2013. GPU computing for parallel local search metaheuristic algorithms. IEEE Trans. Comput. 62, 1 (2013), 173--185.Google ScholarDigital Library
- Jan N. van Rijn, Geoffrey Holmes, Bernhard Pfahringer, and Joaquin Vanschoren. 2018. The online performance estimation framework: Heterogeneous ensemble learning for data streams. Machine Learning 107, 1 (2018), 149--176.Google ScholarDigital Library
- José R. Vázquez-Canteli, Stepan Ulyanin, Jérôme Kämpf, and Zoltán Nagy. 2019. Fusing TensorFlow with building energy simulation for intelligent energy management in smart cities. Sustainable Cities and Society 45 (2019), 243--257.Google ScholarCross Ref
- Pengfei Wang, Yanjie Fu, Guannan Liu, Wenqing Hu, and Charu Aggarwal. 2017. Human mobility synchronization and trip purpose detection with mixture of hawkes processes. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 495--503.Google ScholarDigital Library
- Pengfei Wang, Guannan Liu, Yanjie Fu, Yuanchun Zhou, and Jianhui Li. 2017. Spotting trip purposes from taxi trajectories: A general probabilistic model. ACM Transactions on Intelligent Systems and Technology (TIST) 9, 3 (2017), 1--26.Google ScholarDigital Library
- Hao Wu, Weiwei Sun, and Baihua Zheng. 2017. A fast trajectory outlier detection approach via driving behavior modeling. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 837--846.Google ScholarDigital Library
- Jingyao Wu, Zhibin Zhao, Chuang Sun, Ruqiang Yan, and Xuefeng Chen. 2020. Fault-attention generative probabilistic adversarial autoencoder for machine anomaly detection. IEEE Transactions on Industrial Informatics (2020).Google ScholarCross Ref
- Liang Xiong, Barnabás Póczos, Jeff Schneider, Andrew Connolly, and Jake VanderPlas. 2011. Hierarchical probabilistic models for group anomaly detection. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 789--797.Google Scholar
- Liang Xiong, Barnabás Póczos, and Jeff G. Schneider. 2011. Group anomaly detection using flexible genre models. In Advances in Neural Information Processing Systems. 1071--1079.Google ScholarDigital Library
- Cao Lei Yu, Yanwei, Elke A Rundensteiner, and Qin Wang. 2017. Outlier detection over massive-scale trajectory streams. ACM Transactions on Database Systems (TODS) 42, 2 (2017), 10.Google ScholarDigital Library
- Rose Yu, Xinran He, and Yan Liu. 2015. Glad: Group anomaly detection in social media analysis. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 2 (2015), 18.Google ScholarDigital Library
- Yanwei Yu, Lei Cao, Elke A Rundensteiner, and Qin Wang. 2014. Detecting moving object outliers in massive-scale trajectory streams. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 422--431.Google ScholarDigital Library
- Nicholas Jing Yuan, Yu Zheng, Xing Xie, Yingzi Wang, Kai Zheng, and Hui Xiong. 2014. Discovering urban functional zones using latent activity trajectories. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2014), 712--725.Google ScholarDigital Library
- Jianting Zhang. 2012. Smarter outlier detection and deeper understanding of large-scale taxi trip records: A case study of NYC. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing. ACM, 157--162.Google ScholarDigital Library
- Jiong Zhang, Mohammad Zulkernine, and Anwar Haque. 2008. Random-forests-based network intrusion detection systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, 5 (2008), 649--659.Google ScholarDigital Library
- Yang Zhang, Nirvana Meratnia, and Paul JM Havinga. 2010. Outlier detection techniques for wireless sensor networks: A survey. IEEE Communications Surveys and Tutorials 12, 2 (2010), 159--170.Google ScholarDigital Library
- Xujun Zhao, Jifu Zhang, Xiao Qin, Jianghui Cai, and Yang Ma. 2019. Parallel mining of contextual outlier using sparse subspace. Expert Systems with Applications 126 (2019), 158--170.Google ScholarDigital Library
- Zheng Zhao and Huan Liu. 2007. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th International Conference on Machine Learning. ACM, 1151--1157.Google ScholarDigital Library
- Zhong Zheng, Soora Rasouli, and Harry Timmermans. 2014. Evaluating the accuracy of GPS-based taxi trajectory records. Procedia Environmental Sciences 22, 2014 (2014), 186--198.Google ScholarCross Ref
- Zhaohui Zheng, Xiaoyun Wu, and Rohini Srihari. 2004. Feature selection for text categorization on imbalanced data. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 80--89.Google ScholarDigital Library
- Jie Zhu, Wei Jiang, An Liu, Guanfeng Liu, and Lei Zhao. 2015. Time-dependent popular routes based trajectory outlier detection. In International Conference on Web Information Systems Engineering. Springer, 16--30.Google ScholarDigital Library
- Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In AAAI, Vol. 8. Chicago, IL, USA, 1433--1438.Google ScholarDigital Library
- Arthur Zimek, Ricardo JGB Campello, and Jörg Sander. 2014. Ensembles for unsupervised outlier detection: Challenges and research questions a position paper. Acm Sigkdd Explorations Newsletter 15, 1 (2014), 11--22.Google ScholarDigital Library
- Arthur Zimek, Matthew Gaudet, Ricardo JGB Campello, and Jörg Sander. 2013. Subsampling for efficient and effective unsupervised outlier detection ensembles. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 428--436.Google ScholarDigital Library
Index Terms
- Machine Learning for Identifying Group Trajectory Outliers
Recommendations
Trajectory Outlier Detection: New Problems and Solutions for Smart Cities
Survey Paper and Regular PapersThis article introduces two new problems related to trajectory outlier detection: (1) group trajectory outlier (GTO) detection and (2) deviation point detection for both individual and group of trajectory outliers. Five algorithms are proposed for the ...
Trajectory Outlier Detection: Algorithms, Taxonomies, Evaluation, and Open Challenges
Special Section on WITS 2018 and Regular ArticlesDetecting abnormal trajectories is an important task in research and industrial applications, which has attracted considerable attention in recent decades. This work studies the existing trajectory outlier detection algorithms in different industrial ...
Context learning from a ship trajectory cluster for anomaly detection
Highlights- Context learning extraction from ship trajectory data.
- AIS data real-world data use for data mining problems and anomaly detection.
- Trajectory compression and segmentation techniques.
- Data mining techniques for trajectory ...
AbstractThis paper presents a context information extraction process over Automatic Identification System (AIS) real-world ship data, building a system with the capability to extract representative points of a trajectory cluster. With the trajectory ...
Comments