ABSTRACT
This paper proposes OutViz, a dual view framework for representing and filtering multivariate time series data to highlight abnormal patterns in a dataset. The first view of the proposed visualization incorporates a parallel coordinate chart that allows the user to analyze the scores of features extracted from a dimensionality reduction density-based clustering outlier detection algorithm to determine why a particular time series is predicted to be an outlier. Also included on the parallel coordinates chart is an outlier score rank axis that allows the user to select a range of time series data to be filtered and displayed on the second view of the framework. The second view of our proposed framework uses a multi-line chart to represent how each time series variable changes over a range of time. Each time series is represented as a line with the position on the horizontal axis representing a point in time, while the vertical axis encodes the data value. Use cases using real-world multivariate time series data are demonstrated to show the advantages of using the proposed framework for data analytics as well as some findings uncovered while using OutViz on life expectancy data from 236 countries between the year 1960 and 2018, and carbon dioxide emissions data from 210 countries between the year 1960 and 2016.
- [n.d.]. Life expectancy at birth, total (years). https://data.worldbank.org/indicator/SP.DYN.LE00.IN?view=chartGoogle Scholar
- [n.d.]. Nutrient Parallel Coordinates. http://bl.ocks.org/syntagmatic/3150059Google Scholar
- 2006. Data science and classification. Springer-Verlag.Google Scholar
- 2016. Parallel Coordinates Visual Multidimensional Geometry and Its Applications. Springer Verlag.Google Scholar
- 2019. Human Development Report 2019. Human Development Report(2019). https://doi.org/10.18356/838f78fd-enGoogle ScholarCross Ref
- 2019. Rwanda genocide: 100 days of slaughter. https://www.bbc.com/news/world-africa-26875506Google Scholar
- Ali H Abuzaid. 2020. Identifying density‐based local outliers in medical multivariate circular data. Statistics in medicine 39, 21 (2020), 2793–2798.Google Scholar
- Wolfgang Aigner, Silvia Miksch, Heidrun Schumann, and Christian Tominski. 2011. Visualization of Time-Oriented Data. Springer London, Limited, London.Google Scholar
- Eve The Analyst. [n.d.]. Making a Line Chart in D3.js v.5. https://datawanderings.com/2019/10/28/tutorial-making-a-line-chart-in-d3-js-v-5/Google Scholar
- Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A. Lozano. 2020. A review on outlier/anomaly detection in time series data. arxiv:2002.04236 [cs.LG]Google Scholar
- J. Bor, A. J. Herbst, M.-L. Newell, and T. Barnighausen. 2013. Increases in Adult Life Expectancy in Rural South Africa: Valuing the Scale-Up of HIV Treatment. Science 339, 6122 (2013), 961–965. https://doi.org/10.1126/science.1230413Google ScholarCross Ref
- Hui Cao, Gangquan Si, Yanbin Zhang, and Lixin Jia. 2010. Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor. Expert systems with applications 37, 12 (2010), 8090–8101.Google Scholar
- Haoran Dai, Yubo Tao, and Hai Lin. 2019. Visual Analytics of Urban Transportation from a Bike-Sharing and Taxi Perspective. Proceedings of the 12th International Symposium on Visual Information Communication and Interaction(2019). https://doi.org/10.1145/3356422.3356433Google ScholarDigital Library
- Tuan Nhon Dang, Anushka Anand, and Leland Wilkinson. 2012. Timeseer: Scagnostics for high-dimensional time series. IEEE Transactions on Visualization and Computer Graphics 19, 3(2012), 470–483.Google ScholarDigital Library
- Takanori Fujiwara, Jianping Kelvin Li, Misbah Mubarak, Caitlin Ross, Christopher D. Carothers, Robert B. Ross, and Kwan-Liu Ma. 2018. A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems. Visual Informatics 2, 1 (2018), 98–110. https://doi.org/10.1016/j.visinf.2018.04.010Google ScholarCross Ref
- Takanori Fujiwara, Shilpika, Naohisa Sakamoto, Jorji Nonaka, Keiji Yamamoto, and Kwan-Liu Ma. 2021. A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction. IEEE transactions on visualization and computer graphics 27, 2(2021), 1601–1611.Google ScholarCross Ref
- T. Fujiwara, Shilpika, N. Sakamoto, J. Nonaka, K. Yamamoto, and K. L. Ma. 2021. A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction. IEEE Transactions on Visualization and Computer Graphics 27, 2(2021), 1601–1611. https://doi.org/10.1109/TVCG.2020.3028889Google ScholarCross Ref
- Ben D. Fulcher and Nick S. Jones. 2014. Highly Comparative Feature-Based Time-Series Classification. IEEE Transactions on Knowledge and Data Engineering 26, 12(2014), 3026–3037. https://doi.org/10.1109/tkde.2014.2316504Google ScholarCross Ref
- Daniel K Giles and Lucianne Walkowicz. 2020. Density-based outlier scoring on Kepler data. Monthly Notices of the Royal Astronomical Society 499, 1 (09 2020), 524–542. https://doi.org/10.1093/mnras/staa2736 arXiv:https://academic.oup.com/mnras/article-pdf/499/1/524/33857219/staa2736.pdfGoogle ScholarCross Ref
- Henning Gruendl, Patrick Riehmann, Yves Pausch, and Bernd Froehlich. 2016. Time-Series Plots Integrated in Parallel-Coordinates Displays. Computer Graphics Forum 35, 3 (2016), 321–330. https://doi.org/10.1111/cgf.12908Google ScholarDigital Library
- Rongchen Guo, Takanori Fujiwara, Yiran Li, Kelly M. Lima, Soman Sen, Nam K. Tran, and Kwan-Liu Ma. 2020. Comparative visual analytics for assessing medical records with sequence embedding. Visual Informatics 4, 2 (2020), 72–85. https://doi.org/10.1016/j.visinf.2020.04.001Google ScholarCross Ref
- Karel Haal, Anja Smith, and Eddy Van Doorslaer. 2018. The rise and fall of mortality inequality in South Africa in the HIV era. SSM - Population Health 5 (2018), 239–248. https://doi.org/10.1016/j.ssmph.2018.06.007Google ScholarCross Ref
- Alexander Laban. Hinton. 2005. Why did they kill?: Cambodia in the shadow of genocide. University of California Press.Google Scholar
- Rob J. Hyndman, Earo Wang, and Nikolay Laptev. 2015. Large-scale unusual time series detection. In Proceedings - 15th IEEE International Conference on Data Mining Workshop, Peng Cui, Jennifer Dry, Charu Aggarwal, Zhi-Hua Zhou, Alexander Tuzhilin, Hui Xiong, and Xindong Wu (Eds.). IEEE, Institute of Electrical and Electronics Engineers, United States of America, 1616–1619. https://doi.org/10.1109/ICDMW.2015.104 IEEE International Conference on Data Mining Workshops 2015, ICDMW 2015 ; Conference date: 14-11-2015 Through 17-11-2015.Google ScholarDigital Library
- Alfred Inselberg, Mordechai Reif, and Tuval Chomut. 1987. Convexity algorithms in parallel coordinates. J. ACM 34, 4 (1987), 765–801. https://doi.org/10.1145/31846.32221Google ScholarDigital Library
- Ben Kiernan. 2003. The Demography of Genocide in Southeast Asia: The Death Tolls in Cambodia, 1975-79, and East Timor, 1975-80. Critical Asian Studies 35, 4 (2003), 585–597. https://doi.org/10.1080/1467271032000147041Google ScholarCross Ref
- Tung Kieu, Bin Yang, Chenjuan Guo, and Christian S. Jensen. 2019. Outlier Detection for Time Series with Recurrent Autoencoder Ensembles. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence(2019). https://doi.org/10.24963/ijcai.2019/378Google ScholarCross Ref
- Shusen Liu, Dan Maljovec, Bei Wang, Peer-Timo Bremer, and Valerio Pascucci. 2017. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE transactions on visualization and computer graphics 23, 3(2017), 1249–1268.Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. , 2579–2605 pages.Google Scholar
- Shawn Martin and Tu-Toan Quach. 2016. Interactive Visualization of Multivariate Time Series Data. In Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, Dylan D. Schmorrow and Cali M. Fidopiastis (Eds.). Springer International Publishing, Cham, 322–332.Google Scholar
- Chidochashe L. Munangagwa. 2009. The Economic Decline of Zimbabwe. Gettysburg Economic Review 3, 7 (2009), 585–597. https://cupola.gettysburg.edu/ger/vol3/iss1/9Google Scholar
- Bao Dien Quoc Nguyen, Rattikorn Hewett, and Tommy Dang. 2020. Congnostics: Visual Features for Doubly Time Series Plots. In EuroVis Workshop on Visual Analytics (EuroVA), Cagatay Turkayand Katerina Vrotsou (Eds.). The Eurographics Association. https://doi.org/10.2312/eurova.20201086Google ScholarCross Ref
- P Pudil and J Hovovicova. 1998. Novel methods for subset selection with respect to problem knowledge. IEEE intelligent systems their applications 13, 2 (1998), 66–74.Google ScholarDigital Library
- Yan-Fang Sang, Zhonggen Wang, and Changming Liu. 2012. Period identification in hydrologic time series using empirical mode decomposition and maximum entropy spectral analysis. Journal of hydrology (Amsterdam) 424-425 (2012), 154–164.Google Scholar
- Jorge L. Serras, Susana Vinga, and Alexandra M. Carvalho. 2021. Outlier Detection for Multivariate Time Series Using Dynamic Bayesian Networks. Applied Sciences 11, 4 (2021), 1955. https://doi.org/10.3390/app11041955Google ScholarCross Ref
- Ruey S. Tsay. 2014. Multivariate time series analysis : with R and financial applications. Wiley.Google Scholar
- Peter Uvin. 2001. Reading the Rwandan Genocide. International Studies Review 3, 3 (2001), 75–99. http://www.jstor.org/stable/3186243Google ScholarCross Ref
- Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms. , 3221–3245 pages.Google Scholar
- Xiaozhe Wang, Kate Smith, and Rob Hyndman. 2006. Characteristic-Based Clustering for Time Series Data. Data Mining and Knowledge Discovery 13, 3 (2006), 335–364. https://doi.org/10.1007/s10618-005-0039-xGoogle ScholarDigital Library
- Edward J. Wegman. 1990. Hyperdimensional Data Analysis Using Parallel Coordinates. J. Amer. Statist. Assoc. 85, 411 (1990), 664–675. https://doi.org/10.1080/01621459.1990.10474926Google ScholarCross Ref
- William W. S. Wei. 2019. Multivariate time series analysis and applications (1st edition ed.). Wiley.Google Scholar
- Leland Wilkinson, Anushka Anand, and Robert Grossman. 2005. Graph-theoretic scagnostics. (2005).Google Scholar
Index Terms
- OutViz: Visualizing the Outliers of Multivariate Time Series
Recommendations
Finding multivariate outliers in fMRI time-series data
A fundamental challenge for researchers studying the brain is to explain how distributed patterns of brain activity relate to a specific representation or computation. Multivariate techniques are therefore becoming increasingly popular for pattern ...
The Quality of Clustering Data Containing Outliers
Intelligent Information and Database SystemsAbstractThis article evaluates the efficiency and performance of both clustering algorithms: an agglomerative hierarchical clustering AHC (with various linkage options and distance measures) and the algorithm. We assess the quality of clustering ...
Detecting and classifying outliers in big functional data
AbstractWe propose two new outlier detection methods, for identifying and classifying different types of outliers in (big) functional data sets. The proposed methods are based on an existing method called Massive Unsupervised Outlier Detection (MUOD). ...
Comments