ABSTRACT
Air pollution forecast has become critical because of its direct impact on human health and its increased production caused by rapid industrialization. Machine learning (ML) solutions are being drastically explored in this domain because they can potentially produce highly accurate results with access to historical data. However, experts in the environmental area are skeptical about adopting ML solutions in real-world applications and policy making due to their black-box nature. In contrast, despite having low accuracy sometimes, the existing traditional simulation model (e.g., CMAQ) are widely used and follows well-defined and transparent equations. Therefore, presenting the knowledge learned by the ML model can make it transparent as well as comprehensible. In addition, validating the ML model’s learning with the existing domain knowledge might aid in addressing their skepticism, building appropriate trust, and better utilizing ML models. In collaboration with three experts with an average of five years of research experience in the air pollution domain, we identified that feature (meteorological feature like wind) contribution, towards the final forecast as the major information to be verified with domain knowledge. In addition, the accuracy of ML models compared with traditional simulation models and raw wind trajectories are essential for domain experts to validate the feature contribution. Based on the identified information, we designed and developed AQX, a visual analytics system to help experts validate and verify the ML model’s learning with their domain knowledge. The system includes multiple coordinated views to present the contributions of input features at different levels of aggregation in both temporal and spatial dimensions. It also provides a performance comparison of ML and traditional models in terms of accuracy and spatial map, along with the animation of raw wind trajectories for the input period. We further demonstrated two case studies and conducted expert interviews with two domain experts to show the effectiveness and usefulness of AQX.
- Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6(2018), 52138–52160.Google Scholar
- Antoine Alléon, Grégoire Jauvion, Boris Quennehen, and David Lissmyr. 2020. PlumeNet: Large-scale air quality forecasting using a convolutional LSTM network. arXiv preprint arXiv:2006.09204(2020).Google Scholar
- Gennady Andrienko, Natalia Andrienko, Wei Chen, Ross Maciejewski, and Ye Zhao. 2017. Visual analytics of mobility and transportation: State of the art and further research directions. IEEE Transactions on Intelligent Transportation Systems 18, 8(2017), 2232–2249.Google ScholarDigital Library
- Gennady Andrienko, Natalia Andrienko, Urska Demsar, Doris Dransch, Jason Dykes, Sara Irina Fabrikant, Mikael Jern, Menno-Jan Kraak, Heidrun Schumann, and Christian Tominski. 2010. Space, time and visual analytics. International journal of geographical information science 24, 10(2010), 1577–1600.Google Scholar
- K Wyat Appel, Alice B Gilliland, Golam Sarwar, and Robert C Gilliam. 2007. Evaluation of the Community Multiscale Air Quality (CMAQ) model version 4.5: sensitivities impacting model performance: part I—ozone. Atmospheric Environment 41, 40 (2007), 9603–9615.Google ScholarCross Ref
- Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58(2020), 82–115.Google ScholarDigital Library
- V Athira, P Geetha, Rab Vinayakumar, and KP Soman. 2018. Deepairnet: Applying recurrent networks for air quality prediction. Procedia computer science 132 (2018), 1394–1403.Google Scholar
- Sagar V Belavadi, Sreenidhi Rajagopal, R Ranjani, and Rajasekar Mohan. 2020. Air quality forecasting using LSTM RNN and wireless sensor networks. Procedia Computer Science 170 (2020), 241–248.Google ScholarCross Ref
- Colin Bellinger, Mohomed Shazan Mohomed Jabbar, Osmar Zaïane, and Alvaro Osornio-Vargas. 2017. A systematic review of data mining and machine learning for air pollution epidemiology. BMC public health 17, 1 (2017), 1–19.Google Scholar
- Daewon Byun and Kenneth L Schere. 2006. Review of the governing equations, computational algorithms, and other components of the Models-3 Community Multiscale Air Quality (CMAQ) modeling system. (2006).Google Scholar
- Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th international conference on intelligent user interfaces. 258–262.Google ScholarDigital Library
- Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 839–847.Google Scholar
- Wei Chen, Fangzhou Guo, and Fei-Yue Wang. 2015. A survey of traffic data visualization. IEEE Transactions on Intelligent Transportation Systems 16, 6(2015), 2970–2984.Google ScholarDigital Library
- Weiyu Cheng, Yanyan Shen, Yanmin Zhu, and Linpeng Huang. 2018. A neural attention model for urban air quality inference: Learning the weights of monitoring stations. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Mark W Craven and Jude W Shavlik. 1996. Extracting tree-structured representations of trained networks. Advances in neural information processing systems (1996), 24–30.Google Scholar
- Zikun Deng, Di Weng, Jiahui Chen, Ren Liu, Zhibin Wang, Jie Bao, Yu Zheng, and Yingcai Wu. 2019. Airvis: Visual analytics of air pollution propagation. IEEE transactions on visualization and computer graphics 26, 1(2019), 800–810.Google Scholar
- Mohamed Ben Ellefi, Zohra Bellahsene, and Konstantin Todorov. 2015. Datavore: a vocabulary recommender tool assisting Linked Data modeling. In ISWC: International Semantic Web Conference.Google Scholar
- Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics(2001), 1189–1232.Google Scholar
- Oscar Gomez, Steffen Holter, Jun Yuan, and Enrico Bertini. 2020. ViCE: visual counterfactual explanations for machine learning models. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 531–535.Google ScholarDigital Library
- Md Naimul Hoque and Klaus Mueller. 2021. Outcome-explorer: A causality guided interactive visual interface for interpretable algorithmic decision making. arXiv preprint arXiv:2101.00633(2021).Google Scholar
- Kenneth I Joy. 2007. Numerical methods for particle tracing in vector fields. On-Line Visualization Notes(2007), 1–7.Google Scholar
- Ilias Kalamaras, Ioannis Xygonakis, Konstantinos Glykos, Sigmund Akselsen, Arne Munch-Ellingsen, Hai Thanh Nguyen, Andreas Jacobsen Lepperod, Kerstin Bach, Konstantinos Votis, and Dimitrios Tzovaras. 2019. Visual analytics for exploring air quality data in an AI-enhanced IoT environment. In Proceedings of the 11th International Conference on Management of Digital EcoSystems. 103–110.Google ScholarDigital Library
- Niranjan Kamat, Prasanth Jayachandran, Karthik Tunga, and Arnab Nandi. 2014. Distributed and interactive cube exploration. In 2014 IEEE 30th International Conference on Data Engineering. IEEE, 472–483.Google ScholarCross Ref
- Marilena Kampa and Elias Castanas. 2008. Human health effects of air pollution. Environmental pollution 151, 2 (2008), 362–367.Google Scholar
- Jintao Ke, Hai Yang, Hongyu Zheng, Xiqun Chen, Yitian Jia, Pinghua Gong, and Jieping Ye. 2018. Hexagon-based convolutional neural network for supply-demand forecasting of ride-sourcing services. IEEE Transactions on Intelligent Transportation Systems 20, 11(2018), 4160–4173.Google ScholarCross Ref
- Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melançon. 2008. Visual analytics: Definition, process, and challenges. In Information visualization. Springer, 154–175.Google Scholar
- Lester B Lave and E Seskin. 1973. Air pollution and human health. Readings in Biology and Man 169 (1973), 294.Google Scholar
- Doyup Lee, Suehun Jung, Yeongjae Cheon, Dongil Kim, and Seungil You. 2018. Forecasting taxi demands with fully convolutional networks and temporal guided embedding. In NIPS 2018 Spatiotemporal Workshop.Google Scholar
- Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. 2015. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066(2015).Google Scholar
- Yuxuan Liang, Songyu Ke, Junbo Zhang, Xiuwen Yi, and Yu Zheng. 2018. Geoman: Multi-level attention networks for geo-sensory time series prediction.. In IJCAI. 3428–3434.Google Scholar
- Dongyu Liu, Panpan Xu, and Liu Ren. 2018. TPFlow: Progressive partition and multidimensional pattern extraction for large-scale spatio-temporal data analysis. IEEE transactions on visualization and computer graphics 25, 1(2018), 1–11.Google Scholar
- Zhicheng Liu, Biye Jiang, and Jeffrey Heer. 2013. imMens: Real-time visual querying of big data. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 421–430.Google Scholar
- Scott Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874(2017).Google Scholar
- Yao Ming, Huamin Qu, and Enrico Bertini. 2018. Rulematrix: Visualizing and understanding classifiers with rules. IEEE transactions on visualization and computer graphics 25, 1(2018), 342–352.Google Scholar
- Takayuki Miura, Satoshi Hasegawa, and Toshiki Shibahara. 2021. MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI. arXiv preprint arXiv:2107.08909(2021).Google Scholar
- Christoph Molnar. 2019. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/.Google Scholar
- Sayali Nemade. 2019. A Survey on Different Machine Learning Techniques for Air Quality Forecasting for Urban Air Pollution. International Journal for Research in Applied Science and Engineering Technology 7 (04 2019), 2185–2194. https://doi.org/10.22214/ijraset.2019.4395Google ScholarCross Ref
- Quoc Phong Nguyen, Kar Wai Lim, Dinil Mon Divakaran, Kian Hsiang Low, and Mun Choon Chan. 2019. GEE: A gradient-based explainable variational autoencoder for network anomaly detection. In 2019 IEEE Conference on Communications and Network Security (CNS). IEEE, 91–99.Google ScholarCross Ref
- Huamin Qu, Wing-Yi Chan, Anbang Xu, Kai-Lun Chung, Kai-Hon Lau, and Ping Guo. 2007. Visual analysis of the air pollution problem in Hong Kong. IEEE Transactions on visualization and Computer Graphics 13, 6(2007), 1408–1415.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
- Dominik Sacha, Matthias Kraus, Daniel A Keim, and Min Chen. 2018. Vis4ml: An ontology for visual analytics assisted machine learning. IEEE transactions on visualization and computer graphics 25, 1(2018), 385–395.Google Scholar
- Sam Sattarzadeh, Mahesh Sudhakar, Konstantinos N Plataniotis, Jongseong Jang, Yeonjeong Jeong, and Hyunwoo Kim. 2021. Integrated Grad-Cam: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks Via Integrated Gradient-Based Scoring. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1775–1779.Google Scholar
- Michael Sedlmair, Miriah Meyer, and Tamara Munzner. 2012. Design Study Methodology: Reflections from the Trenches and the Stacks. IEEE Transactions on Visualization and Computer Graphics 18, 12(2012), 2431–2440. https://doi.org/10.1109/TVCG.2012.213Google ScholarDigital Library
- Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.Google ScholarCross Ref
- Qiaomu Shen, Yanhong Wu, Yuzhe Jiang, Wei Zeng, KH Alexis, Anna Vianova, and Huamin Qu. 2020. Visual interpretation of recurrent neural network on multi-dimensional time-series forecast. In 2020 IEEE Pacific Visualization Symposium (PacificVis). IEEE, 61–70.Google ScholarCross Ref
- Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. arXiv preprint arXiv:1506.04214(2015).Google Scholar
- Akshat Shrivastava and Jeffrey Heer. 2020. ISEQL: Interactive sequence learning. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 43–54.Google ScholarDigital Library
- Hyesook Son, Seokyeon Kim, Hanbyul Yeon, Miyeon Lee, Yejin Kim, and Yun Jang. [n. d.]. Visual Deep Learning Models Analysis for Air Pollution Predictions. ([n. d.]).Google Scholar
- Thilo Spinner, Udo Schlegel, Hanna Schäfer, and Mennatallah El-Assady. 2019. explAIner: A visual analytics framework for interactive and explainable machine learning. IEEE transactions on visualization and computer graphics 26, 1(2019), 1064–1074.Google Scholar
- Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M Rush. 2017. Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE transactions on visualization and computer graphics 24, 1(2017), 667–676.Google Scholar
- Junpeng Wang, Liang Gou, Han-Wei Shen, and Hao Yang. 2018. Dqnviz: A visual analytics approach to understand deep q-networks. IEEE transactions on visualization and computer graphics 25, 1(2018), 288–298.Google Scholar
- Junpeng Wang, Liang Gou, Hao Yang, and Han-Wei Shen. 2018. Ganviz: A visual analytics approach to understand the adversarial game. IEEE transactions on visualization and computer graphics 24, 6(2018), 1905–1917.Google Scholar
- Senzhang Wang, Jiannong Cao, and Philip Yu. 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Transactions on Knowledge and Data Engineering (2020).Google ScholarCross Ref
- Yunbo Wang, Zhifeng Gao, Mingsheng Long, Jianmin Wang, and S Yu Philip. 2018. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In International Conference on Machine Learning. PMLR, 5123–5132.Google Scholar
- Yunbo Wang, Mingsheng Long, Jianmin Wang, Zhifeng Gao, and Philip S Yu. 2017. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 879–888.Google Scholar
- Yunbo Wang, Jianjin Zhang, Hongyu Zhu, Mingsheng Long, Jianmin Wang, and Philip S Yu. 2019. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9154–9162.Google ScholarCross Ref
- Daniel Karl I Weidele, Justin D Weisz, Erick Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2020. AutoAIViz: opening the blackbox of automated artificial intelligence with conditional parallel coordinates. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 308–312.Google ScholarDigital Library
- Chenyang Xu and Jerry L Prince. 1997. Gradient Vector Flow: A New External Force for Snakes. In Proceedings of IEEE International Conference on Computer Vision. 66–71.Google Scholar
- Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. 2016. DNN-based prediction model for spatio-temporal data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 1–4.Google ScholarDigital Library
- Pengpeng Zhao, Anjing Luo, Yanchi Liu, Fuzhen Zhuang, Jiajie Xu, Zhixu Li, Victor S Sheng, and Xiaofang Zhou. 2020. Where to go next: A spatio-temporal gated network for next poi recommendation. IEEE Transactions on Knowledge and Data Engineering (2020).Google Scholar
- Zhiguang Zhou, Zhifei Ye, Yanan Liu, Fang Liu, Yubo Tao, and Weihua Su. 2017. Visual analytics for spatial clusters of air-quality data. IEEE computer graphics and applications 37, 5 (2017), 98–105.Google ScholarDigital Library
Index Terms
- AQX: Explaining Air Quality Forecast for Verifying Domain Knowledge using Feature Importance Visualization
Recommendations
Integrating machine learning with knowledge acquisition through direct interaction with domain experts
Knowledge elicitation from experts and empirical machine learning are two distinct approaches to knowledge acquisition with differing and mutually complementary capabilities. Learning apprentices have provided environments in which a knowledge engineer ...
Long-term time-series pollution forecast using statistical and deep learning methods
AbstractTackling air pollution has become of utmost importance since the last few decades. Different statistical as well as deep learning methods have been proposed till now, but seldom those have been used to forecast future long-term pollution trends. ...
Design of Knowledge-Based Systems with a Knowledge-Based Assistant
Special Issue on Artificial Intelligence in Software ApplicationsThe authors propose a model for an intelligent assistant to aid in building knowledge-based systems (KBSs) and discuss a preliminary implementation. The assistant participates in KBS construction, including acquisition of an initial model of a problem ...
Comments