Abstract
To make full use of unlabeled data for soft-sensor modelling and to address the coexistence of a large number of hard-to-measure variable issues, this study proposed a novel two-step adaptive heterogeneous co-training multioutput model. First, unlabeled data with the highest confidence were selected to optimize the model. Then, the proposed model co-trained Gaussian process regression (GPR) and least squares support vector machine (LSSVM) algorithms with two sets of independent labeled data. Second, at each step of the model update, the Kalman filter (KF) worked together with a moving window (MW) to strengthen the model to address process dynamics. Finally, the proposed model was demonstrated by a simulated wastewater treatment platform, BSM1, and a real sewage treatment plant. The root-mean-square error (RMSE) and root-mean sum of squares of the diagonal (RMSSD) were obviously reduced, and the correlation coefficient (R) and correlation coefficient (RR) reached 0.8 in both case studies. The results suggest that the proposed model can significantly improve prediction performance.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Al-Janabi S, Alkaim AF (2020) A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation. Soft Comput 24:555–569. https://doi.org/10.1007/s00500-019-03972-x
Al-Janabi S, Alkaim AF, Adel Z (2020a) An Innovative synthesis of deep learning techniques (DCapsNet & DCOM) for generation electrical renewable energy from wind energy. Soft Comput 24:10943–10962. https://doi.org/10.1007/s00500-020-04905-9
Al-Janabi S, Mahdi MA (2019) Evaluation prediction techniques to achievement an optimal biomedical analysis. In J Grid Util Comp 10:512–527. https://doi.org/10.1504/IJGUC.2019.10020511
Al-Janabi S, Mohammad M, Al-Sultan A (2020b) A new method for prediction of air pollution based on intelligent computation. Soft Comput 24:661–680. https://doi.org/10.1007/s00500-019-04495-1
Al Janabi S (2018) Smart system to create an optimal higher education environment using IDA and IOTs. Int J Comput Appl 42:244–259. https://doi.org/10.1080/1206212x.2018.1512460
Ali SH, Ieee (2012) Miner for OACCR: case of medical data analysis in knowledge discovery. 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications https://doi.org/10.1109/SETIT.2012.6482043
Alizadeh MJ, Nodoushan EJ, Kalarestaghi N, Chau KW (2017) Toward multi-day-ahead forecasting of suspended sediment concentration using ensemble models. Environ Sci Pollut Res 24:28017–28025. https://doi.org/10.1007/s11356-017-0405-4
Alkaim AF, Al-Janabi S (2020) Multi objectives optimization to gas flaring reduction from oil production. Big Data and Networks Technologies. Lect Notes Netw Syst. https://doi.org/10.1007/978-3-030-23672-4_10
Auger F, Hilairet M, Guerrero JM, Monmasson E, Orlowska-Kowalska T, Katsura S (2013) Industrial applications of the kalman filter: a review. IEEE Trans Ind Electron 60:5458–5471. https://doi.org/10.1109/tie.2012.2236994
Bao L, Yuan X, Ge Z (2015) Co-training partial least squares model for semi-supervised soft sensor development. Chemom Intell Lab Syst 147:75–85. https://doi.org/10.1016/j.chemolab.2015.08.002
Borchani H, Varando G, Bielza C, Larranaga P (2015) A survey on multioutput regression. Wiley Interdiscip Rev-Data Mining and Knowledge Discovery 5:216–233. https://doi.org/10.1002/widm.1157
Bose J, Mukherjee S (2019) Semi-supervised method using gaussian random fields for boilerplate removal in web browsers arXiv. arXiv (USA) 4:4. https://doi.org/10.1109/INDICON47234.2019.9030281
Bruzzone L, Chi MM, Marconcini M (2006) A novel transductive SVM for semisupervised classification of remote-sensing images. IEEE Trans Geosci Remote Sens 44:3363–3373. https://doi.org/10.1109/tgrs.2006.877950
Dror DM, van Putten-Rademaker O, Koren R (2009) Incidence of illness among resource-poor households: evidence from five locations in India Indian. J Med Res 130:146–154. https://doi.org/10.1038/icb.2009.19
Kaffash-Charandabi N, Alesheikh AA, Sharif M (2019) A ubiquitous asthma monitoring framework based on ambient air pollutants and individuals’ contexts. Environ Sci Pollut Res 26:7525–7539. https://doi.org/10.1007/s11356-019-04185-3
Kaghed NH, Abbas TA, Hussein Ali S (2006) Design and implementation of classification system for satellite images based on soft computing techniques. Int Conf Inform Commun Technol: from Theory to Applications. https://doi.org/10.1109/ICTTA.2006.1684408
Kocev D, Dzeroski S, White MD, Newell GR, Griffioen P (2009) Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol Model 220:1159–1168. https://doi.org/10.1016/j.ecolmodel.2009.01.037
Li D, Huang DP, Yu GP, Liu YQ (2020a) Learning adaptive semi-supervised multioutput soft-sensors with co-training of heterogeneous models. Ieee Access 8:46493–46504. https://doi.org/10.1109/access.2020.2979611
Li D, Liu Y, Huang D (2020b) Development of semi-supervised multiple-output soft-sensors with co-training and tri-training MPLS and MRVM. Chemom Intell Lab Syst 199:103970. https://doi.org/10.1016/j.chemolab.2020.103970
Liu Y (2017) Adaptive just-in-time and relevant vector machine based soft-sensors with adaptive differential evolution algorithms for parameter optimization. Chem Eng Sci 172:571–584. https://doi.org/10.1016/j.ces.2017.07.006
Liu Y, Liu B, Zhao X, Xie M (2019a) Development of RVM-based multiple-output soft sensors with serial and parallel stacking strategies. IEEE Trans Control Syst Technol 27:2727–2734. https://doi.org/10.1109/tcst.2018.2871934
Liu Y, Pan Y, Huang D (2015) Development of a novel adaptive soft-sensor using variational Bayesian PLS with accounting for online identification of key variables. Ind Eng Chem Res 54:338–350. https://doi.org/10.1021/ie503807e
Liu Y, Xiao H, Pan Y, Huang D, Wang Q (2016) Development of multiple-step soft-sensors using a Gaussian process model with application for fault prognosis. Chemom Intell Lab Syst 157:85–95. https://doi.org/10.1016/j.chemolab.2016.07.002
Liu Y, Xie M (2020) Rebooting data-driven soft-sensors in process industries: a review of kernel methods. J Process Control 89:58–73. https://doi.org/10.1016/j.jprocont.2020.03.012
Liu YQ, Liu B, Zhao XJ, Xie M (2018) A mixture of variational canonical correlation analysis for nonlinear and quality-relevant process monitoring. IEEE Trans Ind Electron 65:6478–6486. https://doi.org/10.1109/tie.2017.2786253
Liu ZJ, Wan JQ, Ma YW, Wang Y (2019b) Online prediction of effluent COD in the anaerobic wastewater treatment system based on PCA-LSSVM algorithm. Environ Sci Pollut Res 26:12828–12841. https://doi.org/10.1007/s11356-019-04671-8
Lopez-Montero EB, Wan J, Marjanovic O (2015) Trajectory tracking of batch product quality using intermittent measurements and moving window estimation. J Process Control 25:115–128. https://doi.org/10.1016/j.jprocont.2014.11.009
Ma SY, Si GQ, Yue WM, Ding ZQ, Ieee (2016) An online monitoring measure consistency computing algorithm by sliding window in multi-sensor system. 2016 Ieee International Conference on Mechatronics and Automation. doi: https://doi.org/10.1109/ICMA.2016.7558905
Nguyen HQ, Ha NT, Pham TL (2020) Inland harmful cyanobacterial bloom prediction in the eutrophic Tri An Reservoir using satellite band ratio and machine learning approaches. Environ Sci Pollut Res 27:9135–9151. https://doi.org/10.1007/s11356-019-07519-3
Pan B, Jin H, Wang L, Qian B, Chen X, Huang S, Li J (2019) Just-in-time learning based soft sensor with variable selection and weighting optimized by evolutionary optimization for quality prediction of nonlinear processes. Chem Eng Res Des 144:285–299. https://doi.org/10.1016/j.cherd.2019.02.004
Shao WM, Ge ZQ, Song ZH (2019) Soft-sensor development for processes with multiple operating modes based on semisupervised Gaussian mixture regression. IEEE Trans Control Syst Technol 27:2169–2181. https://doi.org/10.1109/tcst.2018.2856845
Tong JX, Hu BX, Yang JZ, Zhu Y (2016) Using a hybrid model to predict solute transfer from initially saturated soil into surface runoff with controlled drainage water. Environ Sci Pollut Res 23:12444–12455. https://doi.org/10.1007/s11356-016-6452-4
Tseng ML, Chang CH, Lin CWR, Wu KJ, Chen Q, Xia L, Xue B (2020) Future trends and guidance for the triple bottom line and sustainability: a data driven bibliometric analysis. Environ Sci Pollut Res 27:33543–33567. https://doi.org/10.1007/s11356-020-09284-0
Wu J, Cheng HC, Liu YQ, Huang DP, Yuan LH, Yao LY (2020) Learning soft sensors using time difference-based multi-kernel relevance vector machine with applications for quality-relevant monitoring in wastewater treatment. Environ Sci Pollut Res 27:28986–28999. https://doi.org/10.1007/s11356-020-09192-3
Xiao H, Ba B, Li X, Liu J, Liu Y, Huang D (2019) Interval multiple-output soft sensors development with capacity control for wastewater treatment applications: a comparative study. Chemom Intell Lab Syst 184:82–93. https://doi.org/10.1016/j.chemolab.2018.11.007
Yin L, Wang H, Fan W, Kou L, Lin T, Xiao Y (2019) Incorporate active learning to semi-supervised industrial fault classification. J Process Control 78:88–97. https://doi.org/10.1016/j.jprocont.2019.04.008
Zhang Y, Su H-Y, Chu J (2005) Soft sensor modeling based on fuzzy least squares support vector machines. Control Decis (China) 20:621–624. https://doi.org/10.13195/j.cd.2005.06.21.zhangy.005
Zhou ZH, Li M (2005) Semi-supervised regression with co-training. 19th International Joint Conference on Artificial Intelligence. IJCAI 2005:908–916
Funding
This work was supported by the National Natural Science Foundation of China (61873096, 62073145), Guangdong Basic and Applied Basic Research Foundation (2020A1515011057), Guangdong Technology International Cooperation Project Application (2020A0505100024). Fundamental Research Funds for the central Universities, SCUT (2020ZYGXZR034), Hainan Provincial Natural Science Foundation of China (618QN254). The author thanks the anonymous referees and other people for their help to improve the article.
Author information
Authors and Affiliations
Contributions
Dong Li, Daoping Huang, and Yiqi Liu conceived and designed the methodologies and case study. Dong Li performed the experiments and wrote the paper. Yiqi Liu helps review and edited the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest
The authors declare no competing interest.
Additional information
Responsible editor: Marcus Schulz
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Highlights
• Co-training Gaussian process regression (GPR) and least-square SVM (LS-SVM) models.
• New criteria to select unlabeled data with higher confidence for semisupervised learning.
• Two-step adaptive method to update the model from both of structures and parameters.
• Multi-output models can produce a simpler model structure and better computational efficiency.
Supplementary Information
ESM 1
(DOCX 20 kb)
Rights and permissions
About this article
Cite this article
Li, D., Huang, D. & Liu, Y. A novel two-step adaptive multioutput semisupervised soft sensor with applications in wastewater treatment. Environ Sci Pollut Res 28, 29131–29145 (2021). https://doi.org/10.1007/s11356-021-12656-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-021-12656-9