Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams
<p>Schematic diagram of adaptive selective ensemble learning framework.</p> "> Figure 2
<p>The overall framework of the ODCSS soft sensor method.</p> "> Figure 3
<p>Flowchart of TE chemical process.</p> "> Figure 4
<p>Prediction trends of E component in stream 11 by different soft sensor methods.</p> "> Figure 5
<p>Illustrations of online dynamic clustering for TE process.</p> "> Figure 6
<p>Comparison of RMSE trends of different soft sensor methods.</p> "> Figure 7
<p>Flowchart of industrial fed-batch CTC fermentation process.</p> "> Figure 8
<p>Scatter plots of substrate concentration predictions by different soft sensor methods.</p> "> Figure 9
<p>Illustrations of online dynamic clustering for CTC process.</p> ">
Abstract
:1. Introduction
- (1)
- Rule-based data stream regression algorithms. Shaker et al. [22] proposed a fuzzy rule-learning algorithm called TSKstream for data stream adaptive regression. The method introduces a new TSK fuzzy rule induction strategy by combining the merits of the rule induction concept implemented in AMRules with the expressive power of TSK fuzzy rules, which solves the problem of adaptive learning from evolving data streams. Yu et al. [23] proposed an online multi-output regression algorithm called MORStreaming, which learns instances based on topological networks and correlations between outputs based on adaptive rules and can solve the problem of multiple output regression in the data stream environment.
- (2)
- Tree-model-based data stream regression algorithms. Gomes et al. [24] proposed an adaptive random forest algorithm capable of handling data stream regression tasks (ARF Reg). The algorithm uses the adaptive sliding window drift detection method and experiments with the original Page Hinkley test inside each FIMI-DD to detect and adapt to drift. Zhong et al. [25] proposed an online weight-learning random forest regression (OWL-RFR). This method focuses on a sequential dataset problem that has been ignored in most studies on online RFs and improves the predictive accuracy of the regression model by exploiting data correlation. Subsequently, Zhong et al. [26] proposed an adaptive long short-term memory online random forest regression, which designs an adaptive memory activation mechanism to handle static data streams or non-static data streams with different types of conceptual drift. Further, some researchers have attempted to introduce online clustering for dealing with data stream regression modeling. Ferdaus et al. [27] proposed a new type of fuzzy rules based on the concept of hyperplane clustering for data stream regression problems called PALM; it can automatically generate, merge, and adjust hyperplane-based fuzzy rules in a single pass, which can effectively handle the concept drift of each path in the data stream, with advantages of low memory burden and low computational complexity. Song et al. [28] proposed a data stream regression method based on fuzzy clustering called FUZZ-CARE. The algorithm can accomplish dynamic identification, training, and storage of three patterns, and the affiliation matrix obtained by fuzzy C-means clustering indicates affiliation of subsequent samples of the corresponding pattern. This method can address the concept drift problem in non-stationary environments and effectively avoid the problem of under-training due to lack of new data.
- (1)
- An online dynamic clustering method is proposed to enable online identification of process states concealed in data streams. Unlike offline clustering, this method can automatically generate and update clusters in an online manner; a spatio-temporal double-weighting strategy is used to eliminate obsolete samples in clusters, which can effectively capture the time-varying characteristics of process data streams.
- (2)
- An adaptive switching prediction strategy is proposed by combining selective ensemble learning and JITL. If the query sample is judged to be an outlier, JITL is used for prediction. Otherwise, selective ensemble learning is used. The method facilitates effective handling of both gradual and abrupt changes in process characteristics, which enables preventing high soft sensor performance from deteriorating in time-varying environments.
- (3)
- Online semi-supervised learning is introduced to mine both labeled and unlabeled sample information, thus expanding the labeled training set. This strategy can effectively alleviate the problem of insufficient labeled modeling samples and can obtain better prediction performance than supervised learning.
2. Proposed ODCSS Soft Sensor Method for Industrial Semi-Supervised Data Streams
2.1. Problem Definition
- (1)
- Modeling data changes and accumulates over time. After a period of time, numerous historical data and a small set of recent process data will be obtained. Thus, it is crucial to coordinate the roles of historical data and the latest data for soft sensor modeling. If only the latest information is addressed while ignoring historical valuable information, the generalization performance of the model cannot be guaranteed. Contrarily, focusing only on historical information will make the soft sensor model unable to capture the latest process state.
- (2)
- In most cases, the true value of is unknown, and only a few observed values are obtained through low-frequency and large-delay analysis. Traditional supervised learning can only effectively use labeled samples, ignoring the information from unlabeled samples. In practice, unlabeled samples also contain rich information about the process states. Thus, it is also an important way to improve soft sensor models by fully exploiting labeled and unlabeled samples through a semi-supervised learning framework.
- (3)
- Stream data usually implies a complex nonlinear relationship between inputs and outputs. Therefore, the idea of local learning is often considered to obtain better prediction performance than a single global model. In addition, as the process runs and changes, is not constant and often exhibits significant time-varying characteristics. Therefore, to prevent degradation of the prediction performance of , it is necessary to introduce a suitable adaptive learning mechanism to achieve an online update of .
2.2. Online Dynamic Clustering
2.2.1. Initialization
2.2.2. Updating the Cluster
2.2.3. Generating the New Cluster
2.2.4. Removing Outdated Data in the Cluster
Algorithm 1: Online dynamic clustering (ODC) |
INPUT: : Data streams |
: Cluster radius |
: Minimum density threshold |
: Controlling parameter |
: The percentage of removed samples |
PROCESS: |
1: Create a cluster structure containing: %% Initialization |
2: () = ; %% Cluster center |
3: () = ; %% Save data |
4: () = []; %% Save outlier, initially empty |
5: Calculate distances between and all cluster centers using Equation (3). |
6: if %% Updating the cluster |
7: is stored in all clusters where ; |
8: if 2/3 |
9: Update cluster center using Equation (5); |
10: end if |
11: else if %% Generating a new cluster |
12: is stored in the outliers; |
13: Calculate distances between samples stored in outliers; |
14: if & size(outliers) |
15: Generate a new cluster; |
16: is stored in the new cluster; |
17: Calculate new cluster center using Equation (3); |
18: end if |
19: end if |
%% Removing the outdated data in the cluster |
20: Calculate the spatio-temporal weights for each sample in the cluster using Equation (6); |
21: Sort in ascending order; |
22: According to the sorting, delete the samples with the smallest weights in the cluster and fix the proportion of the deleted samples to ; |
OUTPUT: Cluster results |
2.3. Adaptive Switching Prediction
2.3.1. Adaptive Selective Ensemble Learning for Online Prediction
2.3.2. Just-In-Time Learning for Online Prediction
Algorithm 2: Adaptive switching prediction |
INPUT: : Data streams |
: Online clustering results |
: The maximal size of the selective clusters |
PROCESS: |
1: if is within clusters %% Adaptive selective ensemble learning for online prediction |
2: Select the GPR models corresponding to the closest clusters; |
3: Predict using the selected models to obtain {, , …, }; |
4: Calculate the average of using Equation (7) to obtain final prediction output ; |
5: for = 1, 2, …, do |
6: if there is an update to the samples in the -th cluster |
7: Rebuild a new GPR model using the updated samples; |
8: else if the samples in the -th cluster are not updated |
9: Keep the old GPR model; |
10: end if |
11: end for |
12: else if is judged to be an outlier %% Just-in-time learning for online prediction |
13: Select the most similar samples to the as training set from the historical labeled samples; |
14: Build a JITGPR model with ; |
15: Predict using the JITGPR model; |
16: Obtain finally predicted result ; |
17: end if |
OUTPUT: Prediction result |
2.4. Sample Augmentation and Maintenance
2.5. Implementation Procedure of ODCSS
3. Case Studies
3.1. Methods for Comparison
- (i)
- MWGPR: moving window Gaussian process regression model.
- (ii)
- JITGPR [49]: just-in-time learning Gaussian process regression.
- (iii)
- OSELM [50]: a sequential learning algorithm called online sequential extreme learning machine, which can learn not only the training data one by one but also block by block (with fixed or varying length).
- (iv)
- OSVR [51]: online support vector machines for regression, which achieves incremental updating through moving window strategy.
- (v)
- PALM [27]: parsimonious learning machine, a data stream regression method, which utilizes new fuzzy rules based on the concept of hyperplane clustering. It can automatically generate, merge, and adjust the fuzzy rules based on the hyperplane. The authors propose two types of PALM models, type-1 and type-2, each of which can be divided into local and global updating strategies. To get closer to the idea of local modeling in this paper, we select type-2 PALM with better performance and local update strategy as the comparison method.
- (vi)
- OSEGPR: online selective ensemble Gaussian process regression. The basic idea of this approach is that, assuming GPR models have been established by time , when a new query sample comes, a global GPR model is established using all already obtained historical samples, then the prediction performance of all retained models on an online validation set. Next, part models with high performance are selected to provide the ensemble prediction results. The above process is repeated as new query samples arrive.
- (vii)
- SS-OSEGPR: semi-supervised online selective ensemble Gaussian process regression, which introduces unlabeled samples to OSEGPR. Using the confidence evaluation strategy in Section 2.4 of this paper, we select pseudo-labels with high confidence to expand the training set and update the model.
- (viii)
- ODCSSS: a degenerated version of the proposed online-dynamic-clustering-based soft sensor modeling for industrial supervised data streams. That is, the online soft sensor modeling process is completed using only the labeled data streams.
- (ix)
- ODCSS: the proposed online-dynamic-clustering-based soft sensor modeling for industrial semi-supervised data streams.
3.2. Experimental Setup and Evaluation Metrics
3.3. Tennessee Eastman (TE) Process
3.3.1. Process Description
3.3.2. Parameter Settings
- (i)
- MWGPR: the width of the moving window is set to 50.
- (ii)
- JITGPR: the number of local modeling samples is set to 50, and the best similarity is covariance weighted similarity.
- (iii)
- OSELM: prediction block is set to 1 to provide one prediction value at a time, the number of hidden neurons is set to 45, and the number of initial training samples used in the initial phase is set to 50.
- (iv)
- OSVR: penalty parameter is set to 10, tuning parameter for kernel function is set to 0.01, and precision threshold is set to 0.001.
- (v)
- PALM: the rule merging mechanism involves parameters , and . and are used to calculate the angle and distance between two interval-valued hyperplanes, which are set to 0.02 and 0.01, respectively. and are thresholds for the rule merging conditions defined in advance, which are set to 0.01. The remaining parameters are set as in the original paper.
- (vi)
- OSEGPR: the number of online validation samples is set to 50, the ensemble size is set to 5, and Manhattan distance similarity is chosen.
- (vii)
- SS-OSEGPR: the number of online validation samples is set to 50, the ensemble size is set to 4, the confidence threshold for selecting pseudo-labels is set to 0.03, and Euclidean distance similarity is chosen.
- (viii)
- ODCSSS: clustering radius is set to 8, minimum density threshold is set to 10, and maximal ensemble size is set to 2.
- (ix)
- ODCSS: clustering radius is set to 9, minimum density threshold is set to 10, controlling parameter is set to 0.4, proportion of deleted data is set to 0.5, confidence threshold is set to 0.1, and maximal ensemble size is set to 2.
3.3.3. Prediction Results and Discussion
3.4. Industrial Fed-Batch Chlortetracycline (CTC) Fermentation Process
3.4.1. Process Description
3.4.2. Parameter Settings
- (i)
- MWGPR: the width of the moving window is set to 43.
- (ii)
- JITGPR: the number of local modeling samples is set to 43, and the best similarity is cosine similarity;
- (iii)
- OSELM: prediction block is set to 1 to predict one value at a time, the number of hidden neurons is set to 38, and the number of initial training samples used in the initial phase is set to 43.
- (iv)
- OSVR: penalty parameter is set to 24, tuning parameter for kernel function is set to 0.02, and precision threshold is set to 0.007.
- (v)
- PALM: the optimal parameters are the same as TE industrial process. , , , and are set to 0.02, 0.01, 0.01, and 0.01, respectively;
- (vi)
- OSEGPR: the number of online validation sample sets is set to 43, the ensemble size is set to 5, and Manhattan distance similarity is chosen.
- (vii)
- SS-OSEGPR: the number of online validation sample sets is set to 43, the ensemble size is set to 5, the confidence threshold for selecting pseudo-labels is set to 0.1, and Manhattan distance similarity is chosen.
- (viii)
- ODCSSS: clustering radius is set to 4.9, minimum density threshold is set to 12, and maximal ensemble size is set to 2;
- (ix)
- ODCSS: clustering radius is set to 4.8, minimum density threshold is set to 14, controlling parameter is set to 0.6, proportion of deleted data is set to 0.9, confidence threshold is set to 0.1, and maximal ensemble size is set to 3.
3.4.3. Prediction Results and Discussion
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Gaussian Process Regression
Appendix B. Self-Training
Appendix C. Just-In-Time Learning (JITL)
References
- Jiang, Y.; Yin, S.; Dong, J.; Kaynak, O. A review on soft sensors for monitoring, control, and optimization of industrial processes. IEEE Sens. J. 2020, 21, 12868–12881. [Google Scholar] [CrossRef]
- Liu, Y.; Xie, M. Rebooting data-driven soft-sensors in process industries: A review of kernel methods. J. Process Control. 2020, 89, 58–73. [Google Scholar] [CrossRef]
- Wang, G.; Zhu, H.; Wu, Z.; Yang, M. A novel random subspace method considering complementarity between unsupervised and supervised deep representation features for soft sensors. Meas. Sci. Technol. 2022, 33, 105119. [Google Scholar] [CrossRef]
- Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven soft sensors in the process industry. Comput. Chem. Eng. 2009, 33, 795–814. [Google Scholar] [CrossRef] [Green Version]
- Deng, H.; Yang, K.; Liu, Y.; Zhang, S.; Yao, Y. Actively exploring informative data for smart modeling of industrial multiphase flow processes. IEEE Trans. Ind. Inform. 2020, 17, 8357–8366. [Google Scholar] [CrossRef]
- Liu, Y.; Yang, C.; Zhang, M.; Dai, Y.; Yao, Y. Development of adversarial transfer learning soft sensor for multigrade processes. Ind. Eng. Chem. Res. 2020, 59, 16330–16345. [Google Scholar] [CrossRef]
- Gao, S.; Dai, Y.; Li, Y.; Jiang, Y.; Liu, Y. Augmented flame image soft sensor for combustion oxygen content prediction. Meas. Sci. Technol. 2022, 34, 015401. [Google Scholar] [CrossRef]
- Du, W.; Fan, Y.; Zhang, Y. Multimode process monitoring based on data-driven method. J. Frankl. Inst. 2017, 354, 2613–2627. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning (ICML), Bari, Italy, 3–6 July 1996; pp. 148–156. [Google Scholar]
- Ge, Z.; Song, Z. A comparative study of just-in-time-learning based methods for online soft sensor modeling. Chemom. Intell. Lab. Syst. 2010, 104, 306–317. [Google Scholar] [CrossRef]
- Liu, T.; Chen, S.; Liang, S.; Gan, S.; Harris, C.J. Fast adaptive gradient RBF networks for online learning of nonstationary time series. IEEE Trans. Signal Process. 2020, 68, 2015–2030. [Google Scholar] [CrossRef]
- Yang, Z.; Yao, L.; Ge, Z. Streaming parallel variational Bayesian supervised factor analysis for adaptive soft sensor modeling with big process data. J. Process Control. 2020, 85, 52–64. [Google Scholar] [CrossRef]
- Mohanta, H.K.; Pani, A.K. Adaptive non-linear soft sensor for quality monitoring in refineries using Just-in-Time Learning—Generalized regression neural network approach. Appl. Soft Comput. 2022, 119, 108546. [Google Scholar]
- Kanno, Y.; Kaneko, H. Improvement of predictive accuracy in semi-supervised regression analysis by selecting unlabeled chemical structures. Chemom. Intell. Lab. Syst. 2019, 191, 82–87. [Google Scholar] [CrossRef]
- Xu, W.; Tang, J.; Xia, H. A review of semi-supervised learning for industrial process regression modeling. In Proceedings of the 2021 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; pp. 1359–1364. [Google Scholar]
- Yi, H.; Jiang, Q. Graph-based semisupervised learning for icing fault detection of wind turbine blade. Meas. Sci. Technol. 2020, 32, 035117. [Google Scholar] [CrossRef]
- Jin, H.; Li, Z.; Chen, X.; Qian, B.; Yang, B.; Yang, J. Evolutionary optimization based pseudo labeling for semi-supervised soft sensor development of industrial processes. Chem. Eng. Sci. 2021, 237, 116560. [Google Scholar] [CrossRef]
- Babcock, B.; Babu, S.; Datar, M.; Motwani, R.; Widom, J. Models and issues in data stream systems. In Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, WI, USA, 3–5 June 2002; pp. 1–16. [Google Scholar]
- Barddal, J.P. Vertical and horizontal partitioning in data stream regression ensembles. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Ikonomovska, E.; Gama, J.; Džeroski, S. Learning model trees from evolving data streams. Data Min. Knowl. Discov. 2011, 23, 128–168. [Google Scholar] [CrossRef] [Green Version]
- Shaker, A.; Hüllermeier, E. TSK-Streams: Learning TSK Fuzzy Systems on Data Streams. arXiv 2019, arXiv:1911.03951. [Google Scholar] [CrossRef]
- Yu, H.; Lu, J.; Zhang, G. MORStreaming: A Multioutput Regression System for Streaming Data. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 4862–4874. [Google Scholar] [CrossRef]
- Gomes, H.M.; Barddal, J.P.; Ferreira, L.E.B.; Bifet, A. Adaptive random forests for data stream regression. In Proceedings of the European Symposium on Artificial Neural Network (ESANN), Bruges, Belgium, 25–27 April 2018. [Google Scholar]
- Zhong, Y.; Yang, H.; Zhang, Y.; Li, P. Online random forests regression with memories. Knowl. Based Syst. 2020, 201, 106058. [Google Scholar] [CrossRef]
- Zhong, Y.; Yang, H.; Zhang, Y.; Li, P.; Ren, C. Long short-term memory self-adapting online random forests for evolving data stream regression. Neurocomputing 2021, 457, 265–276. [Google Scholar] [CrossRef]
- Ferdaus, M.M.; Pratama, M.; Anavatti, S.G.; Garratt, M.A. Palm: An incremental construction of hyperplanes for data stream regression. IEEE Trans. Fuzzy Syst. 2019, 27, 2115–2129. [Google Scholar] [CrossRef] [Green Version]
- Song, Y.; Lu, J.; Lu, H.; Zhang, G. Fuzzy clustering-based adaptive regression for drifting data streams. IEEE Trans. Fuzzy Syst. 2019, 28, 544–557. [Google Scholar] [CrossRef]
- Zubaroğlu, A.; Atalay, V. Data stream clustering: A review. Artif. Intell. Rev. 2021, 54, 1201–1236. [Google Scholar] [CrossRef]
- Guha, S.; Rastogi, R.; Shim, K. ROCK: A robust clustering algorithm for categorical attributes. Inf. Syst. 2000, 25, 345–366. [Google Scholar] [CrossRef]
- Udommanetanakit, K.; Rakthanmanon, T.; Waiyamai, K. E-stream: Evolution-based technique for stream clustering. In Proceedings of the International Conference on Advanced Data Mining and Applications, Harbin, China, 6–8 August 2007; pp. 605–615. [Google Scholar]
- Meesuksabai, W.; Kangkachit, T.; Waiyamai, K. Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. In Proceedings of the International Conference on Advanced Data Mining and Applications, Beijing, China, 17–19 December 2011; pp. 27–40. [Google Scholar]
- Aggarwal, C.C.; Philip, S.Y.; Han, J.; Wang, J. A framework for clustering evolving data streams. In Proceedings of the 2003 VLDB Conference, Berlin, Germany, 9–12 September 2003; pp. 81–92. [Google Scholar]
- Ackermann, M.R.; Märtens, M.; Raupach, C.; Swierkot, K.; Lammersen, C.; Sohler, C. Streamkm++ a clustering algorithm for data streams. J. Exp. Algorithmics 2012, 17, 2.1–2.30. [Google Scholar] [CrossRef]
- Puschmann, D.; Barnaghi, P.; Tafazolli, R. Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J. 2016, 4, 64–74. [Google Scholar] [CrossRef] [Green Version]
- Sheikholeslami, G.; Chatterjee, S.; Zhang, A. WaveCluster: A wavelet-based clustering approach for spatial data in very large databases. VLDB J. 2000, 8, 289–304. [Google Scholar] [CrossRef]
- Lu, Y.; Sun, Y.; Xu, G.; Liu, G. A grid-based clustering algorithm for high-dimensional data streams. In Proceedings of the International Conference on Advanced Data Mining and Applications, Wuhan, China, 22–24 July 2005; pp. 824–831. [Google Scholar]
- Gama, J.; Rodrigues, P.P.; Lopes, L. Clustering distributed sensor data streams using local processing and reduced communication. Intell. Data Anal. 2011, 15, 3–28. [Google Scholar] [CrossRef] [Green Version]
- Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Cao, F.; Estert, M.; Qian, W.; Zhou, A. Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM International Conference on Data Mining, Bethesda, MD, USA, 20–22 April 2006; pp. 328–339. [Google Scholar]
- Hyde, R.; Angelov, P.; MacKenzie, A.R. Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf. Sci. 2017, 382, 96–114. [Google Scholar] [CrossRef] [Green Version]
- Amini, A.; Saboohi, H.; Herawan, T.; Wah, T.Y. MuDi-Stream: A multi density clustering algorithm for evolving data stream. J. Netw. Comput. Appl. 2016, 59, 370–385. [Google Scholar] [CrossRef]
- Yin, C.; Xia, L.; Zhang, S.; Sun, R.; Wang, J. Improved clustering algorithm based on high-speed network data stream. Soft Comput. 2018, 22, 4185–4195. [Google Scholar] [CrossRef]
- Zhou, A.; Cao, F.; Yan, Y.; Sha, C.; He, X. Distributed data stream clustering: A fast em-based approach. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 15–20 April 2007; pp. 736–745. [Google Scholar]
- Dang, X.H.; Lee, V.; Ng, W.K.; Ong, K.L. Incremental and adaptive clustering stream data over sliding window. In Proceedings of the International Conference on Database and Expert Systems Applications, Linz, Austria, 31 August–4 September 2009; pp. 660–674. [Google Scholar]
- Hyde, R.; Angelov, P. A new online clustering approach for data in arbitrary shaped clusters. In Proceedings of the 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF), Gdynia, Poland, 24–26 June 2015; pp. 228–233. [Google Scholar]
- Islam, M.K.; Ahmed, M.M.; Zamli, K.Z. A buffer-based online clustering for evolving data stream. Inf. Sci. 2019, 489, 113–135. [Google Scholar] [CrossRef]
- Tareq, M.; Sundararajan, E.A.; Mohd, M.; Sani, N.S. Online clustering of evolving data streams using a density grid-based method. IEEE Access 2020, 8, 166472–166490. [Google Scholar] [CrossRef]
- Jin, H.; Zhang, Y.; Dong, S.; Yang, B.; Qian, B.; Chen, X. Study on semi-supervised ensemble just-in-time learning based soft sensing of Mooney viscosity in rubber mixing process. J. Chem. Eng. Chin. Univ. 2022, 36, 586–596. [Google Scholar]
- Liang, N.-Y.; Huang, G.-B.; Saratchandran, P.; Sundararajan, N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 2006, 17, 1411–1423. [Google Scholar] [CrossRef]
- Parrella, F. Online Support Vector Regression. Master’s Thesis, Department of Information Science, University of Genoa, Genoa, Italy, 2007; p. 69. [Google Scholar]
- Rajaee, T.; Khani, S.; Ravansalar, M. Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review. Chemom. Intell. Lab. Syst. 2020, 200, 103978. [Google Scholar] [CrossRef]
- Parmar, K.S.; Bhardwaj, R. Water quality management using statistical analysis and time-series prediction model. Appl. Water Sci. 2014, 4, 425–434. [Google Scholar] [CrossRef] [Green Version]
- Wu, N.; Huang, J.; Schmalz, B.; Fohrer, N. Modeling daily chlorophyll a dynamics in a German lowland river using artificial neural networks and multiple linear regression approaches. Limnology 2014, 15, 47–56. [Google Scholar] [CrossRef]
- Downs, J.J.; Vogel, E.F. A plant-wide industrial process control problem. Comput. Chem. Eng. 1993, 17, 245–255. [Google Scholar] [CrossRef]
- Rodríguez-Fdez, I.; Canosa, A.; Mucientes, M.; Bugarín, A. STAC: A web platform for the comparison of algorithms using statistical tests. In Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Istanbul, Turkey, 2–5 August 2015; pp. 1–8. [Google Scholar]
- Rasmussen, C.E. Gaussian processes in machine learning. In Advanced Lectures on Machine Learning; ML Summer Schools 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 63–71. [Google Scholar]
- Chowdhary, K. Natural language processing. In Fundamentals of Artificial Intelligence; Springer: New Delhi, India, 2020; pp. 603–649. [Google Scholar]
- Wang, Y.; Hou, J.; Hou, X.; Chau, L.-P. A self-training approach for point-supervised object detection and counting in crowds. IEEE Trans. Image Process. 2021, 30, 2876–2887. [Google Scholar] [CrossRef]
- Zhang, X.-Y.; Liu, C.-L.; Suen, C.Y. Towards robust pattern recognition: A review. Proc. IEEE 2020, 108, 894–922. [Google Scholar] [CrossRef]
- Yang, Z.; Ge, Z. Rethinking the Value of Just-in-Time Learning in the Era of Industrial Big Data. IEEE Trans. Ind. Inform. 2021, 18, 976–985. [Google Scholar] [CrossRef]
- Chen, Z.; Liu, C.; Ding, S.X.; Peng, T.; Yang, C.; Gui, W.; Shardt, Y.A. A just-in-time-learning-aided canonical correlation analysis method for multimode process monitoring and fault detection. IEEE Transactions on Industrial Electronics 2020, 68, 5259–5270. [Google Scholar] [CrossRef]
- Su, Q.-L.; Hermanto, M.W.; Braatz, R.D.; Chiu, M.-S. Just-in-time-learning based extended prediction self-adaptive control for batch processes. J. Process Control. 2016, 43, 1–9. [Google Scholar] [CrossRef]
- Saptoro, A. State of the art in the development of adaptive soft sensors based on just-in-time models. Procedia Chem. 2014, 9, 226–234. [Google Scholar] [CrossRef]
No. | Variable Description | No. | Variable Description |
---|---|---|---|
1 | Time | 17 | Stripper pressure |
2 | A feed (stream 1) | 18 | Stripper underflow |
3 | D feed (stream 2) | 19 | Stripper temperature |
4 | E feed (stream 3) | 20 | Stripper steam flow |
5 | A and C feed rate | 21 | Compressor work |
6 | Recycle flow rate | 22 | Reactor coolant temperature |
7 | Reactor feed rate | 23 | Separator coolant temperature |
8 | Reactor pressure | 24 | D feed flow (stream 2) |
9 | Reactor level | 25 | E feed flow (stream 3) |
10 | Reactor temperature | 26 | A feed flow (stream 1) |
11 | Purge rate | 27 | A and C feed flow (stream 4) |
12 | Product separator temperature | 28 | Purge valve (stream 9) |
13 | Product separator level | 29 | Separator pot liquid flow (stream 10) |
14 | Product separator pressure | 30 | Stripper liquid product flow (stream 11) |
15 | Product separator underflow | 31 | Reactor cooling water flow |
16 | Stripper level | 32 | Condenser cooling water flow |
Method | RMSE | MAE | MAPE (%) | |
---|---|---|---|---|
MWGPR | 0.0174 | 0.0130 | 2.1017 | 0.9660 |
JITGPR | 0.0187 | 0.0138 | 2.2709 | 0.9608 |
OSELM | 0.0271 | 0.0212 | 3.3412 | 0.9174 |
OSVR | 0.0171 | 0.0137 | 2.2020 | 0.9673 |
PALM | 0.0198 | 0.0158 | 2.4966 | 0.9560 |
OSEGPR | 0.0169 | 0.0131 | 2.1106 | 0.9678 |
SS-OSEGRP | 0.0164 | 0.0128 | 2.0488 | 0.9699 |
ODCSSS | 0.0139 | 0.0109 | 1.7397 | 0.9782 |
ODCSS | 0.0134 | 0.0107 | 1.7168 | 0.9799 |
No. | RMSE | |||
---|---|---|---|---|
1 | 8 | 10 | 0.0135 | 0.9797 |
2 | 9 | 10 | 0.0134 | 0.9799 |
3 | 10 | 10 | 0.0135 | 0.9796 |
4 | 11 | 10 | 0.0134 | 0.9798 |
5 | 12 | 10 | 0.0138 | 0.9787 |
6 | 13 | 10 | 0.0134 | 0.9797 |
7 | 8 | 12 | 0.0135 | 0.9795 |
8 | 9 | 12 | 0.0134 | 0.9798 |
9 | 10 | 12 | 0.0135 | 0.9797 |
10 | 11 | 12 | 0.0137 | 0.9790 |
11 | 12 | 12 | 0.0136 | 0.9792 |
12 | 13 | 12 | 0.0137 | 0.9791 |
13 | 8 | 14 | 0.0136 | 0.9792 |
14 | 9 | 14 | 0.0134 | 0.9798 |
15 | 10 | 14 | 0.0135 | 0.9795 |
16 | 11 | 14 | 0.0134 | 0.9798 |
17 | 12 | 14 | 0.0136 | 0.9793 |
18 | 13 | 14 | 0.0135 | 0.9795 |
Method | Statistical Test | ||
---|---|---|---|
Statistic | p-Value | Result | |
MWGPR | 2.77390 | 0.00995 | Reject |
JITGPR | 2.11058 | 0.03907 | Reject |
OSELM | 5.84932 | 0 | Reject |
OSVR | 3.61814 | 0.00067 | Reject |
PALM | 4.46237 | 0.00002 | Reject |
OSEGPR | 2.71360 | 0.00997 | Reject |
SS-OSEGRP | 2.23118 | 0.03288 | Reject |
ODCSSS | 0.12060 | 0.90400 | Accept |
ODCSS (the control method) | - | - | - |
No. | Method | ODCSS Rank | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
MWGPR | JITGPR | OSELM | OSVR | PALM | OSEGPR | SS- OSEGRP | ODCSSs | ODCSS | ||
1 | 0.0219 | 0.0219 | 0.0269 | 0.0244 | 0.0233 | 0.0237 | 0.0232 | 0.0206 | 0.0205 | 1 |
2 | 0.0195 | 0.0180 | 0.0269 | 0.0229 | 0.0135 | 0.0325 | 0.0297 | 0.0179 | 0.0183 | 4 |
3 | 0.0140 | 0.0148 | 0.0231 | 0.0159 | 0.0344 | 0.0132 | 0.0135 | 0.0132 | 0.0135 | 3 |
4 | 0.0142 | 0.0117 | 0.0243 | 0.0142 | 0.0171 | 0.0130 | 0.0119 | 0.0124 | 0.0122 | 3 |
5 | 0.0098 | 0.0118 | 0.0257 | 0.0131 | 0.0198 | 0.0162 | 0.0160 | 0.0116 | 0.0115 | 2 |
6 | 0.0130 | 0.0171 | 0.0237 | 0.0174 | 0.0287 | 0.0204 | 0.0191 | 0.0148 | 0.0150 | 3 |
7 | 0.0133 | 0.0120 | 0.0265 | 0.0151 | 0.0162 | 0.0124 | 0.0124 | 0.0120 | 0.0120 | 2 |
8 | 0.0119 | 0.0111 | 0.0203 | 0.0115 | 0.0121 | 0.0095 | 0.0095 | 0.0089 | 0.0089 | 2 |
9 | 0.0148 | 0.0148 | 0.0164 | 0.0161 | 0.0168 | 0.0153 | 0.0149 | 0.0133 | 0.0133 | 1 |
10 | 0.0123 | 0.0098 | 0.0190 | 0.0111 | 0.0179 | 0.0135 | 0.0134 | 0.0101 | 0.0101 | 2 |
11 | 0.0168 | 0.0165 | 0.0171 | 0.0118 | 0.0127 | 0.0116 | 0.0120 | 0.0107 | 0.0107 | 1 |
12 | 0.0304 | 0.0431 | 0.0242 | 0.0135 | 0.0148 | 0.0111 | 0.0109 | 0.0092 | 0.0092 | 1 |
13 | 0.0206 | 0.0213 | 0.0337 | 0.0209 | 0.0170 | 0.0155 | 0.0154 | 0.0150 | 0.0150 | 1 |
14 | 0.0195 | 0.0196 | 0.0335 | 0.0179 | 0.0218 | 0.0160 | 0.0166 | 0.0128 | 0.0128 | 1 |
15 | 0.0177 | 0.0130 | 0.0432 | 0.0207 | 0.0181 | 0.0162 | 0.0156 | 0.0183 | 0.0127 | 1 |
Mean | 0.0166 | 0.0171 | 0.0256 | 0.0164 | 0.0189 | 0.0160 | 0.0156 | 0.0134 | 0.0131 | 1 |
No. | Variable Description |
---|---|
1 | Cultivation time (min) |
2 | Temperature (°C) |
3 | PH (−) |
4 | Dissolved oxygen concentration (%) |
5 | Air stream rate () |
6 | Volume of air consumption () |
7 | Substrate feed rate (L/h) |
8 | Volume of substrate consumption (L) |
9 | Volume of ammonia consumption (L) |
Method | RMSE | MAE | MAPE (%) | |
---|---|---|---|---|
MWGPR | 0.3174 | 0.2405 | 8.5315 | 0.9347 |
JITGPR | 0.2978 | 0.2266 | 8.0920 | 0.9425 |
OSELM | 0.3363 | 0.2544 | 9.0366 | 0.9276 |
OSVR | 0.3385 | 0.2621 | 9.2811 | 0.9257 |
PALM | 0.2827 | 0.2178 | 7.9556 | 0.9482 |
OSEGPR | 0.2792 | 0.2135 | 7.6421 | 0.9495 |
SS-OSEGRP | 0.2739 | 0.2082 | 7.4001 | 0.9514 |
ODCSSS | 0.2682 | 0.2005 | 7.1162 | 0.9534 |
ODCSS | 0.2495 | 0.1841 | 6.4624 | 0.9597 |
No. | RMSE | ||||
---|---|---|---|---|---|
1 | 0.4 | 0.7 | 0.05 | 0.2558 | 0.9576 |
2 | 0.6 | 0.7 | 0.05 | 0.2573 | 0.9571 |
3 | 0.8 | 0.7 | 0.05 | 0.2687 | 0.9532 |
4 | 0.4 | 0.9 | 0.05 | 0.2540 | 0.9582 |
5 | 0.6 | 0.9 | 0.05 | 0.2551 | 0.9578 |
6 | 0.8 | 0.9 | 0.05 | 0.2535 | 0.9584 |
7 | 0.4 | 0.7 | 0.1 | 0.2533 | 0.9584 |
8 | 0.6 | 0.7 | 0.1 | 0.2543 | 0.9581 |
9 | 0.8 | 0.7 | 0.1 | 0.2596 | 0.9563 |
10 | 0.4 | 0.9 | 0.1 | 0.2591 | 0.9565 |
11 | 0.6 | 0.9 | 0.1 | 0.2495 | 0.9597 |
12 | 0.8 | 0.9 | 0.1 | 0.2531 | 0.9585 |
13 | 0.4 | 0.7 | 0.15 | 0.2568 | 0.9573 |
14 | 0.6 | 0.7 | 0.15 | 0.2535 | 0.9584 |
15 | 0.8 | 0.7 | 0.15 | 0.2659 | 0.9542 |
16 | 0.4 | 0.9 | 0.15 | 0.2512 | 0.9591 |
17 | 0.6 | 0.9 | 0.15 | 0.2545 | 0.9580 |
18 | 0.8 | 0.9 | 0.15 | 0.2608 | 0.9559 |
Method | Statistical Test | ||
---|---|---|---|
Statistic | p-Value | Result | |
MWGPR | 3.23875 | 0.00270 | Reject |
JITGPR | 2.85010 | 0.00785 | Reject |
OSELM | 3.36830 | 0.00227 | Reject |
OSVR | 3.95128 | 0.00035 | Reject |
PALM | 1.55460 | 0.13400 | Accept |
OSEGPR | 2.07280 | 0.05674 | Accept |
SS-OSEGRP | 1.61938 | 0.13338 | Accept |
ODCSSS | 1.16595 | 0.24363 | Accept |
ODCSS (the control method) | - | - | - |
No. | Method | ODCSS Rank | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
MWGPR | JITGPR | OSELM | OSVR | PALM | OSEGPR | SS- OSEGRP | ODCSSS | ODCSS | ||
1 | 0.3169 | 0.3148 | 0.5521 | 0.3223 | 0.2872 | 0.3724 | 0.3441 | 0.2726 | 0.3321 | 6 |
2 | 0.4614 | 0.2720 | 0.3908 | 0.4051 | 0.1984 | 0.1823 | 0.1985 | 0.2075 | 0.3117 | 6 |
3 | 0.2173 | 0.2940 | 0.3561 | 0.3170 | 0.2018 | 0.2167 | 0.2160 | 0.2013 | 0.1880 | 1 |
4 | 0.2987 | 0.2978 | 0.2943 | 0.3106 | 0.2051 | 0.2330 | 0.2202 | 0.1875 | 0.1904 | 2 |
5 | 0.2359 | 0.2597 | 0.1869 | 0.2470 | 0.2033 | 0.1598 | 0.1459 | 0.1701 | 0.1506 | 2 |
6 | 0.3406 | 0.3530 | 0.2413 | 0.3909 | 0.2491 | 0.2723 | 0.2751 | 0.2332 | 0.2153 | 1 |
7 | 0.3155 | 0.2437 | 0.3635 | 0.3827 | 0.4303 | 0.3986 | 0.3810 | 0.3950 | 0.2232 | 1 |
8 | 0.3118 | 0.2122 | 0.2686 | 0.3649 | 0.2522 | 0.2684 | 0.2657 | 0.2576 | 0.2006 | 1 |
9 | 0.2677 | 0.3026 | 0.3300 | 0.2594 | 0.2221 | 0.2776 | 0.2755 | 0.3193 | 0.2530 | 2 |
10 | 0.3342 | 0.3045 | 0.3578 | 0.3528 | 0.3296 | 0.2947 | 0.2975 | 0.3166 | 0.2862 | 1 |
11 | 0.3704 | 0.2994 | 0.2662 | 0.3880 | 0.2949 | 0.2539 | 0.2530 | 0.2829 | 0.2968 | 6 |
12 | 0.3189 | 0.3775 | 0.3021 | 0.3616 | 0.2847 | 0.3134 | 0.3146 | 0.2858 | 0.2746 | 1 |
13 | 0.2582 | 0.2739 | 0.2897 | 0.2568 | 0.3902 | 0.2824 | 0.2818 | 0.2669 | 0.2302 | 1 |
Mean | 0.3113 | 0.2927 | 0.3230 | 0.3353 | 0.2730 | 0.2712 | 0.2668 | 0.2613 | 0.2425 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Jin, H.; Chen, X.; Wang, B.; Yang, B.; Qian, B. Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams. Sensors 2023, 23, 1520. https://doi.org/10.3390/s23031520
Wang Y, Jin H, Chen X, Wang B, Yang B, Qian B. Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams. Sensors. 2023; 23(3):1520. https://doi.org/10.3390/s23031520
Chicago/Turabian StyleWang, Yuechen, Huaiping Jin, Xiangguang Chen, Bin Wang, Biao Yang, and Bin Qian. 2023. "Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams" Sensors 23, no. 3: 1520. https://doi.org/10.3390/s23031520
APA StyleWang, Y., Jin, H., Chen, X., Wang, B., Yang, B., & Qian, B. (2023). Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams. Sensors, 23(3), 1520. https://doi.org/10.3390/s23031520