Abstract
The selection of training data determines the quality of a chemometric calibration model. In order to cover the entire parameter space of known influencing parameters, an experimental design is usually created. Nevertheless, even with a carefully prepared Design of Experiment (DoE), redundant reference analyses are often performed during the analysis of agricultural products. Because the number of possible reference analyses is usually very limited, the presented active learning approaches are intended to provide a tool for better selection of training samples.
Zusammenfassung
Mit Hilfe von chemometrischen Kalibrierungsmodellen können verschiedene Qualitäts- und Reifeparameter für Agrarprodukte aus Nahinfrarotspektren geschätzt werden. Die verwendeten Trainingsdaten bestimmen dabei die Güte des chemometrischen Kalibrierungsmodells. Für das Training wird deshalb ein Datensatz benötigt, welcher Proben im gesamten Parameterraum beinhaltet. In der Regel wird ein Versuchsplan zur Probennahme erstellt, jedoch können viele Parameter in der Herstellung von Agrarprodukten nicht eingestellt werden. Daher muss in der Regel eine große Menge an Proben gesammelt werden, wobei häufig zahlreiche Proben den Informationsgehalt des Datensatzes nicht erhöhen. Des Weiteren müssen die Qualitäts- und Reifeparameter der Proben im Trainingsdatensatz aufwändig durch chemische Referenzanalysen erstellt werden. Die vorgestellten aktiven Lernansätze dienen einer optimalen Probenauswahl anhand von Nahinfrarotspektren, wodurch sich die Zahl der benötigten Proben den damit verbundenen Referenzanalysen verringert.
Funding source: Deutsche Forschungsgemeinschaft
Award Identifier / Grant number: 390732324
Funding source: Bundesministerium für Bildung und Forschung
Award Identifier / Grant number: 01|S17047
Funding statement: The authors of this work were supported by the Fraunhofer Center for Machine Learning within the Fraunhofer Cluster for Cognitive Internet Technologies (CCIT) and the PhenoRob project which is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC 2070 – 390732324, and by the German Federal Ministry of Education and Research (BMBF) within the context of the Software Campus project SmartSpectrometer under grant No. 01|S17047.
About the authors
Julius Krause recived his M.Sc. degree in physics in 2016 from the Karlsruhe Institute of Technology (KIT). Since 2016, his research has taken place at the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB in cooperation with the Vision and Fusion Laboratory at the Karlsruhe Institute of Technology. His research interests are hyperspectral signal processing and imaging for optical inspection and quality control, and machine learning.
Maurice Günder recieved his M.Sc. degree in Experimental Particle Physics at RWTH Aachen University in Aachen, Germany. Since 2020, he has been a Data Scientist at the Fraunhofer Insitute for Intelligent Analysis and Information System IAIS in Sankt Augustin, Germany, while pursuing the PhD degree at the University of Bonn, Germany. His research interests comprise time series analysis, knowledge extraction from sensorical data, and inclusion of expert knowledge in machine learning processes.
Daniel Schulz studied geography, geology and soil science at the universities of Cologne, Bonn and Gothenburg. After his graduation he began his work at the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Sankt Ausgutin. There he worked as a project manager in various large-scale projects with industry and public clients. His research focuses on Machine Learning (Informmed Machine Learning) and Artificial Intelligence. Currently, he heads the office of the Fraunhofer Research Center for Machine Learning.
Robin Gruna obtained his PhD from the Karlsruhe Institute of Technology (KIT) in the field of Machine Vision and Computational Imaging. Currently, he is the research group manager at the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB in Karlsruhe. His research interests include machine learning, hyperspectral imaging and spectral sensing.
References
1. Christian Bauckhage. Numpy / scipy recipes for data science: Mean discrepancy minimization for vector quantization. February 2020.Search in Google Scholar
2. Christian Bauckhage. Numpy / scipy recipes for data science: Subset-constrained vector quantization via mean discrepancy minimization. June 2020.Search in Google Scholar
3. Robert W. Bondi, Benoǐt Igne, James K. Drennen and Carl A. Anderson. Effect of experimental design on the prediction performance of calibration models based on near-infrared spectroscopy for pharmaceutical applications. Applied Spectroscopy, 66(12):1442–1453, 2012.10.1366/12-06689Search in Google Scholar PubMed
4. Carlos Cernuda, Edwin Lughofer, Georg Mayr, Thomas Röder, Peter Hintenaus, Wolfgang Märzinger and Jürgen Kasberger. Incremental and decremental active learning for optimized self-adaptive calibration in viscose production. Chemometrics and Intelligent Laboratory Systems, 138:14–29, 2014.10.1016/j.chemolab.2014.07.008Search in Google Scholar
5. Lorenzo De Benedictis and Christian Huck. New approach to optimize near-infrared spectra with design of experiments and determination of milk compounds as influence factors for changing milk over time. Food Chemistry, 212:552–560, 12 2016.10.1016/j.foodchem.2016.06.012Search in Google Scholar PubMed
6. Fouzi Douak, Farid Melgani, Edoardo Pasolli and Nabil Benoudjit. SVR active learning for product quality control. In 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012, pages 1113–1117, 2012.10.1109/ISSPA.2012.6310457Search in Google Scholar
7. Eigenvector Research Inc. Data sets. https://eigenvector.com/resources/data-sets/#corn-sec. Accessed: 2020/18/08.Search in Google Scholar
8. Simon Goisser, Julius Krause, Michael Fernandes and Heike Mempel. Determination of tomato quality attributes using portable NIR-sensors. In OCM 2019 – Optical Characterization of Materials: Conference Proceedings. Ed.: J. Beyerer, F. Puente León, T. Längle, page 1, 2019.Search in Google Scholar
9. R. W. Kennard and L. A. Stone. Computer aided design of experiments. Technometrics, 11(1):137–148, 1969.10.1080/00401706.1969.10490666Search in Google Scholar
10. Doris Krahe and Juergen Beyerer. Parametric method to quantify the balance of groove sets of honed cylinder bores. Architectures, Networks, and Intelligent Systems for Manufacturing Integration, 3203(December 1997):192–201, 1997.10.1117/12.294431Search in Google Scholar
11. Anna Palou, Aira Miró, Marcelo Blanco, Rafael Larraz, José Francisco Gómez, Teresa Martínez, Josep Maria González and Manel Alcalà. Calibration sets selection strategy for the construction of robust PLS models for prediction of biodiesel/diesel blends physico-chemical properties using NIR spectroscopy. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 180:119–126, June 2017.10.1016/j.saa.2017.03.008Search in Google Scholar PubMed
12. Celio Pasquini. Near infrared spectroscopy: A mature analytical technique with new perspectives – A review. Analytica Chimica Acta, 1026:8–36, October 2018.10.1016/j.aca.2018.04.004Search in Google Scholar PubMed
13. Dominic V. Poerio and Steven D. Brown. Dual-domain calibration transfer using orthogonal projection. Applied Spectroscopy, 72(3):378–391, 2018.10.1177/0003702817724164Search in Google Scholar PubMed
14. Åsmund Rinnan, Frans van den Berg and Søren Balling Engelsen. Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends in Analytical Chemistry, 28(10):1201–1222, 11 2009.10.1016/j.trac.2009.07.007Search in Google Scholar
15. A. Savitzky and M. J. E. Golay. Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36:1627–1639, January 1964.10.1021/ac60214a047Search in Google Scholar
16. B. Schölkopf and AJ. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, USA, December 2002.10.7551/mitpress/4175.001.0001Search in Google Scholar
17. David W. Scott. Multivariate Density Estimation. Wiley, August 1992.10.1002/9780470316849Search in Google Scholar
18. C E Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(3):379–423, 1948.10.1002/j.1538-7305.1948.tb01338.xSearch in Google Scholar
19. Han Tian, Linna Zhang, Ming Li, Yue Wang, Dinggao Sheng, Jun Liu and Chengmin Wang. Weighted SPXY method for calibration set selection for composition analysis based on near-infrared spectroscopy. Infrared Physics & Technology, 95:88–92, December 2018.10.1016/j.infrared.2018.10.030Search in Google Scholar
20. V. Wiedemair, M. De Biasio, R. Leitner, D. Balthasar and C. W. Huck. Application of design of experiment for detection of meat fraud with a portable near-infrared spectrometer. Current Analytical Chemistry, 14(1), January 2018.10.2174/1573411013666170207121113Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston