Abstract
Fish is approximately 40% edible fillet. The remaining 60% can be processed into low-value fertilizer or high-value pharmaceutical-grade omega-3 concentrates. High-value manufacturing options depend on the composition of the biomass, which varies with fish species, fish tissue and seasonally throughout the year. Fatty acid composition, measured by Gas Chromatography, is an important measure of marine biomass quality. This technique is accurate and precise, but processing and interpreting the results is time-consuming and requires domain-specific expertise. The paper investigates different classification and feature selection algorithms for their ability to automate the processing of Gas Chromatography data. Experiments found that SVM could classify compositionally diverse marine biomass based on raw chromatographic fatty acid data. The SVM model is interpretable through visualization which can highlight important features for classification. Experiments demonstrated that applying feature selection significantly reduced dimensionality and improved classification performance on high-dimensional low sample-size datasets. According to the reduction rate, feature selection could accelerate the classification system up to four times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alsahaf, A., Petkov, N., Shenoy, V., Azzopardi, G.: A framework for feature selection through boosting. Exp. Syst. Appl. 187, 115895 (2022)
Alweshah, M., Alkhalaileh, S., Al-Betar, M.A., Bakar, A.A.: Coronavirus herd immunity optimizer with greedy crossover for feature selection in medical diagnosis. Knowl. Based Syst. 235, 107629 (2022)
Bi, K., Zhang, D., Qiu, T., Huang, Y.: GC-MS fingerprints profiling using machine learning models for food flavor prediction. Processes 8(1), 23 (2020)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Eder, K.: Gas chromatographic analysis of fatty acid methyl esters. J. Chromatogr. B Biomed. Sci. Appl. 671(1–2), 113–131 (1995)
Fix, E., Hodges, J.L.: Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev./Revue Internationale de Statistique 57(3), 238–247 (1989)
Hand, D.J., Yu, K.: Idiot’s bayes-not so stupid after all? Int. Stat. Rev. 69(3), 385–398 (2001)
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the International Conference on Neural Networks, ICNN 1995, vol. 4, pp. 1942–1948. IEEE (1995)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence, pp. 388–391. IEEE (1995)
Loh, W.Y.: Classification and regression trees. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011)
Matyushin, D.D., Buryak, A.K.: Gas chromatographic retention index prediction using multimodal machine learning. IEEE Access 8, 223140–223155 (2020)
Nguyen, B.H., Xue, B., Zhang, M.: A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol. Comput. 54, 100663 (2020)
Nguyen, H.B., Xue, B., Andreae, P., Zhang, M.: Particle swarm optimisation with genetic operators for feature selection. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 286–293 (2017). https://doi.org/10.1109/CEC.2017.7969325
Panse, M.L., Phalke, S.D.: World market of omega-3 fatty acids. Omega-3 Fatty Acids, pp. 79–88 (2016)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Restek: High-resolution GC analyses of fatty acid methyl esters (FAMEs)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1), 23–69 (2003)
Simopoulos, A.P.: Evolutionary aspects of diet: the omega-6/omega-3 ratio and the brain. Mol. Neurobiol. 44(2), 203–215 (2011)
Tomasi, G., Van Den Berg, F., Andersson, C.: Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. A J. Chemometr. Soc. 18(5), 231–241 (2004)
Tran, C.T., Zhang, M., Andreae, P.: Multiple imputation for missing data using genetic programming. In: The Annual Conference on Genetic and Evolutionary Computation, pp. 583–590 (2015)
Zhang, D., Huang, X., Regnier, F.E., Zhang, M.: Two-dimensional correlation optimized warping algorithm for aligning GC\(\times \)GC-MS data. Anal. Chem. 80(8), 2664–2671 (2008)
Zhang, Y., Gong, D.w., Gao, X.z., Tian, T., Sun, X.y.: Binary differential evolution with self-learning for multi-objective feature selection. Inf. Sci. 507, 67–85 (2020)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wood, J., Nguyen, B.H., Xue, B., Zhang, M., Killeen, D. (2022). Automated Fish Classification Using Unprocessed Fatty Acid Chromatographic Data: A Machine Learning Approach. In: Aziz, H., Corrêa, D., French, T. (eds) AI 2022: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13728. Springer, Cham. https://doi.org/10.1007/978-3-031-22695-3_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-22695-3_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22694-6
Online ISBN: 978-3-031-22695-3
eBook Packages: Computer ScienceComputer Science (R0)