Loading metrics

Open Access

Peer-reviewed

Research Article

Predicting microbe organisms using data of living micro forms of life and hybrid microbes classifier

Ali Raza,

Roles Conceptualization, Formal analysis, Writing – original draft

Affiliation Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
⨯
Furqan Rustam,

Roles Data curation, Software, Writing – original draft

Affiliation School of Computer Science, University College Dublin, Dublin, Ireland
⨯
Hafeez Ur Rehman Siddiqui,

Roles Investigation, Methodology, Project administration

Affiliation Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
⨯
Isabel de la Torre Diez ,

Roles Funding acquisition, Resources, Visualization

* E-mail: isator@tel.uva.es (ITD); imranashraf@ynu.ac.kr (IA)

Affiliation Department of Signal Theory and Communications and Telematic Engineering, University of Valladolid, Valladolid, Spain
⨯
Imran Ashraf

Roles Supervision, Validation, Writing – review & editing

* E-mail: isator@tel.uva.es (ITD); imranashraf@ynu.ac.kr (IA)

Affiliation Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea

https://orcid.org/0000-0002-8271-6496

⨯

Predicting microbe organisms using data of living micro forms of life and hybrid microbes classifier

Ali Raza,
Furqan Rustam,
Hafeez Ur Rehman Siddiqui,
Isabel de la Torre Diez,
Imran Ashraf

Published: April 20, 2023
https://doi.org/10.1371/journal.pone.0284522

Figures

Abstract

Microbe organisms make up approximately 60% of the earth’s living matter and the human body is home to millions of microbe organisms. Microbes are microbial threats to health and may lead to several diseases in humans like toxoplasmosis and malaria. The microbiological toxoplasmosis disease in humans is widespread, with a seroprevalence of 3.6-84% in sub-Saharan Africa. This necessitates an automated approach for microbe organisms detection. The primary objective of this study is to predict microbe organisms in the human body. A novel hybrid microbes classifier (HMC) is proposed in this study which is based on a decision tree classifier and extra tree classifier using voting criteria. Experiments involve different machine learning and deep learning models for detecting ten different living microforms of life. Results suggest that the proposed HMC approach achieves a 98% accuracy score, 98% geometric mean score, 97% precision score, and 97% Cohen Kappa score. The proposed model outperforms employed models, as well as, existing state-of-the-art models. Moreover, the k-fold cross-validation corroborates the results as well. The research helps microbiologists identify the type of microbe organisms with high accuracy and prevents many diseases through early detection.

Citation: Raza A, Rustam F, Siddiqui HUR, Diez IdlT, Ashraf I (2023) Predicting microbe organisms using data of living micro forms of life and hybrid microbes classifier. PLoS ONE 18(4): e0284522. https://doi.org/10.1371/journal.pone.0284522

Editor: Muhammad Fazal Ijaz, Sejong University, KOREA, REPUBLIC OF

Received: November 20, 2022; Accepted: April 2, 2023; Published: April 20, 2023

Copyright: © 2023 Raza et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: This research was supported by the European University of Atlantic. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Microorganisms are the living organisms present on earth. Microorganisms are vital in medical industries to cure many diseases and maintain environmental balance [1]. The microorganism has many forms, some are beneficial, while others are harmful. The harmful microbes cause many infectious diseases and spoil other materials such as food [2]. The microbes are tiny and cannot be seen by the naked eye. A microscope is required to analyze the microorganisms. Microorganisms live everywhere, such as soil, water, and air. Scientists identified that the human body is home to millions of microorganisms. The microorganisms are of numerous types and species [3]. Each microorganism has its significant purpose. The microorganism can be detected and classified based on its shape, size, and color. The microbe’s shape can be the type of rods, spheres, and corkscrew. The microorganism has common types such as fungi [4], viruses, archaea or protists [5], algae, and bacteria [6]. The other ten most important living microforms of life are Volvox, Spirogyra, Yeast, Pithophora, Penicillium, Raizopus, Protozoa, Aspergillus sp, Ulothrix, and Diatom. These microorganisms can be identified based on microscopic data.

The microbe organisms cause many infections and diseases such as toxoplasmosis [7] and malaria [8]. According to a 2019 report, the microbiological toxoplasmosis disease is widespread in humans, with a seroprevalence of 3.6–84% in sub-Saharan Africa [9]. According to the 2020 report of the world health organization (WHO), 241 million malaria cases are found worldwide, and the number of malaria deaths is 627000 [10]. In this regard, an automatic tool for microbe organism detection would be very beneficial to save lives through the early detection of microbiological diseases.

Machine learning and deep learning have witnessed widespread use over the past decade. Artificial intelligence-based tools and techniques are widely used to process, and analyze massive amounts of medical data [11]. Artificial intelligence helps in bioinformatics for decisions making in numerous diseases using predictive analysis. Disease prediction and medical image processing [12] are the primary applications of artificial intelligence. Artificial intelligence algorithms provide the best performance on large-scale data such as the data of microorganisms [13]. With their wide deployment and superior performance, machine learning models have been adopted in disease prediction and biomedical data analytics. Researchers mostly used classical machine learning models for predicting the microorganisms in previously published studies. The prediction performance of previous studies is low using the classical machine learning models. The ensemble learning techniques were applied to enhance the prediction performance task. Keeping in view their outstanding results, this study follows a machine learning-based approach and makes the following primary contributions toward the prediction of the microbe organisms

Microbe exploratory data analysis (MEAA) is applied to determine the dataset patterns and valuable insights for predicting the microbe organisms. The MEAA is based on the data graphs and charts representing the relations of dataset features.
A novel hybrid microbes classifier (HMC) is proposed based on a decision tree classifier (DTC) and extra tree classifier (ETC) techniques for predicting microbe organisms. The final prediction is made using the voting criterion. Experiments involve multi-class classification with ten classes including Aspergillus sp, Diatom, Penicillium, Pithophora, Protozoa, Raizopus, Spirogyra, Ulothrix, Volvox, and Yeast.
Ten machine learning and deep learning-based models are applied in comparison to the proposed approach for predicting microbe organisms. The multi-layer perceptron classifier (MLP), DTC, random forest classifier (RFC), logistic regression (LR), k-nearest neighbors (KNN), gradient boosting classifier (GBC), ETC, and support vector machines (SVM) are employed in this regard. Also, long short-term memory (LSTM) and gated recurrent unit (GRU) is used as the deep learning models. The performance is analyzed with respect to the accuracy, precision, recall, F1 score, and k-fold cross-validation

The remainder of this study is organized as follows. Section 2 is based on the related literature analysis. The methodology and proposed approach are discussed in Section 3. Experimental results and discussions are given in Section 4. Finally, the study is concluded in Section 5.

Related work

The identification of microbial contaminants in the pharmaceutical industry using a deep learning-based approach is studied in [14]. The Raman spectroscopy dataset is utilized to build the deep learning model. The dataset target microbial contaminants are gram-positive bacteria, gram-negative bacteria, and fungi. The convolution neural network (CNN) is used for experiments which achieve a 95% accuracy score for microbial contaminants prediction. The prediction of personalized antibiograms in microbiology using machine learning is carried out in [15]. The electronic health record data of 8342 infections and 15806 uncomplicated urinary tract infections is utilized for the model building. The gradient boosted tree (GBT) shows outstanding results among the employed machine learning models. The personalized antibiograms performance coverage rate is 90% using the proposed technique.

The generation and classification of microbial colonies images using deep learning-based models is studied in [16]. The synthetic microbial colonies dataset of Petri dishes [17] is utilized. The multi-class data of five different microbial species are utilized for classification. The R-CNN model is employed for generating and detecting microbial colonies. The proposed approach achieved a mean squared error score of 4.49 and a mean average precision accuracy score of 0.520.

The study [18] performs the detection of candida albicans fluconazole resistance using a machine learning approach. The combined dataset based on matrix-assisted laser ionization (MALI), time-of-flight (TOF), and mass spectrometry (MS) is utilized for building machine learning models. The authors leverage the linear discriminant analysis (LDA) for the detection of candida albicans which yields an 85% accuracy. Similarly, [19] proposed the detection of carbapenem-resistant Klebsiella pneumoniae in microbiology using a supervised machine learning approach. The MALDI-TOF MS data is utilized in this research. The study proposes a modified random forest (RF) technique that achieves an accuracy score of 97% for the detection task.

The prediction of methicillin-resistant Staphylococcus aureus using machine learning methods is studied in [20]. The MALDI-TOF MS spectrum data is utilized with the SVM model. Results show an accuracy of 86% using the SVM. The authors study the classification of group B Streptococcus serotypes in [21]. The MALDI-TOF MS data is utilized with SVM and RF models. Results suggest that the RF model outperforms with an accuracy score of 87%.

Skin syndrome detection based on deep neural networks is presented in this study [22]. The deep learning-based techniques MobileNet and long short-term memory (LSTM) are utilized to classify skin disease in real time. The proposed model achieved 85% accuracy on the HAM10000 dataset. However, it can be further improved by fine-tuning different parameters. The automatic detection of Alzheimer’s disease using the fusion-based approach with a heterogeneous ensemble classifier is proposed in [23]. The proposed framework is utilized to predict Alzheimer’s disease based on multimodal time-series data. The dataset is based on 1371 subjects from the Alzheimer’s disease neuroimaging initiative (ADNI). Experimental results show that the proposed model achieves superior results in comparison with the state-of-the-art technique for Alzheimer’s prediction.

The related literature in the context of predicting microbe organisms is examined in this section. The related research proposed approach, dataset, performance score, and the main aim of the research are analyzed. The past applied state-of-the-art approaches are comparatively analyzed in Table 1.

Download:

Table 1. The analysis of related literature in the context of predicting the microbe organisms.

https://doi.org/10.1371/journal.pone.0284522.t001

Study methodology

The methodological analysis of the proposed approach for predicting the microbe organisms in microbiology is visualized in Fig 1. The data of different living microforms of life is utilized for conducting the research experiments. The MEDA is applied to obtain critical insights and patterns in predicting the microbes. The target class in data is encoded to numeric form to transform the labels into machine readable form. The preprocessed data is split into train and test portions with a ratio of 0.8 to 0.2. The novel proposed HMC approach is trained with 80% of data and evaluated using 20% of unseen test data. The proposed HMC approach is fully hyper parameterized to obtain the best results.

Download:

Fig 1. The architecture of the proposed approach for predicting microbe organisms.

It involves data collection, exploratory data analysis, model training and testing.

https://doi.org/10.1371/journal.pone.0284522.g001

Microbe organisms data

The research utilizes the data of different living microforms of life that is publicly available at Kaggle [24] and used in a DPhi challenge [25]. The data contains ten different living microforms of life, which are Volvox, Spirogyra, Yeast, Pithophora, Penicillium, Raizopus, Protozoa, Aspergillus sp, Ulothrix, and Diatom. The description of the different features, types, and counts are given in Table 2. The utilized dataset features are based on the 21368 microscopic object images of different living microforms of life. The dataset is based on the 25 microscopic object features which are used to predict microbe organisms in our research study.

Download:

Table 2. Description of dataset features.

https://doi.org/10.1371/journal.pone.0284522.t002

Microbe exploratory data analysis

MEDA is applied to the research dataset to determine patterns and valuable insights in predicting microbe organisms. The graph and chart-based MEDA are performed, representing the relations of dataset features.

The bar chart-based microorganisms target label frequency analysis is performed in Fig 2. The frequency for each label is represented in the chart’s y-axis. The analysis demonstrates that the target label Ulothrix contains 5194, Volvox contains 3024, Protozoa contains 2721, Aspergillus sp contains 2721, Yeast contains 2520, Raizopus contains 1786, Diatom contains 1273, Pithophora contains 945, Penicillum contains 756, and Spirogyra contains 428 instances. This analysis shows that Ulothrix class contains a high number of instances, and Spirogyra contains the lowest number of instances.

Download:

Fig 2. The bar chart-based frequency analysis of each microorganism target label showing the number of samples in each class.

https://doi.org/10.1371/journal.pone.0284522.g002

The statistical correlation analysis is visualized in Fig 3. The correlation is utilized to determine the linear relationship between two dataset features and analyze their association. This explains how features are related to each other. The analysis demonstrates that the features Extrema, BoundingBox, ConvexHull, and centroid have high correlation values. The features MajorAxisLength, MinorAxisLength, Perimeter, and ConvexArea also have good correlations association. The features Solidity, Extent, and EulerNumber have negative correlation values.

Download:

Fig 3. The correlation analysis of employed dataset features indicating the importance of features regarding the target class.

https://doi.org/10.1371/journal.pone.0284522.g003

The scatter plot-based analysis of different data features is shown in Figs 4 and 5. The scatter plot is primarily utilized to determine the relationships between two dataset features. The dot values in the scatter plot represent the patterns involved in the prediction process. The purpose of the scatter plot is to observe the relation when the values of features change. The scatter plot analysis of features Solidity and Eccentricity along with the target class is visualized in Fig 4. The analysis demonstrates that the microorganisms have the Solidity and Eccentricity feature values in the range of 5 to 30. The analysis shows that the Raizopus microbe is identified when the Solidity values are above 15 and less than 20. All other microbes are identified when the Solidity values are less than 18 and Eccentricity values are above 5. There is a high chance of microorganism detection when the Eccentricity values are above 15 and the Solidity values are above 3.

Download:

Fig 4. The scatter plot showing the distribution of features regarding Solidity and Eccentricity along with the target class.

https://doi.org/10.1371/journal.pone.0284522.g004

Download:

Fig 5. The scatter showing the distribution of features regarding Extent and Orientation along with the target class.

https://doi.org/10.1371/journal.pone.0284522.g005

The scatter plot analysis of features Extent and Orientation along with the target class is visualized in Fig 5. The analysis demonstrates that the microorganisms have Extent feature values in the range of 0 to 20 and Orientation feature values in the range of 0 to 30. The analysis shows that the microorganisms are identified when the Extent values are between 0 and 15. The high chances of Raizopus microbe detection when the Extent values are above 10.

Label encoding and data splitting

We have transformed the dataset target class labels into the machine-readable numeric form using the label encoding technique. The label encoder module from scikit-learn is utilized for the encoding process. The module encodes the target labels with a value between 0 and the total number of classes. Data splitting is a crucial part of machine learning which is applied to split the data into training and testing sets. We split the microbe dataset into 80–20 train-test splits.

Proposed hybrid classifier

A novel HMC is proposed based on a hybrid of DTC and ETC for predicting microbe organisms. The architecture of the proposed HMC approach is shown in Fig 6. The data of different living microforms of life is input to both DTC, and ETC approaches. The DTC and ETC are combined to predict the microbe organisms. The class with the majority of votes from individual predictions is taken to make the final prediction using voting. The final predictions are obtained by using ‘hard’ voting.

Download:

Fig 6. The architecture of the proposed HMC approach showing the voting process for the hybrid classifier.

https://doi.org/10.1371/journal.pone.0284522.g006

The proposed hybrid classifier is based on the combination of multiple supervised classifiers. The key objective of the proposed ensemble method is to reduce variance and bias thus enhancing the prediction performance. The ensemble hybrid methods are proven to show better performance where the dataset has a higher number of features. The predictions of each classifier are passed to the voting classifier to predict the output class based on the majority voting. The prediction performance is improved by resolving the error of each classifier during voting.

Employed machine learning models

The applied machine learning and deep learning models for predicting microbe organisms in microbiology are analyzed in this section.

The DTC is a supervised machine learning model commonly used to solve classification problems [26]. The DTC follows the tree structure to make a decision on data samples. The leaf nodes in the tree contain the target class labels, the tree branches represent the decision rules, and the internal nodes contain the data attributes. The Gini index is mainly utilized in DTC to select the best data attributes during tree constructions as expressed in Eq 1, where p represents the probability of data attributes. (1)

RFC is an ensemble learning model which utilizes decision trees [27]. The RFC model works similarly to the DTC model. In the RFC model, multiple decision trees are created for prediction tasks instead of creating a single tree. The prediction outcomes from multiples tree are taken to make the final prediction. RFC helps to improve the prediction accuracy and control model over-fitting.

ETC is also an ensemble learning method widely used for the classification task [28]. The bagged decision trees are constructed in the ETC model for prediction. ETC is similar to the RFC model. The only difference is the tree-based forest construction in the ETC model. The predictions from multiple de-correlated decision trees are aggregated to make the final prediction.

GBC is an ensemble Learning model [29]. The GBC model combines multiple weak classifiers into a robust classifier to obtain high accuracy. During training, each weak classifier improves accuracy and reduces errors. The gradient boosting is based on the decision trees.

KNN is a non-parametric learning classifier mainly used for classification and regression problems [30]. The KNN model makes the groups of data have similar properties. The Euclidean distance metric is utilized to find the similarity between data points. For each data point, the Euclidean distance values are determined by the data points near it.

LR is another widely used supervised method primarily used to solve classification problems [31]. LR model determines the relationship between the independent and dependent variables. LR is a statistical method that utilizes a logistic sigmoid function for classification tasks. The probabilistic values lie between zero and one for using the logistic sigmoid function. Eq 2, represents the prediction process by the LR model. (2) where y is the predicted class, b₀ is the bias term, and b₁ is the coefficient for input x.

SVM is a supervised method that utilizes the support vectors to classify the data points [32]. The primary motive of the SVM model is to determine the best-fit decision boundary. The best-fit decision boundary classifies the n-dimensional feature space data into the target label. The best-fit decision boundary is also known as the hyperplane [33]. The error is minimized by the iterative process of finding the best-fit decision boundary. SVM selects the extreme support vectors to create the hyperplane. The best-fit hyperplane is represented in Eq 3. (3) where w represents the weight matrix, x represents the input features and b indicates the biased values.

MLP is a feedforward artificial neural network-based supervised learning model [34]. The artificial neural network uses many representation layers to process the data. The model layers contain neuron units in the network. The layers have the graph representation between the input and output layers. The backpropagation technique [35] is utilized in the MLP model to train the network.

LSTM model is a recurrent neural network known best for learning long-term sequences [36]. The primary motive behind the LSTM model is to remember the long sequences for a long period. The LSTM model contains three gates for processing: input gate, output gate, and forget gate. The LSTM model has a high number of training parameters that use high memory.

GRU model is a recurrent neural network [37]. It contains two gates: the update gate and the reset gate which are utilized for its working mechanism. The GRU model has less complexity than the LSTM model due to a smaller number of gates. The GRU model uses fewer training parameters that use less memory and execute faster. The GRU and LSTM model benefit from overcoming the vanishing gradient problem.

The hyperparameter tuning and optimization techniques [38] is based on the iterative process of training and evaluation of learning models. In the iterative tuning process, the parameters on which the learning model gives the best performance accuracy scores are considered the best-fit hyperparameters. The best-fit hyperparameters result in higher accuracy scores for predicting the microbe organisms in this study. The final selected hyperparameters for learning models are given in Table 3.

Download:

Table 3. The hyperparameters of employed learning techniques.

https://doi.org/10.1371/journal.pone.0284522.t003

Results and discussions

Results and discussions are presented in this section. The results of all the machine learning and deep learning models are compared. The performance evaluation is based on accuracy, error rate, precision, recall, F1, cohen kappa, and the geometric mean score.

Experimental setup

The Python 3.0 programming tool [39] is utilized to conduct all experiments. The modules Keras version 2.8.0 and TensorFlow version 2.8.2 are used for building deep learning models. Machine learning models are built using the Scikit-learn module version 1.0.2. The platform with 13GB RAM and a 2.20GHz CPU is used to complete the experiments.

Results of machine learning and deep learning models

Experimental results of all the models are given in Table 4. Results indicate that the proposed approach obtains the best results with 98% accuracy and geometric mean,97% precision and Cohen Kappa, and 96% recall and F1 scores. Regarding the training time, propose approach takes 1.386 seconds which is higher than only KNN, ETC, and DTC which take 0.028, 0.522, and 1.242 seconds, respectively.

Download:

Table 4. Performance analysis of employed machine and deep learning techniques with the proposed technique.

https://doi.org/10.1371/journal.pone.0284522.t004

The second best accuracy is obtained jointly by the DTC, RFC and ETC which obtain 97% accuracy, as shown in Fig 7. Machine learning models tend to perform better on average, except for SVM and LR which obtains 41% and 44% accuracy, respectively. Deep learning models show poor performance and obtain the lowest accuracy scores of 30% and 34% for LSTM and GRU models. Due to the smaller dataset, the models can not get a good fit and show poor results.

Download:

Fig 7. Comparative analysis of employed machine learning and deep learning models in terms of accuracy and recall.

https://doi.org/10.1371/journal.pone.0284522.g007

The pie chart-based error rate comparative analysis of employed learning techniques is visualized in Fig 8. The analysis demonstrates that the proposed approach has the minimum error rate indicating high-performance accuracy scores for the microbe organism predictions. Based on this analysis, the proposed approach has a 0.7% error rate. The high error rate of 22% is achieved by the LSTM model, which indicates the low accuracy scores. The analysis shows that DTC and RFC have the same error rate of 0.8%, indicating maximum accuracy scores.

Download:

Fig 8. Comparative analysis of employed machine learning and deep learning models in terms of prediction error rate.

https://doi.org/10.1371/journal.pone.0284522.g008

The classification report based on individual categories is given in Table 5. The analysis demonstrates that the organism’s categories Penicillum and Raizopus achieved a 100% score for all performance metrics. The categories Protozoa and Raizopus achieved 100% scores for recall and F1 score measures, respectively. The average performance metrics scores for all are between 96% to 97%. This analysis validates the proposed model results and demonstrates the high accuracy scores for the microbe’s organism’s predictions.

Download:

Table 5. Individual class-vise report of the proposed approach.

https://doi.org/10.1371/journal.pone.0284522.t005

Results of k-fold cross-validation

The k-fold cross-validation results of employed learning techniques are given in Table 6. The 10-fold cross-validation results demonstrate that the proposed approach achieves a high accuracy score of 98%. The standard deviation score of the proposed approach is ±0.0033, which is the minimum compared to other techniques. The lowest accuracy score is archived by the SVM technique, which is 24% for 10-fold cross-validation. This analysis validates that the proposed model can provide generalized results for predicting microbe organisms.

Download:

Table 6. K-fold cross-validation results of employed models.

https://doi.org/10.1371/journal.pone.0284522.t006

Comparison with state-of-the-art approaches

The comparative performance analysis of other state-of-the-art studies is given in Table 7. The state-of-the-art studies from 2019 to 2022 are considered. These studies employ different models line RF, logit boost, KNN, and GRU. For a fair comparison, the models are implemented on the dataset used in this study. Accuracy, recall, and geometric mean scores are utilized for comparison. The analysis demonstrates that the proposed approach outperforms the state-of-the-art studies with high accuracy for predicting microbe organisms.

Download:

Table 7. Performance analysis of the proposed approach with state-of-the-art studies.

https://doi.org/10.1371/journal.pone.0284522.t007

Discussion

The prediction of the microbe organisms using the data of different living microforms of life is presented in this study. An ensemble method based on a hybrid of DTC and ETC techniques is used for the prediction task. Experiments are performed using many machine learning and deep learning models for performance comparisons like DTC, RFC, LR, KNN, GBC, ETC SVM, MLP, LSTM, and GRU. These models are optimized regarding different hyperparameters to obtain the best results. For performance analysis, Cohen Kappa and geometric mean are used in addition to error rate, accuracy, recall, precision, and F1 score. Moreover, training time is also used to estimate the computational complexity of models. Results reveal that DTC, RFC, and ETC obtain the best results among machine learning models with moderate training time. On the other hand, deep learning models show poor performance and have a higher training time. The proposed approach obtains the best performance compared to both machine learning and deep learning models with 98% accuracy and geometric mean each. In addition, its error rate of 0.024 is also the lowest among all models. K-fold cross-validation proves the robustness of the proposed approach. Similarly, performance comparison with existing state-of-the-art studies shows that the results from the proposed approach are superior. The research study helps microbiologists for the identification of different types of microbe organisms with high accuracy.

Conclusions

The human body contains millions of microbe organisms that carry out both positive and negative activities. Microbe organisms can cause different infections and diseases and their prediction can be vital for the early detection of diseases. This study proposes an automatic approach for the prediction of ten types of microbe organisms like Aspergillus sp, Diatom, Penicillum, Pithophora, Protozoa, Raizopus, Spirogyra, Ulothrix, Volvox, and Yeast. The proposed hybrid approach, comprising DTC and ETC, shows better accuracy than employed machine learning and deep learning models and obtains a 98% accuracy. Similarly, the geometric mean, recall, precision, and F1 scores are the best among all the models and it obtains the lowest error of 0.024. K-fold cross-validation and performance comparison with state-of-the-art methods further validate its superior performance. Owing to the poor performance of deep learning models, we intend to incorporate a large dataset in the future. Similarly, using transfer learning and multi-class data balancing is also intended.

Supporting information

S1 Dataset.

https://doi.org/10.1371/journal.pone.0284522.s001

(ZIP)

References

1. Horve PF, Lloyd S, Mhuireach GA, Dietz L, Fretz M, MacCrone G, et al. Building upon current knowledge and techniques of indoor microbiology to construct the next era of theory into microorganisms, health, and the built environment. Journal of Exposure Science & Environmental Epidemiology. 2020;30(2):219–235. pmid:31308484
- View Article
- PubMed/NCBI
- Google Scholar
2. Hou J, Pugazhendhi A, Phuong TN, Thanh NC, Brindhadevi K, Velu G, et al. Plant resistance to disease: Using biochar to inhibit harmful microbes and absorb nutrients. Environmental Research. 2022; p. 113883. pmid:35835163
- View Article
- PubMed/NCBI
- Google Scholar
3. D’Abramo F, Neumeyer S. A historical and political epistemology of microbes. Centaurus. 2020;62(2):321–330. pmid:32834061
- View Article
- PubMed/NCBI
- Google Scholar
4. Cao J, Feng Y, Lin X, Wang J. A beneficial role of arbuscular mycorrhizal fungi in influencing the effects of silver nanoparticles on plant-microbe systems in a soil matrix. Environmental Science and Pollution Research. 2020;27(11):11782–11796. pmid:31975001
- View Article
- PubMed/NCBI
- Google Scholar
5. Gawryluk RM, Stairs CW. Diversity of electron transport chains in anaerobic protists. Biochimica et Biophysica Acta (BBA)-Bioenergetics. 2021;1862(1):148334. pmid:33159845
- View Article
- PubMed/NCBI
- Google Scholar
6. Caruana JC, Walper SA. Bacterial membrane vesicles as mediators of microbe–microbe and microbe–host community interactions. Frontiers in microbiology. 2020;11:432. pmid:32265873
- View Article
- PubMed/NCBI
- Google Scholar
7. Fisch D, Yakimovich A, Clough B, Mercer J, Frickel EM. Image-Based Quantitation of Host Cell–Toxoplasma gondii Interplay Using HRMAn: A Host Response to Microbe Analysis Pipeline. In: Toxoplasma gondii. Springer; 2020. p. 411–433.
8. Joice Cordy R. Mining the human host metabolome toward an improved understanding of malaria transmission. Frontiers in Microbiology. 2020;11:164. pmid:32117175
- View Article
- PubMed/NCBI
- Google Scholar
9. Mboera LE, Kishamawe C, Kimario E, Rumisha SF. Mortality patterns of toxoplasmosis and its comorbidities in Tanzania: a 10-year retrospective hospital-based survey. Frontiers in Public Health. 2019;7:25. pmid:30838195
- View Article
- PubMed/NCBI
- Google Scholar
10. Malaria; 2022. Available from: https://www.who.int/news-room/fact-sheets/detail/malaria.
11. Montoya OLQ, Paniagua JG. From artificial intelligence to deep learning in bio-medical applications. In: Deep Learners and Deep Learner Descriptors For Medical Applications. Springer; 2020. p. 253–284.
12. Gore JC. Artificial intelligence in medical imaging; 2020.
- View Article
- Google Scholar
13. Zhang Y, Jiang H, Ye T, Juhas M. Deep learning for imaging and detection of microorganisms. Trends in Microbiology. 2021;29(7):569–572. pmid:33531192
- View Article
- PubMed/NCBI
- Google Scholar
14. Maruthamuthu MK, Raffiee AH, De Oliveira DM, Ardekani AM, Verma MS. Raman spectra-based deep learning: A tool to identify microbial contamination. MicrobiologyOpen. 2020;9(11):e1122. pmid:33063423
- View Article
- PubMed/NCBI
- Google Scholar
15. Corbin CK, Sung L, Chattopadhyay A, Noshad M, Chang A, Deresinksi S, et al. Personalized antibiograms for machine learning driven antibiotic selection. Communications medicine. 2022;2(1):1–14. pmid:35603264
- View Article
- PubMed/NCBI
- Google Scholar
16. Pawłowski J, Majchrowska S, Golan T. Generation of microbial colonies dataset with deep learning style transfer. Scientific Reports. 2022;12(1):1–12. pmid:35338253
- View Article
- PubMed/NCBI
- Google Scholar
17. Wei J, Suriawinata A, Ren B, Liu X, Lisovsky M, Vaickus L, et al. A petri dish for histopathology image analysis. In: International Conference on Artificial Intelligence in Medicine. Springer; 2021. p. 11–24.
18. Delavy M, Cerutti L, Croxatto A, Prod’hom G, Sanglard D, Greub G, et al. Machine learning approach for Candida albicans fluconazole resistance detection using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Frontiers in microbiology. 2020;10:3000. pmid:32010083
- View Article
- PubMed/NCBI
- Google Scholar
19. Huang TS, Lee SSJ, Lee CC, Chang FC. Detection of carbapenem-resistant Klebsiella pneumoniae on the basis of matrix-assisted laser desorption ionization time-of-flight mass spectrometry by using supervised machine learning approach. PLoS One. 2020;15(2):e0228459. pmid:32027671
- View Article
- PubMed/NCBI
- Google Scholar
20. Wang HY, Lee TY, Tseng YJ, Liu TP, Huang KY, Chang YT, et al. A new scheme for strain typing of methicillin-resistant Staphylococcus aureus on the basis of matrix-assisted laser desorption ionization time-of-flight mass spectrometry by using machine learning approach. PloS one. 2018;13(3):e0194289. pmid:29534106
- View Article
- PubMed/NCBI
- Google Scholar
21. Wang HY, Li WC, Huang KY, Chung CR, Horng JT, Hsu JF, et al. Rapid classification of group B Streptococcus serotypes based on matrix-assisted laser desorption ionization-time of flight mass spectrometry and machine learning techniques. BMC bioinformatics. 2019;20(19):1–17.
- View Article
- Google Scholar
22. Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors. 2021;21(8):2852. pmid:33919583
- View Article
- PubMed/NCBI
- Google Scholar
23. El-Sappagh S, Ali F, Abuhmed T, Singh J, Alonso JM. Automatic detection of Alzheimer’s disease progression: An efficient information fusion approach with heterogeneous ensemble classifiers. Neurocomputing. 2022;512:203–224.
- View Article
- Google Scholar
24. SAYAN SAHA. Microbes Dataset | Kaggle; 2022. Available from: https://www.kaggle.com/datasets/sayansh001/microbes-dataset.
25. DPhi. Data sprint 71—Microbes Classification | DPhi; 2022. Available from: https://dphi.tech/challenges/data-sprint-71-microbes-classification/207/overview/about.
26. Mahela OP, Shaik AG, Khan B, Mahla R, Alhelou HH. Recognition of complex power quality disturbances using S-transform based ruled decision tree. IEEE Access. 2020;8:173530–173547.
- View Article
- Google Scholar
27. Liu K, Hu X, Zhou H, Tong L, Widanage WD, Marco J. Feature analyses and modeling of lithium-ion battery manufacturing based on random forest classification. IEEE/ASME Transactions on Mechatronics. 2021;26(6):2944–2955.
- View Article
- Google Scholar
28. Raza A, Munir K, Almutairi M, Younas F, Fareed MMS. Predicting Employee Attrition Using Machine Learning Approaches. Applied Sciences. 2022;12(13):6424.
- View Article
- Google Scholar
29. Khan MSI, Islam N, Uddin J, Islam S, Nasir MK. Water quality prediction and classification based on principal component regression and gradient boosting classifier approach. Journal of King Saud University-Computer and Information Sciences. 2021;.
- View Article
- Google Scholar
30. Kumbure MM, Luukka P, Collan M. A new fuzzy k-nearest neighbor classifier based on the Bonferroni mean. Pattern Recognition Letters. 2020;140:172–178.
- View Article
- Google Scholar
31. Manoharan H, Teekaraman Y, Kirpichnikova I, Kuppusamy R, Nikolovski S, Baghaee HR. Smart grid monitoring by wireless sensors using binary logistic regression. Energies. 2020;13(15):3974.
- View Article
- Google Scholar
32. Leong W, Kelani R, Ahmad Z. Prediction of air pollution index (API) using support vector machine (SVM). Journal of Environmental Chemical Engineering. 2020;8(3):103208.
- View Article
- Google Scholar
33. Hao PY, Kung CF, Chang CY, Ou JB. Predicting stock price trends based on financial news articles and using a novel twin support vector machine with fuzzy hyperplane. Applied Soft Computing. 2021;98:106806.
- View Article
- Google Scholar
34. Zheng H, Wang G, Li X. Swin-MLP: a strawberry appearance quality identification method by Swin Transformer and multi-layer perceptron. Journal of Food Measurement and Characterization. 2022; p. 1–12.
- View Article
- Google Scholar
35. Wright LG, Onodera T, Stein MM, Wang T, Schachter DT, Hu Z, et al. Deep physical neural networks trained with backpropagation. Nature. 2022;601(7894):549–555. pmid:35082422
- View Article
- PubMed/NCBI
- Google Scholar
36. Xayasouk T, Lee H, Lee G. Air pollution prediction using long short-term memory (LSTM) and deep autoencoder (DAE) models. Sustainability. 2020;12(6):2570.
- View Article
- Google Scholar
37. Que Z, Jin X, Xu Z. Remaining useful life prediction for bearings based on a gated recurrent unit. IEEE Transactions on Instrumentation and Measurement. 2021;70:1–11.
- View Article
- Google Scholar
38. Agrawal T. Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient. Springer; 2021.
39. Chandra Y, Jana A. Sentiment analysis using machine learning and deep learning. In: 2020 7th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE; 2020. p. 1–4.
40. Ryan FJ. Application of machine learning techniques for creating urban microbial fingerprints. Biology direct. 2019;14(1):1–13. pmid:31420049
- View Article
- PubMed/NCBI
- Google Scholar
41. Thompson J, Johansen R, Dunbar J, Munsky B. Machine learning to predict microbial community functions: an analysis of dissolved organic carbon from litter decomposition. PLoS One. 2019;14(7):e0215502. pmid:31260460
- View Article
- PubMed/NCBI
- Google Scholar
42. Bang S, Yoo D, Kim SJ, Jhang S, Cho S, Kim H. Establishment and evaluation of prediction model for multiple disease classification based on gut microbial data. Scientific reports. 2019;9(1):1–9. pmid:31308384
- View Article
- PubMed/NCBI
- Google Scholar
43. Riekeles M, Schirmack J, Schulze-Makuch D. Machine learning algorithms applied to identify microbial species by their motility. Life. 2021;11(1):44. pmid:33445805
- View Article
- PubMed/NCBI
- Google Scholar
44. Shi H, Zhang S. Accurate Prediction of Anti-hypertensive Peptides Based on Convolutional Neural Network and Gated Recurrent unit. Interdisciplinary Sciences: Computational Life Sciences. 2022; p. 1–16. pmid:35474167
- View Article
- PubMed/NCBI
- Google Scholar
45. Singh N, Bhatnagar S. Machine Learning for Prediction of Drug Targets in Microbe Associated Cardiovascular Diseases by Incorporating Host-pathogen Interaction Network Parameters. Molecular Informatics. 2022;41(3):2100115. pmid:34676983
- View Article
- PubMed/NCBI
- Google Scholar

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Machine learning
Is the Subject Area "Machine learning" applicable to this article?

Thanks for your feedback.
Deep learning
Is the Subject Area "Deep learning" applicable to this article?

Thanks for your feedback.
Decision tree learning
Is the Subject Area "Decision tree learning" applicable to this article?

Thanks for your feedback.
Supervised machine learning
Is the Subject Area "Supervised machine learning" applicable to this article?

Thanks for your feedback.
Trees
Is the Subject Area "Trees" applicable to this article?

Thanks for your feedback.
Microbiology
Is the Subject Area "Microbiology" applicable to this article?

Thanks for your feedback.
Protozoans
Is the Subject Area "Protozoans" applicable to this article?

Thanks for your feedback.
Support vector machines
Is the Subject Area "Support vector machines" applicable to this article?

Thanks for your feedback.