Abstract
Planktonic organisms including phyto-, zoo-, and mixoplankton are key components of aquatic ecosystems and respond quickly to changes in the environment, therefore their monitoring is vital to follow and understand these changes. Advances in imaging technology have enabled novel possibilities to study plankton populations, but the manual classification of images is time consuming and expert-based, making such an approach unsuitable for large-scale application and urging for automatic solutions for the analysis, especially recognizing the plankton species from images. Despite the extensive research done on automatic plankton recognition, the latest cutting-edge methods have not been widely adopted for operational use. In this paper, a comprehensive survey on existing solutions for automatic plankton recognition is presented. First, we identify the most notable challenges that make the development of plankton recognition systems difficult and restrict the deployment of these systems for operational use. Then, we provide a detailed description of solutions found in plankton recognition literature. Finally, we propose a workflow to identify the specific challenges in new datasets and the recommended approaches to address them. Many important challenges remain unsolved including the following: (1) the domain shift between the datasets hindering the development of an imaging instrument independent plankton recognition system, (2) the difficulty to identify and process the images of previously unseen classes and non-plankton particles, and (3) the uncertainty in expert annotations that affects the training of the machine learning models. To build harmonized instrument and location agnostic methods for operational purposes these challenges should be addressed in future research.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Plankton, including phytoplankton, mixoplankton and zooplankton, is a fundamental component of aquatic ecosystems (Flynn et al. 2019; Glibert and Mitra 2022; Mitra et al. 2023). They form the basis of the food web and are essential for global biogeochemical cycles (Arrigo 2005; Hays et al. 2005). Plankton comprises a diverse array of life forms, which are associated with a variety of functions and possess strong interspecific associations (De Vargas et al. 2015). Aquatic ecosystems have been subjected to changes forced by climate and anthropogenic drivers, which already led to species loss affecting the provision of critical ecosystem services, such as water quality in some regions and productivity (Worm et al. 2006). In order to improve management practices of aquatic ecosystems, it is essential to understand functioning of planktonic communities, the distribution of different life forms and how those are affected by anthropogenic and climate changes (Rogers et al. 2022).
Phytoplankton blooms are observed when favorable conditions trigger algae growth and accumulation in the environment. Although blooms are part of natural productive cycles in aquatic ecosystems (e.g. the increased production during spring time in temperate systems), harmful blooms that are nuisance to recreational use and even hazardous also occur (Anderson et al. 2019; Zohdi and Abbaspour 2019). Due to their importance and potential adverse effect, understanding blooms is essential and efforts towards developing effective observation networks and predictive models have been made (Zhou et al. 2023). Blooms have been traditionally monitored by analyzing fixed samples under the microscope, but despite the high taxonomical information provided by the method, the high costs and time required by the method limit the number of analyzed samples (Zingone et al. 2015). Remote sensing has been employed for increasing coverage of bloom observations, showing that algal blooms have been expanding and intensifying in many coastal areas due to environmental changes (Dai et al. 2023), even though the methods yield limited taxonomical information. Thus, methods that can quickly identify different species at high speed, such as imaging, can enhance our knowledge of the bloom-forming species dynamics for understanding plankton communities and to provide more accurate data for model validation information on a higher time scale is needed (Kraft et al. 2021).
Studying and monitoring plankton is hindered by their microscopic size, fast turnover rates and close interaction with the multiscale hydrodynamics (Benfield et al. 2007). Recent advances in plankton imaging systems have led to their popularization and integration into monitoring programs, collectively accumulating information on plankton systems and simultaneously gathering massive amounts of image data (Benfield et al. 2007; Cowen and Guigand 2008; Lombard et al. 2019; Olson and Sosik 2007; Picheral et al. 2010). The major constraint to the use of these datasets lies in the expert annotation of plankton images, which is expensive, time-consuming, and error-prone. To fully benefit from the technological development and to properly explore the gathered information, there is a clear need for automated analysis methods. During recent years, significant research effort has been put into exploring and developing automated methods for performing plankton recognition based on computer vision techniques and machine learning methods (e.g. Lumini and Nanni 2019a; Orenstein and Beijbom 2017).
The research on automatic plankton image recognition has matured from early works based on hand-engineered image features combined with traditional classifiers such as support vector machine (SVM) (Cortes and Vapnik 1995) and random decision forest (RDF) (Ho 1995) (see e.g. Tang et al. 1998; Sosik and Olson 2007) to feature learning-based approaches utilizing deep learning and especially convolutional neural networks (CNNs) (Lee et al. 2016; Orenstein and Beijbom 2017; Lumini and Nanni 2019a; Kloster et al. 2020). Various custom methods and modifications to general-purpose techniques have been proposed to address the special characteristics of plankton image data. However, despite the high recognition accuracies reported in the literature, these methods have not been widely adapted to the operational use. Many instrument users do not possess computation skills and/or the resources required for implementing custom methods for image recognition and often rely on the default methods that come with the instruments, which typically follow rather simple approaches and do not fully exploit the latest advances in computer vision and machine learning. Deploying deep learning based methods for new environments often requires notable amounts of labeled training data and expert knowledge while publicly available feature engineering based plankton recognition libraries are accessible for non-experts.
Some survey papers on more general microorganism recognition, as well as utilizing machine learning for marine ecology already exist. Zhang et al. (2022) presented a review of machine learning approaches for microorganism image analysis including history, trends, and applications. The paper covers the segmentation, clustering, and classification of various types of microorganism data. Rani et al. (2021) described and compared existing microorganism recognition methods. While the challenges are briefly discussed, the discussion remains on a general level and does not go deeply into the solutions. Li et al. (2019a) provided a review on microorganism recognition for various different application domains with the focus on traditional feature engineering approaches. The survey by Goodwin et al. (2022) covers an even larger scope by addressing the utilization of deep learning methods in marine research. A similar survey was provided by Mittal et al. (2022), who presented existing methods on underwater image classification including fish, plankton, coral reefs, seagrass, and submarines. Bachimanchi et al. (2023) present a brief survey on deep learning methods for data analysis in plankton ecology including recognition, tracking, and biomass estimation. Irisson et al. (2022) provided a plankton recognition review from the application (aquatic research) point-of-view. They present a rather compact survey of the machine learning methods but provide several insights on utilizing machine learning in solving various application-related research questions. Luo et al. (2021b) considered plankton analysis using imaging flow cytometry. In addition to the different imaging technologies, also automatic image analysis methods are reviewed. Those earlier surveys either have considerably wider scope considering various machine learning tasks and organisms, and therefore, not focusing on challenges specific to plankton recognition, or a more narrow scope concentrating on certain technologies for plankton imaging, and thus, lacking a comprehensive review on plankton recognition in general.
In contrast to earlier surveys, we focus on the challenges that researchers commonly face when developing plankton recognition methods and on existing solutions to them. The main goals of this survey are (1) to provide an extensive guide on the available methods to address the challenging characteristics of plankton image data, and (2) to enumerate the challenges that remain unsolved, and which are the most beneficial directions for the future research on the topic. We identify and list the most notable challenges in automatic plankton recognition and provide detailed descriptions of the solutions found in the plankton recognition literature for each challenge. To the best of our knowledge, this is the first comprehensive survey focusing exclusively on plankton recognition and the specific challenges related to it.
The rest of the paper is organized as follows. In Sect. 2, the plankton imaging, i.e., imaging instruments and existing image datasets are reviewed. In Sect. 3, automatic plankton recognition including feature engineering and CNNs are discussed. In Sect. 4, the most notable challenges in plankton recognition are identified. In Sect. 5, the existing solutions for each challenge are described. Finally, the paper concludes with a direction for future research in Sect. 6.
2 Plankton imaging
2.1 Imaging instruments
A fundamental understanding of how plankton species composition is regulated requires frequent and sustained observations. As plankton communities are diverse and dynamic, monitoring plankton is challenging. Different types of plankton imaging and analysis systems have been developed to identify and enumerate living (plankton) and non-living particles in natural waters (Benfield et al. 2007). Instruments designed for monitoring plankton communities are briefly discussed next (see review by Lombard et al. (2019) for more detailed information). The specifications of the main imaging instruments are summarized in Table 1.
Microscopy has been widely employed for analysis of plankton, with most of the standard monitoring of plankton organisms based on brightfield microscopy (Zingone et al. 2015). With the possibility of easy magnification change, microscopy can cover the whole size range of plankton. Added to the potential to be combined with other technologies, such as fluorescence, it can provide a flexible array for visualizing planktonic organisms. When combined with a digital camera, it can generate high quality images at relatively low operational costs, although the amount of images is limited in comparison to other devices. Imaging flow cytometry (IFC) combines fluidics, optical characterization and the imaging of cells/colonies. The Imaging FlowCytobot (IFCB) (Olson and Sosik 2007) and the CytoSense/Cytobuoy (Dubelaar et al. 1999), as well as simpler flow systems such as the FlowCam (Sieracki et al. 1998) and the ZooCAM (Colas et al. 2018) are among the imaging devices most frequently used within aquatic research. The IFCB is a fully automated, submersible instrument with built-in design features that routinely operate during deployments imaging each particle triggering the camera. The CytoSense, available either as a bench top or submersible versions, records forward scatter (FSC), side scatter (SSC) and multiple fluorescence signals of each particle, additionally it can image a subset of the analysed particles. Unlike the IFCB and CytoSense, the FlowCam does not have sheath fluid and it is not an automated in situ instrument. Particle detection in IFCB and CytoSense is triggered by one of the optical sensors (scatter or fluorescence), while FlowCam captures images of a field of view at regular intervals where particles can be identified (autotrigger mode). If the FlowCam is equipped with a laser, particle imaging can be triggered by fluorescence properties, such as the presence of chlorophyll-a. The imaging resolution of the IFCB and CytoSense is targeted for a size range of approximately from larger nanoplankton to smaller mesoplankton. The targeted size range for the FlowCam vary according to the combination of flowcell and objective used and instrument versions for imaging of smaller and larger objects and organisms, FlowCam-Nano and FlowCam-Macro, respectively are currently available and image capture is based on autotrigger. The ZooCAM uses an imaging principle similar to that of FlowCam autotrigger.
For obtaining quantitative information from plankton larger than 100 μm, larger volumes of water are needed to be examined than is possible with IFC (Lombard et al. 2019). For imaging of larger particles different types of instruments have been developed utilizing slightly distinct techniques. There are many commercially available instruments such as the In-situ Ichthyoplankton Imaging System (ISIIS) (Cowen and Guigand 2008), Continuous Plankton Imaging and Classification Sensor (CPICS) (Grossmann et al. 2015), ZooScan (Gorsky et al. 2010), Video Plankton Recorder (VPR) (Davis et al. 2005), Underwater Vision Profiler (UVP) (Picheral et al. 2010), and Lightframe On-sight Keyspecies Investigation (LOKI) (Schulz et al. 2010) which are mostly in situ imaging systems and their operational principles as well as capabilities are reviewed by Lombard et al. (2019). Some instruments have been developed through research purposes but are not commercially available such as the ZooCAM and Prince William Sound Plankton Camera (PWSPC) (Campbell et al. 2020).
Some of the more recent imaging instruments include the SPC (Scripps Plankton Camera) system (Orenstein et al. 2020b), a submersible Digital Holographic Camera (DHC) instrument for temporal and spatial plankton measurements (Dyomin et al. 2020, 2019), and its modification, the miniDHC (Dyomin et al. 2021, 2019). Also HOLOCAM (Nayak et al. 2018), HoloSea (Walcutt et al. 2020; MacNeil et al. 2021), and LISST-Holo are utilized for underwater microscopy using digital holographic imaging (DHI). SPC utilizes an underwater dark-field imaging microscope combined with an onboard computer that allows real-time processing of the images, while the four latter instruments produce 3-D holograms of the imaged volume. The core principal of DHI is in the optical interference phenomenon. A coherent light source, typically a laser, produces the optical interference pattern between undeviated portion of the beam and light diffracted by the object which is recorded on the sensor, and then holograms are reconstructed with pre-/post-processed computer-based algorithms (Watson 2018). The main reasons of emerging DHI microscopy are a wide depth-of-field and field-of-view, i.e., larger sampling volume, and mechanically simpler optical configuration compared to lens-based devices (Walcutt et al. 2020; Watson 2018).As the focus of this review is on image recognition, we stress that the instrument list is not exhaustive and focus only on the most used methods found in the publications surveyed. Some instruments not detailed also include underwater microscopes, scanning electron microscopy and the capacity to image different fluorescent channels, such as the Amnis ImageStreamX Mk II Imaging Flow Cytometer (Cytek) and environmental high content fluorescence microscopy (Colin et al. 2017).
2.2 Publicly available image datasets
Publicly available image datasets are crucial on the development of the automatic plankton recognition methods since the most labor intensive part of the process is to create large labeled training and testing datasets. The available datasets are also important for the traceability and comparability of the developed methods. There are several publicly available datasets to be utilized in the research for developing the machine learning methods of plankton recognition. However, it is not always clear from the reported results if there are differences in the classification performance among the classes, and if so, which classes perform better than others. This becomes relevant to understand if there are potential class-specific biases in classifiers, which could be associated with specific size classes and robustness of the organisms. The details of the publicly available and commonly used datasets are summarized in Table 2, and example images from the datasets are shown in Fig. 1. The most frequently used datasets are ZooScanNet (Elineau et al. 2018), Kaggle-Plankton (PlanktonSet-1.0) (Cowen et al. 2015), WHOI-Plankton (Orenstein et al. 2015; Sosik et al. 2021) and their manifold task specific subsets. They all comprise grayscale images collected with a single plankton imaging instrument. UVP5/MC dataset (Kiko and Simon-Martin 2020) consists of data collected in the EcoTaxa application (Picheral et al. 2017). A part of the UVP5/MC dataset has been annotated by an expert and part with an automated tool. More recently collected datasets include PMID2019 (Li et al. 2019b), miniPPlankton (Sun et al. 2020), DYB-PlanktonNet (Li et al. 2021b), Lake-Zooplankton (Kyathanahally et al. 2021a), and the one collected by Plonus et al. (2021b). They are acquired with modern imaging instruments and characterized by the presence of color and a higher resolution. SYKE-plankton_IFCB_2022 (Kraft et al. 2022c) and SYKE-plankton_IFCB_Utö_2021 (Kraft et al. 2022a) datasets consist of IFCB images of phytoplankton collected from the Baltic Sea. There are also references to some older commonly used plankton datasets that are not available any more. One example is Automatic Diatom Identification And Classification (ADIAC) database (Du Buf et al. 1999).
3 Automatic plankton recognition
3.1 Feature engineering
A traditional solution for image classification including plankton recognition is to divide the problem into two steps: image feature extraction and classification (Blaschko et al. 2005; Bueno et al. 2017; Ellen et al. 2015; Grosjean et al. 2004; Sosik and Olson 2007; Zetsche et al. 2014; Barsanti et al. 2021). Ideally, image features form a lower-dimensional representation of the image content that contains relevant information for the classification. The main challenge is to design and select good features that are both general and provide good discrimination between the classes. As a result of feature extraction, the obtained feature vectors are used to train a classifier that can then classify unseen images. The most commonly used classifiers for plankton recognition are support vector machine (SVM) (Bernhard et al. 1992; Cortes and Vapnik 1995) and random decision forest (RDF) (Ho 1995). SVM in its most simplistic form is a binary linear classifier that works by mapping the data points in the feature space in such way that the margin between two classes is maximised. It can be extended to multi-class case, for example, by utilizing multiple binary classifiers and to non-linear classification by using a kernel trick. The RDF is a widely used classification method that is based on the observation that combining several classifiers to form an ensemble typically provides better classification performance than any of the individual classifiers. In a typical RDF, a large number of decision tree classifiers are constructed and the final classification is obtained by computing the mode of individual classifications. This way, the typical problem of overfitting in the case of decision trees is avoided.
The first work on automatic plankton image classification was presented by Tang et al. (1998). The image data were produced using a video plankton recorder (VPR) (Davis et al. 1992) and the proposed method combined texture and shape information of plankton images in a descriptor that is the combination of traditional invariant moment features and Fourier boundary descriptors with gray-scale morphological granulometries. It should be noted that some papers on automatic plankton recognition based on non-image data have been published even earlier. For example, Boddy et al. (1994) utilized light scatter and fluorescence data obtained by flow cytometry to train an artificial neural network (ANN) to classify plankton species.
Finding good image features is essential for any plankton classification system (Cheng et al. 2018; Corgnati et al. 2016). Various feature extraction technologies have been proposed and put into practice for different underwater imaging environments (Sosik and Olson 2007; Zetsche et al. 2014). Frequently used plankton features include texture features (e.g. Mosleh et al. 2012), geometric and shape features (e.g. Tan et al. 2014), color features (e.g. Ellen et al. 2015), local features (e.g. Zheng et al. 2017), and model-based features (e.g. Rivas-Villar et al. 2021). Table 4 in Appendix A categorize and summarize various features used for plankton recognition.
The most commonly used image feature type in plankton recognition is shape features (see e.g. Sosik and Olson 2007; Zetsche et al. 2014) that characterize either the contour or binary mask of the object (plankton). In their simplest form geometric features are numerical descriptors of generic geometric aspects such as major and minor axis length, perimeter, equivalent spherical diameter and area of an object computed from binarized image. Another common approach is to utilize image moments to describe the shape. Both Hu moments (Hu 1962; Thiel et al. 1995; Liu et al. 2021a; Zhao et al. 2005, 2010) and Zernike moments (Khotanzad and Hong 1990; Blaschko et al. 2005) have been proposed for plankton recognition. Also, various advanced features quantifying the shape of the contour have been proposed for plankton data. These include boundary smoothness (e.g. Tang et al. 2006; Liu and Watson 2020), affine curvature descriptors (Liu and Watson 2020), Freeman contour code features (Rodenacker et al. 2006), and elliptical Fourier descriptors [Sánchez et al. 2019a; Beszteri et al. 2018). Further geometric features applied for plankton recognition include symmetry measures (e.g. Hausdorff distance (Guo et al. 2021c; Sosik and Olson 2007)] and granulometries (Kingman 1975) utilizing morphological operations (Luo et al. 2005; Kramer 2005; Tang et al. 2006; Wu and Sheu 1998).
Other frequently used type of features in plankton recognition systems are texture features that quantify spatial distribution of intensity or color values in local image regions. While shape features consider only the boundary of plankton, texture features describe the region inside the boundary. The simplest texture features commonly applied in plankton recognition are first-order statistical descriptors that compute simple statistical values directly from the intensity values (see e.g. Lisin 2006; Zetsche et al. 2014; Guo et al. 2021c). These are sometimes called color features and include, for example, mean intensity, variance of intensity, as well as, skewness and kurtosis that quantify the shape of the color or intensity histogram. The first order statistics only provide information on how the intensity or color values are distributed in the image. To obtain further spatial information on texture, various second-order statistical descriptors have been proposed. The most common second-order statistical descriptor used in plankton recognition is the co-occurrence matrices (Hu and Davis 2005; Liu et al. 2021a; Shan et al. 2020; Wei et al. 2022), that describe the statistics of pixel color pairs occurring with certain distance from each other in the image. More advanced texture features proposed for plankton recognition include Local Binary Patterns (LBP) (Ojala et al. 2002; Schulze et al. 2013; Chang et al. 2016; Lisin 2006; Yu and Sun 2023), and Gabor descriptors (Idrissa and Acheroy 2002; Sánchez et al. 2019b; Bueno et al. 2017).
The third widely utilized group of image features is local features that typically combine the feature detectors and descriptors. Feature detectors search the image for characteristic interest points or regions that contain useful information for the task, i.e. plankton recognition. Local feature descriptors then quantify these regions. General-purpose feature descriptors that have been applied for plankton images include Histogram of Oriented Gradient (HOG) (Dalal and Triggs 2005; Bi et al. 2015; Guo et al. 2021c), Scale Invariant Feature Transform (SIFT) (Lowe 2004; Tsechpenakis et al. 2007), Speeded Up Robust Features (SURF) (Bay et al. 2006; Chang et al. 2016), Inner-Distance shape context (IDSC) (Ling and Jacobs 2007; Zheng et al. 2017), and Phase congruency descriptors (PCD) (Kovesi 2000; Sánchez et al. 2019b; Verikas et al. 2012).
Feature engineering-based methods for plankton recognition usually combine features from different groups to obtain more representative feature vectors. For example, Zheng et al. (2017) used geometric features (e.g. size and shape measurements, such as area, circularity, elongation, convex rate), color features (e.g. sum, mean, standard deviation of color values), texture features [e.g. Gabor descriptors and Local Binary Pattern (LBP)] and local features (e.g. HOG and SIFT). Sosik and Olson (2007) applied simple geometry features, shape and symmetry features, as well as texture features including co-occurrence matrices for phytoplankton recognition. Wacquet et al. (2018) extracted 26 features including basic shape features, advanced morphological features, and color features.
Typical plankton recognition systems further apply additional feature selection (see e.g. Zheng et al. 2017) or dimensional reduction steps to construct compact feature representations. In feature selection, the large set of initial features are ranked based on how representative or informative they are, and the least informative features are discarded. For example, Tang et al. (2006) proposed normalized multilevel dominant eigenvector estimation (NMDEE) technique to select a best feature set for plankton recognition. In dimensional reduction, principle component analysis (PCA) or similar technique is applied to reduce the length of the extracted feature vector while preserving maximum amount of information. For example, Li et al. (2014) and Chang et al. (2016) utilized PCA as a part of the plankton recognition system.
Although feature-engineering-based techniques have been applied with promising results, they require discrete parts, i.e., feature extraction, selection, and training a classifier. Due to the difficulty of finding general features that provide high classification accuracy over different datasets, feature engineering based plankton recognition methods are often ad-hoc solutions tuned for a single imaging instrument and provide limited accuracy. Moreover, based on previous works (Al-Barazanchi et al. 2015b; Khalid et al. 2014), it typically requires extensive manual work to integrate a new class to the existing system. Each new class requires intensive work to find new features that could represent the new class. Depending on the quality of feature design, providing a suitable framework for the accurate, rapid and simplified classification of plankton species is not always possible.
3.2 Convolutional neural networks
Recently, CNNs have replaced traditional feature engineering techniques in various computer vision applications. The notable difference is that the image features are learnt from the data instead of manually designing them. CNN (LeCun et al. 2015) is a type of neural network model for image processing inspired by the animal visual cortex. The key component of CNNs are the convolutional layers that consist of neurons each processing data only for their receptive field. Due to the shared-weight architecture, these neurons fundamentally perform the convolution operation to the input with a filter defined by the weights of the neurons. This makes it possible to learn the feature extraction filters (weights) through backpropagation. A typical CNN involves repetitions of several convolution layers and a pooling layer, followed by a set of fully connected layers. The convolution and pooling layers perform feature extraction and the fully connected layers perform the higher-level reasoning and map the extracted features into final output. Increasing the amount of convolutional layers (the depth of the network) allows to represent more complex relations between features often leading to a better recognition accuracy while increasing the amount of parameters. An example of CNN structure is shown in Fig. 2. In the recent years CNN-based approaches have become dominant in various image analysis tasks providing state-of-the-art performance, for example, in image classification, object localization, and image segmentation tasks (Teuwen and Moriakov 2020). Fig. 3 illustrates how the popularity of the CNNs and feature engineering based approaches on plankton recognition have changed over the years. It can be seen that the introduction of CNNs clearly boosted the research in the field.
The first works considering CNN-based classification of plankton images were carried out in 2015. Zheng and Wang (2015) carried out preliminary experiments on applying CNN for automated plankton recognition. A small CNN-model (3-5 layers) was tested on zooplankton data. Similarly, Kuang (2015) used CNN together with data augmentation to solve the recognition task. Al-Barazanchi et al. (2015a) proposed a hybrid solution where CNN was used for plankton image feature extraction and RDF and SVM for classification.
One reason why CNNs have become more popular is that they have been shown to outperform the traditional approach utilizing feature engineering multiple times and the architectural components have been studied with care (Gu et al. 2018). For example, Zheng and Wang (2015) compared a CNN-based plankton image classifier to traditional classifiers such as a multi-layer perceptron (MLP) model utilizing hand-engineered features. The results showed that CNN outperformed the earlier methods. In various experiments (Orenstein et al. 2015; Orenstein and Beijbom 2017; Guo et al. 2021c), CNNs have demonstrated higher plankton recognition accuracy than RDF combined with hand-selected features. The preliminary experiments done by Mitra et al. (2019) on planktonic foraminifera species suggest that CNN can even surpass the human in plankton recognition accuracy for certain cases, in which the taxonomy is nuanced. However, in some special cases, if the computation time is heavily restricted (e.g. embedded systems), feature-engineering based approaches might still be preferable (see e.g. Zimmerman et al. 2020).
3.2.1 CNN architectures
Numerous CNN architectures have been suggested for plankton recognition. These include various common CNN developed for generic image recognition such as AlexNet (Krizhevsky et al. 2012), VGGNet (Simonyan and Zisserman 2014), GoogLeNet (Szegedy et al. 2015), and ResNet (He et al. 2016). AlexNet was the first deep CNN applied to general image recognition and contains 8 layers. VGGNet uses smaller convolution filters (3 \(\times \) 3) compared to AlexNet to obtain deeper networks (up to 19 layers) and more nonlinearity while reducing the number of parameters. GoogLeNet and its modifications (e.g. InceptionV1 and InceptionV3) utilize inception modules that apply convolutional filters of different sizes simultaneously to capture information at various scales. ResNet introduced a residual block that uses the shortcut connection. This allows to avoid vanishing gradient problem while training very deep networks (up to 152 layers).
Lumini and Nanni (2019a) compared AlexNet, DenseNet (Huang et al. 2017), ResNet, VGGNet, GoogleNet, and SqueezeNet (Iandola et al. 2016). DenseNet produced the best classification results with ZooScan, Kaggle-Plankton and WHOI datasets. Liu et al. (2018a) evaluated AlexNet, VGG16 (Simonyan and Zisserman 2014), GoogLeNet, PyramidNet (Han et al. 2017) and ResNet. The results suggest that PyramidNet provided improvement on accuracy on a WHOI-Plankton dataset. Sánchez et al. (2019b) performed a comparison of ResNet, AlexNet, VGGNet, SqueezeNet, DenseNet, and InceptionV3 (Szegedy et al. 2016) on a dataset consisting of 1085 diatom images of 14 different classes and DenseNet, ResNet and VGG provided the highest accuracy. Kloster et al. (2020) tested extensively various CNN architectures. Notably, relatively shallow VGG-16 model outperformed more modern architectures. Table 5 in Appendix A gives a summary of different architectures that have been utilized in plankton recognition.
There are also CNN architectures developed specifically for plankton recognition. Al-Barazanchi et al. (2018) proposed a shallow VGGNet-based architecture for the task. Dai et al. (2016a) proposed a CNN architecture called ZooplanktoNet that was characterized by the ability to capture more general and representative features than previous predefined feature extraction algorithms. It was strongly inspired by AlexNet and VGGNet. A comparative experiment with different CNN architectures including AlexNet, VGGNet and GoogleNet was carried out and ZooplanktoNet was found to outperform other architectures on zooplankton classification. Yan et al. (2017) proposed another light CNN architecture for plankton recognition by utilizing smaller filter size and less fully-connected layers. Li et al. (2019c) proposed tiny attention network (TANet) consisting of three main parts: a reduction module, self-attention operation, and group convolution. The reduction module was utilized to reduce the information loss caused by pooling operation, self-attention was used to improve the feature learning ability and the group convolution was applied to compress the model size. One of the benefits of the TANet model is its small size which allows real-time classification on mobile devices. Luo et al. (2021a) presented a custom architecture MCellNet derived from MobileNetV2 (Sandler et al. 2018). The model was shown to outperform MobileNetV2 on plankton data on both accuracy and computation time. Xu et al. (2022) developed a CNN for classifying algae based on ResNet and SeNet architectures. Benammar et al. (2021) applied to a custom architecture utilizing 3D convolutions to image data collected using environmental high content fluorescence microscopy.
Custom architectures have also been developed for holographic microscopy images as existing image recognition models cannot be directly applied to raw digital holographic microscopy data. A straightforward approach is to first reconstruct images and then utilize any common image recognition architecture (see e.g. Qiao et al. 2021; MacNeil et al. 2021). This, however, leads to long processing times as the reconstruction stage is computationally heavy. It has been shown that by using a custom architecture CNNs can be successfully applied to the raw digital holographic data and the reconstruction step can be avoided (Guo et al. 2021a; Zhang et al. 2021). Also, simulated holograms have been proposed for training and testing simultaneous detection and classification of plankton (Scherrer et al. 2021).
Various works (Orenstein and Beijbom 2017; Rivas-Villar et al. 2022) have suggested to use CNNs only for the feature extraction and utilize other classifiers, such as SVM or RDF for the final classification step. Jindal and Mundra (2015) suggested to use output of the first fully-connected layer of two CNNs (ClassyFireNet and GoogLeNet) as image features and fed it to RDF for plankton recognition. Similar approach was evaluated by Orenstein and Beijbom (2017) who utilized AlexNet to extract features for an RDF-based classifier. Sánchez et al. (2019b) compared both approaches: fine-tuned CNN for classification, and CNN for feature extraction. Based on the experiments with various CNN architectures fine-tuned CNN outperformed the approach where CNN was used as feature extractor.
Other commonly used approach is to combine multiple CNNs into ensemble to improve the accuracy. This so called ensemble learning is based on the assumption that limited performance of an individual recognition model can be compensated by utilizing additional models more capable of classifying different sets of classes. Kuang (2015) proposed various approaches for model ensemble. These include averaging softmax probabilities and applying principal component analysis for concatenated CNN features before softmax classifier. Lumini and Nanni (2019a) and Lumini et al. (2020) proposed an ensemble of classifiers by score fusion. Various classifier combinations containing different CNN models were evaluated for both plankton and coral classification. Henrichs et al. (2021) proposed an ensemble of 6 CNNs and showed it to outperform an RDF-based classifier. Kyathanahally et al. (2021b) compared various CNNs architectures in ensemble with multilayer perceptron (MLP) on zooplankton recognition using a mix of feature descriptors and CNNs features. Yang et al. (2023) applied an ensemble of CNNs to harmful algae recognition. To avoid false negatives, images were selected for further expert verification if any of the five CNNs models classified the image as harmful algae. While ensemble learning has shown slightly improved recognition accuracy, it also increases the computation time and complicates the training process.
3.2.2 Hybrid methods
Multiple methods that aim to combine the feature engineering approach with CNNs have been proposed. One approach is to utilize a separate classifier (e.g. RDF) as above. This way CNN features can be simply supplemented with selected hand-engineered features before classification (see e.g. Orenstein and Beijbom 2017; Keçeli et al. 2017). Similarly, ensembles of classifiers can be utilized to combine handcrafted feature based classification and CNNs. For example, in the method proposed by Lumini and Nanni (2019a); Lumini et al. (2020) individual classifiers utilized in the ensembles included various CNNs applied to both original images and preprocessed (filtered) images. The preprocessing techniques included various filters commonly used to compute local features, such as gradient, LBP and wavelets. Rivas-Villar et al. (2021) combined color and texture features with deep CNN features. Both RDF and SVM were tested for classification. Dai et al. (2016b) proposed a multi-stream CNN for plankton classification, where multiple inputs are processed in parallel as different streams before merging or concatenating the features for the classification. In addition to the original image, global feature image representing the shape and local feature image representing the edge information were used as input. Similar approach was proposed in paper by Cui et al. (2018), where the original image, shape image, and texture image were processed in streams before feature concatenation. Concatenated feature maps were processed with one more convolutional and pooling layer, a set of fully connected layers and softmax layer. A related approach was proposed by Ellen et al. (2019) who utilized non-image information (metadata) in the CNN-based plankton classification. Various architectures to fuse Metadata with CNN-based image features were proposed consisting of a set of convolutional and pooling layers for the image and fully-connected layers for the metadata before feature concatenation and common fully-connected layers for the classification. Similar hybrid models were utilized also in (Benammar et al. 2021).
Also various other modifications to baseline CNN classifiers exist. Kosov et al. (2018) proposed Conditional Random Field model to utilize spatial relations among pixel-based CNN classification results and global features for microorganism detection and recognition. Liu et al. (2018b) proposed to include squeeze-and-excitation block (Hu et al. 2018) to deep pyramidal residual network to increase the plankton recognition accuracy. Luo et al. (2018) took into account the fact that typical plankton images contain a large amount of background pixels without useful information and applied spatially sparse convolutional neural networks originally developed for handwriting recognition (Graham 2014). Cheng et al. (2020) proposed to combine two CNNs, one applied to normal Cartesian coordinate image and one to the same image transformed into Polar representation. This way rotational invariance was obtained in addition to the translation invariance of the baseline CNN.
3.2.3 Transformers
In addition to CNNs, also other feature learning approaches have been proposed for plankton recognition. One of the most promising approach is ViTs (Dosovitskiy et al. 2021), that works by dividing the image into patches resulting in a sequence of vectors (tokens) that are fed to the model. The architecture allows the model to measure relationships between pairs of image patches making it possible to learn to identify the most informative regions in an image via self-attention. Kyathanahally et al. (2022) applied ensembles of Data-efficient image Transformers (DeiTs) for various ecological image datasets including four publicly available plankton datasets and provided state-of-the-art performance. Maracani et al. (2023) evaluated three different modifications of transformers: ViTs, Hierarchical Vision Transformer (Swin) (Liu et al. 2021b), and Image Transformer pre-trained on a large language model (BEiT) (Bao et al. 2021).
3.2.4 Plankton detection
Depending on the imaging instrument, there is sometimes a need to first detect the plankton particles in the images (Moniruzzaman et al. 2017; Cai et al. 2022; Chen et al. 2023). Plankton detection can be applied to two main types of images: single-specimen/specimen (including colonial forms) focused images and multi-specimen images. Specimen focused images are automatically centered and cropped to show only one specimen (see Fig. 1). While the plankton recognition on such data can be treated as an image classification task, in some cases a detection step might still be needed due to other plankton particles or detritus on the background. Multi-specimen images are those that capture multiple different plankton particles in one frame, such as those obtained using general-purpose microscopes. These contain multiple different plankton particles that need to be detected and recognized separately. While detection itself is out of the scope of this survey, we will briefly review the existing methods that both detect and recognize plankton focusing mainly on multi-specimen images.
Modern CNN-based object detection methods such as R-CNN (Girshick et al. 2014), YOLO (Redmon et al. 2016), and their variants perform the detection and recognition simultaneously, providing end-to-end methods for plankton recognition. For example, Pedraza et al. (2018) applied R-CNN to detect and classify diatoms in microscopy images, and Soh et al. (2018) used YOLO to detect and recognize plankton. Wang et al. (2022b) compared multiple CNN-based object detection methods including Faster R-CNN (Ren et al. 2017), SSD (Liu et al. 2016), YOLOv3 (Redmon and Farhadi 2018) and YOLOX (Ge et al. 2021) on imaging flow cytometer data. YOLOX achieved the best accuracy. Chen et al. (2023) explored a family of YOLOv5 (Jocher 2020) architectures in the automated video-oriented plankton detection and tracking workflow.
While typically detection methods are applied on multi-specimen images, they have been proposed for recognition of single-specimen focused images. Li et al. (2021c, 2021d) proposed an improved YOLOv3-based model for plankton detection on IFCB images. The proposed model contains two YOLOv3 networks fused with DenseNet architecture. Kosov et al. (2018) applied CNN-based images, features and conditional random fields for plankton localization and segmentation.
Similar to modern detection methods, also semantic and instance segmentation methods can be applied to simultaneously detect and recognize plankton. Ruiz-Santaquiteria et al. (2020) compared a semantic segmentation model called SegNet (Badrinarayanan et al. 2017) and instance segmentation model called Mask R-CNN (He et al. 2017) on algae detection and recognition.
3.2.5 Comparisons
Many papers utilize in-house datasets and most publicly available datasets do not provide standardized evaluation protocol meaning that different papers utilize different train-test splits and performance metrics. This makes comparison of the performance of different solutions challenging before the principles of making the science findable, accessible, interoperable, reusable (FAIR) are fully adopted (Schoening et al. 2022). Table 3 summarizes some published results obtained on publicly available datasets. However, the provided accuracies are not directly comparable due to the reasons mentioned above. One notable comparison of plankton recognition methods is The National Data Science Bowl (Aurelia et al. 2014) from 2015. The winning team used an ensemble of over 40 convolutional neural networks.
4 Challenges in plankton recognition
Based on the literature on automatic plankton recognition various challenges can be identified. The most notable challenges are as follows:
-
1.
The amount of labeled data for training is limited. This challenge can be divided into two subchallenges: (1) expert knowledge is required for data labeling, and (2) certain plankton species are notably less common producing a small amount of example images. Plankton species are inherently difficult to identify, requiring prior expertise. Labeling image data for training and evaluation purposes must be done by experts (e.g. plankton taxonomists) ruling out crowdsourcing tools such as Amazon Mechanical Turk commonly used for labeling large datasets. This makes labeling expensive limiting the amount of labeled data. It also takes years to accumulate enough data to cover rare species. Collecting a labeled training set is essential for deep learning models. Considering that morphological plasticity can be found for all planktonic organisms, larger amount of labeled training data increases the model’s capacity to generalize to new data while training a large model with a small number of examples increases the risk of overfitting, i.e. learning the noise in training data causing the model to perform poorly on unseen images.
-
2.
There is a large imbalance between classes. Image classification with datasets that suffer from a greatly imbalanced class distribution is a challenging task in the computer vision field. Data of plankton species naturally exhibit an imbalance in their class distribution, with some plankton species occurring naturally more commonly than others. This results in highly biased datasets and makes it difficult to learn to recognize rare species, having a serious impact on the performance of classifiers. Furthermore, with highly unbalanced datasets the overall classification accuracy (e.g. percentage of images that were correctly classified) provides little information about the classes with a small number of samples which may bias the evaluation of the goodness of the classification methods.
-
3.
Visual differences between certain classes are small. Certain plankton species, especially those that are taxonomically close to each other and/or have reduced size, resemble each other visually, which renders the recognition task a fine-grained classification problem. Limitations in the amount of labeled training data make it challenging to ensure that the recognition model learns the subtle differences between the classes reducing the recognition accuracy.
-
4.
Imaging instruments vary between datasets. If two datasets have been obtained with different imaging instruments producing visually different images (domain shift) the classification model trained on one dataset does not provide sufficient classification accuracy on the other dataset when applied directly. This makes it challenging to develop general-purpose classifiers that could be applied to new datasets limiting the applicability of the existing publicly available large image datasets. There is a need for approaches that allow the adaptation of the trained models to new imaging instruments.
-
5.
Labeled training sets do not contain all the classes that can be captured. When deploying a recognition model in operational use, it should be able to handle images from the classes that were not present in the training phase. Different datasets often have different sets of plankton species due to, for example, the geographical distance between the imaging locations or the particle size range of the imaging instruments. Moreover, imaging instruments capture images of unknown particles. Typical CNN-based classification models trained on one dataset tend to classify the images from a previously unseen class to one of the known classes often with high confidence, which not only makes the models incapable to generalize to new datasets and analyze noisy data but makes it difficult to recognize when the model fails. This calls for methods that can identify when the image is from a previously unseen class (species).
-
6.
There are uncertainties in expert labels. Due to limited imaging resolutions and low image quality, recognizing plankton species is often difficult even for an expert. Manually labeling large amounts of images is tedious work increasing the risk of human errors. Moreover, due to the high costs of labeling work, it is typically not possible to obtain opinions from multiple experts for each image. These reasons cause inaccuracies (uncertainty) in labels to the training data decreasing the classification performance of the trained models. Furthermore, this uncertainty is often highly imbalanced since some of the classes are easier to identify than others.
-
7.
Variation in image size and aspect ratio is very large. Most CNN architectures require that the input images have fixed dimensions and a typical approach in image classification is to first scale the images into a common size. This is not ideal in plankton recognition due to a very large variation in both the size and aspect ratio of plankton. Scaling images into a common size may cause either small details to be lost in the large images (downscaling) or very large and computationally heavy models (upscaling). Furthermore, the size is an important cue for recognizing the plankton species and this information is lost in scaling.
-
8.
Image quality can be low or have extensive variation. Plankton imaging requires high magnification and the (natural) water might contain other particles, cause unwanted optical distortions, as well as limit the visibility. More importantly, due to the limited depth-of-field, automated imaging instruments often fail to capture particles in focus and the focus may drift away from optimal setting. These reduce the quality of images. The low image quality makes both manual labeling (Challenge 6) and automatic classification considerably more challenging. Therefore, there is a need for plankton recognition solutions that are robust to image distortions such as blur and noise.
-
9.
The amount of image data is massive. Modern plankton imaging instruments produce massive amounts of image data, e.g. FlowCam Macro and ISIIS have the ability to take 10,000 images per minute and 64,000 images per hour respectively. Computationally efficient solutions are needed to perform the analysis in real-time (MacLeod et al. 2010; Orenstein et al. 2015).
All the nine challenges are visualized in Fig. 4.
5 Existing solutions
5.1 Challenge 1: Limited amount of labeled training data
The two main reasons limiting the amount of labeled training data, the requirement of expert knowledge for the very laborious labeling task and rarity of certain plankton species, require different solutions.
Active learning has been utilized to minimize the effort of expensive human experts in labeling plankton image data (Luo et al. 2005; Li et al. 2021a). The basic idea behind active learning is to select only the most informative samples for labeling. A classifier is first trained on a small initial training set and the method iteratively seeks to find the most informative samples from an unlabeled dataset. These samples are then labeled by a human expert and the model is re-trained. A simple active learning technique for plankton images called "breaking ties" was proposed by Luo et al. (2005). The method utilizes probability approximation for SVM-based classifier and ranks the unlabeled images based on the differences between the largest and the second largest class probabilities (the smaller the difference the less confident the classifier is). Images with the smallest confidence were labeled by an expert. Drews et al. (2013) studied semi-automatic classification and active learning approaches for microalgae identification. A Gaussian mixture model (GMM) model is estimated from the image feature data and three different sampling strategies are used for the active learning. The experimental results show the benefit of using active learning to improve the performance with few labeled samples. Bochinski et al. (2018) proposed Cost-Effective Active Learning (CEAL) (Wang et al. 2016) for plankton recognition. In contrast to traditional active learning where only the manually annotated samples are used in the model training, CEAL utilizes also the unlabeled high-confidence samples for training with class predictions as pseudo labels. Haug et al. (2021b); Haug (2021); Haug et al. (2021a) proposed Combined Informative and Representative Active Learning technique (CIRAL) to minimize the human involvement in the plankton image labeling process. The main idea behind the method is to find the images with minimal perturbations that are often miss-classified and ignore the images that are far from the decision boundary. The DeepFool algorithm is used to compute small perturbations to the images. The finding of the representative images is formulated as a min-max facility location problem and solved using a greedy algorithm.
While active learning helps to reduce manual work, it is often still a time-consuming process. Typically, there is a need to obtain more training data in a completely automated manner. A traditional approach to increase the amount of training data is to utilize data augmentation. By augmenting the existing labeled image data with various image manipulations, the diversity of the training data, and therefore, the generalizability and accuracy of the trained model can be improved. The most commonly used data augmentation techniques for plankton image recognition include various geometric transformations (Orenstein and Beijbom 2017; Vallez et al. 2022) including rotation (e.g. Cheng et al. 2019; Correa et al. 2017), shearing (e.g. Dai et al. 2016a; Geraldes et al. 2019), flipping (e.g. Ellen et al. 2019; Geraldes et al. 2019), and rescaling (e.g. Li and Cui 2016; Luo et al. 2018). Also, additional noise (e.g. Correa et al. 2017; Geraldes et al. 2019), blurring (Geraldes et al. 2019), contrast normalisation (Geraldes et al. 2019), as well as adjusting brightness, saturation, contrast, and hue (Dunker et al. 2018) have been utilized. Some works augment images using translation (e.g. Dai et al. 2016a; Li and Cui 2016). However, it should be noted that, unless the translation is used to cut an image, CNNs are invariant to translation by design, and therefore, this is typically unnecessary when CNNs are used for recognition. Augmentation has been shown to increase the plankton recognition accuracy even with relatively large training sets (see e.g. Song et al. 2020). Examples of augmented images are shown in Fig. 5.
Another commonly used approach to address a small amount of labeled training data is transfer learning. Transfer learning is a machine learning method that utilizes knowledge gained from the source domain, where labeled training data are abundant, to the target domain, where labeled training data are scarce (Pan and Yang 2009; Shao et al. 2014; Weiss et al. 2016) (see Fig. 6). In the context of plankton recognition, this typically means that the model is first trained using either general image datasets ([e.g. ImageNet (Deng et al. 2009)] or a large publicly available plankton dataset and then fine-tuned for the target plankton dataset with typically a limited number of labeled images. Using general image databases as source data is justified by the fact that the learned low level image features are often useful despite the classification problem. In the simplest case transfer learning can be done by simply replacing and training the classification layer and keeping the feature extraction layers unchanged (see e.g. Mitra et al. 2019). However, it is often beneficial to use the pre-trained network only for initialization and retrain (or fine-tune) the whole network with the target dataset (Lumini et al. 2020). Existing studies on WHOI-Plankton dataset suggest that using pre-trained models and fine-tuning them for plankton data (see e.g. Lumini and Nanni 2019a) can achieve significantly higher accuracy than training the models from scratch on plankton data (see e.g. Liu et al. 2018a).
One way to apply transfer learning for plankton images is to use trained CNNs only for feature extraction and utilize general classification methods such as SVM or RDF for the recognition (see e.g. Rodrigues et al. 2018; Rawat et al. 2019). However, the results by Orenstein and Beijbom (2017) suggest that better accuracy is obtained by utilizing end-to-end CNN with classification layers. Lumini and Nanni (2019a); Lumini et al. (2020) evaluated various strategies for transfer learning on plankton images. The first strategy was to initialize the model with ImageNet weights and fine-tune the whole model with plankton data. In the second strategy (two rounds tuning), a second pre-training step utilizing out-of-domain plankton image data was added before the fine-tuning. In the third strategy, ensembles of multiple different models were used. Based on the experiments the two rounds tuning did not provide a notable improvement in accuracy. Similarly, Guo et al. (2021b) explored and compared multiple transfer learning schemes on several biology image datasets from various domains. Various underwater and ecological image datasets are utilized for multistage transfer learning, where ImageNet pretraining is first improved by fine-tuning on an intermediate dataset before, finally, training on the target dataset consisting of plankton images. The experimental results show the potential of cross-domain transfer learning even on the out-of-domain data when the number of samples in the target domain is insufficient.
Large models with more parameters typically require a large amount of data to be trained without overfitting the model. To avoid this and allow the training with a smaller amount of data, shallower CNN architectures have been proposed for plankton recognition. For example, the 18-layer version of ResNet architecture has been shown to achieve a high plankton recognition accuracy on IFCB data (Kraft et al. 2022b). Most custom CNN architectures developed especially for plankton recognition including ClassyFireNet (Jindal and Mundra 2015), TANet (Li et al. 2019c), and ZooplanktoNet (Dai et al. 2016a) are relatively shallow with 8, 8 and 11 layers, respectively. It has been shown that a good classification accuracy could be obtained with a shallow architecture and by using suitable data augmentation methods even with as few as 10 images per class (Kraft et al. 2022b).
In addition to data manipulation and custom recognition models, also model training approaches have been considered to address the limited data amounts. Learning techniques developed for training the classifier with a minimal amount of samples are called few-shot learning methods. Typically, the idea is to utilize some prior knowledge to allow the generalization to new tasks (in this case classification of new plankton species) containing only a few labeled training examples. Common ways to address few-shot learning is to utilize generation (Hariharan and Girshick 2017), embedding or metric learning. The basic idea is to learn such embeddings that the images from the same class are close to each other in the metric space and images from the different classes are far. This allows performing the plankton recognition using distances to the images with known plankton species. Embedding and metric learning have been successfully applied to plankton recognition (Teigen et al. 2020; Badreldeen Bdawy Mohamed et al. 2022).
Schröder et al. (2018) employed a low-shot learning technique called weight imprinting (Qi et al. 2018) for plankton recognition with a limited amount of labeled training data. The main idea of weight imprinting is to divide the set of all classes into base classes with enough training data and smaller low-shot classes. During the representation learning phase, a CNN is trained to distinguish the base classes with a large amount of labeled training data. In the second phase (low-shot learning), the classifier is then updated with calculated weights to distinguish the smaller low-shot classes. This is done by using appropriately scaled class features of the low-shot classes as their weights, directly allowing the inclusion of classes with only one training image. Guo and Guan (2021) addressed the few-shot learning by supplementing the softmax loss with center loss term (Wen et al. 2016) that forces the samples from the same class close to each other in the deep feature space. The loss function is a weighted sum of the two loss terms and a regularization parameter is used to control the weights.
In the extreme case, the labeled training data are completely absent and unsupervised learning methods are required. Image clustering is the most commonly used unsupervised technique for plankton image analysis. Ibrahim (2020) carried out preliminary experiments on common clustering algorithms such as k-means with phytoplankton data. Image features for clustering were extracted using pretrained CNN models. Coltelli et al. (2014) used various handcrafted image features and self-organizing maps (SOM) for plankton image clustering. Schmarje et al. (2021) proposed a framework for handling semi-supervised classifications of fuzzy labels due to experts having different opinions. The approach is based on overclustering to identify substructures in the fuzzy labels and a loss function to improve the overclustering. The performance surpassed the one of a state-of-the-art semi-supervised method on plankton data. Salvesen (2021); Salvesen et al. (2022) studied deep learning for plankton classification without ground truth labels. The improved feature learning was implemented using DeepCluster, a Generative Adversarial Network (GAN) and a rotation-invariant autoencoder. Despite the potential in unsupervised methods, the gap to supervised learning is still significant.
Hierarchical clustering methods are preferred on plankton data as they have the potential to mimic the taxonomic hierarchy of plankton. Dimitrovski et al. (2012), classification of diatom images is considered as a hierarchical multi-label classification problem and solved by constructing predictive clustering trees that can simultaneously predict all different levels in the taxonomic hierarchy. These trees are then used as an ensemble forming a random forest (RF) to improve the predictive performance. Morphocluster (Schröder et al. 2020) utilizes a semi-automated iterative approach and hierarchical density-based HDBSCAN* (Campello et al. 2015) for plankton image data analysis. To compute image features for the clustering a CNN trained with UVP5/EcoTaxa dataset in a supervised manner was used. The method works iteratively in a semi-automated manner so that clusters are validated by an expert. An improved version of Morphocluster was presented by Schröder and Kiko (2022). Multiple CNN-based feature extractors were trained using different labeled datasets to allow the selection of the most suitable feature extractors for the target data. In addition, an unsupervised approach to learn the plankton image features based on the momentum contrast method (He et al. 2020) was proposed. The idea is to use data augmentation to generate two different instances of the same image and use a loss function that forces the model to learn similar feature representations for both instances. Moreover, two custom clustering methods were proposed: (1) shrunken k-Means, and (2) Partially Labeled k-Means. Due to the iterative clustering process of Morphocluster, only part of the images needs to be clustered in each iteration. Shrunken k-Means utilizes distances to cluster centers provided k-means to discard images that are far from the centers. Partially Labeled k-Means utilizes the label information from the earlier iterations to guide the clustering.
Autoencoders have also been proposed for learning plankton image features for clustering without the label information. The basic idea is to utilize encoder-decoder network architecture where the encoder generates an embedding vector from an image and the decoder tries to reconstruct the original image based on the embedding vector. Such a network can be trained without any labels. Ideally, the encoder learns to compress the essential information from the image into an embedding vector that can then be used for clustering. For example, Salvesen et al. (2020) applied an autoencoder-based approach called Deep Convolutional Embedded Clustering (DCEC) plankton image data. The method employs the CNN-based autoencoder architecture by Guo et al. (2017) and uses k-means to cluster the obtained embeddings. Alfano et al. (2022) proposed a plankton image clustering technique based on variational autoencoders (VAEs). The method utilizes a pre-trained DenseNet without fine-tuning to extract features. Obtained deep image features are then fed to VAE to generate latent space representations. Finally, low-dimensional embeddings are clustered using fuzzy k-means.
Clustering methods are only able to produce unlabeled clusters of images with a similar appearance. Therefore, further analysis is needed to confirm and label the clusters. Schröder et al. (2020) addressed this by introducing an interactive tool where the users revise the obtained clusters, manually correct the hierarchy and annotate the final set of clusters. This semiautomatic approach reduces the manual work needed for data labeling as the expert does not need to annotate every image separately. Goulart et al. (2021) utilized t-distributed stochastic neighbor embedding (t-SNE) to visualize the clusters in two-dimensional space allowing the human expert to quickly see the clusters in the data. Pastore et al. (2020) proposed a full pipeline for environmental monitoring based on plankton image clustering and minimal expert supervision (the expert labels only one image per cluster). CNN was used for image feature extraction and various unsupervised clustering algorithms including K-means, fuzzy K-means, and Gaussian mixture model were compared.
Unsupervised learning has also been applied for pre-training on unlabeled plankton image data. This enables a semi-supervised approach for plankton recognition where an initial set of image features is learned in an unsupervised manner using large volumes of unlabeled data, and the final model is obtained by fine-tuning it on a small amount of expert labeled data. Schanz et al. (2023) proposed to use the SimCLR method (Chen et al. 2020) for unsupervised pre-training. Pastore et al. (2023) applied a customized variational autoencoder for unsupervised feature learning and compression.
As a summary, the most common approaches to tackle the problem of limited amount of labeled plankton image data are data augmentation and transfer learning. Data augmentation is an essential part of practically all modern plankton recognition pipelines based on deep learning, while transfer learning allows to utilize knowledge from another domain to compensate the lack of labeled training data. In the case of extreme scarcity of labeled training data, further modifications to the model training are needed. Typically this means the adoption of regularization techniques that prevent the model to overfit to the training data. Weight imprinting, metric learning, and central loss have been found useful tools in few-shot plankton recognition. If labeled training data is completely missing, clustering or active learning can be utilized. Clustering allows to analyze plankton image datasets in an unsupervised manner, while active learning makes it possible to minimize the amount of expert labeling effort for building a plankton recognition model for future data.
5.2 Challenge 2: High class imbalance
High class imbalance is naturally inherent in many real-world applications and plankton recognition is not an exception. Certain plankton species are considerably more common than others causing the data in typical plankton datasets to be highly imbalanced. This is problematic when it comes to training plankton classification methods. One of the most notable problems connected to the high class imbalance is the catastrophic forgetting where neural network, while learning new information, completely forgets previously learned information. This typically affects the minority classes that are only rarely seen during the training stage causing the network to only learn the necessary image features for the majority classes.
Undersampling is a technique to decrease the level of imbalance by discarding images from the majority classes. In the simplest case, undersampling can be done by randomly selecting a subset of images from majority classes is such way that the resulting training dataset has an equal amount of images in all classes. For example, Lee et al. (2016) reduced the class bias on small-sized plankton classes by randomly sampling images from the classes with more samples than the predefined threshold. Kloster et al. (2020) utilized a similar undersampling technique. Also, more intelligent solutions for undersampling have been suggested in plankton literature. Le et al. (2022) utilized undersampling by filtering combined with cost-sensitive learning to obtain a more balanced dataset for training. Ding et al. (2018) proposed an EasyEnsemble.D algorithm for plankton recognition on highly imbalanced datasets. The basic idea is to sample multiple subsets from majority classes to fully utilize the large data volumes. Each subset is used to train a separate weak classifier with different weights, and the final classification is performed using the ensemble of the weak classifiers. The problem with undersampling is that it reduces the amount of training data which in the case of plankton recognition is typically already limited. Especially, in the presence of rare species, the undersampling alone leads to an extremely small training set.
Oversampling is another technique to reduce the level of imbalance with duplicating samples from the minority classes. The oversampling is typically done using data augmentation, i.e. instead of using identical duplicates, manipulated versions are created to obtain more training data for minority classes. For example, Bochinski et al. (2018), increased the amount of training samples of the smaller classes by mirroring the images horizontally and vertically to counter the imbalance during training.
Xiaoyan (2020) proposed a combination of undersampling and oversampling to address the class imbalance in plankton recognition. This is done by utilizing KA-Ensemble algorithm (Ding et al. 2020) that combines oversampling of the minority class via kernel-based adaptive synthetic sampling (Kernel-ADASYN) and random undersampling of the majority class. The experiments showed increased classification accuracy for the minority class. Liu et al. (2021a) proposed to combine borderline-SMOTE oversampling with Fuzzy C-means clustering-based undersampling for plankton image data. The Synthetic Minority Oversampling TEchnique (SMOTE) (Chawla et al. 2002) synthesizes new samples between the minority class and its nearest neighbor in the feature space. Borderline-SMOTE (Han et al. 2005) improves the method by concentrating on the samples near the class boundaries in order to oversample more significant samples for the minority classes. Fuzzy C-means clustering is utilized to preserve the clusters found in the original data during undersampling.
Another approach among a variety of resampling methods is cost-sensitive learning (Elkan 2001). The method defines a so-called cost-matrix which specifies a reward or a penalty over the classifications of an algorithm. A core idea behind it is similar to resampling but it does not change the prevalence of the training set directly. However, a performance evaluation for an imbalanced plankton set reported by Corrêa et al. (2016) demonstrates only minor improvements for cost-matrix in comparison to SMOTE and resampling.
Another solution to artificially create more image data for training and to reduce the level of imbalance is to utilize generative models capable of generating realistic images with a certain distribution. GANs (Goodfellow et al. 2014) are deep learning models that can be used to generate photo-realistic artificial images with the same statistics as the data they were trained with. This is done by using two models, a generative model and a discriminative model. The generative model generates candidate images usually from random noise. The discriminative model is an image classifier that is given labeled samples from the real set of images and fake images produced by the generative model. The task of the discriminative model is to distinguish real images from fake ones and the task of the generative model is to fool the discriminative model. These two models are trained simultaneously in such a way that the generative model becomes increasingly better at producing realistic fake images and the discriminative model gets increasingly better at recognizing them. GANs have been shown to be able to generate images that are authentic to human observers.
GANs have been utilized also for reducing bias caused by the class imbalance in plankton recognition. Wang et al. (2017) used GAN to generate new example images of minority classes. Furthermore, a method was proposed where the CNN-based plankton recognition model shares the weights with the discriminative model. However, only minor improvement was observed over the baseline recognition models trained on the original data without GAN-based data augmentation. Liu et al. (2018b) proposed a GAN-based curriculum learning strategy. The proposed method contains two stages. First, the model is trained using the original data and then with more complex data consisting of GAN-generated images. Li et al. (2021c) utilized CycleGan (Zhu et al. 2017) for the augmentation of rare taxa, and Khan et al. (2022); Ali et al. (2022) applied DC-GAN (Radford et al. 2015) to augment an algae image dataset. Vallez et al. (2022) compared data augmentation by combining two diatom images from the same class using morphing and image registration methods performing diffeomorphic transformations to generation of synthetic images by a GAN. In this study, mixing images using morphing achieved better results. The fundamental problem of using GANs for image augmentation is that the generated images have the same statistics as the images they were trained with. Therefore, if the GANs are trained using the same data as the recognition model, and the recognition model is able to learn the data distribution from the original data, the generated samples do not necessarily provide additional value for the training. However, some promising results have been obtained on GAN-based augmentation of highly imbalanced datasets (Tanaka and Aranha 2019).
Similarly to the challenge of a limited amount of labeled training data, transfer learning has also been proposed to overcome the class imbalance problem. In a method proposed by (Lee et al. 2016), a balanced dataset is first generated using randomized undersampling, the model is pre-trained on the balanced dataset, and finally fine-tuned using the whole unbalanced plankton image dataset. Wang et al. (2018) introduced a transfer parallel model approach for plankton recognition. The main idea is to avoid the catastrophic forgetting by training two submodels: (1) a model trained on the whole dataset, and (2) a pre-trained model trained only on small classes. Deep features from both of the models are concatenated before the softmax layer. The latter submodel adds good image features for minor class classification that the network could otherwise fail to learn.
Also, modified model architectures have been proposed to address the class imbalance. These include models with increased generalization ability to minority classes. Liu et al. (2018a) applied Deep Pyramidal Residual Network (PyramidNet) (Han et al. 2017) to plankton recognition and shown to improve accuracy on a highly imbalanced dataset. The idea behind PyramidNet is to gradually increase the size of the feature map. This combined with the ResNet style to skip connections causes reduced change of overfitting, and therefore, better generalization ability. Kerr et al. (2020) proposed model fusion to address the class imbalance. The results suggest that combining multiple individually trained CNNs with a common softmax layer improves the accuracy of rare species, consequently providing better overall accuracy on imbalanced data.
As a summary, undersampling and oversampling are the simplest and most widely used approaches to address high class imbalance in plankton image data. Oversampling is typically performed using traditional data augmentation, but also generative approaches such as GANs have been proposed to generate completely new plankton images for the minority classes. Moreover, transfer learning, model fusion, and regularization techniques preventing overfitting have been shown to improve plankton recognition accuracy in the case of highly imbalanced training data.
5.3 Challenge 3: Fine-grained nature of the recognition task
In order to obtain high recognition accuracy on classes with high inter-class similarity such as taxonomically close plankton species, techniques that focus attention on subtle visual differences are needed. The task of recognizing hard-to-distinguish classes from each other is called fine-grained classification. Plankton recognition in most cases can be considered a fine-grained classification task as the fundamental way to improve the overall accuracy of a recognition model is to make it better at recognizing the challenging cases. Despite this, most of the work on plankton recognition does not tackle the challenge directly but instead focuses on comparing different general model architectures on the task. Related to this viewpoint, it has been also studied whether the recognition should be considered as a flat or hierarchical classification task. Boddy et al. (2000) considered misclassifications of phytoplankton as a result from the overlap of feature distributions and grouping of similar species within genera or based on groupings indicated in dendrograms was proposed. Similarly, Fernandes et al. (2009) proposed an approach for balancing the trade-off between the classification performance and number of classes. The model automatically suggests merging of classes based on the statistics evaluated after the classification. The results from taxa recognition of macroinvertebrates by Ärje et al. (2020) showed that humans performed better when a hierarchical classification approach commonly used by human taxonomic experts was used, but when a flat classification approach was used, the CNN was close to human accuracy. To improve the automatic approaches, a few methods focusing especially on the attention mechanism to address the fine-grained nature of the recognition task have been proposed.
Sun et al. (2020) considered fine-grained classification of plankton by proposing an attention mechanism based on Gradient-weighted Class Activation Maps (Grad-CAM) (Selvaraju et al. 2017) to force the CNN to focus on the most informative regions in the image. Grad-CAM was originally developed for visualizing the CNN-based models. It highlights important image regions which correspond to the decision of interest (in this case plankton recognition). Sun et al. (2020) utilized Grad-CAM to detect the regions to focus on, and a feature fusion approach utilizing high-order integration (Cai et al. 2017) is applied to obtain stronger features for those regions. This approach shares similarities with the self-attention module used in the TANet architecture (Li et al. 2019c) for plankton recognition. However, the self-attention module puts larger weights on the important regions, i.e. those regions in the feature map with high activation values. Ito et al. (2023) proposed to use Attention Branch Network model for hierarchical classification of plankton images. This was motivated by the hierarchical structure of the plankton taxonomy. Successful classification at higher levels of the taxonomy simplifies the fine-grained recognition task at the lower levels.
Also other approaches for fine-grained plankton recognition have been proposed. Du et al. (2020) applied Matrix Power Normalized CO-Variance (MPN-COV) pooling layer for second-order feature extraction. The aim is to model the complex class boundaries more accurately than in traditional pooling (e.g. softmax). There is some evidence (Li et al. 2017) that suggests that higher-order information can improve recognition accuracy in fine-grained tasks. Venkataramanan et al. (2021) proposed an improved pipeline tackling inter-class similarity and intra-class variance. The authors suggested alleviating inter-class variance with a metric learning-based approach utilizing triplet loss and mitigating intra-class variance by X-means clustering technique applied to the extracted features. The idea is to cluster the classes with high inter-class variance into multiple clusters and consider these as separate classes. The authors propose a method to find the optimal amount of clusters that minimize both the intra-class variance and inter-class similarity, and this way improve the accuracy of fine-grained plankton recognition. Si et al. (2023) proposed to use a token-selective vision transformer for fine-grained recognition of marine organisms including plankton. The most important tokens are selected layer-by-layer and they focus on distinctive features.
In general, only few papers directly tackling the fine-grained nature of the plankton recognition task exist. These are based on attention mechanisms to find the most important regions in the images allowing the recognition model to focus on the subtle differences between the classes, and contrastive or metric learning that allow explicitly learning the image features that separate the pairs of classes.
5.4 Challenge 4: Domain shift between datasets
Different imaging instruments cause domain shift between plankton data-sets. Domain shift in a wider sense refers to a situation where the distribution of the dataset that is used for training differs from the data where the recognition model is applied. CNN-based models tend to learn image features that are very specific to the distribution of the training data making them notoriously weak at generalizing beyond the domain they were trained on (Gulrajani and Lopez-Paz 2020). This is why most automatic plankton recognition solutions focus on just one imaging instrument. This, however, limits the wider utilization of the methods. Tuning the classification model trained on one dataset to work on another dataset (correcting domain shift between the datasets) is called domain adaptation (Ben-David et al. 2010) and learning a general model that can be applied to any dataset (domain) is called domain generalization (Zhou et al. 2022).
While domain adaptation and generalization have not been widely studied on plankton recognition, there have been works where multiple different plankton image datasets have been utilized to solve the recognition task. Transfer learning and fine-tuning have been utilized as approaches against the differences in datasets. Rodrigues et al. (2018) applied transfer learning using CNNs to obtain a feature extractor that can be used for new datasets. The Kaggle-Plankton dataset was used to train a CNN (source dataset) and an in-house dataset was used as a target dataset to test the suitability of the features. Orenstein and Beijbom (2017) applied a variety of learning schemes to three very different plankton image datasets. The bigger labeled image datasets, IFCB and ISIIS, were used to train CNNs both by fine-tuning and from scratch. Then, the classifiers were used to classify within-domain images directly and as feature extractors for out-of-domain data. Maracani et al. (2023) performed a similar experiment where out-of-domain datasets were used for pretraining and small plankton datasets for fine-tuning. Surprisingly, ImageNet pretraining provided higher accuracy on target datasets than pretraining done on large-scale plankton datasets (e.g. WHOI).
Lumini and Nanni (2019a); Lumini et al. (2020) studied ensembles of different CNN models, fine-tuned on several datasets, with the aim of exploiting their diversity in designing an ensemble of a classifier. The experimental results show that the combination of several CNNs in an ensemble grants a performance improvement compared with a single CNN model.
In Bochinski et al. (2018), two datasets from different biological environments were captured and analyzed. The first dataset was used to analyze the achievable accuracy of the CNN and how the Cost-Effective Active Learning (CEAL) can be used to minimize the number of required annotations. The second dataset was used to examine the generalization ability of the CNN and if the CEAL method can be used to fine-tune the system to adapt to the characteristics of this new data.
Plonus et al. (2021a) suggest using capsule neural networks combined with probability filters to address the dataset shift caused by different plankton imaging instruments. The idea of Capsule neural networks is to form groups of neurons (capsules) that learn the specific properties of the object (e.g. plankton) in the image. The authors argue that the capsule neural networks are less sensitive to the changes in the field conditions and therefore able to adapt to different data distributions. Guo et al. (2022b) proposed a cross-domain few-shot learning model for instrument-agnostic plankton recognition. Similarly to transfer learning, the model is first trained on the source domain with a large amount of labeled training data and then adapted to the target domain using fine-tuning. In addition, graph neural network-based meta-learning is applied to learn a feature distance metric capable of recognizing plankton species in the target dataset with a very limited amount of labeled data.
Domain shifts between the plankton image datasets or imaging instruments have not been widely studied. Most works focus on fine-tuning the recognition models trained on one dataset to new datasets using transfer learning. While the transfer learning reduces the amount of manual labeling needed for new datasets, it does not fully solve the problem of multiple domains. Labeled training data are still needed for all datasets, and the recognition models need to be fine-tuned for each, requiring expertise in machine learning and computing resources. A more general model can be obtained by using ensemble learning with submodels learned on different datasets if labeled training data on each dataset (imaging instrument) is available. More sophisticated approaches to plankton image domain adaptation include the capsule neural networks and meta-learning.
5.5 Challenge 5: Previously unseen classes and unknown particles
Automated plankton imaging instruments capture images of unknown particles and the class (plankton species) composition varies between geographical regions and ecosystems. CNN-based models are known to struggle in open-set settings where the class composition of training data differs from the data for which the trained model is applied. Typical CNN-based classification models tend to classify the images from a new class to one of the known classes often with high confidence, and to include new classes to the models, they need to be retrained. These are major problems for plankton recognition as the plankton species vary between different regions and seasons. Retraining a separate model for each dataset is not feasible. Therefore, there is a need for a recognition model that (1) is able to predict when the image contains a previously unknown plankton species (open-set recognition) and (2) can be generalized to new classes without retraining the whole model.
In the case of plankton recognition, the open-set problem is often formulated as an anomaly detection problem where the model is trained to both correctly classify the known classes and to filter abnormal classes by training the model to produce high and low entropy distributions for the normal classes and abnormal classes respectively. Pastore et al. (2020) proposed a semi-automatic method to handle the previously unseen plankton classes by utilizing anomaly detection combined with expert verification. Both one-class SVM and a new neural network-based method called Delta-Enhanced Class (DEC) detector were considered. The DEC detector utilizes absolute differences between the feature vectors of an input image and random images from a known class as additional input to predict whether the input image is from the known class or anomaly. Varma et al. (2020) proposed \(L_1\)-norm tensor-conformity curation to remove outliers (non-plankton or misclassified images) from the training data. The idea is to measure the conformity of the images using \(L_1\)-norm subspaces (Tountas et al. 2019). Conradt et al. (2022) brought up the high intra-class and low inter-class variation of plankton morphology, and spatio-temporal changes in the plankton community as the main causes for the need to frequently validate the results from automatic recognition. The proposed remedy is a dynamic optimization cycle in which the model is updated based on manual-validation results.
Pu et al. (2021) proposed a loss function that contains three loss terms to detect the anomalies and to maintain the classification accuracy for the images belonging to the normal classes by incorporating the expected cross-entropy loss, the expected Kullback-Leibler (KL) divergence, and the Anchor loss. The model was tested on classes of plankton images containing also bubbles or random suspending particles. Walker and Orenstein (2021) utilized a large background set of images that do not belong to the target classes (classes to be recognized) and hard negative mining to find images that are more likely to cause false negatives. The training set was then complemented with these challenging images to improve the classifier’s ability to recognize when the images are from novel classes. While promising results were obtained on open-set plankton recognition the method requires that a labeled background set is available which limits the usability of the method. Pastore et al. (2022) addressed the unseen classes using anomaly detection method called TailDeTect. The TailDeTect method applies bootstrapping to estimate the mean and standard deviation for each image feature and utilizes this information to analyze if the input sample is out of boundaries for that particular feature. The sample is considered as anomaly if it is out of boundaries for more than predefined number of features. This process is applied separately for each known class similarly to one-class classifiers.
Another approach to tackle the open-set problem is to utilize similarity metric learning. The aim of metric learning is to obtain image embedding vectors that model the similarity between images. It is commonly utilized in person (Ye et al. 2021) and animal re-identification (Nepovinnykh et al. 2020), as well as content-based image retrieval (Dubey 2021), but has been also successfully applied to plankton classification (Teigen et al. 2020; Badreldeen Bdawy Mohamed et al. 2022). A simple approach to implement a recognition method is to construct a gallery set of known species and use the learned similarity metric to compare query images to the gallery images. The similarity in this context corresponds to the likelihood that the images belong to the same class. This further allows defining a threshold value for similarity enabling open-set classification: if no similar images are found in the gallery set, the query image is predicted to belong to an unknown class. Furthermore, new classes can be added by simply including them in the gallery set as the model does not necessarily need to learn class-specific image features.
The most common approaches for deep metric learning include triplet-based learning and classification-based metric learning. The first approach learns the metric by sampling image triplets with anchor, positive, and negative examples (Hoffer and Ailon 2015). The loss function is defined in such a way that the distances (similarity) from the embeddings of the anchors to the positive samples are minimized, and the distances from the anchors to the negative samples are maximized. The second approach approximates the classes using learned proxies (Movshovitz-Attias et al. 2017) or class centers (Deng et al. 2019) that provide the global information needed to learn the metric. This makes it possible to formulate the loss function based on the softmax loss and allows to avoid the challenging triplet mining step.
Teigen et al. (2020) studied the viability of few-shot learners in correctly classifying plankton images. A Siamese network was trained using the triplet loss and used to determine the class of a query image. Two scenarios were tested: the multi-class classification and the novel class detection. A model trained to distinguish between five classes of plankton using five reference images from each class was able to achieve reasonable accuracy. In the novel class detection, however, the model was able to filter out only 57 images out of 500 unknowns.
Badreldeen Bdawy Mohamed et al. (2022) utilized the angular margin loss (ArcFace) (Deng et al. 2019) instead of triplet loss to address the high cost of the triplet mining step. Furthermore, Generalised Mean pooling (GeM) (Radenović et al. 2018) was applied to aggregate the deep activations to rotation and translation invariant representations. ArcFace uses a similarity learning mechanism that allows distance metric learning to be solved in the classification task by introducing the Angular Margin Loss. This allows straightforward training of the model and only adds negligible computational complexity. The metric learning-based method was shown to outperform the model utilizing OpenMax (Bendale and Boult 2016) layer in open-set classification of plankton. One of the main benefits of the method is that it generalizes well to new classes added to the gallery set without retraining. This makes it straightforward to apply the model to new datasets with only partly overlapping plankton species composition. Similar approach was proposed by Yang et al. (2022) who proposed to use supervised contrastive (SupCon) loss instead of ArcFace loss.
Plankton species vary in different locations and seasons, thus, it is common that a recognition model should be adapted to or retrained for the new situation at some point. Retraining a separate model for each situation is infeasible, and continual or online training of the model would be challenging for online monitoring applications. Therefore, an effective remedy would be to treat it as an open-set recognition problem, solve it with the modern methods anomaly detection or metric learning, and take care of the model’s capability to generalize to new data without the need to retrain the whole model.
5.6 Challenge 6: Label uncertainty
The plankton image label uncertainty is caused by the difficulty of manually recognizing the species from low-quality images with limited resolution, human error, and high costs preventing the repetition of the manual annotation by multiple experts. Culverhouse et al. (2003) identified four main reasons for the incorrect labeling of plankton images: (1) the limited short-term memory of humans, (2) fatigue, (3) recency effects, i.e., labeling is biased towards the most recently seen labels, and (4) positivity bias, i.e., labeling is biased by the expert’s expectations to the content of sample. Labels provided by sixteen human experts (marine ecologists and harmful algal bloom monitoring specialists) on microscopy images of dinoflagellates (6 classes) were analyzed. The results showed that only 67–83% self-consistency and 43% consensus between experts was obtained. Experts who where routinely labeling the selected classes were able to achieve 84–95% labeling accuracy. Culverhouse (2007) brought up several important points related to labeling algae. The presented performance figures do not represent the state-of-the-art of automatic approaches, but improvements would be beneficial for both alternatives. Human expert judgements would benefit from peer review and inter-expert calibration to remove human bias. To improve the automatic solutions, the errors of both man and machine would require further attention. Global reference databases with validated samples and representative coverage of the morphological and physiological characteristics in nature would be beneficial for training and evaluation purposes. In addition, Solow et al. (2001) noted that the taxonomic counts of classified individuals are biased when there are errors in classification. A straightforward method for correcting for the bias was proposed based on the classification probabilities of the classifier.
Image filtering has been proposed to address label uncertainty in plankton image data. The idea is to discard images for which the recognition model is uncertain, and therefore, more likely to produce erroneous labels. For example, Faillettaz et al. (2016) utilized a probabilistic RF for classification, and obtained class probabilities were used to detect and ignore images for which the classifier is uncertain. Luo et al. (2018), Plonus et al. (2021a), and Kraft et al. (2022b) utilized similar approach for CNN-based recognition models. Luo et al. (2018) used a separate fully annotated validation set to set class-specific probability thresholds for filtering. Plonus et al. (2021a) proposed a pipeline for tailoring filtering thresholds to the research question of interest by allowing to select between high precision and high recall. Kraft et al. (2022b) evaluated a CNN-based model with class-specific probability thresholds on operational use.
Schanz et al. (2023) proposed a novel loss function that measures the Kullback-Leibler divergence between the model’s output distribution over classes and the distribution of expert labels. This allows for training on multiple expert labels that can be conflicting, leading to a model that can estimate the label uncertainty.
Related to the label uncertainty, quantification methods have been proposed for plankton image data analysis. The basic idea is to estimate the class distribution directly. While mislabeled samples cause noise to the training data for classification methods, the class distributions are often close to correct. Sosik and Olson (2007) used a quantification method to estimate the abundance of different taxonomic groups of phytoplankton. Utilizing a combination of image feature types including size, shape, symmetry, and texture characteristics, plus orientation invariant moments, diffraction pattern sampling, and co-occurrence matrix statistics proposed. Statistical analysis was used to estimate category-specific misclassification probabilities for accurate abundance estimates and for quantification of uncertainties in abundance estimates. Beijbom et al. (2015) analyzed several quantification methods on a time-series dataset of plankton samples. These included unsupervised and supervised quantification. In unsupervised quantification, the dataset shift is assumed to be a pure class-distribution shift. Alternatively, the dataset shift is assumed to be ‘small’ and the unlabeled set of target samples is used to align the internal feature representation of a machine learning algorithm. In supervised quantification, no explicit assumptions are made on the dataset shift, but it is assumed that a small amount of samples are available in the target domain. González et al. (2017) proposed a methodology to assess the efficacy of learned models, which takes into account the fact that the data distribution (the plankton composition of the sample) might vary between the training phase and the testing phase. Their approach used validation-by-sample. They proposed using the sample as the basic unit instead of the individuals to predict the abundance of the different plankton groups. Thus, model assessment processes require groups of samples with sufficient variability to provide precise error estimates. González et al. (2019) used a transfer learning approach where deep image features as input for the quantification algorithm to estimate the distribution of each class in an unknown water sample was proposed. Orenstein et al. (2020a) proposed a semi-automatic pipeline where a small subset of images were manually labeled to estimate the dataset shift and use this information to correct the quantification estimate.
Supervised machine learning and particularly the performance evaluation of a recognition model relies on the correctness of the class labels. However, visual recognition of a number of plankton species from low-quality images is difficult and using expert panels becomes practically infeasible if the aim is to produce large datasets. The proposed remedies include exclusion of images that have high label uncertainty or focusing on the actual quantity of interest if it is not plankton recognition. Alternative ways to solve this challenge would be to focus on few-shot learning with ground truth validated by an expert panel and pay special attention to model generalisability, or to use generative models.
5.7 Challenge 7: Large image size variation
Most plankton datasets have extreme variation in image size. Fig. 7 shows example images obtained using Imaging FlowCytobot (IFCB). Typical CNN-based image classifiers require the input image to have a predefined size. Therefore, image resizing has been used as a pre-processing step for datasets with varying height and width of images (e.g. Dai et al. 2016b; Kuang 2015).
On a general level, the resizing can be done in two ways: by forgoing aspect ratio (e.g. Al-Barazanchi et al. 2015a; Sánchez et al. 2019b) or by maintaining the aspect ratio (e.g. Dai et al. 2016a; Correa et al. 2017; González et al. 2019). In the first approach, stretching is needed for images whose aspect ratio does not match the target aspect ratio. This will change the shape of the objects in the image which may affect the feature extraction or learning. In the second approach, images are typically resized based on the length of their longest side and padded with a single color to make the image size correct. Eerola et al. (2020) evaluated various ways to implement the padding and padding with the mode of the image (the most common color in the image typically corresponding to the background color) produced the best results on IFCB data. Both approaches (forgoing and maintaining aspect ratio) have been utilized in plankton recognition. However, there exists little comparison between them. Dai et al. (2016a) tested various resizing methods were tested on zooplankton images and the best accuracy was obtained by maintaining the aspect ratio while scaling. On the other hand, Jindal and Mundra (2015) found little to no difference on performance between the approaches despite images appearing distorted when forgoing aspect ratio.
Various other ways to obtain fixed-size images have been proposed. In the method proposed by Ho et al. (2018), a fixed input image size was chosen and the images were either cropped or padded with zeros to adjust them to the correct size. Schröder et al. (2018) proposed to crop the images to their tight bounding box and pad to a square with a minimum edge length of 128 pixels. Images larger than 128 pixels were shrunken to the same size. Ellen et al. (2019) resized images larger than the target size thus losing some detail. Images smaller than the target size were resized by padding and, therefore, the object size remained the same. Lumini and Nanni (2019a, 2019b); Lumini et al. (2020) compared the two different strategies: (1) resizing all images to a common size and (2) resizing only images that were larger than the input size and using padding for the smaller images. The results showed that the first method produced a better classification result in most of the datasets and models.
All methods to produce fixed-size images from original plankton images with a large size variation result in some degree of information loss or image distortions. Information on the size of the plankton is lost during the resizing, small details disappear if images are heavily downscaled, and only part of the object is seen if cropping is used. Ellen et al. (2019) partially solved this problem by providing the size information as metadata (additional features) for the classifier while still using resized versions of images as the main input for the CNN. Metadata is used as an input for the network besides image data, and they are processed independently by separate parts of the network. The outputs of both subnetworks are concatenated together and processed by fully connected layers. Results showed that metadata was useful for classification accuracy.
To truly solve the problem with the varying image size and aspect ratio, the CNN architecture needs to be modified so that it can process images with multiple sizes. This can be achieved, e.g. by combining scale-invariant and scale-variant features to devise a multi-scale CNN architecture (Van Noord and Postma 2017). Py et al. (2016) proposed an inception module that allows to use multiple scaled versions of the original image with different sizes as the input for CNN. By selecting different strides for each scale, the computed feature maps have the same size for all scales and can be concatenated to a single set of multi-scale features. The proposed method was shown to outperform the method with a single fixed-size input.
Bureš et al. (2021) compared various modifications of the baseline CNN on plankton recognition with high variation in image size. These include Spatial Pyramid Pooling (SPP) (He et al. 2015), using image size as metadata, patch cropping and multi-stream CNNs. SPP allows the training of a single CNN with multiple image sizes in order to obtain higher scale invariance by pooling the features produced by the convolutional layer to a fixed-length vector required by the fully connected layers. The metadata was used as described by Ellen et al. (2019). The patch cropping technique divides images into fixed-size patches that are classified separately. The final recognition is done by averaging the resulting score vectors. Multi-stream CNN utilizes a similar approach but uses multiple different networks trained for different image sizes and aspect ratios. The best plankton recognition accuracy was obtained using a multi-stream network combining two models with different input aspect ratios and patch cropping.
Most plankton datasets have significant variation in image sizes and aspect ratios. Common CNN-based image classifiers require that the input images have a constant size. In this case, image resizing is used and it is necessary to consider what to do with the aspect ratio and whether metadata about the image size provides an advantage when complementing the fixed-size images. However, a more general remedy would be to use a multi-scale CNN with an appropriate architecture as the recognition model.
5.8 Challenge 8: Low or varying image quality
To improve the classification accuracy on low-quality images various preprocessing steps have been proposed. These include discarding bad quality images (Raitoharju et al. 2016), image segmentation (Keçeli et al. 2017), and denoising (Cheng et al. 2019).
Low quality images can be discarded in different ways. Raitoharju et al. (2016) manually removed low-quality images from the dataset before training the recognition model. Moreover, the remaining images were cropped to remove artifacts mainly appearing close to image borders. Coltelli et al. (2014) filtered out out-of-focus images before the feature extraction. The out-of-focus detection was done by fitting color histograms in a GMM. If the distribution contained two components (background and plankton), the image was considered to be in-focus.
Some studies suggest segmenting the images as a preprocessing step to discard non-plankton pixels from the images. For example, Keçeli et al. (2017) used Otsu’s thresholding method (Otsu 1975) for segmentation and pixels outside the obtained segmentation map are set to zero.
Cheng et al. (2019) applied texture enhancement together with background suppression before the classification step. Enhanced images were shown to produce a slightly higher recognition accuracy than the images without enhancement. Ma et al. (2021) proposed to use modern CNN-based super-resolution techniques to improve the plankton image quality. The EDRN super-resolution architecture (Lim et al. 2017) was combined with the contextual loss (Mechrez et al. 2018), and was shown to produce high-quality images. Guo et al. (2022a) proposed a deep learning-based colorization method to address the loss of the critical color information due to imaging. However, the effect of improved image quality on plankton recognition accuracy was not assessed in neither of the studies. Also contrast limited adaptive histogram equalization (CLAHE) has been proposed to improve the contrast of plankton image data Geronimo et al. (2023). Lang et al. (2022) addressed the image quality issues on holographic imaging via image fusion.
Many real-world computer vision applications have to deal with low-quality images and plankton recognition is no exception. A wealth of image preprocessing approaches exist and in the case of plankton images, at least exclusion of bad images, denoising and image segmentation have been proposed. A more profound way would be to adopt image reconstruction methods, but from the practical perspective of plankton recognition, the simpler methods can be considered as sufficient and data augmentation is commonly used to introduce additional variation to the data.
5.9 Challenge 9: Massive amount of data
While most of the challenges are connected to training and developing plankton recognition models, the modern imaging devices with high output rates introduce a challenge also for the model deployment phase. Massive data volumes obtained by modern imaging instruments motivate to develop computationally efficient solutions that are able to analyse data in real time. However, the computation time is rarely considered in plankton recognition literature. Most works related to the challenge consider lightweight CNN architectures. For example, shallow TANet (Li et al. 2019c) was shown to outperform competing methods in computing time without sacrificing accuracy on the Kaggle dataset.
Zimmerman et al. (2020) proposed an embedded system for in situ deployment of plankton microscope with real-time recognition system. Due to the limited computation resources and computation time limitations, CNN-based recognition methods were considered unsuitable and a faster feature-engineering based approach was proposed with reduced recognition accuracy. Yuan et al. (2023) applied the edge computing with an AI chip to establish real-time on-site analysis of IFCB data.
The computation time is an especially big issue with holographic imaging that traditionally relies on computationally heavy reconstruction operations to process the raw data. To address this end-to-end CNN methods for plankton recognition that take the raw holographic data as input have been proposed (Guo et al. 2021a; Zhang et al. 2021; Barua et al. 2023). This way the reconstruction step can be completely avoided. Guo et al. (2021a) and Zhang et al. (2021) showed that CNNs are able to learn the image features for the plankton recognition from the raw data speeding up the processing significantly.
Online monitoring of plankton with modern imaging equipment produces huge amounts of images. The related image analysis requires either high-performance computing (HPC) resources in the cloud or local (edge) computing with shallow CNN architectures. In most cases, the recognition model training has to be performed in a HPC environment after which at least the lightweight models can be deployed for local execution.
6 Summary and future directions
In this paper, a comprehensive survey of challenges and existing solutions for automatic plankton recognition was provided. We identified nine challenges that complicate the introduction of automatic plankton recognition methods to operational use: (1) the limited amount of labeled training data for less common species, (2) large class imbalance, (3) fine-grained nature of the recognition task, (4) domain shift between imaging instruments, (5) presence of previously unseen classes and unknown particles, (6) uncertainty in expert labels, (7) large variation in image size, (8) low or varying image quality, and (9) massive data volumes. While most of the considered challenges are common in a wide variety of machine learning applications, plankton recognition has its specific characteristics including highly imbalanced image datasets, extreme variation in image size, limitations in image quality, and a shortage of qualified experts to visually annotate the images.
Figure 8 shows a flowchart summarizing the challenges and approaches to solve them. Given a new plankton image dataset, the flowchart provides a simple pipeline to identify the problems related to the dataset as a series of yes-no questions. Furthermore, references to the sections in this paper providing the detailed descriptions are provided to find the existing techniques to tackle the problems and to automate the analysis of the dataset.
Some of the challenges, especially the limited amount of labeled training data, have been rather extensively studied. While this problem cannot be considered solved, relatively high classification accuracies have been obtained with limited amounts of training images for certain classes. On the other hand, some of the other challenges have not been widely considered in plankton recognition literature. These include the domain shift between different image sets, presence of previously unseen classes and unknown particles, uncertainty in expert labels, and massive data volumes. The reasons for this vary. Most of the research has focused on improving classification accuracy and computation time has not been seen as an issue. Furthermore, the majority of the method development has been done for a fixed set of species and one imaging instrument, thus, there has been no need to address the domain shift or open-set problem.
The large variation in size and appearance of plankton has a notable effect on how challenging the recognition task is depending on what type of plankton is considered. While the type of plankton should be taken into account when designing handcrafted image features for the recognition task, modern feature learning approaches (CNN and ViT) are more general and can often be applied without the need for customized solutions for different plankton types. A notable exception for this are species that are taxonomically close to each other and/or have reduced size, for which fine-grained recognition techniques are needed. The larger size groups are somewhat overpresented in plankton recognition studies, but the existing literature covers a wide variety of different size groups and plankton types. Table 6 in Appendix A summarizes the prevalence of different plankton types and size groups considered in plankton recognition literature.
One notable problem in plankton recognition is the lack of publicly available general-purpose plankton image datasets with an evaluation protocol making it possible to compare different plankton recognition methods in a fair and reliable manner. The vast majority of the research either has focused on private in-house datasets or is based on custom evaluation protocol and dataset splits on publicly available datasets. This makes it impossible to compare the accuracies between different studies making it challenging to select the best practices for future research. This slows down the progress in the plankton recognition method development. Therefore, there is a need for a publicly available plankton dataset with a predetermined evaluation protocol and preferably multiple subsets captured with different imaging instruments to allow quantitative evaluation of the advances in general (device-agnostic) plankton recognition.
Another important problem limiting the wider utilization of automatic plankton recognition is the difficulty of collecting training images to exhaust all the possible classes. It is not realistic to construct a labeled training set consisting of all the plankton species and non-plankton particles that the imaging instrument is capable of capturing in a certain location. Moreover, varying plankton species composition between different geographical regions and ecosystems limits the possibility to apply traditional recognition models to new locations and datasets. Even a classification model developed and trained for one imaging instrument and one geographic location struggles if new species appear, for example, due to seasonal changes. The remedy for this is open-set recognition together with new class discovery methods. Open-set models are able to identify when the images belong to previously unseen classes and either reject them or process them further by, for example, clustering. Such techniques have potential to enable robust open-world plankton recognition systems. Open-set recognition is an active research topic in machine learning [see, for example, Geng et al. (2020)].
The massive volumes of unlabeled data produced using the modern imaging instruments motivate the use of semi-supervised learning techniques to tackle the challenges related to the limitations in labeled training data. One way to achieve this is to utilize unsupervised and self-supervised learning for pre-training of image features on unlabeled data. In self-supervised learning, the data itself is used to generate the supervisory information to guide the training. Typically, this is done by generating augmented versions of the images to obtain image pairs that have the same label. Image features learned this way can then be fine-tuned for the target dataset with a small amount of labeled training data using transfer learning.
Large variation between plankton image datasets with different species compositions and imaging instruments can be considered not only a challenge but also an opportunity. While it is very difficult to develop one general-purpose algorithm for imaging instrument-agnostic plankton recognition, modern domain adaptation methods have the potential to enable the joint utilization of different datasets. This would allow adapting the classification model to new datasets with a reasonable amount of manual work. Domain adaptation has already been successfully applied to various other machine learning applications, such as general object recognition (Wilson and Cook 2020). Domain adaptation can be considered a special case of transfer learning that mimics the human vision system and utilizes a model trained in one or more source domains to a different (but related) target domain. Domain adaptation can be utilized to reduce the effect of a large domain shift between different datasets and the lack of labeled training data.
The relatively large pool of different plankton image datasets motivates to further utilize domain generalization and meta-learning to obtain an imaging instrument agnostic recognition model. In meta-learning, multiple datasets and tasks are used to “learn how to learn” the recognition model. The idea is to automate the creation of the entire machine learning pipeline end-to-end including the search for the model architecture, hyperparameters, and learning the model weights. Domain generalization refers to learning domain-independent (in this case imaging instrument-independent) feature representations that can be then applied to any dataset. Domain generalization has a wide variety of different applications and it has become an increasingly studied problem in machine learning [see the recent survey in Wang et al. (2022a)]. Recent progress in such methods has opened novel possibilities to aim towards a universal plankton recognition system that is able to adapt to different environments, with dramatically different plankton populations and varying imaging instruments, promoting the wider utilization of automatic plankton recognition for aquatic research.
Data availability
No new data was collected for this survey.
References
Al-Barazanchi H, Verma A, Wang SX (2018) Intelligent plankton image classification with deep learning. Int J Comput Vision Robot 8(6):561–571
Al-Barazanchi HA, Verma A, Wang S (2015a) Performance evaluation of hybrid CNN for SIPPER plankton image calssification. In: International conference on image information processing (ICIIP), IEEE, pp 551–556
Al-Barazanchi HA, Verma A, Wang S (2015b) Plankton image classification using convolutional neural networks. In: International conference on image processing, computer vision, and pattern recognition (IPCV), pp 455–461
Alfano PD, Rando M, Letizia M, et al (2022) Efficient unsupervised learning for plankton images. arXiv preprint arXiv:2209.06726
Ali S, Khan Z, Hussain A et al (2022) Computer vision based deep learning approach for the detection and classification of algae species using microscopic images. Water 14(14):2219
Anderson CR, Berdalet E, Kudela RM et al (2019) Scaling up from regional case studies to a global harmful algal bloom observing system. Front Marine Sci 6:250
Ärje J, Raitoharju J, Iosifidis A et al (2020) Human experts vs. machines in taxa recognition. Signal Process: Image Commun 87:115917
Arrigo KR (2005) Marine microorganisms and global nutrient cycles. Nature 437:349–355
Aurelia, Luo J, Josette-BoozAllen, et al (2014) National Data Science Bowl. https://kaggle.com/competitions/datasciencebowl
Bachimanchi H, Pinder MI, Robert C, et al (2023) Deep-learning-powered data analysis in plankton ecology. arXiv preprint arXiv:2309.08500
Badreldeen Bdawy Mohamed O, Eerola T, Kraft K, et al. (2022) Open-set plankton recognition using similarity learning. In: International symposium on visual computing (ISVC)
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Bao H, Dong L, Piao S, et al (2021) BEiT: BERT pre-training of image transformers. In: International conference on learning representations
Barsanti L, Birindelli L, Gualtieri P (2021) Water monitoring by means of digital microscopy identification and classification of microalgae. Processes & Impacts, Environmental Science
Barua R, Sanborn D, Nyman L et al (2023) In situ digital holographic microscopy for rapid detection and monitoring of the harmful dinoflagellate, karenia brevis. Harmful Algae 123:102401
Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: European Conference on Computer Vision (ECCV), Springer, pp 404–417
Beijbom O, Hoffman J, Yao E, et al (2015) Quantification in-the-wild: Data-sets and baselines. arXiv preprint arXiv:1510.04811
Bell JL, Hopcroft RR (2008) Assessment of zooimage as a tool for the classification of zooplankton. J Plankton Res 30(12):1351–1367
Ben-David S, Blitzer J, Crammer K et al (2010) A theory of learning from different domains. Mach Learn 79(1):151–175
Benammar N, Kahil H, Titah A, et al (2021) Improving 3d plankton image classification with c3d2 architecture and context metadata. In: International conference on innovations in bio-inspired computing and applications, Springer, pp 170–182
Bendale A, Boult T (2016) Towards open set deep networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp 1563–1572
Benfield MC, Grosjean P, Culverhouse PF et al (2007) Rapid: research on automated plankton identification. Oceanography 20:172–187
Bernhard B, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Workshop on computational learning theory. association for computing machinery, p 144–152
Beszteri B, Allen C, Almandoz GO et al (2018) Quantitative comparison of taxa and taxon concepts in the diatom genus fragilariopsis: a case study on using slide scanning, multiexpert image annotation, and image analysis in taxonomy1. J Phycol 54(5):703–719
Bi H, Guo Z, Benfield MC et al (2015) A semi-automated image analysis procedure for in situ plankton imaging systems. PLOS ONE 10:e0127121
Blaschko MB, Holness G, Mattar MA, et al (2005) Automatic in situ identification of plankton. In: Workshops on applications of computer vision (WACV), IEEE, pp 79–86
Bochinski E, Bacha G, Eiselein V, et al (2018) Deep active learning for in situ plankton classification. In: International conference on pattern recognition (ICPR), pp 5–15
Boddy L, Morris C, Wilkins M et al (1994) Neural network analysis of flow cytometric data for 40 marine phytoplankton species. Cytom: J Int Soci Anal Cytol 15(4):283–293
Boddy L, Morris C, Wilkins M et al (2000) Identification of 72 phytoplankton species by radial basis function neural network analysis of flow cytometric data. Mar Ecol Prog Ser 195:47–59
Bueno G, Deniz O, Pedraza A et al (2017) Automated diatom classification (part a): handcrafted feature approaches. Appl Sci 7:753
Bureš J, Eerola T, Lensu L, et al (2021) Plankton recognition in images with varying size. In: International conference on pattern recognition (ICPR) workshops and challenges
Cai H, Shan S, Wang X (2022) Rapid detection for optical micrograph of plankton in ballast water based on neural network. Algal Res 66:102811
Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: International conference on computer vision (ICCV), pp 511–520
Campbell RW, Roberts P, Jaffe J (2020) The prince william sound plankton camera: a profiling in situ observatory of plankton and particulates. ICES J Mar Sci 77:1440–1455
Campello RJ, Moulavi D, Zimek A et al (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discovery Data (TKDD) 10(1):1–51
Chang L, Wang R, Zheng H, et al (2016) Phytoplankton feature extraction from microscopic images based on surf-pca. In: OCEANS Conference, IEEE, pp 1–4
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen T, Kornblith S, Norouzi M, et al (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp 1597–1607
Chen Z, Du M, Yang XD et al (2023) Deep-learning-based automated tracking and counting of living plankton in natural aquatic environments. Environ Sci Technol 57:18048–18057
Cheng K, Cheng X, Hao Q (2018) A review of feature extraction technologies for plankton images. In: International conference on information hiding and image processing (IHIP), pp 48–56
Cheng K, Cheng X, Wang Y et al (2019) Enhanced convolutional neural network for plankton identification and enumeration. PLoS ONE 14:e0219570
Cheng X, Ren Y, Cheng K et al (2020) Method for training convolutional neural networks for in situ plankton image recognition and classification based on the mechanisms of the human eye. Sensors 20(9):2592
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Conference on computer vision and pattern recognition (CVPR), pp 1251–1258
Colas F, Tardivel M, Perchoc J et al (2018) The ZooCAM, a new in-flow imaging system for fast onboard counting, sizing and classification of fish eggs and metazooplankton. Prog Oceanogr 166:54–65
Colin S, Coelho LP, Sunagawa S et al (2017) Quantitative 3d-imaging for cell biology and ecology of environmental microbial eukaryotes. Elife 6:e26066
Coltelli P, Barsanti L, Evangelista V et al (2014) Water monitoring: automated and real time identification and classification of algae using digital microscopy. Environ Sci: Processes Impacts 16(11):2656–2665
Conradt J, Börner G, López-Urrutia Á et al (2022) Automated plankton classification with a dynamic optimization and adaptation cycle. Front Mar Sci 9:868420
Corgnati L, Marini S, Mazzei L et al (2016) Looking inside the ocean: toward an autonomous imaging system for monitoring gelatinous zooplankton. Sensors 16:2124
Corrêa I, Drews P, de Souza MS, et al (2016) Supervised microalgae classification in imbalanced dataset. In: Brazilian conference on intelligent systems (BRACIS), IEEE, pp 49–54
Correa I, Drews P, Botelho S, et al (2017) Deep learning for microalgae classification. In: International conference on machine learning and applications (ICMLA), pp 20–25
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Cosgriff R (1960) Identification of shape. Ohio State University Research Foundation, Report 820-11
Cowen R, Sponaugle S, Robinson K, et al (2015) PlanktonSet 1.0: Plankton imagery data collected from F.G. Walton smith in straits of florida from 2014-06-03 to 2014-06-06 and used in the 2015 National Data Science Bowl (NCEI Accession 0127422) (National Centers for Environmental Information). https://doi.org/10.7289/v5d21vjd
Cowen RK, Guigand CM (2008) In situ ichthyoplankton imaging system (ISIIS): system design and preliminary results. Limnol Oceanogr Methods 6(2):126–132
Cui J, Wei B, Wang C, et al (2018) Texture and shape information fusion of convolutional neural network for plankton image classification. In: OCEANS Techno-Oceans (OTO), pp 1–5
Culverhouse PF (2007) Human and machine factors in algae monitoring performance. Eco Inform 2(4):361–366
Culverhouse PF, Williams R, Reguera B et al (2003) Do experts make mistakes? a comparison of human and machine indentification of dinoflagellates. Mar Ecol Prog Ser 247:17–25
Dai J, Wang R, Zheng H, et al (2016a) Zooplanktonet: deep convolutional network for zooplankton classification. In: OCEANS Conference, pp 1–6
Dai J, Yu Z, Zheng H, et al (2016b) A hybrid convolutional neural network for plankton classification. In: Asian conference on computer vision (ACCV), Springer, pp 102–114
Dai Y, Yang S, Zhao D et al (2023) Coastal phytoplankton blooms expand and intensify in the 21st century. Nature 615(7951):280–284
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 886–893
Davis CS, Gallager SM, Solow AR (1992) Microaggregations of oceanic plankton observed by towed video microscopy. Science 257:230–232
Davis CS, Hu Q, Gallager SM et al (2004) Real-time observation of taxa-specific plankton distributions: an optical sampling method. Mar Ecol Prog Ser 284:77–96
Davis CS, Thwaites FT, Gallager SM et al (2005) A three-axis fast-tow digital video plankton recorder for rapid surveys of plankton taxa and hydrography. Limnol Oceanogr Meth 3(2):59–74
De Vargas C, Audic S, Henry N et al (2015) Eukaryotic plankton diversity in the sunlit ocean. Science 348:6237
Deng J, Dong W, Socher R, et al (2009) Imagenet: a large-scale hierarchical image database. In: Conference on computer vision and pattern recognition (CVPR), IEEE, pp 248–255
Deng J, Guo J, Xue N, et al (2019) ArcFace: additive angular margin loss for deep face recognition. In: Conference on computer vision and pattern recognition (CVPR), pp 4690–4699
Dimitrovski I, Kocev D, Loskovska S et al (2012) Hierarchical classification of diatom images using ensembles of predictive clustering trees. Eco Inform 7(1):19–29
Ding H, Wei B, Tang N, et al (2018) Plankton image classification via multi-class imbalanced learning. In: OCEANS Techno-Oceans (OTO), IEEE, pp 1–6
Ding H, Wei B, Gu Z et al (2020) Ka-ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling. Multim Tools Appl 79(21):14871–14888
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations
Drews P, Colares RG, Machado P et al (2013) Microalgae classification using semi-supervised and active learning based on Gaussian mixture models. J Braz Comput Soc 19(4):411–422
Du A, Gu Z, Yu Z, et al (2020) Plankton image classification using deep convolutional neural networks with second-order features. In: Global oceans 2020: Singapore–US Gulf Coast, IEEE, pp 1–5
Du Buf H, Bayer MM (2002) Automatic diatom identification. World Scientific, Singapore
Du Buf H, Bayer M, Droop S, et al (1999) Diatom identification: a double challenge called adiac. In: International conference on image analysis and processing (CAIP), IEEE, pp 734–739
Dubelaar GB, Gerritzen PL, Beeker AE et al (1999) Design and first results of CytoBuoy: a wireless flow cytometer for in situ analysis of marine and fresh waters. Cytometry: J Int Soci Anal Cytol 37(4):247–254
Dubey SR (2021) A decade survey of content based image retrieval using deep learning. IEEE Trans Circuits Syst Video Technol 32(5):2687–2704
Duda RO, Hart PE (1972) Use of the hough transformation to detect lines and curves in pictures. Commun ACM 15(1):11–15
Dunker S, Boho D, Wäldchen J et al (2018) Combining high-throughput imaging flow cytometry and deep learning for efficient species and life-cycle stage identification of phytoplankton. BMC Ecol 18:51
Dyomin V, Polovtsev I, Davydova AY (2017) Fast recognition of marine particles in underwater digital holography. In: International symposium on atmospheric and ocean optics: atmospheric physics, p 1046627
Dyomin V, Gribenyukov A, Davydova A et al (2019) Holography of particles for diagnostics tasks. Appl Opt 58(34):G300–G310
Dyomin V, Davydova A, Morgalev S et al (2020) Monitoring of plankton spatial and temporal characteristics with the use of a submersible digital holographic camera. Front Mar Sci 7:653
Dyomin V, Davydova A, Polovtsev I et al (2021) Underwater holographic sensor for plankton studies in situ including accompanying measurements. Sensors 21(14):4863
Eerola T, Kraft K, Grönberg O, et al (2020) Towards operational phytoplankton recognition with automated high-throughput imaging and compact convolutional neural networks. Ocean Science Discussions, pp 1–20
Elineau A, Desnos C, Jalabert L, et al (2018) ZooScanNet: plankton images captured with the ZooScan. https://doi.org/10.17882/55741
Elkan C (2001) The foundations of cost-sensitive learning. International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 973–978
Ellen J, Li H, Ohman MD (2015) Quantifying california current plankton samples with efficient machine learning techniques. In: OCEANS Conference, pp 1–9
Ellen JS, Graff CA, Ohman MD (2019) Improving plankton image classification using context metadata. Limnol Oceanogr Methods 17:439–461
Ellis R, Simpson R, Culverhouse PF et al (1997) Committees, collectives and individuals: Expert visual classification by neural network. Neural Comput Appl 5(2):99–105
Embleton KV, Gibson C, Heaney S (2003) Automated counting of phytoplankton by pattern recognition: a comparison with a manual counting method. J Plankton Res 25(6):669–681
Faillettaz R, Picheral M, Luo JY et al (2016) Imperfect automatic image classification successfully describes plankton distribution patterns. Meth Oceanogr 15:60–77
Fernandes JA, Irigoien X, Boyra G et al (2009) Optimizing the number of classes in automated zooplankton classification. J Plankton Res 31(1):19–29
Fernández A, Álvarez MX, Bianconi F (2011) Image classification with binary gradient contours. Opt Lasers Eng 49:1177–1184
Fischer S, Šroubek F, Perrinet L et al (2007) Self-invertible 2d log-gabor wavelets. Int J Comp Vision (IJCV) 75(2):231–246
Flynn KJ, Mitra A, Anestis K et al (2019) Mixotrophic protists and a new paradigm for marine ecology: where does plankton research go now? J Plankton Res 41(4):375–391
Freeman H (1961) On the encoding of arbitrary geometric configurations. IRE Trans Electron Comput 10(2):260–268
Ge Z, Liu S, Wang F, et al (2021) YOLOX: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430
Geng C, Sj Huang, Chen S (2020) Recent advances in open set recognition: a survey. IEEE Trans Patt Anal Mach Intell (PAMI) 43(10):3614–3631
Geraldes P, Barbosa J, Martins A, et al (2019) In situ real-time zooplankton detection and classification. In: OCEANS conference, IEEE, pp 1–6
Geronimo JONV, Arguelles ED, Abriol-Santos KJM (2023) Automated classification and identification system for freshwater algae using convolutional neural networks. Phil J Sci 152(1):325–335
Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: conference on computer vision and pattern recognition (CVPR), pp 580–587
Glibert PM, Mitra A (2022) From webs, loops, shunts, and pumps to microbial multitasking: evolving concepts of marine microbial ecology, the mixoplankton paradigm, and implications for a future ocean. Limnol Oceanogr 67(3):585–597
González P, Álvarez E, Díez J et al (2017) Validation methods for plankton image classification systems. Limnol Oceanogr Methods 15:221–237
González P, Castaño A, Peacock EE et al (2019) Automatic plankton quantification using deep features. J Plankton Res 41(4):449–463
Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Conference on neural information processing systems (NIPS), pp 2672–2680
Goodwin M, Halvorsen KT, Jiao L et al (2022) Unlocking the potential of deep learning for marine ecology: overview, applications, and outlook. ICES J Mar Sci 79(2):319–336
Gorsky G, Guilbert P, Valenta E (1989) The autonomous image analyzer - enumeration, measurement and identification of marine phytoplankton. Mar Ecol Prog Ser 58:133–142
Gorsky G, Ohman MD, Picheral M et al (2010) Digital zooplankton image analysis using the zooscan integrated system. J Plankton Res 32(3):285–303
Goulart AJH, Morimitsu A, Jacomassi R, et al (2021) Deep learning and t-sne projection for plankton images clusterization. In: OCEANS 2021: San Diego–Porto, pp 1–4
Graham B (2014) Spatially-sparse convolutional neural networks. arXiv preprint arXiv:1409.6070
Grosjean P, Picheral M, Warembourg C et al (2004) Enumeration, measurement, and identification of net zooplankton samples using the zooscan digital imaging system. ICES J Mar Sci 61(4):518–525
Grossmann MM, Gallager SM, Mitarai S (2015) Continuous monitoring of near-bottom mesoplankton communities in the east china sea during a series of typhoons. J Oceanogr 71(1):115–124
Gu J, Wang Z, Kuen J et al (2018) Recent advances in convolutional neural networks. Patt Recogn 77:354–377
Gulrajani I, Lopez-Paz D (2020) In search of lost domain generalization. In: International conference on learning representations
Guo B, Nyman L, Nayak AR et al (2021) Automated plankton classification from holographic imagery with deep convolutional neural networks. Limnol Oceanogr Methods 19(1):21–36
Guo C, Wei B, Yu K (2021) Deep transfer learning for biology cross-domain image classification. J Contr Sci Eng 2021:1–19
Guo G, Lin Q, Chen T, et al (2022a) Colorization for in situ marine plankton images. In: European conference on computer vision, Springer, pp 216–232
Guo J, Guan J (2021) Classification of marine plankton based on few-shot learning. Arab J Sci Eng 46(9):9253–9262
Guo J, Ma Y, Lee JH (2021) Real-time automated identification of algal bloom species for fisheries management in subtropical coastal waters. J Hydro-Environ Res 36:1–32
Guo J, Li W, Guan J, et al (2022b) CDFM: a cross-domain few-shot model for marine plankton classification. IET Computer Vision
Guo X, Liu X, Zhu E, et al (2017) Deep clustering with convolutional autoencoders. In: International conference on neural information processing (NIPS), pp 373–382
Han D, Kim J, Kim J (2017) Deep pyramidal residual networks. In: Conference on computer vision and pattern recognition (CVPR), pp 5927–5935
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887
Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 3(6):610–621
Hariharan B, Girshick R (2017) Low-shot visual recognition by shrinking and hallucinating features. In: International conference on computer vision (ICCV), pp 3018–3027
Haug ML (2021) Applying active learning techniques in machine learning to minimize labeling effort. Master’s thesis, NTNU
Haug ML, Saad A, Stahl A (2021) Ciral: a hybrid active learning framework for plankon taxa labeling. IFAC-PapersOnLine 54(16):450–457
Haug ML, Saad A, Stahl A (2021b) A combined informative and representative active learning approach for plankton taxa labeling. In: International conference on digital image processing (ICDIP), SPIE, pp 495–503
Hays GC, Richardson AJ, Robinson C (2005) Climate change and marine plankton. Trends Ecol Evol 20(6):337–344
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell (PAMI) 37(9):1904–1916
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: The Conference on computer vision and pattern recognition (CVPR), pp 770–778
He K, Gkioxari G, Dollár P, et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Fan H, Wu Y, et al (2020) Momentum contrast for unsupervised visual representation learning. In: Conference on computer vision and pattern recognition (CVPR), pp 9729–9738
Henrichs DW, Anglès S, Gaonkar CC et al (2021) Application of a convolutional neural network to improve automated early warning of harmful algal blooms. Environ Sci Pollut Res 28(22):28544–28555
Hirata NS, Fernandez MA, Lopes RM (2016) Plankton image classification based on multiple segmentations. International Conference on Pattern Recognition (ICPR) Workshops. Computer vision for analysis of underwater imagery (CVAUI), IEEE, pp 55–60
Ho E, Henriquez B, Yeung J (2018) Flagellates classification via transfer learning. Project Report, Course ECE228 Machine learning for physical applications, University of California San Diego, USA, http://noiselab.ucsd.edu/ECE228_2018/Reports/Report14.pdf
Ho TK (1995) Random decision forests. In: International conference on document analysis and recognition (ICDAR), IEEE, pp 278–282
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, pp 84–92
Howard AG, Zhu M, Chen B, et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Hu MK (1962) Visual pattern recognition by moment invariants. IRE Trans Inform Theory 8(2):179–187
Hu Q, Davis C (2005) Automatic plankton image recognition with co-occurrence matrices and support vector machine. Mar Ecol Prog Ser 295:21–31
Hu Q, Davis C (2006) Accurate automatic quantification of taxa-specific plankton abundance using dual classification with correction. Mar Ecol Prog Ser 306:51–61
Huang G, Liu Z, Van Der Maaten L, et al (2017) Densely connected convolutional networks. In: Conference on computer vision and pattern recognition (CVPR), pp 4700–4708
Iandola FN, Han S, Moskewicz MW, et al (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\) 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ibrahim M (2020) Image clustering for unsupervised analysis of plankton data. Master’s thesis, LUT University, Finland
Idrissa M, Acheroy M (2002) Texture classification using gabor filters. Patt Recogn Lett 23(9):1095–1102
Irisson JO, Ayata SD, Lindsay DJ et al (2022) Machine learning for the study of plankton and marine snow from images. Ann Rev Mar Sci 14:277–301
Ito K, Miura K, Aoki T, et al (2023) Zooplankton classification using hierarchical attention branch network. In: Asian conference on pattern recognition, Springer, pp 409–419
Jindal P, Mundra R (2015) Plankton classification using hybrid convolutional network-random forests architectures. Technical Report, Stanford University
Jocher G (2020) Ultralytics yolov5. https://github.com/ultralytics/yolov5
Julesz B (1962) Visual pattern discrimination. IRE Trans Inform Theory 8(2):84–92
Keçeli AS, Kaya A, Keçeli SU (2017) Classification of radiolarian images with hand-crafted and deep features. Comp Geosci 109:67–74
Kerr T, Clark JR, Fileman ES et al (2020) Collaborative deep learning models to handle class imbalance in flowcam plankton imagery. IEEE Access 8:170013–170032
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: Science and information conference (SAI), IEEE, pp 372–378
Khan Z, Mumtaz W, Mumtaz AS, et al (2022) Multiclass-classification of algae using dc-gan and transfer learning. In: International conference on image processing and robotics (ICIPRob), IEEE, pp 1–6
Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans Patt Anal Mach Intell (PAMI) 12(5):489–497
Kiko R, Simon-Martin S (2020) UVP5 data sorted with EcoTaxa and morphocluste https://doi.org/10.17882/73002
Kingman J, Matheron G (1975) Random sets and integral geometry. Bull Am Math Soci 81(5):844–847
Kloster M, Kauer G, Beszteri B (2014) Sherpa: an image segmentation and outline feature extraction tool for diatoms and other objects. BMC Bioinform 15(1):1–17
Kloster M, Langenkämper D, Zurowietz M et al (2020) Deep learning-based diatom taxonomy on virtual slides. Sci Rep 10(1):1–13
Kosov S, Shirahama K, Li C et al (2018) Environmental microorganism classification using conditional random fields and deep convolutional neural networks. Patt Recogn 77:248–261
Kovesi P (2000) Phase congruency: a low-level image invariant. Psychol Res 64(2):136–148
Kovesi P (2003) Phase congruency detects corners and edges. In: Australian pattern recognition society conference: DICTA
Kraft K, Seppälä J, Hällfors H, et al (2021) First application of ifcb high-frequency imaging-in-flow cytometry to investigate bloom-forming filamentous cyanobacteria in the baltic sea. Front Marine Sci, p 282
Kraft K, Haraguchi L, Velhonoja O, et al (2022a) SYKE-phytoplankton_IFCB_Utö_2021. https://doi.org/10.23728/b2share.7c273b6f409c47e98a868d6517be3ae3
Kraft K, Velhonoja O, Eerola T et al (2022) Towards operational phytoplankton recognition with automated high-throughput imaging, near-real-time data processing, and convolutional neural networks. Front Mar Sci 9:867695
Kraft K, Velhonoja O, Seppälä J, et al (2022c) SYKE-phytoplankton_IFCB_2022. https://doi.org/10.23728/b2share.abf913e5a6ad47e6baa273ae0ed6617a
Kramer KA (2005) Identifying Plankton from Grayscale Silhouette Images. Master’s thesis, University of South Florida
Kramer KA (2010) System for identifying plankton from the sipper instrument platform. University of South Florida
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 1097–1105
Kuang Y (2015) Deep neural network for deep sea plankton classification. Project Report, Course CS231n: Convolutional Neural Networks for Visual Recognition, Stanford University, USA, https://pdfs.semanticscholar.org/40fd/606b61e15c28a509a5335b8cf6ffdefc 51bc.pdf
Kuhl FP, Giardina CR (1982) Elliptic fourier features of a closed contour. Comput Graphics Image Process 18(3):236–258
Kyathanahally S, Hardeman T, Merz E, et al (2021a) Data for: Deep learning classification of lake zooplankton. https://opendata.eawag.ch/dataset/deep-learning-classification-of-zooplankton-from-lakes
Kyathanahally SP, Hardeman T, Merz E, et al (2021b) Deep learning classification of lake zooplankton. Front Microbiol, p 3226
Kyathanahally SP, Hardeman T, Reyes M et al (2022) Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology. Sci Rep 12(1):18590
Lai QT, Lee KC, Tang AH et al (2016) High-throughput time-stretch imaging flow cytometry for multi-class classification of phytoplankton. Opt Expr 24(25):28170–28184
Lang K, Shan S, Lv W, et al (2022) Image fusion method for improving the accuracy of ocean plankton recognition. In: OCEANS 2022-Chennai, IEEE, pp 1–4
Lauffer M, Genty F, Margueron S et al (2017) Morphological recognition with the addition of multi-band fluorescence excitation of chlorophylls of phytoplankton. Photosynthetica 55(3):434–442
Le KT, Yuan Z, Syed A et al (2022) Benchmarking and Automating the Image Recognition Capability of an In Situ Plankton Imaging System. Front Mar Sci 9:869088
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Lee H, Park M, Kim J (2016) Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. In: International Conference on Image Processing (ICIP), IEEE, pp 3713–3717
Lendaris GG, Stanley GL (1970) Diffraction-pattern sampling for automatic pattern recognition. Proc IEEE 58(2):198–216
Li C, Wang K, Xu N (2019) A survey for the applications of content-based microscopic image analysis in microorganism classification domains. Artif Intell Rev 51(4):577–646
Li J, Chen T, Yang Z et al (2021) Development of a buoy-borne underwater imaging system for in situ mesoplankton monitoring of coastal waters. IEEE J Oceanic Eng 47(1):88–110
Li J, Yang Z, Chen T (2021b) DYB-PlanktonNet, https://doi.org/10.21227/875n-f104
Li P, Xie J, Wang Q, et al (2017) Is second-order information helpful for large-scale visual recognition? In: International conference on computer vision (ICCV), pp 2070–2078
Li Q, Sun X, Dong J et al (2019) Developing a microscopic image dataset in support of intelligent phytoplankton detection using deep learning. ICES J Mar Sci 77(4):1427–1439
Li X, Cui Z (2016) Deep residual networks for plankton classification. In: OCEANS conference, pp 1–4
Li X, Long R, Yan J et al (2019) Tanet: a tiny plankton classification network for mobile devices. Mobile Inform Syst. https://doi.org/10.1155/2019/6536925
Li Y, Guo J, Guo X et al (2021) Plankton detection with adversarial learning and a densely connected deep learning model for class imbalanced distribution. J Marine Sci Eng 9(6):636
Li Y, Guo J, Guo X et al (2021) Toward in situ zooplankton detection with a densely connected yolov3 model. Appl Ocean Res 114:102783
Li Z, Zhao F, Liu J et al (2014) Pairwise nonparametric discriminant analysis for binary plankton image recognition. IEEE J Oceanic Eng 39(4):695–701
Libreros J, Bueno G, Trujillo M, et al (2018) Automated identification and classification of diatoms from water resources. In: Iberoamerican Congress on Pattern Recognition (CIARP), Springer, pp 496–503
Lim B, Son S, Kim H, et al (2017) Enhanced deep residual networks for single image super-resolution. In: Conference on computer vision and pattern recognition (CVPR) Workshops, pp 136–144
Ling H, Jacobs DW (2007) Shape classification using the inner-distance. IEEE Trans Patt Anal Mach Intell (PAMI) 29(2):286–299
Lisin DA (2006) Image classification with bags of local features. University of Massachusetts Amherst
Lisin DA, Mattar MA, Blaschko MB, et al (2005) Combining local and global image features for object class recognition. In: Conference on computer vision and pattern recognition (CVPR) workshops, IEEE, pp 47
Liu J, Du A, Wang C, et al (2018a) Deep pyramidal residual networks for plankton image classification. In: OCEANS Techno-Oceans (OTO), IEEE, pp 1–5
Liu J, Du A, Wang C, et al (2018b) Teaching squeeze-and-excitation pyramidnet for imbalanced image classification with gan-based curriculum learning. In: International conference on pattern recognition (ICPR), IEEE, pp 2444–2449
Liu W, Anguelov D, Erhan D, et al (2016) SSD: Single shot multibox detector. In: European conference on computer vision (ECCV), pp 21–37
Liu Y, Qiao X, Gao R (2021) Plankton classification on imbalanced dataset via hybrid resample method with lightbgm. International conference on image, vision and computing (ICIVC), IEEE, pp 191–195
Liu Z, Watson J (2020) Shape-based image classification and identification system for digital holograms of marine particles and plankton. In: Global Oceans 2020: Singapore–U.S. Gulf Coast, pp 1–5
Liu Z, Watson J, Allen A (2017) Efficient affine-invariant fourier descriptors for identification of marine plankton. In: OCEANS 2017-Aberdeen, IEEE, pp 1–9
Liu Z, Lin Y, Cao Y, et al (2021b) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Liu Z, Mao H, Wu CY, et al (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
Lombard F, Boss E, Waite AM et al (2019) Globally consistent quantitative observations of planktonic ecosystems. Front Mar Sci 6:196
Lowe DG (1999) Object recognition from local scale-invariant features. In: International conference on computer vision (ICCV), IEEE, pp 1150–1157
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comp Vision (IJCV) 60(2):91–110
Lumini A, Nanni L (2019) Deep learning and transfer learning features for plankton classification. Eco Inform 51:33–43
Lumini A, Nanni L (2019b) Ocean ecosystems plankton classification. In: Recent advances in computer vision. Springer, pp 261–280
Lumini A, Nanni L, Maguolo G (2020) Deep learning for plankton and coral classification. Appl Comp Inform 19(3/4):265–83
Luo JY, Irisson JO, Graham B et al (2018) Automated plankton image analysis using convolutional neural networks. Limnol Oceanogr Methods 16:814–827
Luo Q, Gao Y, Luo J et al (2011) Automatic identification of diatoms with circular shape using texture analysis. J Software 6(3):428–435
Luo S, Nguyen KT, Nguyen BT et al (2021) Deep learning-enabled imaging flow cytometry for high-speed cryptosporidium and giardia detection. Cytometry A 99(11):1123–1133
Luo S, Shi Y, Chin LK et al (2021) Machine-learning-assisted intelligent imaging flow cytometry: a review. Adv Intell Syst 3(11):2100073
Luo T (2005) Scaling up support vector machines with application to plankton recognition. PhD thesis, University of South Florida
Luo T, Kramer K, Goldgof D et al (2003) Learning to recognize plankton. International conference on systems, man and cybernetics, IEEE, pp 888–893
Luo T, Kramer K, Goldgof DB et al (2004) Recognizing plankton images from the shadow image particle profiling evaluation recorder. IEEE Trans Syst, Man, Cybernet Part B (Cybernet) 34(4):1753–1762
Luo T, Kramer K, Goldgof DB et al (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6(Apr):589–613
Ma N, Zhang X, Zheng HT, et al (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131
Ma W, Chen T, Zhang Z, et al (2021) Super-resolution for in situ plankton images. In: International conference on computer vision (ICCV), pp 3683–3692
MacLeod N, Benfield M, Culverhouse P (2010) Time to automate identification. Nature 467(7312):154–155
MacNeil L, Missan S, Luo J et al (2021) Plankton classification with high-throughput submersible holographic microscopy and transfer learning. BMC Ecol Evol 21(1):1–11
Maracani A, Pastore VP, Natale L et al (2023) In-domain versus out-of-domain transfer learning in plankton image classification. Sci Rep 13(1):10443
Mechrez R, Talmi I, Zelnik-Manor L (2018) The contextual loss for image transformation with non-aligned data. In: European conference on computer vision (ECCV), pp 768–783
Mirasbekov Y, Zhumakhanova A, Zhantuyakova A et al (2021) Semi-automated classification of colonial microcystis by flowcam imaging flow cytometry in mesocosm experiment reveals high heterogeneity during seasonal bloom. Sci Rep 11(1):1–14
Mitra A, Caron DA, Faure E et al (2023) The mixoplankton database (mdb): Diversity of photo-phago-trophic plankton in form, function, and distribution across the global ocean. J Eukary Microbiol 70(4):e12972
Mitra R, Marchitto T, Ge Q et al (2019) Automated species-level identification of planktic foraminifera using convolutional neural networks, with comparison to human performance. Mar Micropaleontol 147:16–24
Mittal S, Srivastava S, Jayanth JP (2022) A survey of deep learning techniques for underwater image classification. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3143887
Moniruzzaman M, Islam SMS, Bennamoun M, et al (2017) Deep learning on underwater marine object detection: A survey. In: International conference on advanced concepts for intelligent vision systems (ACIVS), Springer, pp 150–160
Mosleh MA, Manssor H, Malek S et al (2012) A preliminary study on automated freshwater algae recognition and classification system. BMC Bioinform 13(Suppl17):S25
Movshovitz-Attias Y, Toshev A, Leung TK, et al (2017) No fuss distance metric learning using proxies. In: International conference on computer vision (ICCV), pp 360–368
Nandini TS, Swethaa S, Bolem S, et al (2022) Real-time classification of plankton species using convolutional neural networks. In: OCEANS 2022-Chennai, IEEE, pp 1–5
Nayak AR, McFarland MN, Sullivan JM et al (2018) Evidence for ubiquitous preferential particle orientation in representative oceanic shear flows. Limnol Oceanogr 63(1):122–143
Nepovinnykh E, Eerola T, Kalviainen H (2020) Siamese network based pelage pattern matching for ringed seal re-identification. In: Winter conference on applications of computer vision (WACV) workshops, pp 25–34
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Patt Anal Mach Intell (PAMI) 24(7):971–987
Olson RJ, Sosik HM (2007) A submersible imaging-in-flow instrument to analyze nano-and microplankton: Imaging flowcytobot. Limnol Oceanogr Methods 5:195–203
Orenstein EC, Beijbom O (2017) Transfer learning and deep feature extraction for planktonic image data sets. In: Winter conference on applications of computer vision (WACV), IEEE, pp 1082–1088
Orenstein EC, Beijbom O, Peacock EE, et al (2015) WHOI-plankton-a large scale fine grained visual recognition benchmark dataset for plankton classification. arXiv preprint arXiv:1510.00745
Orenstein EC, Kenitz KM, Roberts PL et al (2020) Semi-and fully supervised quantification techniques to improve population estimates from machine classifiers. Limnol Oceanogr Methods 18(12):739–753
Orenstein EC, Ratelle D, Briseño-Avena C et al (2020) The scripps plankton camera system: a framework and platform for in situ microscopy. Limnol Oceanogr Methods 18(11):681–695
Otsu N (1975) A threshold selection method from gray-level histograms. Automatica 11:23–27
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Pardeshi R, Deshmukh PD (2019) Classification of microscopic algae: An observational study with alexnet. In: International conference on soft computing and signal processing (ICSCSP), Springer, pp 309–316
Pastore VP, Zimmerman TG, Biswas SK et al (2020) Annotation-free learning of plankton for classification and anomaly detection. Sci Rep 10(1):1–15
Pastore VP, Megiddo N, Bianco S (2022) An anomaly detection approach for plankton species discovery. In: International conference on image analysis and processing, Springer, pp 599–609
Pastore VP, Ciranni M, Bianco S et al (2023) Efficient unsupervised learning of biological images with compressed deep features. Image Vis Comput 137:104764
Pedraza A, Bueno G, Deniz O et al (2017) Automated diatom classification (Part B): A deep learning approach. Appl Sci 7:460
Pedraza A, Bueno G, Deniz O, et al (2018) Lights and pitfalls of convolutional neural networks for diatom identification. In: Optics, photonics, and digital technologies for imaging applications V, international society for optics and photonics (SPIE), p 106790G
Picheral M, Guidi L, Stemmann L et al (2010) The underwater vision profiler 5: An advanced instrument for high spatial resolution studies of particle size spectra and zooplankton. Limnol Oceanogr Methods 8(9):462–473
Picheral M, Colin S, Irisson JO (2017) EcoTaxa, a tool for the taxonomic classification of images. https://ecotaxa.obs-vlfr.fr/
Plonus RM, Conradt J, Harmer A et al (2021) Automatic plankton image classification -Can capsules and filters help cope with data set shift? Limnol Oceanogr Methods 19(3):176–195
Plonus RM, Conradt J, Harmer A, et al (2021b) Automatic plankton image classification – can capsules and filters help coping with data set shift? (Dataset) https://doi.org/10.5281/zenodo.4431509
Pratt WK (2007) Image feature extraction, vol 16. Wiley, Hoboken, pp 535–577
Pu Y, Feng Z, Wang Z, et al (2021) Anomaly detection for in situ marine plankton images. In: International conference on computer vision (ICCV), pp 3661–3671
Py O, Hong H, Zhongzhi S (2016) Plankton classification with deep convolutional neural networks. In: Information technology, networking, electronic and automation control conference (ITNEC), IEEE, pp 132–136
Qi H, Brown M, Lowe DG (2018) Low-shot learning with imprinted weights. In: Conference on computer vision and pattern recognition (CVPR), pp 5822–5830
Qiao X, Tang M, Tang Z, et al (2021) Classification of phytoplankton digital holograms using transfer learning. In: Symposium on novel photoelectronic detection technology and applications, SPIE, pp 1721–1726
Rachman A, Suwarno AS, Nurdjaman S (2022) Application of deep (machine) learning for phytoplankton identification using microscopy images. In: International conference on biological science (ICBS), Atlantis Press, pp 213–224
Radenović F, Tolias G, Chum O (2018) Fine-tuning cnn image retrieval with no human annotation. IEEE Trans Patt Anal Mach Intell (PAMI) 41(7):1655–1668
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434
Raitoharju J, Riabchenko E, Meissner K, et al (2016) Data enrichment in fine-grained classification of aquatic macroinvertebrates. In: Workshop on computer vision for analysis of underwater imagery (CVAUI), IEEE, pp 43–48
Rani P, Kotwal S, Manhas J et al (2021) Machine learning and deep learning based computational approaches in automatic microorganisms image recognition: methodologies, challenges, and developments. Arch Computat Meth Eng 9(3):1801–1837
Ravela SS (2003) On multi-scale differential features and their representations for image retrieval and recognition. University of Massachusetts Amherst
Rawat SS, Bisht A, Nijhawan R (2019) A deep learning based cnn framework approach for plankton classification. In: International Conference on Image Information Processing (ICIIP), IEEE, pp 268–273
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767
Redmon J, Divvala S, Girshick R, et al (2016) You only look once: unified, real-time object detection. In: Conference on computer vision and pattern recognition (CVPR), pp 779–788
Reiss TH (1991) The revised fundamental theorem of moment invariants. IEEE Trans Patt Anal Mach Intell (PAMI) 13(8):830–834
Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell (PAMI) 39(6):1137–1149
Rivas-Villar D, Rouco J, Carballeira R et al (2021) Fully automatic detection and classification of phytoplankton specimens in digital microscopy images. Comput Methods Programs Biomed 200:105923
Rivas-Villar D, Morano J, Rouco J, et al (2022) Deep features-based approaches for phytoplankton classification in microscopy images. In: International conference on computer aided systems theory, Springer, pp 419–426
Rodenacker K, Hense B, Jütting U et al (2006) Automatic analysis of aqueous specimens for phytoplankton structure recognition and population estimation. Microsc Res Tech 69(9):708–720
Rodrigues FCM, Hirata NS, Abello AA, et al (2018) Evaluation of transfer learning scenarios in plankton image classification. In: International joint conference on computer vision, imaging and computer graphics theory and applications (VISIGRAPP), pp 359–366
Rogers AD, Appeltans W, Assis J et al (2022) Chapter two - discovering marine biodiversity in the 21st century. Adv Mar Biol 93:23–115
Ruiz-Santaquiteria J, Bueno G, Deniz O et al (2020) Semantic versus instance segmentation in microscopic algae detection. Eng Appl Artif Intell 87:103271
Salvesen E (2021) Unsupervised methods for in-situ classification of plankton taxa. Master’s thesis, NTNU
Salvesen E, Saad A, Stahl A (2020) Robust methods of unsupervised clustering to discover new planktonic species in-situ. In: Global Oceans 2020: Singapore–US Gulf Coast, IEEE, pp 1–9
Salvesen E, Saad A, Stahl A (2022) Robust deep unsupervised learning framework to discover unseen plankton species. In: Fourteenth international conference on machine vision, SPIE, pp 241–250
Sánchez C, Cristóbal G, Bueno G (2019) Diatom identification including life cycle stages through morphological and texture descriptors. PeerJ 7:e6770
Sánchez C, Vállez N, Bueno G, et al (2019b) Diatom classification including morphological adaptations using cnns. In: Iberian conference on pattern recognition and image analysis (IbPRIA), Springer, pp 317–328
Sandler M, Howard A, Zhu M, et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Conference on computer vision and pattern recognition (CVPR), pp 4510–4520
Schanz T, Möller KO, Rühl S et al (2023) Robust detection of marine life with label-free image feature learning and probability calibration. Mach Learn: Sci Technol 4(3):035007
Scherrer R, Govan R, Quiniou T, et al (2021) Automatic plankton detection and classification on raw hologram with a single deep learning architecture. In: International conference on computational intelligence methods for bioinformatics and biostatistics (CIBB)
Schmarje L, Brünger J, Santarossa M et al (2021) Fuzzy Overclustering: semi-supervised classification of fuzzy labels with overclustering and inverse cross-entropy. Sensors 21(19):6661
Schoening T, Durden JM, Faber C et al (2022) Making marine image data FAIR. Scient Data 9(1):414
Schröder SM, Kiko R (2022) Assessing representation learning and clustering algorithms for computer-assisted image annotation-simulating and benchmarking morphocluster. Sensors 22(7):2775
Schröder SM, Kiko R, Irisson JO, et al (2018) Low-shot learning of plankton categories. In: German conference on pattern recognition (GCPR), Springer, pp 391–404
Schröder SM, Kiko R, Koch R (2020) Morphocluster: efficient annotation of plankton images by clustering. Sensors 20(11):3060
Schulz J, Barz K, Ayon P et al (2010) Imaging of plankton specimens with the lightframe on-sight keyspecies investigation (LOKI) system. J Eur Opt Soci. https://doi.org/10.2971/jeos.2010.10017s
Schulze K, Tillich UM, Dandekar T et al (2013) Planktovision-an automated analysis system for the identification of phytoplankton. BMC Bioinform 14(1):1–10
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: International conference on computer vision (ICCV), pp 618–626
Shan S, Zhang W, Wang X et al (2020) Automated red tide algae recognition by the color microscopic image. In: International congress on image and signal processing. BioMedical engineering and informatics (CISP-BMEI), IEEE, pp 852–861
Shao L, Zhu F, Li X (2014) Transfer learning for visual categorization: a survey. IEEE Trans Neural Netw Learn Syst 26(5):1019–1034
Si G, Xiao Y, Wei B et al (2023) Token-selective vision transformer for fine-grained image recognition of marine organisms. Front Mar Sci 10:1174347
Sieracki CK, Sieracki ME, Yentsch CS (1998) An imaging-in-flow system for automated analysis of marine microplankton. Mar Ecol Prog Ser 168:285–296
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Soh Y, Song J, Hae Y (2018) Multiple plankton detection and recognition in microscopic images with homogeneous clumping and heterogeneous interspersion. J Instit Converg Signal Process 19(2):35–41
Solano GA, Gasmen P, Marquez EJ (2018) Radiolarian classification decision support using supervised and unsupervised learning approaches. International conference on information. Intelligence, systems and applications (IISA), pp 1–6
Solow A, Davis C, Hu Q (2001) Estimating the taxonomic composition of a sample when individuals are classified with error. Mar Ecol Prog Ser 216:309–311
Song H, Mehdi SR, Huang H et al (2020) Classification of freshwater zooplankton by pre-trained convolutional neural network in underwater microscopy. Int J Adv Comput Sci Appl 11(7):1–7
Sosik HM, Olson RJ (2007) Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol Oceanogr Methods 5:204–216
Sosik HM, Peacock EE, Brownlee EF (2021) WHOI-plankton: annotated plankton images - dataset for developing and evaluating classification methods. https://doi.org/10.1575/1912/7341
Sun X, Xv H, Dong J et al (2020) Few-shot learning for domain-specific fine-grained image classification. IEEE Trans Ind Electr 68(4):3588–3598
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Conference on computer vision and pattern recognition (CVPR), pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, et al (2016) Rethinking the inception architecture for computer vision. In: Conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, et al (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence
Sömek B, Yuksel SE (2023) Plankton classification with deep learning. In: 2023 Signal processing: algorithms, architectures, arrangements, and applications (SPA), pp 118–123
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning (ICML), pp 6105–6114
Tan S, Zhang F, Huang Q et al (2014) Measuring and calculating geometrical parameters of marine plankton using digital laser holographic imaging. Optik 125:5119–5123
Tanaka FHKdS, Aranha C (2019) Data augmentation using gans. arXiv preprint arXiv:1904.09135
Tang X, Stewart WK, Huang H et al (1998) Automatic plankton image recognition. Artif Intell Rev 12(1–3):177–199
Tang X, Lin F, Samson S et al (2006) Binary plankton image classification. IEEE J Oceanic Eng 31(3):728–735
Teigen AL, Saad A, Stahl A (2020) Leveraging similarity metrics to in-situ discover planktonic interspecies variations or mutations. In: Global Oceans 2020: Singapore–US Gulf Coast, IEEE, pp 1–8
Teuwen J, Moriakov N (2020) Convolutional neural networks. In: Handbook of medical image computing and computer assisted intervention. Elsevier, pp 481–501
Thiel SU, Wiltshire RJ, Davies LJ (1995) Automated object recognition of blue-green algae for measuring water quality-a preliminary study. Water Res 29(10):2398–2404
Tountas K, Pados DA, Medley MJ (2019) Conformity evaluation and l1-norm principal-component analysis of tensor data. In: Big data: learning, analytics, and applications, pp 190–200
Tsechpenakis G, Guigand CM, Cowen RK (2007) Image analysis techniques to accompany a new in situ ichthyoplankton imaging system. In: OCEANS Conference, IEEE, pp 1–6
Vallez N, Bueno G, Deniz O et al (2022) Diffeomorphic transforms for data augmentation of highly variable shape and texture objects. Comput Methods Programs Biomed 219:106775
Van Noord N, Postma E (2017) Learning scale-variant and scale-invariant features for deep image classification. Patt Recogn 61:583–592
Varma K, Nyman L, Tountas K, et al (2020) Autonomous plankton classification from reconstructed holographic imagery by l1-pca-assisted convolutional neural networks. In: Global Oceans 2020: Singapore–US Gulf Coast, IEEE, pp 1–6
Venkataramanan A, Laviale M, Figus C, et al (2021) Tackling inter-class similarity and intra-class variance for microscopic image-based classification. In: International conference on computer vision systems (ICVS), Springer, pp 93–103
Verikas A, Gelzinis A, Bacauskiene M et al (2012) Phase congruency-based detection of circular objects applied to analysis of phytoplankton images. Patt Recogn 45:1659–1670
Verikas A, Gelzinis A, Bacauskiene M et al (2015) An integrated approach to analysis of phytoplankton images. IEEE J Oceanic Eng 40(2):315–326
Wacquet G, Lefebvre A, Blondel C, et al (2018) Combination of machine learning methodologies and imaging-in-flow systems for the automated detection of harmful algae. In: Harmful Algae 2018 - From Ecosystems to Socioecosystems: International Conference on Harmful Algae
Walcutt NL, Knörlein B, Cetinić I et al (2020) Assessment of holographic microscopy for quantifying marine particle size and concentration. Limnol Oceanogr Methods 18(9):516–530
Walker JL, Orenstein EC (2021) Improving rare-class recognition of marine plankton with hard negative mining. In: International conference on computer vision (ICCV), pp 3672–3682
Walker RF, Ishikawa K, Kumagai M (2002) Fluorescence-assisted image analysis of freshwater microalgae. J Microbiol Methods 51(2):149–162
Wang C, Yu Z, Zheng H, et al (2017) Cgan-plankton: towards large-scale imbalanced class generation and fine-grained classification. In: International conference on image processing (ICIP), IEEE, pp 855–859
Wang C, Zheng X, Guo C, et al (2018) Transferred parallel convolutional neural network for large imbalanced plankton database classification. In: OCEANS Techno-Oceans (OTO), IEEE, pp 1–5
Wang J, Lan C, Liu C et al (2022) Generalizing to unseen domains: a survey on domain generalization. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2022.3178128
Wang J, Tang C, Li J (2022b) Towards real-time analysis of marine phytoplankton images sampled at high frame rate by a yolox-based object detection algorithm. In: OCEANS 2022-Chennai, IEEE, pp 1–9
Wang K, Zhang D, Li Y et al (2016) Cost-effective active learning for deep image classification. IEEE Trans Circuits Syst Video Technol 27(12):2591–2600
Watson J (2018) High-resolution underwater holographic imaging. In: Encyclopedia of modern optics. pp 106–112
Wei L, XiaoPan S, Heydari F (2022) Microalgae classification using improved metaheuristic algorithm. Math Probl Eng. https://doi.org/10.1155/2022/3783977
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):9
Wen Y, Zhang K, Li Z, et al (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision (ECCV), pp 499–515
Wilson G, Cook DJ (2020) A survey of unsupervised deep domain adaptation. ACM Trans Intell Syst Technol 11(5):1–46
Worm B, Barbier EB, Beaumont N et al (2006) Impacts of biodiversity loss on ocean ecosystem services. Science 314(5800):787–790
Wu MF, Sheu HT (1998) Representation of 3d surfaces by two-variable Fourier descriptors. IEEE Trans Patt Anal Mach Intell (PAMI) 20(8):858–863
Xiaoyan Q (2020) Research on imbalanced microscopic image classification of harmful algae. IEEE Access 8:125438–125446
Xu L, Xu L, Chen Y, et al (2022) Accurate classification of algae using deep convolutional neural network with a small database. ACS ES &T Water
Yan J, Li X, Cui Z (2017) A more efficient cnn architecture for plankton classification. In: Chinese conference on computer vision (CCCV), Springer, pp 198–208
Yang M, Wang W, Gao Q et al (2023) Automatic identification of harmful algae based on multiple convolutional neural networks and transfer learning. Environ Sci Pollut Res 30(6):15311–15324
Yang Z, Li J, Chen T et al (2022) Contrastive learning-based image retrieval for automatic recognition of in situ marine plankton images. ICES J Mar Sci 79(10):2643–2655
Ye L, Chang CY, Hsieh Ch (2011) Bayesian model for semi-automated zooplankton classification with predictive confidence and rapid category aggregation. Mar Ecol Prog Ser 441:185–196
Ye M, Shen J, Lin G et al (2021) Deep learning for person re-identification: a survey and outlook. IEEE Trans Patt Anal Mach Intell (PAMI) 44(6):2872–2893
Yu K, Sun W (2023) Annular characteristic spectrum extraction for species identification of marine coscinodiscus from micrographs. J Biotech Res 15:284–294
Yuan A, Wang B, Li J et al (2023) A low-cost edge AI-chip-based system for real-time algae species classification and hab prediction. Water Res. https://doi.org/10.1155/2022/3783977
Zetsche EM, El Mallahi A, Dubois F et al (2014) Imaging-in-flow: Digital holographic microscopy as a novel tool to detect and classify nanoplanktonic organisms. Limnol Oceanogr Methods 12(11):757–775
Zhang J, Li C, Yin Y, et al (2022) Applications of artificial neural networks in microorganism image analysis: a comprehensive review from conventional multilayer perceptron to popular convolutional neural network and potential visual transformer. Artificial Intelligence Review, pp 1–58
Zhang Y, Lu Y, Wang H et al (2021) Automatic classification of marine plankton with digital holography using convolutional neural network. Optics Laser Technol 139:106979
Zhao F, Tang X, Lin F, et al (2005) Binary plankton image classification using random subspace. In: International conference on image processing (ICIP), IEEE, pp 1–357
Zhao F, Lin F, Seah HS (2009) Bagging based plankton image classification. In: IEEE International conference on image processing (ICIP), IEEE, pp 2081–2084
Zhao F, Lin F, Seah HS (2010) Binary sipper plankton image classification using random subspace. Neurocomputing 73:1853–1860
Zheng A, Wang M (2015) Convolutional neural networksbased plankton image classification system. Project Report, Course CSE258 Web Mining and Recommender Systems, University of California San Diego, USA, http://jmcauley.ucsd.edu/cse258/projects/fa15/005.pdf
Zheng H, Wang R, Yu Z et al (2017) Automatic plankton image classification combining multiple view features via multiple kernel learning. BMC Bioinform 18(16):570
Zhou K, Liu Z, Qiao Y et al (2022) Domain generalization: a survey. IEEE Trans Patt Anal Mach Intell 45(4):4396–4415
Zhou X, Rowe M, Liu Q et al (2023) Comparison of Eulerian and Lagrangian transport models for harmful algal bloom forecasts in lake erie. Environ Modell Softw 162:105641
Zhu JY, Park T, Isola P, et al (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International conference on computer vision (ICCV), pp 2223–2232
Zimmerman TG, Pastore VP, Biswas SK, et al (2020) Embedded system to detect, track and classify plankton using a lensless video microscope. arXiv preprint arXiv:2005.13064
Zingone A, Harrison PJ, Kraberg A et al (2015) Increasing the quality, comparability and accessibility of phytoplankton species composition time-series data. Estuar Coast Shelf Sci 162:151–160
Zohdi E, Abbaspour M (2019) Harmful algal blooms (red tide): a review of causes, impacts and approaches to monitoring and prediction. Int J Environ Sci Technol 16:1789–1806
Zoph B, Vasudevan V, Shlens J, et al (2018) Learning transferable architectures for scalable image recognition. In: Conference on computer vision and pattern recognition (CVPR), pp 8697–8710
Funding
The research was carried out in the FASTVISION and FASTVISION-plus projects funded by the Academy of Finland (Decision numbers 321980, 321991, 339612, and 339355). Lumi Haraguchi was supported by OBAMA-NEXT (grant agreement no. 101081642), funded by the European Union under the Horizon Europe program.
Author information
Authors and Affiliations
Contributions
TE: Literature search, Writing - prepare original draft, review & editing, Visualization, Supervision. DB: Literature search, Writing—prepare original draft, review & editing, Visualization. NVB: Writing—prepare original draft, Visualization. KK: Writing—review & editing. LH: Writing—review & editing. LL: Writing—review & editing, Supervision. SS: Writing—review & editing, Project administration. JS: Writing - review & editing. TT: Writing—review & editing. HK: Writing—review & editing, Supervision, Project administration.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Consent for publication
All authors consent that the publisher has the author’s permission to publish research findings. All authors guarantee that the research findings have not been previously published.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Eerola, T., Batrakhanov, D., Barazandeh, N.V. et al. Survey of automatic plankton image recognition: challenges, existing solutions and future perspectives. Artif Intell Rev 57, 114 (2024). https://doi.org/10.1007/s10462-024-10745-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-024-10745-y