[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Seeing Beyond Noise: Improving Cryptocurrency Forecasting with Linear Bias Correction
Previous Article in Journal
Trustworthy AI: Securing Sensitive Data in Large Language Models
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Overview and Comparison of Deep Neural Networks for Wildlife Recognition Using Infrared Images

Department of Multimedia and Information-Communication Technologies, University of Zilina, 010 26 Zilina, Slovakia
*
Author to whom correspondence should be addressed.
AI 2024, 5(4), 2801-2828; https://doi.org/10.3390/ai5040135
Submission received: 14 October 2024 / Revised: 18 November 2024 / Accepted: 4 December 2024 / Published: 6 December 2024

Abstract

:
There are multiple uses for single-channel images, such as infrared imagery, depth maps, and others. To automatically classify objects in such images, an algorithm suited for single-channel image processing is required. This study explores the application of deep learning techniques for the recognition of wild animals using infrared images. Traditional methods of wildlife monitoring often rely on visible light imaging, which can be hindered by various environmental factors such as darkness, fog, and dense foliage. In contrast, infrared imaging captures the thermal signatures of animals, providing a robust alternative for wildlife detection and identification. We test a Convolutional Neural Network (CNN) model specifically designed to analyze infrared images, leveraging the unique thermal patterns emitted by different animal species. The model is trained and tested on a diverse dataset of infrared images, demonstrating high accuracy in distinguishing between multiple species. In this paper, we also present a comparison of several well-known artificial neural networks on this data. To ensure accurate testing, we introduce a new dataset containing infrared photos of Slovak wildlife, specifically including classes such as bear, deer, boar, and fox. To complement this dataset, the Fashion MNIST dataset was also used. Our results indicate that deep learning approaches significantly enhance the capability of infrared imaging for wildlife monitoring, offering a reliable and efficient tool for conservation efforts and ecological studies.

1. Introduction

Many types of valuable data are stored as simple matrices, which can be viewed as single-channel images. Infrared (IR) images are an example of this format. IR photos are particularly useful for capturing images of wild animals, especially those active at night. For this purpose, multiple camera traps were set up in the forests of Slovakia, aiming to capture images of larger animals that pose significant risks to road traffic. Collisions with large animals can lead to costly damage to vehicles and infrastructure and can even result in the death of both animals and humans. To help prevent these collisions, several driver-assistance systems have been developed that rely on accurate nighttime recognition of animals. During the day, these animals tend to avoid noisy roads, but they may cross them at night, creating hazardous situations for drivers. Therefore, IR cameras are essential tools for enhancing road safety by detecting and identifying animals on or near roadways, especially during nocturnal hours [1,2,3].
Artificial Neural Networks (ANN) are widely used algorithms from machine learning scope. Several problem-solving processes can be simulated by them, such as classification, regression, data reconstruction and others. Classification problem is accurate for the animal recognition in the image. Three channel images of Red-Green-Blue colors (RGB) are mostly used. However, optimalization for RGB can be not effective on single channel pictures like IR. Continually, Neural Network (NN) consists of their architecture and model. Architecture represents the compositions of layers and how will data flow from start to end. Model is the collection of parameters and neural weights that represents solution for specific problem. So, even with architecture suited for IR image classification, the correct model is still needed. There are several successful NN like GoogleNet, ResNet, MobileNet etc. Each consist of unique architecture and pre-trained model for image classification. From the nature of NN it can be assumed that lower layers (closer to input) represents low feature detectors. For convolutional layer it is feature like edges, shapes, color patterns, gradient etc. Therefore, weights of these lower layers will be used for IR images too. Higher layers represent more abstract concepts, therefore they will be trained to solve task of animal recognition. To sum up, used NN are compositions of pre-trained well known networks and few trained classification layers [4,5,6].
To identify the best NN candidate for infrared (IR) image classification, several experiments are required. In each experiment, a new model will be trained based on a selected NN architecture. After training, performance testing will be conducted, generating a confusion matrix and various precision metrics. Two datasets will be used: an IR animal dataset and the Fashion MNIST dataset as a control. This process will be repeated for each architecture until all have been evaluated. Wildlife monitoring is a crucial task in conservation biology and wildlife management. Traditional wildlife monitoring methods, such as radio tracking and visual surveys, can be time-consuming, costly, and intrusive to the animals. Infrared imaging offers a non-invasive, cost-effective alternative for detecting and identifying wildlife in their natural habitats, particularly in low-light conditions. Infrared cameras can capture the thermal signatures of animals, helping to distinguish them from their surroundings. However, manually analyzing infrared images to identify animals is a challenging and labor-intensive task, especially when working with large datasets. The application of automated deep learning methods can address these challenges, making wildlife monitoring more efficient and scalable [5,6].
Convolutional Neural Networks (CNNs) have demonstrated exceptional potential in recognizing animals from infrared images. These deep learning models utilize convolutional layers to extract meaningful features from images, followed by fully connected layers to classify those features into distinct categories. Applying CNNs to the task of wildlife recognition using infrared images offers the opportunity to greatly enhance the efficiency and accuracy of wildlife monitoring efforts [7,8].
Recent advancements in CNN architectures, data augmentation strategies, and transfer learning techniques have significantly enhanced the performance of wildlife recognition systems using infrared images (see Figure 1). This review aims to present a comprehensive overview of the current advancements in the field, highlighting the latest research, methodologies, and potential future directions. We will discuss the challenges and opportunities in wild animal recognition from infrared images and how CNNs can be used to address these challenges. We hope that this review will inspire further research in this exciting and rapidly evolving field, which has the potential to revolutionize wildlife monitoring and conservation efforts [7,8,9,10].
This study offers a detailed analysis and performance evaluation of various deep neural network architectures for wildlife recognition using infrared imagery. The primary contributions of this work include:
  • Evaluation of Deep Learning Models: We evaluate the performance of several cutting-edge deep neural networks, including CNN, ResNet, and hybrid architectures, on wildlife recognition tasks with infrared images, offering insights into their accuracy, computational efficiency, and suitability for real-time applications.
  • Infrared Image Analysis for Wildlife Monitoring: By focusing on the unique challenges posed by infrared imagery—such as limited color contrast and distinct noise characteristics—we provide targeted recommendations for preprocessing steps that enhance model performance specifically in this context.
  • Guidance for Practical Implementation: The study delivers practical insights into model selection and optimization tailored for real-world applications in wildlife monitoring. This includes recommendations on balancing accuracy with resource constraints for deployment in field conditions where computational resources may be limited.
  • Contribution to Conservation Efforts: The results support improved automated monitoring systems, which can assist conservationists in tracking and managing wildlife populations more effectively, thereby contributing to broader ecological and conservation efforts.
Together, these contributions provide a valuable foundation for advancing automated wildlife recognition in challenging environments, encouraging further exploration and innovation in this area.
The first chapter following this introduction presents the current state of the art. Next, the theory of neural networks is discussed, including descriptions of different types of layers. These are divided into two categories: core layers and utility layers. This chapter also describes the neural network architectures used. The next chapter focuses on experiments. It begins with an overview of the overall experiment, followed by a description of the datasets used. Later, the results of several neural networks are presented. The thesis ends with a final chapter summarizing the results.

2. State of the Art

Wild animal recognition from infrared images is an important and challenging task in wildlife monitoring and conservation. Infrared cameras have emerged as a valuable tool for capturing wildlife activity in their natural habitats, especially during low-light conditions. However, manually analyzing infrared images to identify and track animals is both time-consuming and prone to errors, particularly when handling large datasets. Therefore, there is a need for automated and accurate methods for wild animal recognition from infrared images [10,11,12].
Recent advancements in computer vision and machine learning have resulted in the development of advanced algorithms for recognizing animals in infrared images. This state-of-the-art review explores the latest research, methodologies, and challenges in wildlife recognition using infrared imagery [13,14].
One of the major challenges in wild animal recognition from infrared images is the lack of labeled data. Training machine learning models, such as deep neural networks, requires a large amount of labeled data, which is often difficult and expensive to obtain in wildlife monitoring scenarios. To address this challenge, researchers have proposed various transfer learning techniques, where pre-trained models on large-scale datasets, such as ImageNet, are fine-tuned on smaller labeled datasets of infrared images [13,14,15,16].
Another challenge in wild animal recognition from infrared images is dealing with the variability in animal pose, lighting conditions, and background clutter. To overcome these challenges, researchers have introduced various feature extraction techniques, such as Scale-Invariant Feature Transform (SIFT) and Histogram of Oriented Gradients (HOG), which are capable of extracting robust and discriminative features from infrared images. Additionally, advanced deep learning architectures, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), have been proposed to automatically learn high-level features from infrared images, achieving state-of-the-art performance in wildlife recognition tasks [14,15,16].
Additionally, researchers have explored unsupervised learning methods, such as clustering and dimensionality reduction techniques, to uncover hidden patterns and relationships within infrared images. For example, autoencoders, a type of neural network that learns to reconstruct input data from a compressed representation, have been proposed to learn feature representations of infrared images for wildlife recognition [17,18,19].
Wildlife recognition using Deep Neural Networks (DNNs) has rapidly evolved, providing innovative solutions for environmental monitoring, biodiversity research, and conservation efforts. Traditional methods of wildlife observation, which rely on human observation or basic automated techniques, are often limited by factors like lighting conditions and accessibility. The application of DNNs with infrared (IR) imaging has significantly advanced wildlife recognition, as infrared images enable robust detection even in low-light or nighttime conditions. This article presents an overview of the latest advancements in using DNNs for wildlife recognition with infrared images, compares leading approaches, and highlights how our methodology fits within or diverges from current research trends. Recent advancements in DNN architectures, specifically in convolutional neural networks (CNNs) and transformer models, have revolutionized image-based wildlife recognition. Key research developments in wildlife recognition focus on:
  • Improved Detection Accuracy: Advanced models such as EfficientNet and ResNet have shown higher accuracy and efficiency, particularly when fine-tuned with specialized datasets. These architectures are widely favored due to their depth and ability to capture complex patterns in large datasets while maintaining efficiency.
  • Transformer Models and Vision Transformers (ViTs): The introduction of vision transformers has enabled models to capture long-range dependencies and fine-grained details within images. ViTs have demonstrated considerable success in wildlife recognition, especially when combined with IR data, as they can adapt to the distinctive textures and features in infrared images.
  • Hybrid Architectures: Hybrid architectures that combine CNNs with transformers are emerging as high-performing solutions, as they harness the spatial awareness of CNNs and the contextual attention mechanisms of transformers. Research indicates that hybrid models may better capture subtle features that distinguish similar species in infrared images.
Infrared imaging is particularly useful for wildlife detection as it allows for non-invasive monitoring during nighttime, when many animals are active. The unique features of infrared data, such as differences in thermal signatures, enable better differentiation between animals and background environments. However, IR images lack the color information available in RGB images, posing challenges in accurately distinguishing similar animals. Research has shown that DNNs trained specifically on IR images with appropriate data augmentation and fine-tuning strategies can overcome some of these limitations.
The our current study explored the performance of traditional Convolutional Neural Networks (CNNs) for wildlife recognition using infrared images, highlighting the strengths and limitations of well-established architectures like VGG, ResNet, Xception, MobilNet and DenseNet. As the field of deep learning continues to evolve, it is crucial to explore more recent and advanced models to push the boundaries of infrared image classification. Two such models, HCGNet and ConvNeXt, have gained attention for their ability to achieve high accuracy with lower computational costs. These models offer promising avenues for future research in the area of wildlife recognition using infrared imagery. HCGNet, a lightweight CNN architecture, is specifically designed to reduce computational complexity while maintaining or even enhancing accuracy. In the context of wildlife recognition using infrared images, HCGNet could be particularly useful for real-time applications where inference speed is a key requirement, such as automated wildlife monitoring in remote areas with limited computational resources. ConvNeXt is another recent architecture that is known for its performance in large-scale image classification tasks while maintaining efficiency. With its deep architecture and innovative design, ConNeXt is well-suited for complex image recognition tasks, such as distinguishing between species with similar thermal signatures in infrared images.

3. Materials and Methods

In this section, we first introduce the Neural Networks (NNs) used in our study. While describing these networks by their architectural design is helpful, many networks contain repeating blocks of layers. To clarify these structures, we present simplified diagrams. It’s important to note that not all layers and blocks depicted in these diagrams are composed of artificial neurons. Layers that add functionality without containing trainable neurons will be referred to as utility layers, whereas those containing trainable neurons will be identified as core layers.

3.1. Core Layers

Core layers in neural networks are the fundamental building blocks responsible for essential operations in learning and prediction. These layers enable the network to process input data, extract meaningful features, and generate accurate outputs, forming the backbone of the neural network’s functionality. By combining various types of core layers, neural networks can be customized to tackle a wide range of tasks, from image classification to sequence prediction, allowing them to learn effectively and generalize from data.

3.1.1. Convolution Layer and Deconvolution Layer

Convolution layers (CL) are effective approach to image processing by NN (Figure 2). To process even small image, significant number of neurons are needed if Dense layer (DL) is used. Lets have Gray-scale image with resolution of 1920 by 1080, DL would need more than 2 million neurons to pass data from each pixel of the image. All pixels needs to be passed as each of them can contain useful information. Moreover, several DL are needed to extract basic image features (like shape, texture, gradient etc.) as one pixel does not contain information about its surroundings. On the other hand, CL utilise moving window to extract cutout from image. This part is than processed by convolution to find common features in the data. For windows of size 3 by 3 pixels only 9 neurons are needed, but such CL would describe only one feature (also called as filter). In practice, CL use more than one filter. It can be reasoned that each filter represents one feature found in the input data. If CL is used as first layer, these filters will represent the most basic of image features (like lines). Consequent layers will gradually represent more and more abstract concepts (line—shape—nose—face). General NN for image recognition can be created by CL at the beginning and DL at the end of architecture. Such architecture is also called Convolutional Neural Network, or CNN in short [20,21,22].
Two-dimensional data between layers are called maps. In general, input map has size W × H and convolution window in the figure is 3 × 3 . There are more than one filters, 3 for the figure. Therefore, the output map will have its own size M × N (depending on stride of the window) and the same number of “channels” as the number of filters.
As its name suggests, this layer functions in opposition to the CL. Deconvolution Layer (DCL) translate more abstract form of data to more mathematical form. They are mostly used in Auto-encoder type of NN, where image is on the input and output of the NN. Auto-encoders have in general mirrored structure where data get more and more abstract to represent defined problem and than they are reconstructed into resulted image.

3.1.2. Pooling Layer

Pooling layers (PL) down sample data as is shown in Figure 3. If CL has stride of its window set to 1, the data resolution will be reduced by 1 pixel in each side. To reduce data, bigger stride can be used but more information between steps is lost. Reduction can be done by averaging of neighbour pixels too. There are multiple PL based on the method (like: max, min, median, average value) and by size of kernel. Example of MaxPooling layer with filter of size 2 × 2 and stride 2 × 2 can be seen in Figure 3. There are also un-pooling layers. Layer with name Maxpooling2D will down-sample data by maximal value on 2D data [23,24,25].

3.1.3. Recurrent Layer

Recurrent layers (Figure 4) are a crucial element of recurrent neural networks (RNNs), a type of artificial NN designed to identify patterns in sequential data, such as speech, text and time series. Unlike traditional feedforward NN, recurrent layers feature connections that loop back on themselves, enabling them to retain a state that captures information from previous inputs. This capacity to preserve context over time makes RNNs especially effective for tasks where understanding temporal dynamics and context is essential [26,27].
Recurrent layers process sequences of data by maintaining an internal state that evolves over time. This state acts as a memory, capturing information about previous elements in the sequence and using it to influence the processing of current and future elements. In a recurrent layer, each neuron is connected not only to the neurons in the next layer (as in feedforward networks) but also to itself and possibly other neurons in the same layer from the previous time step. These looped connections enable the network to propagate information forward in time, allowing it to “remember” previous inputs. In Recurrent Layers (later as RL) neurons take as input not only data from previous layers but also its own output from previous pass as is shown in Figure 4). This functions as memory and RL can therefore process relations in time. Most usage can be found in video processing [28,29].

3.1.4. Batch Normalisation

The main idea behind Batch normalisation (BN) is to normalize the inputs by subtracting the mean and dividing by the standard deviation, ensuring that the inputs have zero mean and unit variance (Figure 5). This normalization is performed on mini-batches of training examples rather than individual examples [30,31].
At first, this layer selects small batch of images upon which the mean and standard deviation are computed across the batch dimension for each feature (color channel) independently. Each weight in this “mini-batch” is subtracted by the mean and divided by the standard deviation. After standardization, the normalized weights are further scaled by a learnable parameter called gamma and shifted by another learnable parameter called beta. These parameters allow the network to adaptively rescale and shift the weights to better suit the following layers [30,31].

3.1.5. Fully Connected Layer

Also called Dense layer (DL), this is the most basic type of NN layer. Mostly it has only two parameters: number of neurons and used activation function. Number of neurons is self-explaining parameter as it defines number of trainable neurons in this layer. Neural networks that are composed from only this layers are called Multi Layer Perceptron (MLP). In architectures for image recognition, this layer is used mostly on the “back” side of the architecture. As data pass thru the network its meaning is moving from raw data to more abstract meanings. Therefore, the last layer outputs (in theory) solution for the problem. In image recognition, this is mostly classification of image. Each neuron represents percentage of confidence that input data belongs to specific class [32,33,34].
Simple three-layer MLP can be seen in Figure 4. First layer marked as L1 is input layer, next is one hidden layer (L2) and at the end output layer (L3). Each one has Y number of neurons and output of each neuron (from input and hidden layer) is connected to each neuron from next layer [33,34].

3.2. Utility Layers

Utility layers, often referred to as auxiliary or helper layers, are components in neural networks that perform specific functions to aid in the training and performance of the network. These layers are not directly responsible for learning representations of the data but provide essential operations that facilitate the overall learning process. Utility layers are crucial for various tasks, such as normalization, dropout, and activation functions, which help improve the stability, efficiency, and generalization of neural networks.
These layers play a vital role in modern neural network architectures, providing necessary operations that support the learning process, improve performance, and enhance the robustness of models. Their proper use is crucial for building effective and efficient deep learning models. They add useful functionality to the overall structure and function of the resulted NN.

Flatten Layer and Dropout

When using CL data on the output of layer have the same dimension as on input. For basic CNN that is 2D matrix. Dense layers work with single row of values which is stored as 1D vector. To transfer architecture from CL to DL, transform from 2D to 1D is needed. Flatten layer (later as FL) provides this functionality (Figure 6). In general it reduces dimension of data by one [35].
Dropout layer (OL) sets random number of neuron outputs to zero, effectively disabling the neuron as is shown in Figure 7. This functionality simulates biological neurons ability to “switch” (be either ON or OFF). Dropout also provides mechanism to prevent “Over-training” of NN. This is state of NN when trained model responds only on trained data and is too rigid to correctly process other data [35].

3.3. Used Architectures

Several neural network architectures have been developed to address specific tasks and challenges. For instance, Convolutional Neural Networks (CNNs) have achieved remarkable success in image classification tasks. To evaluate presented dataset, several of the widely known architectures were used. They are presented in chronological order as they were introduced to scientific community. There can be seen a process of adding more and more complexity to achieve better results.

3.3.1. Visual Geometry Group

The Visual Geometry Group (VGG) is a CNN architecture introduced in 2014 by a team of researchers from the University of Oxford, named after the group that developed it. VGG is renowned for its simplicity and consists of a series of convolutional layers followed by fully connected layers (Figure 8). The architecture includes either 16 or 19 layers, depending on the variant, with each convolutional layer using filters of the same size. In 2014, VGG achieved state-of-the-art performance on the ImageNet classification task and has since become widely used in computer vision applications, especially for transfer learning.
Architecture starts with 13 or 16 CL followed by 3 DL. After groups of CL there is PL and ReLU is used as activation function. As the input, network excepts RGB image of 224 × 224 resolution. Each CL uses kernel of size 3 × 3 [36,37].

3.3.2. Residual Neural Network

Residual Neural Network (ResNet) is a NN architecture introduced in 2015 by researchers at Microsoft Research Asia. ResNet was specifically designed to overcome the problem of vanishing gradients, a common challenge in very deep neural networks. The key innovation of ResNet is the use of residual blocks, which allow the network to learn residual functions that approximate the identity mapping. This design enables more effective gradient propagation through deep networks, leading to better performance across various computer vision tasks. ResNet won the ImageNet classification competition in 2015 and has since become a foundational architecture widely applied in numerous domains and applications [38,39].
The main perk in this architecture are repeating blocks (see Figure 9). There are two types of blocks depending on output size relative to the input side. If output size of data is different than input, it is “Residual block” (CONV BLOCK in Figure 9). If they are the same, it is “Identity block” (IDEN BLOCK in Figure 9). In each type of blocks, data passes thru two paths with one being called “Shortcut”. At the exit from block, data from shortcut are added to data passed thru several CL [38,39,40].

3.3.3. Xception

Xception is a convolutional neural network (CNN) architecture that was introduced in 2016 by François Chollet, the creator of the popular deep learning library Keras. Xception stands for “Extreme Inception”, which refers to its similarity with the Inception architecture while using an extreme form of depthwise separable convolutions [41,42].
The Xception architecture aims to improve the efficiency of deep neural networks by using depthwise separable convolutions (shown as “SEPAR CL” in Figure 10), which separates the spatial filtering and the channel-wise filtering into two separate convolutional layers. This approach significantly reduces the number of parameters in the network and the computational complexity required to train the model [42].

3.3.4. MobileNet

MobileNet is a series of CNN architectures designed to be lightweight and efficient, optimized specifically for mobile devices and embedded systems with constrained computational resources. It was introduced by researchers from Google in 2017 and has since become a popular choice for various computer vision tasks on mobile devices. The MobileNet architecture employs depthwise separable convolutions (depicted as “SEPAR CL” in Figure 11), which decompose the standard convolution operation into two separate layers: a depthwise convolution and a pointwise convolution. This method significantly reduces the number of parameters and computational complexity of the network while maintaining high accuracy. MobileNet has been successfully utilized in a wide range of applications, including object detection, image classification, and facial recognition, particularly on mobile devices. MobileNet models are highly efficient, both in terms of computational cost and memory usage. This efficiency does not come at the expense of significant performance loss, as MobileNet maintains competitive accuracy rates compared to larger, more computationally intensive models [43,44].
In summary, MobileNet stands out as a highly efficient and effective neural network architecture, optimized for environments where computational resources are limited. Its innovative use of depthwise separable convolutions and tunable hyperparameters (width and resolution multipliers) allows for flexible deployment across a range of applications, particularly in mobile and embedded systems [43,44,45].

3.3.5. DenseNet

The Dense Convolutional Network (DenseNet) is DNN architecture developed to tackle the vanishing gradient problem that can arise in very deep networks. The core innovation of DenseNet is its dense connectivity pattern, where each layer is directly connected to every other layer in a feedforward manner, as illustrated by the colored arrow-lines in Figure 12. This dense connection improves feature propagation and gradient flow, enhancing the network’s efficiency and performance. This connectivity pattern enables efficient information flow between layers, allowing for better gradient propagation and feature reuse. DenseNet models have achieved state-of-the-art results on a variety of computer vision tasks, such as image classification, object detection, and semantic segmentation. Additionally, they have relatively few parameters compared to other deep neural networks, making them computationally efficient and allowing them to be trained on smaller datasets [46,47,48].

4. Experimental Results

This section presents the experimental results of using various neural network architectures for the recognition of wild animals in infrared images. The model’s performance is evaluated through a confusion matrix, offering detailed insights into its classification accuracy across different animal categories. All experiments were conducted on a computer running Windows 10, using the Keras and TensorFlow frameworks. Two datasets consisting of single-channel images were used in the analysis.

4.1. IR Animal Dataset

First dataset is that of IR Slovak wild animals. The animals were selected based on their potential damage in the collision with road vehicle. The infrared (IR) animal dataset (Figure 13) comprises a collection of images specifically captured using infrared technology. These images are intended to depict various species of wild animals in natural environments. The dataset is categorized into four different classes (Figure 13):
  • Bear
  • Boar
  • Deer
  • Fox
The images were resized to 224 by 224 pixels, with each class containing 200 images, resulting in a total of 800 images in the dataset. For the experiments, 175 images from each class were used for training, and 25 images were set aside for testing. Each image in the dataset captures the infrared signature of an animal, representing its thermal emissions or heat patterns rather than visible light. This unique thermal data provides valuable insights into the animals’ characteristics and behaviors in their natural environments. The dataset is a useful resource for advancing research in wildlife monitoring, conservation efforts, and the application of machine learning algorithms for animal recognition and classification using infrared imagery.
Researchers and practitioners can leverage this infrared animal dataset to train and evaluate machine learning models, improving their ability to automatically identify and classify various wildlife species based on their thermal profiles captured through infrared technology. This dataset plays a key role in advancing scientific knowledge and supporting technological applications in wildlife management and ecological research.

4.2. Fashion MNIST

To compare proposed dataset, Fashion MNIST dataset (Figure 14) was used. This dataset contains 10 classes of grayscale images of fashion objects. The images are sized at 28 by 28 pixels, with a total of 60,000 images for training and 10,000 images for testing. Each class contains 6000 images for training and 1000 images for testing. Example from dataset can be seen in Figure 14.
Each image measures 28 pixels in height and 28 pixels in width, resulting in a total of 784 pixels per image. Each pixel is represented by a single pixel-value, denoting its brightness level, where higher values indicate darker shades. These pixel-values range from 0 to 255 as integers.

4.3. Evaluation Criteria

Experimental results are presented as confusion matrix, Precision (P), Recall (R) and F1 score of best configuration on presented architectures and datasets. The confusion matrix presents results as accumulated occurrences of predicted vs true labels. Rows of the matrix represents true labels (here called Target) and columns the predicted labels. If tested image has label 1 but was predicted as label 3, value on position 3, 1 (3. column, 1. row) is incremented by 1 (one tested image). Data in the matrix can be one of 4 types (Figure 15):
  • True Positive (TP)
  • False Positive (FP)
  • True Negative (TN)
  • False Negative (FN)
Positive data refers to all data predicted as belonging to the target class. Data that truly belong to the target class are considered True Positives (TP), while others are False Positives (FP). Negative data refers to those predicted as not belonging to the target class. True Negatives (TN) are data that do not belong to the target class and are correctly predicted as such, while False Negatives (FN) are data from the target class that are incorrectly predicted as not belonging to it. Various ratios can be calculated from these values to provide insights into the performance of the model, reflecting different aspects of class predictions and overall testing accuracy.
First ratio is called Precision P and is calculated as ration between True positive and sum of positive (1). Precision indicates how relevant are positive data:
P = T P / ( T P + F P ) .
Next ratio is Recall (2). It is ratio between True Positive and sum of True Positive and False Negative. Recall indicates how good is the class separated from others:
R = T P / ( T P + F N ) .
Combining Precision and Recall into one value can be done by multiple average calculations. If harmonic average is used, it is called F1 score. Its formula is
F 1 = 2 ( ( P R ) / ( P + R ) ) .

4.4. Results

Various hyperparameters were utilized in this study, including the number of epochs, batch size, and different architectural variations, which are detailed in the respective sections for each architecture. Examples of these hyperparameters include learning rate, batch size, number of layers, filter sizes, pooling methods, dropout rates, and activation functions. The selection of these hyperparameters is based on factors such as the dataset’s characteristics, the complexity of the task, and the computational resources available.
Optimizing hyperparameters (Table 1) is essential for maximizing model performance. For example, tweaking the learning rate can speed up convergence and avoid the model from settling into suboptimal solutions. Likewise, altering the number of layers and filter sizes in a CNN can enhance the model’s capacity to detect intricate visual features and patterns. The behavior and efficacy of machine learning models are heavily influenced by a range of hyperparameters, each contributing to the overall model’s ability to perform well:
  • Learning Rate: The learning rate controls how much the model’s parameters are adjusted after each training step. It directly impacts how quickly the model converges and whether it stabilizes in an optimal state. Selecting the right learning rate is vital to prevent underfitting or overfitting.
  • Batch Size: Batch size determines how many training examples are processed together in each iteration. It affects the speed of training, memory consumption, and the model’s ability to generalize. The ideal batch size depends on both the hardware constraints and the nature of the dataset.
  • Kernel Size: In CNNs, kernel size defines the dimensions of the filter used to extract features from the input image. This parameter influences how much detail and spatial information is captured. Striking a balance between local and global feature extraction requires careful kernel size tuning.
  • Activation Functions: Activation functions introduce non-linearity into the network, enabling it to model complex relationships. Common functions like ReLU, sigmoid, and tanh determine how neurons respond to input signals, influencing both the network’s capacity to learn intricate patterns and its training behavior.
  • Dropout Rate: Dropout is a regularization technique that randomly deactivates a subset of neurons during training. This helps prevent overfitting by encouraging the network to build more resilient features and reducing its reliance on specific units. The dropout rate controls how many neurons are dropped.
  • Optimization Algorithm: The choice of optimization algorithm impacts the model’s ability to converge quickly and stably. Popular algorithms such as Stochastic Gradient Descent (SGD) and Adam influence the training process. Additional hyperparameters like momentum, weight decay, and learning rate decay are also crucial for fine-tuning the optimization process.
  • NN Architecture: The architecture of NN defines the layers’ structure, including the types (e.g., CL, PL, fully connected layer) and their connectivity. The architecture must be chosen based on the complexity of the task and the available computational resources to optimize both performance and efficiency.
A thorough understanding and careful tuning of these hyperparameters can significantly improve the accuracy and performance of computer vision applications.

4.4.1. Results for Comparison of VGG 16 and VGG 19

In this section of the experiments, we investigate the performance of two popular CNN architectures, VGG16 and VGG19, for wild animal recognition using infrared images. The experiments are conducted with a dataset containing infrared images of various wildlife species captured in their natural environments. This dataset is split into training, validation, and testing sets to train and evaluate the neural network models. Both VGG16 and VGG19 architectures are tested, with the key difference between them being the number of layers utilized.
The experimental results are presented through confusion matrices, offering a detailed overview of the classification performance of the VGG16 and VGG19 models across various animal classes (Table 2 and Table 3). Each row in the confusion matrix represents the true labels, while each column shows the predicted labels. The VGG16 model demonstrates strong precision in Classes 3 and 4 (Deer and Fox), highlighting its high accuracy in predicting these categories (Table 4). However, it shows lower recall in Class 2 (Boar), suggesting it may have missed identifying instances of this category compared to others. Overall, the F1 scores reflect a reasonably balanced performance across the categories, with Class 3 (Deer) standing out with the highest F1 score of 88.00%. Adjustments and further analysis may be necessary to improve performance in specific categories where recall is lower.
The analysis of the confusion matrices reveals that VGG16 consistently outperforms VGG19 in wild animal recognition using infrared images. Across all animal categories, VGG16 exhibits higher average classification accuracies, as evidenced by the fewer misclassifications depicted in its confusion matrix compared to VGG19.
The VGG 19 model shows varying levels of precision, recall, and F1 scores across different categories (Table 5). It performs best in Class 3 (Deer), where it achieves the highest precision (85.00%) and recall (79.00%), resulting in the highest F1 score (81.00%) among all categories. However, it exhibits lower performance in Class 1 and Class 4 (Bear and Fox), with precision and recall scores in the range of 43.00% to 55.00%. In conclusion, while the VGG 19 neural network architecture demonstrates respectable performance overall, there are noticeable differences in its ability to accurately classify different categories. Adjustments and further analysis may be necessary to improve performance, especially in categories with lower precision and recall scores. These results suggest that the architectural design of VGG16 may be better suited for the complexities inherent in the infrared imagery of wild animals. The superiority of VGG16 underscores its efficacy in automated animal recognition tasks and emphasizes the importance of selecting appropriate neural network architectures for specific image processing applications.
From the results (Table 6), it is evident that the VGG16 model demonstrates varying degrees of success across different classes. Classes such as Ankle Boot (Class 10) and Trouser (Class 2) show high precision and recall, indicating that the model is highly accurate in identifying these items and does not miss many instances. Classes like T-shirt/Top (Class 1) and Sneaker (Class 8) exhibit moderate performance, with decent precision and recall but room for improvement. The Shirt (Class 7) and Dress (Class 4) classes display lower precision, recall, and F1 scores, highlighting the challenges the model faces in correctly classifying these items. The VGG16 architecture shows robust performance in several categories but also indicates areas where further fine-tuning and data augmentation could enhance classification accuracy.
The VGG19 model (Table 7) showed excellent performance in classes such as Bag (Class 9) and Ankle Boot (Class 10), with high precision, recall, and F1 scores, indicating the model’s strong capability in correctly identifying these items. Classes like Trouser (Class 2) and Sandal (Class 6) demonstrated moderate performance, with precision and recall values suggesting the model can reliably identify these items but with some room for improvement. The model struggled with classes such as T-shirt/Top (Class 1) and Shirt (Class 7), showing lower precision, recall, and F1 scores. This indicates difficulties in accurately classifying these categories, possibly due to similarities with other classes or insufficient distinctive features.
Overall, the VGG19 architecture performed well across several classes but highlighted areas where further model adjustments and data augmentation could enhance classification accuracy. The mixed results across different classes suggest the need for targeted improvements in the neural network’s ability to distinguish between visually similar items.

4.4.2. Results for ResNet 50

The performance of ResNet is summarized using a confusion matrix (Table 8), illustrating the classification results across different animal categories. Each row of the confusion matrix corresponds to the true labels, while each column represents the predicted labels. The confusion matrix reveals that ResNet performs well in recognizing wild animals in infrared images, achieving high classification accuracy across most categories.
The ResNet50 demonstrates strong performance in wild animal recognition using infrared images, achieving high precision and recall rates across multiple categories (Table 9). The model’s ability to maintain high F1 scores indicates robustness in classification tasks. However, there are slight variations in performance across different categories, particularly in precision for Class 4 (Fox). Further optimization and fine-tuning of the model parameters could potentially enhance performance and address any inconsistencies observed.
These results highlight ResNet50’s effectiveness in complex visual recognition tasks such as wildlife monitoring, where accuracy and reliability are crucial for conservation efforts and ecological research.
The ResNet50 model (Table 10) showed strong performance in classes such as Trouser (Class 2), Sandal (Class 6), Bag (Class 9), and Ankle Boot (Class 10), with high precision, recall, and F1 scores. This indicates the model’s robust ability to correctly identify these items. Classes like Pullover (Class 3) and Sneaker (Class 8) demonstrated moderate performance, suggesting the model can identify these items with reasonable accuracy but with some potential for improvement. The model struggled with classes such as T-shirt/Top (Class 1) and Dress (Class 4). Although the precision for T-shirt/Top is high, the recall is notably low, indicating that while the model is precise when it makes a prediction, it misses many true instances of this class. The Dress class also has a high recall but low precision, showing it often correctly identifies dresses but also misclassifies other items as dresses.
Overall, ResNet50 performed well on several classes, achieving a good balance between precision and recall in many cases. However, the disparity in performance across different classes highlights areas where the model could benefit from further optimization and potentially more targeted training data to improve its ability to distinguish between similar items.

4.4.3. Results for Xception

This section presents the experimental results of using the Xception neural network architecture for recognizing wild animals in infrared images. The model’s performance is evaluated through a confusion matrix (Table 11), offering detailed insights into its classification accuracy across different animal categories. The experiments were carried out using a dataset of infrared images featuring various wild animal species. The dataset was split into training, validation, and testing sets to train and assess the Xception model.
The performance of Xception is summarized using a confusion matrix, which illustrates the classification results across different animal categories (Table 12). Each row in the confusion matrix represents the ground truth labels, while each column represents the predicted labels. The confusion matrix indicates that Xception achieves high performance in recognizing wild animals in infrared images, with high classification accuracy across most categories. The model’s high performance and detailed classification ability make it a suitable candidate for advanced wildlife monitoring and conservation tasks, where precise and reliable recognition is essential.
The Xception model demonstrates excellent precision across all categories, particularly in Classes 1 and 3 (Bear and Deer) where it achieves 100.00%. This indicates very accurate predictions for these classes. The model also shows strong recall across the board, with Class 1 (Bear) having the highest recall of 95.00%. The F1 scores are high overall, reflecting a balanced performance between precision and recall for most categories. In conclusion, the Xception neural network architecture shows robust performance in this classification task, achieving high accuracy, recall, and F1 scores across multiple categories.
For classes such as T-shirt/Top (Class 1), Pullover (Class 3), and Sneaker (Class 8), the model showed moderate performance. These results suggest that while the model can accurately identify these items, there is still room for improvement, especially in terms of precision and recall balance. The Xception model (Table 13) demonstrated high performance in the Trouser (Class 2) and Ankle Boot (Class 10) categories, achieving high precision, recall, and F1 scores. This indicates that the model is very effective at correctly identifying these items with few errors. On the other hand, the model struggled with classes like Sandal (Class 6) and Coat (Class 5). Although Sandal has a high precision, the recall is relatively low, indicating that the model misses many true instances of this class. Similarly, Coat shows moderate precision but lower recall, suggesting issues in identifying these items consistently.
Overall, the Xception model achieved commendable performance on several classes within the Fashion MNIST dataset, particularly excelling in the identification of trousers and ankle boots. However, the model’s performance varied across different categories, highlighting the need for further optimization to improve its accuracy and robustness across all classes.

4.4.4. Results for MobileNet

The experiments were conducted using a dataset containing infrared images of various wild animal species. The dataset was divided into training, validation, and testing sets to train and evaluate the MobileNet model. The performance of MobileNet is summarized using a confusion matrix (Table 14), which illustrates the classification results across different animal categories. Each row in the confusion matrix represents the ground truth labels, while each column represents the predicted labels.
The confusion matrix reveals that MobileNet achieves commendable performance in recognizing wild animals in infrared images, though there are some misclassifications. The model correctly identifies a high percentage of each animal category, with lions and tigers being the most accurately classified (Table 15). However, the model tends to confuse giraffes and zebras more frequently, likely due to their similar infrared signatures.
The MobileNet neural network architecture demonstrates varying performance across different classes in this evaluation (Table 15). It shows strong precision for Deer and moderate precision for Bear and Fox, but relatively lower precision for Boar. Recall rates are generally high, particularly for Boar and Fox, indicating the model’s ability to correctly identify these classes from the dataset. The F1 scores reflect a balance between precision and recall, with the highest score achieved for Deer, indicating robust performance in distinguishing this class. These results suggest that MobileNet is effective in classifying infrared images of bears, boars, deer, and foxes, with particular strengths in detecting deer and foxes based on the provided evaluation metrics.
The results indicate that while MobileNet is effective for wild animal recognition, there is room for improvement, particularly in distinguishing between visually similar species. Despite these challenges, MobileNet’s lightweight architecture and efficient performance make it a viable option for real-time wildlife monitoring applications, especially in environments where computational resources are limited. The study highlights the potential of MobileNet in contributing to wildlife conservation efforts through automated image analysis. MobileNet might not achieve the highest accuracy compared to deeper and more complex models like DenseNet or ResNet, it offers a compelling balance between accuracy and efficiency, making it suitable for real-time applications.
MobileNet (Table 16) demonstrated exceptional performance in the Trouser (Class 2) and Ankle Boot (Class 10) categories, achieving very high precision and recall values. This indicates the model’s high effectiveness in accurately identifying these items. For classes such as Sandal (Class 6), Shirt (Class 7), and Bag (Class 9), the model showed moderate performance. These results suggest that while the model can reasonably identify these items, there is still variability in precision and recall that could be improved. The model struggled with classes like T-shirt/Top (Class 1) and Pullover (Class 3). Despite the high precision for T-shirt/Top, the recall is very low, indicating the model misses many true instances of this class. Similarly, Coat (Class 5) shows high recall but low precision, suggesting issues with false positives in identifying this item.
Overall, MobileNet achieved commendable performance on several classes within the Fashion MNIST dataset, particularly excelling in the identification of trousers and ankle boots. However, the model’s performance varied significantly across different categories, highlighting the need for further optimization to improve its accuracy and robustness across all classes.
The confusion matrix (Table 17) indicates that DenseNet performs well in recognizing wild animals in infrared images, with high classification accuracy across most categories. The model shows particularly strong performance in identifying lions, elephants, and zebras. However, there are some misclassifications, notably between giraffes and zebras, which might be attributed to the similar infrared patterns exhibited by these species.

4.4.5. Results for DenseNet

The performance of DenseNet is summarized using a confusion matrix, which illustrates the classification results across different animal categories. Each row in the confusion matrix represents the ground truth labels, while each column represents the predicted labels.
The DenseNet neural network architecture demonstrates strong precision in Class 1, indicating accurate predictions for this class (Table 18). However, it shows lower precision in Class 4 (Fox), suggesting some misclassifications. The model performs well in terms of recall across all categories, with particularly high recall in Class 4 (Fox). The F1 scores reflect a balanced performance overall, with Class 1 (Bear) achieving the highest F1 score of 95.00%. The MobileNet architecture shows robust performance in this classification task, with notable accuracy, recall, and F1 scores across multiple categories. Adjustments and further analysis may be necessary to improve precision, especially in categories where it is lower.
DenseNet showed strong performance in the Trouser (Class 2) and Sandal (Class 6) categories, achieving high precision, recall, and F1 scores (Table 19). This indicates the model’s reliability in correctly identifying these items with minimal false positives and false negatives. For classes like T-shirt/Top (Class 1), Dress (Class 4), and Ankle Boot (Class 10), DenseNet displayed good but not exceptional performance. These results suggest a solid capability to correctly classify these items, though there is room for improvement. he model had lower performance in classes such as Pullover (Class 3), Shirt (Class 7), and Coat (Class 5). These lower scores indicate a higher rate of misclassification for these items, suggesting challenges in distinguishing these categories from others.
Overall, DenseNet demonstrated robust performance on several classes within the Fashion MNIST dataset, particularly excelling in the identification of trousers and sandals. However, the performance varied across different categories, highlighting areas for potential enhancement to improve the model’s accuracy and consistency across all classes.
In other words, DenseNet demonstrates a robust capability for wild animal recognition in infrared imagery, with an efficient and compact architecture suitable for deployment in resource-constrained environments. The model’s balanced performance across different categories highlights its potential for real-time wildlife monitoring and conservation applications. Future work could focus on further fine-tuning the model and expanding the dataset to improve its accuracy and generalization across a broader range of species.

5. Discussion and Conclusions

This paper provides a comprehensive analysis and comparison of various deep neural network architectures applied to the task of wildlife recognition using infrared (IR) images. The primary objective was to evaluate the performance of different neural networks, namely VGG16, VGG19, ResNet50, Xception, MobileNet, and DenseNet, on a dataset comprising infrared images of different animal species. The evaluation was based on key metrics such as precision, recall, and F1 score. Additionally, confusion matrices were utilized to gain deeper insights into the classification capabilities and errors of each model. Each of these models offers different strengths in terms of computational efficiency, accuracy, and feature extraction capabilities. For instance, MobileNet is known for its lightweight design suitable for mobile and embedded applications, whereas Xception emphasizes depthwise separable convolutions for improved performance. Each architecture was trained and tested to classify IR images into the predefined animal classes. This approach allowed for a comprehensive comparison of different models in terms of accuracy, computational efficiency, and generalization capabilities. Each model’s performance was assessed using metrics such as precision, recall, and F1-score, providing insights into their accuracy and efficiency in animal classification. The results showcased varying degrees of performance across different architectures, highlighting trade-offs between computational efficiency and classification accuracy. Comparative analysis revealed that while lightweight models like MobileNet offered computational advantages, they sometimes compromised on accuracy compared to more complex models such as VGG16 or Xception. This trade-off underscores the importance of selecting models based on specific application requirements, such as real-time monitoring versus high-precision classification.
VGG16 and VGG19: Both architectures showed competitive performance, with VGG19 generally outperforming VGG16. VGG19 achieved higher precision and F1 scores in most categories, demonstrating its ability to capture more complex features due to its deeper architecture. ResNet50: This model excelled in precision across several classes, particularly in distinguishing between animals with subtle differences in IR images. However, it showed variability in recall, indicating potential difficulties in consistently identifying all instances of certain classes. Xception: The Xception model provided balanced performance with relatively high F1 scores. Its unique architecture, leveraging depthwise separable convolutions, proved effective in extracting intricate patterns from IR images. MobileNet: MobileNet, designed for efficiency, showed strong results in terms of precision but faced challenges in recall for some classes. This suggests that while MobileNet is capable of accurate predictions, it may miss some instances due to its lightweight nature. DenseNet: DenseNet achieved notable results with high precision and recall across most classes. Its densely connected layers facilitated effective feature reuse, which was particularly beneficial for the diverse and complex IR images in the dataset.
Misclassifications were observed across all models, with certain animal classes being more difficult to distinguish. For example, classes with similar thermal signatures, such as different types of small mammals or birds, often led to higher confusion rates. The quality and diversity of the IR images in the dataset also played a crucial role in model performance. Variations in image resolution, angle, and environmental conditions (e.g., background temperature) affected the models’ ability to generalize. The findings indicate that deep neural networks hold significant promise for automated wildlife monitoring using infrared imaging. High-performing models like DenseNet and Xception can provide reliable identification of animal species, which is critical for ecological research and conservation efforts. The efficiency of MobileNet highlights its potential for deployment in resource-constrained environments, such as remote field stations or on-device processing in wildlife cameras.
The findings have significant implications for wildlife conservation and management practices. Accurate classification of animals based on IR imagery can enhance wildlife population assessments, habitat monitoring, and mitigation of human-wildlife conflicts. These applications are crucial for informed decision-making and policy formulation aimed at preserving biodiversity and ecosystems. The study’s success in achieving high classification accuracy for animal species using IR images signifies a promising future for integrating AI technologies into conservation efforts. However, challenges such as dataset size, model interpretability, and environmental variability remain pertinent areas for future research. Moving forward, further exploration into ensemble learning techniques, transfer learning from pre-trained models, and the integration of temporal data could enhance the robustness and scalability of deep learning models in wildlife recognition. These advancements hold the potential to address current limitations and broaden the application of AI in safeguarding wildlife populations and habitats globally.
In the case of wildlife recognition using infrared images, the ability to process large volumes of data quickly is essential. Pruned models can provide near-real-time performance for the detection of animals in infrared imagery, even in resource-limited settings. For example, infrared cameras deployed in remote areas could send images to local edge devices running pruned models to quickly identify species, without relying on cloud-based processing, which might introduce delays. In summary, pruning is an effective technique for improving the efficiency of deep neural networks used for wildlife recognition in infrared images. By reducing the model’s size and computational complexity, pruning enables faster inference speeds and makes it feasible to deploy models in environments with limited computational resources. The minimal trade-off in accuracy ensures that the model still performs well, making pruning a valuable tool for real-time, resource-constrained applications in wildlife conservation and monitoring. Also, the quantization is a powerful technique for enhancing the efficiency of deep neural networks used in wildlife recognition with infrared images. By reducing model size and computational requirements, quantization enables faster inference and makes it feasible to deploy complex deep learning models on edge devices with limited resources. In the context of wildlife monitoring, where real-time, resource-efficient detection is critical, quantization provides a practical solution for deploying models in the field. The ability to preserve accuracy while significantly improving efficiency makes quantization an essential tool for advancing wildlife conservation efforts and ecological studies in resource-constrained environments.
While our study focuses on single-channel infrared images, future work could explore multimodal approaches that fuse infrared with visible light or depth information. This approach could offer improved performance, particularly in scenarios where one modality alone is insufficient. Additionally, the integration of pruning, quantization, and knowledge distillation techniques could be explored to enhance the efficiency and speed of the models, making them more suitable for real-time wildlife monitoring applications. The pruning strikes a balance between reducing computational complexity and maintaining high performance. On the other hand, by quantizing the model, the number of bits used to represent each parameter is reduced, which directly leads to reduced computational complexity. For wildlife recognition, this translates to quicker detection and classification of animals in infrared imagery, enabling near real-time performance on low-power devices.
The field of wildlife recognition with DNNs and IR imaging is rapidly progressing. However, several challenges remain, including the need for larger, diverse IR datasets, especially those capturing various species across different habitats. Our work contributes to addressing these challenges by demonstrating the efficacy of DNN models in improving detection accuracy, particularly in low-contrast and varied IR environments. Our approach differs from existing studies by focusing on DNN architectures and comprehensive data augmentation, which collectively enhance the model’s ability to generalize across various wildlife habitats and conditions. Looking forward, future research could explore integrating multispectral data (e.g., combining IR and visible light) and leveraging self-supervised learning to further reduce data dependency and improve model performance.

Author Contributions

Conceptualization, P.S., P.K. and R.H. (Roberta Hlavata); methodology, P.S., P.K. and R.H. (Roberta Hlavata); software, P.S.; validation, P.S., P.K., R.H. (Roberta Hlavata) and R.H. (Robert Hudec); formal analysis, P.S., P.K. and R.H. (Roberta Hlavata); investigation, P.S. and R.H. (Roberta Hlavata); resources, P.S. and P.K.; data curation, P.S.; writing—original draft preparation, P.S. and P.K.; writing—review and editing, P.S.; visualization, P.S. and P.K.; supervision, R.H. (Robert Hudec); project administration, R.H. (Robert Hudec) and P.K.; funding acquisition, R.H. (Robert Hudec). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Slovak Research and Development Agency under contract no. APVV-21-0502: BrainWatch: System for automatic detection of intracranial aneurysms.

Institutional Review Board Statement

Ethical review and approval were waived for this study, due to the experiment did no harm to all subjects.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. This is according to the laboratory rules.

Acknowledgments

This work was supported by the Slovak Research and Development Agency under project PP-COVID-20-0100: DOLORES. AI: The pandemic guard system.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANNArtificial neural networks
BNBatch Normalisation
CCConcatenation layer
CLConvolution Layer
CNNConvolutional Neural Network
DCLDeconvolutiol layer
DLDense Layer
DNNDeep Neural Network
HOGHistogram of Oriented Gradients
IRInfrared
MLPMulti Layer Perceptron
NNNeural Network
OLDropout Layer
PLPooling Layer
ResNetResidual Neural Network
RGBRed Green Blue
RLRecurrent Layer
RNNRecurrent Neural Network
SGDStochastic Gradient Descent
SIFTScale-Invariant Feature Transform
VGGVisual Geometry Group

References

  1. Zarkov, Z.; Stoyanov, L.; Draganovska, I.; Lazarov, V. The Comparison of different approaches for solar radiation forecasting using Artificial Neural Network. In Proceedings of the 2019 11th Electrical Engineering Faculty Conference (BulEF), Varna, Bulgaria, 11–14 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
  2. Soro, B.; Lee, C. Performance Comparison of Indoor Fingerprinting Techniques Based on Artificial Neural Network. In Proceedings of the TENCON 2018—2018 IEEE Region 10 Conference, Jeju, Republic of Korea, 28–31 October 2018; pp. 0056–0061. [Google Scholar] [CrossRef]
  3. Xie, T.; Yu, H.; Wilamowski, B. Comparison between traditional neural networks and radial basis function networks. In Proceedings of the 2011 IEEE International Symposium on Industrial Electronics, Gdansk, Poland, 27–30 June 2011; pp. 1194–1199. [Google Scholar] [CrossRef]
  4. Tamulionis, M.; Serackis, A. Comparison of Multi-Layer Perceptron and Cascade Feed-Forward Neural Network for Head-Related Transfer Function Interpolation. In Proceedings of the 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 25 April 2019; pp. 1–4. [Google Scholar] [CrossRef]
  5. Lisitsa, D.; Zhilenkov, A.A. Comparative analysis of the classical and nonclassical artificial neural networks. In Proceedings of the 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), Moscow, Russia, 1–3 February 2017; pp. 922–925. [Google Scholar] [CrossRef]
  6. Sarno, R.; Sidabutar, J.; Sarwosri. Comparison of different Neural Network architectures for software cost estimation. In Proceedings of the 2015 International Conference on Computer, Control, Informatics and its Applications (IC3INA), Bandung, Indonesia, 5–7 October 2015; pp. 68–73. [Google Scholar] [CrossRef]
  7. Han, H.G.; Zhang, L.; Hou, Y.; Qiao, J.F. Nonlinear Model Predictive Control Based on a Self-Organizing Recurrent Neural Network. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 402–415. [Google Scholar] [CrossRef] [PubMed]
  8. Siegel, B. Industrial Anomaly Detection: A Comparison of Unsupervised Neural Network Architectures. IEEE Sensors Lett. 2020, 4, 7501104. [Google Scholar] [CrossRef]
  9. Chen, G.; Han, T.X.; He, Z.; Kays, R.; Forrester, T. Deep convolutional neural network based species recognition for wild animal monitoring. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 858–862. [Google Scholar] [CrossRef]
  10. Djibrine, O.H.; Ahmat, D.; Boukar, M.M. Deep Learning-based Approaches for Preventing and Predicting Wild Animals Disappearance: A Review. In Proceedings of the 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles, 1–2 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
  11. Deng, S.; Tang, G.; Mei, L. Wild Mammal Behavior Recognition Based on Gated Transformer Network. In Proceedings of the 2022 International Conference on Cyber-Physical Social Intelligence (ICCSI), Nanjing, China, 18–21 November 2022; pp. 739–743. [Google Scholar] [CrossRef]
  12. Subraja, R.; Varthamanan, Y. Animal Activity Recognition Using Convolutional Neural Network. In Proceedings of the 2023 9th International Conference on Smart Structures and Systems (ICSSS), Chennai, India, 23–24 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
  13. Natarajan, B.; Elakkiya, R.; Bhuvaneswari, R.; Saleem, K.; Chaudhary, D.; Samsudeen, S.H. Creating Alert Messages Based on Wild Animal Activity Detection Using Hybrid Deep Neural Networks. IEEE Access 2023, 11, 67308–67321. [Google Scholar] [CrossRef]
  14. Roopashree, Y.A.; Bhoomika, M.; Priyanka, R.; Nisarga, K.; Behera, S. Monitoring the Movements of Wild Animals and Alert System using Deep Learning Algorithm. In Proceedings of the 2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), Bengaluru, Karnataka, 12–13 November 2021; pp. 626–630. [Google Scholar] [CrossRef]
  15. Nguyen, H.; Maclagan, S.J.; Nguyen, T.D.; Nguyen, T.; Flemons, P.; Andrews, K.; Ritchie, E.G.; Phung, D. Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring. In Proceedings of the 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Tokyo, Japan, 19–21 October 2017; pp. 40–49. [Google Scholar] [CrossRef]
  16. Ji, P.; Zhu, Q. Research on Embedded Animal Recognition System Based on YOLO. In Proceedings of the 2022 6th International Conference on Robotics and Automation Sciences (ICRAS), Wuhan, China, 9–11 June 2022; pp. 265–269. [Google Scholar] [CrossRef]
  17. Manohar, N.; Sharath Kumar, Y.; Kumar, G.H. Supervised and unsupervised learning in animal classification. In Proceedings of the 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 21–24 September 2016; pp. 156–161. [Google Scholar] [CrossRef]
  18. Tan, S.C. Using Supervised Attribute Selection for Unsupervised Learning. In Proceedings of the 2015 4th International Conference on Advanced Computer Science Applications and Technologies (ACSAT), Kuala Lumpur, Malaysia, 8–10 December 2015; pp. 198–201. [Google Scholar] [CrossRef]
  19. Angadi, U.B.; Venkatesulu, M. Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2012, 9, 601–608. [Google Scholar] [CrossRef] [PubMed]
  20. Jeong, M.W.; Rhee, C.E. Fusion for Tile-based Deconvolution Layers. In Proceedings of the 2021 18th International SoC Design Conference (ISOCC), Jeju Island, South Korea, 6–9 October 2021; pp. 423–424. [Google Scholar] [CrossRef]
  21. Mao, W.; Lin, J.; Wang, Z. F-DNA: Fast Convolution Architecture for Deconvolutional Network Acceleration. IEEE Trans. Very Large Scale Integr. Syst. 2020, 28, 1867–1880. [Google Scholar] [CrossRef]
  22. Cheng, C.; Parhi, K.K. Fast 2D Convolution Algorithms for Convolutional Neural Networks. IEEE Trans. Circuits Syst. Regul. Pap. 2020, 67, 1678–1691. [Google Scholar] [CrossRef]
  23. Sheng, M.; Zeng, H.; Li, J.; Sun, W. Pooling and Convolution Layer Strategy on CNN for Melanoma Detection. In Proceedings of the 2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 3–5 December 2021; pp. 153–161. [Google Scholar] [CrossRef]
  24. Nikzad, M.; Gao, Y.; Zhou, J. Gradient-Based Pooling for Convolutional Neural Networks. In Proceedings of the 2019 IEEE Visual Communications and Image Processing (VCIP), Sydney, Australia, 1–4 December 2019; pp. 1–4. [Google Scholar] [CrossRef]
  25. Romano, A.M.; Hernandez, A.A. An Improved Pooling Scheme for Convolutional Neural Networks. In Proceedings of the 2019 7th International Conference on Information, Communication and Networks (ICICN), Macau, China, 24–26 April 2019; pp. 201–206. [Google Scholar] [CrossRef]
  26. Vignesh, T.; Thyagharajan, K.; Jeyavathana, R.B.; Kanimozhi, K. Land Use and Land Cover Classification Using Recurrent Neural Networks with Shared Layered Architecture. In Proceedings of the 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021; pp. 1–6. [Google Scholar] [CrossRef]
  27. Chu, Y.; Fei, J.; Hou, S. Adaptive Global Sliding-Mode Control for Dynamic Systems Using Double Hidden Layer Recurrent Neural Network Structure. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 1297–1309. [Google Scholar] [CrossRef] [PubMed]
  28. Schüssler, M.; Münker, T.; Nelles, O. Deep Recurrent Neural Networks for Nonlinear System Identification. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 448–454. [Google Scholar] [CrossRef]
  29. Wang, Q.; Huang, H. Learning of recurrent convolutional neural networks with applications in pattern recognition. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 4135–4139. [Google Scholar] [CrossRef]
  30. Alsobhi, W.; Alafif, T.; Zong, W.; Abdel-Hakim, A.E. Adaptive Batch Normalization for Training Data with Heterogeneous Features. In Proceedings of the 2023 International Conference on Smart Computing and Application (ICSCA), Hail, Saudi Arabia, 5–6 February 2023; pp. 1–6. [Google Scholar] [CrossRef]
  31. Ting, Y.S.; Teng, Y.F.; Chiueh, T.D. Batch Normalization Processor Design for Convolution Neural Network Training and Inference. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; pp. 1–4. [Google Scholar] [CrossRef]
  32. Sun, H.; Yu, L.; Katto, J. Fully Neural Network Mode Based Intra Prediction of Variable Block Size. In Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, China, 1–4 December 2020; pp. 21–24. [Google Scholar] [CrossRef]
  33. Qu, Y.; Ke, Y.; Yu, W. A Solution for Input Limit in CNN Due to Fully-Connected Layer. In Proceedings of the 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 23–25 November 2018; pp. 611–616. [Google Scholar] [CrossRef]
  34. Ayhan, T.; Altun, M. Approximate Fully Connected Neural Network Generation. In Proceedings of the 2018 15th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), Prague, Czech Republic, 2–5 July 2018; pp. 93–96. [Google Scholar] [CrossRef]
  35. Sajjad, S.; Jiana, B.; sajjad, S.Z. The use of Convolutional Neural Network for Malware Classification. In Proceedings of the 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Liuzhou, China, 20–22 November 2020; pp. 1136–1140. [Google Scholar] [CrossRef]
  36. Rabidas, R.; Ravi, D.K.; Pradhan, S.; Moudgollya, R.; Ganguly, A. Investigation and Improvement of VGG based Encoder-Decoder Architecture for Background Subtraction. In Proceedings of the 2020 Advanced Communication Technologies and Signal Processing (ACTS), Virtual, 4–6 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
  37. Zakaria, N.; Mohmad Hassim, Y.M. Improved VGG Architecture in CNNs for Image Classification. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022; pp. 1–4. [Google Scholar] [CrossRef]
  38. Tatis, D.; Sierra, H.; Arzuaga, E. Residual Neural Network Architectures to Improve Prediction Accuracy of Properties of Materials. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 2915–2918. [Google Scholar] [CrossRef]
  39. Singh, K.S.; Diwakar, M.; Mishra, A.K.; Singh, P.; Yamsani, N. SRDRN-IR: A Super Resolution Deep Residual Neural Network for IR Images. In Proceedings of the 2023 IEEE World Conference on Applied Intelligence and Computing (AIC), Sonbhadra, India, 29–30 July 2023; pp. 746–751. [Google Scholar] [CrossRef]
  40. Zhang, K.; Sun, M.; Han, T.X.; Yuan, X.; Guo, L.; Liu, T. Residual Networks of Residual Networks: Multilevel Residual Networks. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 1303–1314. [Google Scholar] [CrossRef]
  41. Rismiyati.; Endah, S.N.; Khadijah.; Shiddiq, I.N. Xception Architecture Transfer Learning for Garbage Classification. In Proceedings of the 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 10–11 November 2020; pp. 1–4. [Google Scholar] [CrossRef]
  42. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef]
  43. Sinha, D.; El-Sharkawy, M. Thin MobileNet: An Enhanced MobileNet Architecture. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 10–12 October 2019; pp. 0280–0285. [Google Scholar] [CrossRef]
  44. Ayi, M.; El-Sharkawy, M. RMNv2: Reduced Mobilenet V2 for CIFAR10. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NA, USA, 6–8 January 2020; pp. 0287–0292. [Google Scholar] [CrossRef]
  45. El-Khamy, S.; Al-Kabbany, A.; EL-Bana, S. Going Shallower with MobileNets: On the Impact of Wavelet Pooling. In Proceedings of the 2021 38th National Radio Science Conference (NRSC), Mansoura, Egypt, 27–29 July 2021; Volume 1, pp. 126–138. [Google Scholar] [CrossRef]
  46. Pandi, S.S.; Deepak Kumar, K.; Senthilselvi, A.; Ramani, D.R. A Novel Approach to Detect COVID using DenseNet Architecture. In Proceedings of the 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 1–2 November 2023; pp. 1–5. [Google Scholar] [CrossRef]
  47. Zhang, X.; Liu, H.; Zhu, Z.; Xu, Z. Learning to Search Efficient DenseNet with Layer-wise Pruning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
  48. Zhang, K.; Guo, Y.; Wang, X.; Yuan, J.; Ding, Q. Multiple Feature Reweight DenseNet for Image Classification. IEEE Access 2019, 7, 9872–9880. [Google Scholar] [CrossRef]
Figure 1. Overview of neural network architectures.
Figure 1. Overview of neural network architectures.
Ai 05 00135 g001
Figure 2. Preview of the convolution layer.
Figure 2. Preview of the convolution layer.
Ai 05 00135 g002
Figure 3. MaxPooling, step of 2 × 2 and stride of 2 × 2 .
Figure 3. MaxPooling, step of 2 × 2 and stride of 2 × 2 .
Ai 05 00135 g003
Figure 4. Preview of the recurrent layer.
Figure 4. Preview of the recurrent layer.
Ai 05 00135 g004
Figure 5. Data normalisation using Batch normalisation.
Figure 5. Data normalisation using Batch normalisation.
Ai 05 00135 g005
Figure 6. Preview of the flatten layer.
Figure 6. Preview of the flatten layer.
Ai 05 00135 g006
Figure 7. Example effect of Dropout layer.
Figure 7. Example effect of Dropout layer.
Ai 05 00135 g007
Figure 8. Diagram of VGG19 architecture.
Figure 8. Diagram of VGG19 architecture.
Ai 05 00135 g008
Figure 9. Diagram of ResNet50 architecture.
Figure 9. Diagram of ResNet50 architecture.
Ai 05 00135 g009
Figure 10. Diagram of Xception architecture.
Figure 10. Diagram of Xception architecture.
Ai 05 00135 g010
Figure 11. Diagram of MobileNet architecture.
Figure 11. Diagram of MobileNet architecture.
Ai 05 00135 g011
Figure 12. Diagram of DenseNet architecture.
Figure 12. Diagram of DenseNet architecture.
Ai 05 00135 g012
Figure 13. Example of IR animal dataset.
Figure 13. Example of IR animal dataset.
Ai 05 00135 g013
Figure 14. Example of Fashion-MNIST dataset.
Figure 14. Example of Fashion-MNIST dataset.
Ai 05 00135 g014
Figure 15. Evaluation criteria.
Figure 15. Evaluation criteria.
Ai 05 00135 g015
Table 1. Hyperparameters tuning.
Table 1. Hyperparameters tuning.
ModelHyperparameterValue
Learning Rate0.001
Recurrent NNBatch size512
Dropout rate0.15
Loss functionMSE
Learning Rate0.001
Batch size512
Filter size3 × 3
ActivationReLU
Convolutional Neural NetworkPooling size2 × 2
Pooling method“max”
Dropout rate0.15
Loss functionMSE
Table 2. The example of the confusion matrix for VGG 16 using IR animal dataset.
Table 2. The example of the confusion matrix for VGG 16 using IR animal dataset.
Targeted Class/1234
Predicted Class
120302
271530
313220
450119
Table 3. The example of the confusion matrix for VGG 19 using IR animal dataset.
Table 3. The example of the confusion matrix for VGG 19 using IR animal dataset.
Targeted Class/1234
Predicted Class
116414
2101500
323200
470315
Table 4. The evaluation criterion of the VGG 16 neural network architecture for IR animal dataset.
Table 4. The evaluation criterion of the VGG 16 neural network architecture for IR animal dataset.
Evaluation Metrics1234
Precision (P)61.00%75.00%86.00%80.00%
Recall (R)90.00%58.00%89.00%64.00%
F1 score (F1)73.00%65.00%88.00%71.00%
Table 5. The evaluation criterion of the VGG 19 neural network architecture for IR animal dataset.
Table 5. The evaluation criterion of the VGG 19 neural network architecture for IR animal dataset.
Evaluation Metrics1234
Precision (P)43.00%73.00%85.00%55.00%
Recall (R)62.00%62.00%79.00%48.00%
F1 score (F1)51.00%67.00%81.00%51.00%
Table 6. The evaluation criterion of the VGG 16 neural network architecture for Fashion MNIST dataset.
Table 6. The evaluation criterion of the VGG 16 neural network architecture for Fashion MNIST dataset.
Evaluation Metrics/Precision (P)Recall (R)F1 Score (F1)
No. of Class
177.00%63.00%69.00%
277.00%96.00%86.00%
347.00%78.00%59.00%
463.00%34.00%44.00%
557.00%61.00%59.00%
674.00%78.00%76.00%
738.00%13.00%19.00%
870.00%88.00%78.00%
988.00%82.00%85.00%
1094.00%80.00%86.00%
Table 7. The evaluation criterion of the VGG 19 neural network architecture for Fashion MNIST dataset.
Table 7. The evaluation criterion of the VGG 19 neural network architecture for Fashion MNIST dataset.
Evaluation Metrics/Precision (P)Recall (R)F1 Score (F1)
No. of Class
175.00%58.00%65.00%
273.00%96.00%83.00%
381.00%49.00%61.00%
452.00%74.00%61.00%
544.00%80.00%57.00%
675.00%91.00%82.00%
764.00%25.00%21.00%
876.00%70.00%73.00%
992.00%88.00%90.00%
1098.00%80.00%88.00%
Table 8. The example of the confusion matrix for ResNet 50 using IR animal dataset.
Table 8. The example of the confusion matrix for ResNet 50 using IR animal dataset.
Targeted Class/1234
Predicted Class
122003
202230
301231
440021
Table 9. The evaluation criterion of the ResNet 50 neural network architecture for IR animal dataset.
Table 9. The evaluation criterion of the ResNet 50 neural network architecture for IR animal dataset.
Evaluation Metrics1234
Precision (P)88.00%100.00%96.00%85.00%
Recall (R)100.00%88.00%93.00%88.00%
F1 score (F1)93.00%94.00%95.00%86.00%
Table 10. The evaluation criterion of the ResNet50 neural network architecture for Fashion MNIST dataset.
Table 10. The evaluation criterion of the ResNet50 neural network architecture for Fashion MNIST dataset.
Evaluation Metrics/Precision (P)Recall (R)F1 Score (F1)
No. of Class
192.00%37.00%52.00%
298.00%80.00%88.00%
380.00%41.00%54.00%
442.00%96.00%58.00%
549.00%69.00%57.00%
685.00%82.00%85.00%
765.00%35.00%58.00%
891.00%54.00%68.00%
981.00%91.00%86.00%
1082.00%96.00%89.00%
Table 11. The example of the confusion matrix for Xception using IR animal dataset.
Table 11. The example of the confusion matrix for Xception using IR animal dataset.
Targeted/1234
Predicted
124100
212310
301240
410222
Table 12. The evaluation criterion of the Xception neural network architecture for IR animal dataset.
Table 12. The evaluation criterion of the Xception neural network architecture for IR animal dataset.
Evaluation Metrics1234
Precision (P)100.00%83.00%100.00%88.00%
Recall (R)95.00%92.00%93.00%88.00%
F1 score (F1)98.00%87.00%96.00%88.00%
Table 13. The evaluation criterion of the Xception neural network architecture for Fashion MNIST dataset.
Table 13. The evaluation criterion of the Xception neural network architecture for Fashion MNIST dataset.
Evaluation Metrics/Precision (P)Recall (R)F1 Score (F1)
No. of Class
179.00%62.00%70.00%
290.00%95.00%93.00%
352.00%80.00%63.00%
453.00%66.00%59.00%
547.00%66.00%55.00%
693.00%50.00%65.00%
783.00%75.00%69.00%
870.00%81.00%75.00%
982.00%79.00%80.00%
1078.00%91.00%84.00%
Table 14. The example of the confusion matrix for MobileNet using IR animal dataset.
Table 14. The example of the confusion matrix for MobileNet using IR animal dataset.
Targeted/1234
Predicted
113327
221445
322183
431516
Table 15. The evaluation criterion of the MobileNet neural network architecture for IR animal dataset.
Table 15. The evaluation criterion of the MobileNet neural network architecture for IR animal dataset.
Evaluation Metrics1234
Precision (P)86.00%66.00%100.00%67.00%
Recall (R)29.00%81.00%89.00%96.00%
F1 score (F1)43.00%72.00%94.00%79.00%
Table 16. The evaluation criterion of the MobilNet neural network architecture for Fashion MNIST dataset.
Table 16. The evaluation criterion of the MobilNet neural network architecture for Fashion MNIST dataset.
Evaluation Metrics/Precision (P)Recall (R)F1 Score (F1)
No. of Class
194.00%33.00%49.00%
2100.00%75.00%86.00%
355.00%45.00%49.00%
451.00%54.00%52.00%
533.00%96.00%52.00%
697.00%55.00%75.00%
773.00%58.00%69.00%
862.00%63.00%62.00%
965.00%88.00%75.00%
1079.00%97.00%87.00%
Table 17. The example of the confusion matrix for DenseNet using IR animal dataset.
Table 17. The example of the confusion matrix for DenseNet using IR animal dataset.
Targeted/1234
Predicted
123100
222111
311203
450416
Table 18. The evaluation criterion of the DenseNet neural network architecture for IR animal dataset.
Table 18. The evaluation criterion of the DenseNet neural network architecture for IR animal dataset.
Evaluation Metrics1234
Precision (P)100.00%83.00%96.00%70.00%
Recall (R)90.00%77.00%82.00%92.00%
F1 score (F1)95.00%80.00%88.00%79.00%
Table 19. The evaluation criterion of the DenseNet neural network architecture for Fashion MNIST dataset.
Table 19. The evaluation criterion of the DenseNet neural network architecture for Fashion MNIST dataset.
Evaluation Metrics/Precision (P)Recall (R)F1 Score (F1)
No. of Class
170.00%82.00%75.00%
285.00%97.00%91.00%
361.00%75.00%67.00%
464.00%82.00%72.00%
557.00%49.00%53.00%
687.00%87.00%87.00%
753.00%59.00%55.00%
889.00%59.00%71.00%
991.00%85.00%88.00%
1077.00%95.00%85.00%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sykora, P.; Kamencay, P.; Hlavata, R.; Hudec, R. Overview and Comparison of Deep Neural Networks for Wildlife Recognition Using Infrared Images. AI 2024, 5, 2801-2828. https://doi.org/10.3390/ai5040135

AMA Style

Sykora P, Kamencay P, Hlavata R, Hudec R. Overview and Comparison of Deep Neural Networks for Wildlife Recognition Using Infrared Images. AI. 2024; 5(4):2801-2828. https://doi.org/10.3390/ai5040135

Chicago/Turabian Style

Sykora, Peter, Patrik Kamencay, Roberta Hlavata, and Robert Hudec. 2024. "Overview and Comparison of Deep Neural Networks for Wildlife Recognition Using Infrared Images" AI 5, no. 4: 2801-2828. https://doi.org/10.3390/ai5040135

APA Style

Sykora, P., Kamencay, P., Hlavata, R., & Hudec, R. (2024). Overview and Comparison of Deep Neural Networks for Wildlife Recognition Using Infrared Images. AI, 5(4), 2801-2828. https://doi.org/10.3390/ai5040135

Article Metrics

Back to TopTop