A Review on Multiscale-Deep-Learning Applications
<p>The primary taxonomy of multiscale-deep-learning architecture used in classification and segmentation tasks.</p> "> Figure 2
<p>Multiscale receptive fields of deep-feature maps that are used to activate the visual semantics and their contexts. Multiscale representations help in better segmenting the objects by combining low-level and high-level representations.</p> "> Figure 3
<p>Multiscale CNN, defined as a network with multiple distinct CNN networks with various contextual input sizes that run concurrently, whereby the outputs are combined at the end of the network to obtain rich multiscale semantic features.</p> "> Figure 4
<p>The spatial-pyramid-pooling module extracts information from different scales that varies among different subregions. Using a four-level pyramid, the pooling kernels cover the whole, half, and small portions of the image. A more powerful representation could be fused with information from the different subregions within these receptive fields.</p> "> Figure 5
<p>Multilevel spatial bin, with the example of bin-size-6 resultant feature maps segmented into 6 × 6 subsets.</p> "> Figure 6
<p>In ASSP, the atrous convolution uses a parameter called the dilation rate that adjusts the field of view to allow a wider receptive field for better semantic-segmentation results. By increasing the dilation rate at each block, the spatial resolution can be preserved, and a deeper network can be built by capturing features at multiple scales.</p> "> Figure 7
<p>In early fusion, all local attributes (shapes and colors) are retrieved from identical regions and locally concatenated before encoding. In late fusion, image representations are derived independently for each attribute and concatenated afterward.</p> "> Figure 8
<p>Feature-pyramid-network (FPN) model that combines low- and high-resolution features via a top-down pathway to enrich semantic features at all levels.</p> ">
Abstract
:1. Introduction
- There is a tradeoff between the network complexity and processing speed. Typically, a very deep network may produce great accuracy, but it will not be nearly as fast as a lightweight network. This tradeoff applies to both classification and segmentation models;
- If the number of training data is limited, then increasing the network complexity, which directly increases the number of parameters that needs to be fit, will likely result in an overfitting problem;
- The backpropagated gradient will dissipate as the network becomes deeper, leading to the gradient’s diffusion. This circumstance makes it harder to optimize the deep model.
- This is the first comprehensive review on the taxonomy of a multiscale-deep-learning architecture;
- This review explains, in detail, the two main categories of the multiscale-deep-learning approaches, which are multiscale feature learning and multiscale feature fusion;
- This is a comprehensive review on the multiscale-deep-learning usage in various main applications, such as satellite imagery, medical imaging, agriculture, and industrial and manufacturing systems.
2. Multiscale-Deep-Learning Taxonomy
2.1. Multiscale Feature Learning
2.1.1. Multiscale CNN
2.1.2. Spatial Pyramid Pooling (SPP)
2.1.3. Atrous Spatial Pyramid Pooling (ASPP)
2.2. Multiscale Feature Fusion
2.2.1. Image-Level Fusion
- Early fusion: This fusion scheme happens when spatial scales are retrieved from the same regions and concatenated as one input image locally, prior to the encoding. In [33], the authors applied an early-fusion scheme by combining bitemporal remote-sensing images as one input, which was then fed to a modified UNet++ as a backbone to learn the multiscale semantic levels of the visual feature representations for remote-sensing-based change-detection application. Alcantarilla et al. [34] used two bitemporal street-view images that were combined as one input image for their early-fusion method before feeding them into an FCN network to identify the changes in the street-view images;
- Late fusion: This fusion scheme utilizes separate image representations that are generated for each feature, which are then concatenated together. An example of this late-fusion scheme is the work in [35], which addresses the multiscale problem of the change-detection system by designing a feature-difference network to generate feature-difference maps to provide valuable information at different scales and depths for land-cover application. These learned features are then fed into a feature-fusion network to produce change-detection maps with minimal pixel-wise training samples.
2.2.2. Feature-Level Fusion
3. Application of Multiscale Deep Learning
3.1. Satellite Imagery
3.2. Medical Imaging
- The range of the annotated medical images required for optimally training the model is often limited;
- The regions of interest (ROIs) are generally small in size, and they have imprecise edges that make them appear in unpredictable x, y, and z positions. Furthermore, sometimes only the entire image label is labeled, even though the targeted ROIs are not available;
- The ROIs in medical images often contain visual information with similar patterns and that vary in size (scale).
3.3. Agriculture
3.4. Industrial and Manufacturing Systems
3.4.1. Machine-Fault Diagnosis
3.4.2. Predictive Analytics and Defect Prognosis
3.4.3. Surface-Integration Inspection
4. Conclusions and Future Works
- Typically, multiscale networks are constructed by using multiple parallel paths that begin with the coarsest feature map, followed by finer paths, which are progressively added to extract various scale information. The implementation of multiple paths increases the overall network complexity, which directly increases the required computational-resource and memory usage. Due to the volume and resolution of the multiscale data, this method is sometimes impractical for certain applications, and especially for mobile-based systems;
- In order for a multiscale-deep-learning network to be successfully implemented, the emphasis of feature extraction must shift from the global to local scale, allowing the relevance of each connection to be determined at the node level. As a result, the issue creates a challenge in extracting the features as efficiently as possible by combining low-resolution and high-resolution features from different sources. Therefore, the architecture needs to be designed optimally, and as such, the ideal paths for combining the feature maps need to be designed carefully.
- Optimal network-flow configurations in the parallel paths. The most popular technique is a direct and homogenous flow scheme for all the parallel paths. A unique network flow, such as a waterfall scheme, can be applied to the parallel paths by cascading down the input between the paths. The second parallel path will receive input from the middle of the first path, and consequently, the third path will receive its input from the middle of the second path. By applying this network flow, the information variation in the input can be expanded while retaining the crucial features of each scale;
- The optimal implementation of multiple multiscale modules to various selected layers. Usually, a single-feature learning module is added right after the encoder part of a classification network or the bottleneck part of a segmentation network. The main reason for this placement selection is to apply the multiscale module to relatively smaller-size feature maps. However, these reduced feature maps have probably lost some of the crucial information during the downpooling operations, whereby, in a certain network, the multiscale module is applied to small feature maps, which do not carry much information. However, applying the multiscale module at the initial layer will increase the required number of parameters, which directly enlarges the network size and increases the computational workload. Hence, the multiscale module can be selectively applied in the initial and later stages, but with fewer parallel paths for each layer. Hence, the number of parameters can be kept small, but this allows the network to extract multiscale features from various layers;
- Combining the downsampling approach of SPP and the upsampling approach of ASPP. Usually, in one implementation of the multiscale module, only either SPP or ASPP is applied in the whole network. SPP works by taking multiple-scale input from the downsampling operations, while ASPP work by taking multiple upsamples of the kernel sizes in the atrous convolution. By combining both approaches in one-layer, different types of features can be extracted, as the natures of both modules are different. Therefore, both schemes can be integrated to produce a better multiscale feature extractor; however, the computational workload will also certainly increase.
Author Contributions
Funding
Conflicts of Interest
Abbreviations
CNN | Convolutional Neural Network |
ReLU | Rectified Linear Unit |
FCN | Fully Convolutional Network |
GTSRB | German Traffic Sign Recognition Benchmark |
SPP | Spatial Pyramid Pooling |
PSPNet | Pyramid Scene Parsing Network |
ASPP | Atrous Spatial Pyramid Pooling |
FPN | Feature Pyramid Network |
ResNet | Residual Neural Network |
LSTM | Long Short-Term Memory |
Bi-LSTM | Bidirectional Long Short-Term Memory |
HSI | Hyperspectral Imaging |
2D | Two-Dimensional |
3D | Three-Dimensional |
SSCEM | Spectral-Spatial-Feature Cross-Extraction Module |
RPN | Region Proposal Network |
RCN | Region-Based Convolutional Neural Networks |
AID | Aerial Image Dataset |
ROI | Region of Interest |
ACDC | Automated Cardiac Diagnosis Challenge |
MICCAI | Medical Image Computing and Computer-Assisted Intervention |
ED | End-Diastolic |
ES | End-Systolic |
DRN | Dilated Residual Network |
HPPN | Hybrid Pyramid-Pooling Network |
CT | Computed Tomography |
RGB | Red Green Blue |
SAR | Synthetic-Aperture Radar |
PA | Precision Agriculture |
GAN | Generative Adversarial Network |
mIoU | Mean Intersection over Union |
GA | Global Accuracy |
UAVSAR | Uninhabited Aerial Vehicle Synthetic Aperture Radar |
YOLOv3 | You Only Look Once version 3 |
SSD | Single Shot Detector |
IoT | Internet of Things |
EMD | Empirical Mode Decomposition |
RUL | Remaining Useful Life |
WPE | Wavelet Packet Energy |
ConvNet | Convolutional Network |
MAE | Mean Absolute Error |
RMSE | Root Mean Squared Error |
TFR | Time-Frequency Representation |
MSCNN | Multiscale Convolutional Neural Network |
DBN | Deep Belief Network |
MODBNE | Multi-Objective Deep Belief Network Ensemble |
C-MAPSS | Commercial Modular Aero-Propulsion System Simulation |
MS-DCNN | Multiscale Deep Convolutional Neural Network |
References
- Gao, K.; Niu, S.; Ji, Z.; Wu, M.; Chen, Q.; Xu, R.; Yuan, S.; Fan, W.; Chen, Y.; Dong, J. Double-Branched and Area-Constraint Fully Convolutional Networks for Automated Serous Retinal Detachment Segmentation in SD-OCT Images. Comput. Methods Programs Biomed. 2019, 176, 69–80. [Google Scholar] [CrossRef] [PubMed]
- Teng, L.; Li, H.; Karim, S. DMCNN: A Deep Multiscale Convolutional Neural Network Model for Medical Image Segmentation. J. Healthc. Eng. 2019, 2019, 8597606. [Google Scholar] [CrossRef] [PubMed]
- Sermanet, P.; Lecun, Y. Traffic Sign Recognition with Multi-Scale Convolutional Networks. In Proceedings of the International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2809–2813. [Google Scholar] [CrossRef]
- Buyssens, P.; Elmoataz, A.; Lézoray, O. Multiscale Convolutional Neural Networks for Vision–Based Classification of Cells. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2013; Volume 7725 LNCS, pp. 342–352. [Google Scholar] [CrossRef]
- Zamri, N.F.M.; Tahir, N.M.; Ali, M.S.A.M.; Ashar, N.D.K.; Al-misreb, A.A. Mini-Review of Street Crime Prediction and Classification Methods. J. Kejuruter. 2021, 33, 391. [Google Scholar]
- Abdani, S.R.; Zulkifley, M.A.; Zulkifley, N.H. Analysis of Spatial Pyramid Pooling Variations in Semantic Segmentation for Satellite Image Applications. In Proceedings of the 2021 International Conference on Decision Aid Sciences and Application, DASA 2021, Online, 7–8 December 2021; pp. 397–401. [Google Scholar] [CrossRef]
- Mohamed, N.A.; Zulkifley, M.A.; Kamari, N.A.M.; Kadim, Z.; Mohamed, N.A.; Zulkifley, M.A.; Azwan, N.; Kamari, M.; Kadim, Z.; My, N.A.M.K. Symmetrically Stacked Long Short-Term Memory Networks for Fall Event Recognition Using Compact Convolutional Neural Networks-Based Tracker. Symmetry 2022, 14, 293. [Google Scholar] [CrossRef]
- LeCun, Y.; Kavukcuoglu, K.; Farabet, C. Convolutional Networks and Applications in Vision. In Proceedings of the ISCAS 2010–2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, Paris, France, 30 May–2 June 2010; pp. 253–256. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 10 December 2015; pp. 770–778. [Google Scholar] [CrossRef]
- Lu, D.; Popuri, K.; Ding, G.W.; Balachandar, R.; Beg, M.F.; Weiner, M.; Aisen, P.; Petersen, R.; Jack, C.; Jagust, W.; et al. Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer’s Disease Using Structural MR and FDG-PET Images. Sci. Rep. 2018, 8, 5697. [Google Scholar] [CrossRef]
- Suh, S.; Lukowicz, P.; Lee, Y.O. Generalized Multiscale Feature Extraction for Remaining Useful Life Prediction of Bearings with Generative Adversarial Networks. Knowl. Based Syst. 2022, 237, 107866. [Google Scholar] [CrossRef]
- Farabet, C.; Couprie, C.; Najman, L.; LeCun, Y. Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, UK, 26 June–1 July 2012; Volume 1, pp. 575–582. [Google Scholar] [CrossRef]
- Zhou, W.; Lin, X.; Lei, J.; Yu, L.; Hwang, J.N. MFFENet: Multiscale Feature Fusion and Enhancement Network For RGB-Thermal Urban Road Scene Parsing. IEEE Trans. Multimed. 2022, 24, 2526–2538. [Google Scholar] [CrossRef]
- Zhang, R.; Chen, J.; Feng, L.; Li, S.; Yang, W.; Guo, D. A Refined Pyramid Scene Parsing Network for Polarimetric SAR Image Semantic Segmentation in Agricultural Areas. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2018; Volume 11211 LNCS, pp. 833–851. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2016; pp. 6230–6239. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651. [Google Scholar] [CrossRef]
- Yue, J.; Mao, S.; Li, M. A Deep Learning Framework for Hyperspectral Image Classification Using Spatial Pyramid Pooling. Remote Sens. Lett. 2016, 7, 875–884. [Google Scholar] [CrossRef]
- Sriram, S.; Vinayakumar, R.; Sowmya, V.; Alazab, M.; Soman, K.P. Multi-Scale Learning Based Malware Variant Detection Using Spatial Pyramid Pooling Network. In Proceedings of the IEEE INFOCOM 2020–IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2020, Toronto, ON, Canada, 6–9 July 2020; pp. 740–745. [Google Scholar] [CrossRef]
- Tan, Y.S.; Lim, K.M.; Tee, C.; Lee, C.P.; Low, C.Y. Convolutional Neural Network with Spatial Pyramid Pooling for Hand Gesture Recognition. Neural Comput. Appl. 2020, 33, 5339–5351. [Google Scholar] [CrossRef]
- Asgari, R.; Waldstein, S.; Schlanitz, F.; Baratsits, M.; Schmidt-Erfurth, U.; Bogunović, H. U-Net with Spatial Pyramid Pooling for Drusen Segmentation in Optical Coherence Tomography. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2019; Volume 11855 LNCS, pp. 77–85. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Stofa, M.M.; Zulkifley, A.; Atiqi, M.A.; Zainuri, M. Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks. Sensors 2022, 22, 4634. [Google Scholar] [CrossRef] [PubMed]
- Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1451–1460. [Google Scholar] [CrossRef]
- Hamaguchi, R.; Fujita, A.; Nemoto, K.; Imaizumi, T.; Hikosaka, S. Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1442–1450. [Google Scholar] [CrossRef]
- Gao, H.; Yuan, H.; Wang, Z.; Ji, S. Pixel Deconvolutional Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 2528–2535. [Google Scholar]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef]
- Amer, A.; Lambrou, T.; Ye, X. MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation. Appl. Sci. 2022, 12, 3676. [Google Scholar] [CrossRef]
- Seeland, M.; Rzanny, M.; Alaqraa, N.; Wäldchen, J.; Mäder, P. Plant Species Classification Using Flower Images—A Comparative Study of Local Feature Representations. PLoS ONE 2017, 12, e0170629. [Google Scholar] [CrossRef]
- Wang, C.; Sun, W.; Fan, D.; Liu, X.; Zhang, Z.; Wang, M.; Yu, H.; Chen, J.; Zhu, Y. Adaptive Feature Weighted Fusion Nested U-Net with Discrete Wavelet Transform for Change Detection of High-Resolution Remote Sensing Images. Remote Sens. 2021, 13, 4971. [Google Scholar] [CrossRef]
- Peng, D.; Zhang, Y.; Guan, H. End-to-End Change Detection for High Resolution Satellite Images Using Improved UNet++. Remote Sens. 2019, 11, 1382. [Google Scholar] [CrossRef]
- Alcantarilla, P.F.; Stent, S.; Ros, G.; Arroyo, R.; Gherardi, R. Street-View Change Detection with Deconvolutional Networks. Auton. Robot. 2018, 42, 1301–1322. [Google Scholar] [CrossRef]
- Zhang, M.; Shi, W. A Feature Difference Convolutional Neural Network-Based Change Detection Method. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7232–7246. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Fan, J.; Bocus, M.J.; Hosking, B.; Wu, R.; Liu, Y.; Vityazev, S.; Fan, R. Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for Road Pothole Detection. In Proceedings of the ICAS 2021–2021 IEEE International Conference on Autonomous Systems, Montreal, QC, Canada, 11–13 August 2021. [Google Scholar] [CrossRef]
- Wang, C.; Wang, Z.; Xi, W.; Yang, Z.; Bai, G.; Wang, R.; Duan, M. MufiNet: Multiscale Fusion Residual Networks for Medical Image Segmentation. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
- Wang, Z.; Peng, Y.; Li, D.; Guo, Y.; Zhang, B. MMNet: A Multi-Scale Deep Learning Network for the Left Ventricular Segmentation of Cardiac MRI Images. Appl. Intell. 2021, 52, 5225–5240. [Google Scholar] [CrossRef]
- Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, D.; Ma, A.; Zhong, Y.; Fang, F.; Xu, K. Multiscale U-Shaped CNN Building Instance Extraction Framework with Edge Constraint for High-Spatial-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6106–6120. [Google Scholar] [CrossRef]
- Gong, H.; Li, Q.; Li, C.; Dai, H.; He, Z.; Wang, W.; Li, H.; Han, F.; Tuniyazi, A.; Mu, T. Multiscale Information Fusion for Hyperspectral Image Classification Based on Hybrid 2D-3D CNN. Remote Sens. 2021, 13, 2268. [Google Scholar] [CrossRef]
- Kim, B.C.; Yoon, J.S.; Choi, J.S.; Suk, H.I. Multi-Scale Gradual Integration CNN for False Positive Reduction in Pulmonary Nodule Detection. Neural Netw. 2018, 115, 1–10. [Google Scholar] [CrossRef]
- Gao, H.; Wu, H.; Chen, Z.; Zhang, Y.; Zhang, Y.; Li, C. Multiscale Spectral-Spatial Cross-Extraction Network for Hyperspectral Image Classification. IET Image Process. 2022, 16, 755–771. [Google Scholar] [CrossRef]
- Zhao, W.; Du, S. Learning Multiscale and Deep Representations for Classifying Remotely Sensed Imagery. ISPRS J. Photogramm. Remote Sens. 2016, 113, 155–165. [Google Scholar] [CrossRef]
- Li, S.; Zhu, X.; Bao, J. Hierarchical Multi-Scale Convolutional Neural Networks for Hyperspectral Image Classification. Sensors 2019, 19, 1714. [Google Scholar] [CrossRef] [PubMed]
- Gong, Z.; Zhong, P.; Yu, Y.; Hu, W.; Li, S. A CNN with Multiscale Convolution and Diversified Metric for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3599–3618. [Google Scholar] [CrossRef]
- Hu, G.X.; Yang, Z.; Hu, L.; Huang, L.; Han, J.M. Small Object Detection with Multiscale Features. Int. J. Digit. Multimed. Broadcasting 2018, 2018, 4546896. [Google Scholar] [CrossRef]
- Cui, X.; Zheng, K.; Gao, L.; Zhang, B.; Yang, D.; Ren, J. Multiscale Spatial-Spectral Convolutional Network with Image-Based Framework for Hyperspectral Imagery Classification. Remote Sens. 2019, 11, 2220. [Google Scholar] [CrossRef]
- Li, X.; Jiang, Y.; Peng, H.; Yin, S. An Aerial Image Segmentation Approach Based on Enhanced Multi-Scale Convolutional Neural Network. In Proceedings of the 2019 IEEE International Conference on Industrial Cyber Physical Systems, ICPS 2019, Taipei, Taiwan, 6–9 May 2019; pp. 47–52. [Google Scholar] [CrossRef]
- Liu, Y.; Zhong, Y.; Qin, Q. Scene Classification Based on Multiscale Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2018, 56, 7109–7121. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Li, W.; Zuluaga, M.A.; Pratt, R.; Patel, P.A.; Aertsen, M.; Doel, T.; David, A.L.; Deprest, J.; Ourselin, S.; et al. Interactive Medical Image Segmentation Using Deep Learning with Image-Specific Fine Tuning. IEEE Trans. Med. Imaging 2018, 37, 1562–1573. [Google Scholar] [CrossRef]
- Yin, S.; Bi, J. Medical Image Annotation Based on Deep Transfer Learning. J. Appl. Sci. Eng. 2019, 22, 385–390. [Google Scholar] [CrossRef]
- Li, P.; Chen, Z.; Yang, L.T.; Zhang, Q.; Deen, M.J. Deep Convolutional Computation Model for Feature Learning on Big Data in Internet of Things. IEEE Trans. Ind. Inform. 2018, 14, 790–798. [Google Scholar] [CrossRef]
- Zhao, L.; Chen, Z.; Yang, Y.; Zou, L.; Wang, Z.J. ICFS Clustering with Multiple Representatives for Large Data. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 728–738. [Google Scholar] [CrossRef]
- Zulkifley, M.A.; Abdani, S.R.; Zulkifley, N.H.; Shahrimin, M.I. Residual-Shuffle Network with Spatial Pyramid Pooling Module for COVID-19 Screening. Diagnostics 2021, 11, 1497. [Google Scholar] [CrossRef]
- Roslidar, R.; Syaryadhi, M.; Saddami, K.; Pradhan, B.; Arnia, F.; Syukri, M.; Munadi, K.; Roslidar, R.; Syaryadhi, M.; Saddami, K.; et al. BreaCNet: A High-Accuracy Breast Thermogram Classifier Based on Mobile Convolutional Neural Network. Math. Biosci. Eng. 2022, 19, 1304–1331. [Google Scholar] [CrossRef] [PubMed]
- Havaei, M.; Davy, A.; Warde-Farley, D.; Biard, A.; Courville, A.; Bengio, Y.; Pal, C.; Jodoin, P.M.; Larochelle, H. Brain Tumor Segmentation with Deep Neural Networks. Med. Image Anal. 2017, 35, 18–31. [Google Scholar] [CrossRef] [PubMed]
- Pace, D.F.; van Dalca, A.; Geva, T.; Powell, A.J.; Moghari, M.H.; Golland, P. Interactive Whole-Heart Segmentation in Congenital Heart Disease. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 80–88. [Google Scholar]
- Zotti, C.; Luo, Z.; Lalande, A.; Jodoin, P.M. Convolutional Neural Network with Shape Prior Applied to Cardiac MRI Segmentation. IEEE J. Biomed. Health Inform. 2019, 23, 1119–1128. [Google Scholar] [CrossRef]
- Wolterink, J.M.; Leiner, T.; Viergever, M.A.; Išgum, I. Dilated Convolutional Neural Networks for Cardiovascular MR Segmentation in Congenital Heart Disease. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2017; Volume 10129 LNCS, pp. 95–102. [Google Scholar] [CrossRef]
- Du, X.; Song, Y.; Liu, Y.; Zhang, Y.; Liu, H.; Chen, B.; Li, S. An Integrated Deep Learning Framework for Joint Segmentation of Blood Pool and Myocardium. Med. Image Anal. 2020, 62, 101685. [Google Scholar] [CrossRef]
- Muralidharan, N.; Gupta, S.; Prusty, M.R.; Tripathy, R.K. Detection of COVID19 from X-Ray Images Using Multiscale Deep Convolutional Neural Network. Appl. Soft Comput. 2022, 119, 108610. [Google Scholar] [CrossRef] [PubMed]
- Amer, A.; Ye, X.; Janan, F. ResDUnet: A Deep Learning-Based Left Ventricle Segmentation Method for Echocardiography. IEEE Access 2021, 9, 159755–159763. [Google Scholar] [CrossRef]
- Yang, X.; Zhang, Y.; Lo, B.; Wu, D.; Liao, H.; Zhang, Y.T. DBAN: Adversarial Network with Multi-Scale Features for Cardiac MRI Segmentation. IEEE J. Biomed. Health Inform. 2021, 25, 2018–2028. [Google Scholar] [CrossRef]
- Wang, L.; Wang, J.; Liu, Z.; Zhu, J.; Qin, F. Evaluation of a Deep-Learning Model for Multispectral Remote Sensing of Land Use and Crop Classification. Crop J. 2022. [Google Scholar] [CrossRef]
- Xu, J.; Zhu, Y.; Zhong, R.; Lin, Z.; Xu, J.; Jiang, H.; Huang, J.; Li, H.; Lin, T. DeepCropMapping: A Multi-Temporal Deep Learning Approach with Improved Spatial Generalizability for Dynamic Corn and Soybean Mapping. Remote Sens. Environ. 2020, 247, 111946. [Google Scholar] [CrossRef]
- Turkoglu, M.O.; D’Aronco, S.; Perich, G.; Liebisch, F.; Streit, C.; Schindler, K.; Wegner, J.D. Crop Mapping from Image Time Series: Deep Learning with Multi-Scale Label Hierarchies. Remote Sens. Environ. 2021, 264, 112603. [Google Scholar] [CrossRef]
- Ubbens, J.R.; Stavness, I. Deep Plant Phenomics: A Deep Learning Platform for Complex Plant Phenotyping Tasks. Front. Plant Sci. 2017, 8, 1190. [Google Scholar] [CrossRef] [PubMed]
- Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef] [PubMed]
- Boulent, J.; Foucher, S.; Théau, J.; St-Charles, P.L. Convolutional Neural Networks for the Automatic Identification of Plant Diseases. Front. Plant Sci. 2019, 10, 941. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.; Chen, Z.; Yang, M.; Zhang, R.; Cui, Y. A Multiscale Fusion Convolutional Neural Network for Plant Leaf Recognition. IEEE Signal Process. Lett. 2018, 25, 853–857. [Google Scholar] [CrossRef]
- Zulkifley, M.A.; Moubark, A.M.; Saputro, A.H.; Abdani, S.R. Automated Apple Recognition System Using Semantic Segmentation Networks with Group and Shuffle Operators. Agriculture 2022, 12, 756. [Google Scholar] [CrossRef]
- Ferentinos, K.P. Deep Learning Models for Plant Disease Detection and Diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
- Genaev, M.A.; Skolotneva, E.S.; Gultyaeva, E.I.; Orlova, E.A.; Bechtold, N.P.; Afonnikov, D.A. Image-Based Wheat Fungi Diseases Identification by Deep Learning. Plants 2021, 10, 1500. [Google Scholar] [CrossRef]
- Rangarajan Aravind, K.; Maheswari, P.; Raja, P.; Szczepański, C. Crop Disease Classification Using Deep Learning Approach: An Overview and a Case Study. Deep. Learn. Data Anal. 2020, 173–195. [Google Scholar] [CrossRef]
- Rahman, C.R.; Arko, P.S.; Ali, M.E.; Iqbal Khan, M.A.; Apon, S.H.; Nowrin, F.; Wasif, A. Identification and Recognition of Rice Diseases and Pests Using Convolutional Neural Networks. Biosyst. Eng. 2020, 194, 112–120. [Google Scholar] [CrossRef]
- Li, T.; Sun, F.; Sun, R.; Wang, L.; Li, M.; Yang, H. Chinese Herbal Medicine Classification Using Convolutional Neural Network with Multiscale Images and Data Augmentation. In Proceedings of the 2018 International Conference on Security, Pattern Analysis, and Cybernetics, SPAC 2018, Jinan, China, 14–17 December 2018; pp. 109–113. [Google Scholar] [CrossRef]
- Li, H.; Zhang, C.; Zhang, Y.; Zhang, S.; Ding, X.; Atkinson, P.M. A Scale Sequence Object-Based Convolutional Neural Network (SS-OCNN) for Crop Classification from Fine Spatial Resolution Remotely Sensed Imagery. Int. J. Digit. Earth 2021, 14, 1528–1546. [Google Scholar] [CrossRef]
- Wang, X.; Liu, J. Multiscale Parallel Algorithm for Early Detection of Tomato Gray Mold in a Complex Natural Environment. Front. Plant Sci. 2021, 12, 719. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Chen, S.; Ren, Y.; Zhang, Y.; Fu, J.; Fan, D.; Lin, J.; Wang, Q. Atrous Pyramid GAN Segmentation Network for Fish Images with High Performance. Electronics 2022, 11, 911. [Google Scholar] [CrossRef]
- Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep Learning for Smart Manufacturing: Methods and Applications. J. Manuf. Syst. 2018, 48, 144–156. [Google Scholar] [CrossRef]
- Jeschke, S.; Brecher, C.; Meisen, T.; Özdemir, D.; Eschert, T. Industrial Internet of Things and Cyber Manufacturing Systems. In Industrial Internet of Things; Springer: Cham, Switzerland, 2017; pp. 3–19. [Google Scholar] [CrossRef]
- Yin, S.; Li, X.; Gao, H.; Kaynak, O. Data-Based Techniques Focused on Modern Industry: An Overview. IEEE Trans. Ind. Electron. 2015, 62, 657–667. [Google Scholar] [CrossRef]
- Wang, D.; Guo, Q.; Song, Y.; Gao, S.; Li, Y. Application of Multiscale Learning Neural Network Based on CNN in Bearing Fault Diagnosis. J. Signal Processing Syst. 2019, 91, 1205–1217. [Google Scholar] [CrossRef]
- Yang, T.; Guo, Y.; Wu, X.; Na, J.; Fung, R.F. Fault Feature Extraction Based on Combination of Envelope Order Tracking and CICA for Rolling Element Bearings. Mech. Syst. Signal Process. 2018, 113, 131–144. [Google Scholar] [CrossRef]
- Hoang, D.T.; Kang, H.J. Rolling Element Bearing Fault Diagnosis Using Convolutional Neural Network and Vibration Image. Cogn. Syst. Res. 2019, 53, 42–50. [Google Scholar] [CrossRef]
- Guo, X.; Chen, L.; Shen, C. Hierarchical Adaptive Deep Convolution Neural Network and Its Application to Bearing Fault Diagnosis. Measurement 2016, 93, 490–502. [Google Scholar] [CrossRef]
- Ince, T.; Kiranyaz, S.; Eren, L.; Askar, M.; Gabbouj, M. Real-Time Motor Fault Detection by 1-D Convolutional Neural Networks. IEEE Trans. Ind. Electron. 2016, 63, 7067–7075. [Google Scholar] [CrossRef]
- Wang, H.; Li, S.; Song, L.; Cui, L. A Novel Convolutional Neural Network Based Fault Recognition Method via Image Fusion of Multi-Vibration-Signals. Comput. Ind. 2019, 105, 182–190. [Google Scholar] [CrossRef]
- Jiang, G.; He, H.; Yan, J.; Xie, P. Multiscale Convolutional Neural Networks for Fault Diagnosis of Wind Turbine Gearbox. IEEE Trans. Ind. Electron. 2019, 66, 3196–3207. [Google Scholar] [CrossRef]
- Shen, Y.; Wu, Q.; Huang, D.; Dong, S.; Chen, B. Fault Detection Method Based on Multi-Scale Convolutional Neural Network for Wind Turbine Gearbox. In Proceedings of the 16th IEEE International Conference on Control, Automation, Robotics and Vision, ICARCV 2020, Shenzhen, China, 13–15 December 2020; pp. 838–842. [Google Scholar] [CrossRef]
- Ding, X.; He, Q. Energy-Fluctuated Multiscale Feature Learning with Deep ConvNet for Intelligent Spindle Bearing Fault Diagnosis. IEEE Trans. Instrum. Meas. 2017, 66, 1926–1935. [Google Scholar] [CrossRef]
- Babu, G.S.; Zhao, P.; Li, X.L. Deep Convolutional Neural Network Based Regression Approach for Estimation of Remaining Useful Life. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; Volume 9642, pp. 214–228. [Google Scholar] [CrossRef]
- Guo, S.; Zhang, B.; Yang, T.; Lyu, D.; Gao, W. Multitask Convolutional Neural Network with Information Fusion for Bearing Fault Diagnosis and Localization. IEEE Trans. Ind. Electron. 2020, 67, 8005–8015. [Google Scholar] [CrossRef]
- Wang, B.; Lei, Y.; Li, N.; Wang, W. Multiscale Convolutional Attention Network for Predicting Remaining Useful Life of Machinery. IEEE Trans. Ind. Electron. 2021, 68, 7496–7504. [Google Scholar] [CrossRef]
- Zhu, J.; Chen, N.; Peng, W. Estimation of Bearing Remaining Useful Life Based on Multiscale Convolutional Neural Network. IEEE Trans. Ind. Electron. 2019, 66, 3208–3216. [Google Scholar] [CrossRef]
- Jiang, Y.; Lyu, Y.; Wang, Y.; Wan, P. Fusion Network Combined with Bidirectional LSTM Network and Multiscale CNN for Useful Life Estimation LSTM Network and Multiscale CNN for Useful Life Estimation. In Proceedings of the 12th International Conference on Advanced Computational Intelligence, ICACI 2020, Dali, China, 14–16 August 2020; pp. 620–627. [Google Scholar] [CrossRef]
- Neogi, N.; Mohanta, D.K.; Dutta, P.K. Review of Vision-Based Steel Surface Inspection Systems. J. Comput. High Educ. 2014, 2014, 50. [Google Scholar] [CrossRef]
- Xie, X. A Review of Recent Advances in Surface Defect Detection Using Texture Analysis Techniques. ELCVIA Electron. Lett. Comput. Vis. Image Anal. 2008, 7, 1–22. [Google Scholar] [CrossRef]
- Wang, R.; Shi, R.; Hu, X.; Shen, C. Remaining Useful Life Prediction of Rolling Bearings Based on Multiscale Convolutional Neural Network with Integrated Dilated Convolution Blocks. Shock. Vib. 2021, 2021, 6616861. [Google Scholar] [CrossRef]
- Li, H.; Zhao, W.; Zhang, Y.; Zio, E. Remaining Useful Life Prediction Using Multi-Scale Deep Convolutional Neural Network. Appl. Soft Comput. 2020, 89, 106113. [Google Scholar] [CrossRef]
Literature | Target Task | Network Structure | Method | Strength | Weakness |
---|---|---|---|---|---|
Gong et al., 2019 [49] | Hyperspectral Image | Spatial Pyramid Pooling | CNN with multiscale convolutional layers, using multiscale filter banks with different metrics to represent the features for HSI classification. | The accuracy is comparable to or even better than other classifications in the both spectral and spectral-spatial classification of the HSI image. | Extracts only the spatial features in the limited-size filtering or convolutional windows. |
Hu et al., 2018 [50] | Small Objects | Multiscale-Feature CNN | Identifying small objects by extracting features at different object convolution levels and applying multiscale features. | When compared with Faster RCNN, the accuracy of the small-object detection is significantly higher. | The performance is restricted by the computational costs and image representations. |
Cui et al., 2019 [51] | Hyperspectral Image | Atrous Spatial Pyramid Pooling | Integrating both fused features from multiple receptive fields and multiscale spatial features based on the structure of the feature pyramid at various levels. | Better accuracy compared with other classification methods for Indian Pine, Pavia University, and Salina Datasets. | The classification significantly depends on the quality and quantity of the labeled samples, which are costly and time consuming to obtain. |
Li et al., 2019 [52] | Aerial Image | Multiscale U-Net | The main structure is U-Net with cascaded dilated convolution at the bottom with varying dilation rates. | The best accuracy for the whole set is compared to four well-known methods using Inria Aerial Image Dataset. The best IoU in Chicago and Vienna Image in the same dataset. | The average IoU performance is still very weak, and especially in the Inria dataset. |
Gong et al., 2021 [44] | Hyperspectral Image | Multiscale Fusion + Spatial Pyramid Pooling | The main structure includes a 3D CNN module, a squeeze-and-excitation module, and a 2D CNN pyramid-pooling module. | The method was evaluated on three public hyperspectral classification datasets: Indian Pine, Salinas, and Pavia University. The classification accuracies were 96.09%, 97%, and 96.56%, respectively. | The method still has the misclassification of bricks and gravel. The classification performance is still weak, and especially in the Indian Pine dataset. |
Liu et al., 2021 [43] | Hyperspectral Image | Multiscale Fusion | Multiscale feature learning uses three simultaneous pretrained ResNet sub-CNNs, a fusion operation, and a U-shaped deconvolution network. A region proposal network (RPN) with an attention mechanism is used to extract building-instance locations, which are used to eliminate building occlusion. | When compared with a mask R-CNN, the proposed method improved the performance by 2.4% on the self-annotated building dataset of the instance-segmentation task, and by 0.17% on the ISPRS Vaihingen semantic-labeling-contest dataset. | The use of fusion strategies invariably results in increased computational and memory overhead. |
Liu et al., 2018 [53] | UC Merced Dataset, SIRI-WHU Dataset, Aerial Image Dataset (AID), | Multiscale CNN + SPP | The proposed method trains the network on multiscale images by developing a dual-branch CNN network: F-net (given that training is performed at a fixed scale), and V-net (given that training is performed with varied input scales per n-iterations). | The MCNN reached a classification accuracy of 96.66 ± 0.90 for the UC Merced Dataset, 93.75 ± 1.13 for the SIRI-WHU Dataset, and 91.80 ± 0.22 for the AID Dataset. | This method reduces the possibility of feature discrimination by focusing solely on the feature map from the last CNN layer and ignoring the feature data from additional layers. |
Gao et al., 2022 [46] | Hyperspectral Image | Multiscale Fusion | This method employs cross-spectral spatial-feature extraction (SSCEM). This module sent previous CNN layer information into the spatial and spectral extraction branches independently, and so changes in the other domain after each convolution could be fully exploited. | The proposed network excels in many deep-learning-based networks on three HSI datasets. It also cuts down on the number of training parameters for the network, which helps, to a certain extent, to prevent overfitting problems. | The performance is restricted by the complexity of the network structure, which implies a greater computational cost. |
Literature | Target Task | Network Structure | Method | Strength | Weakness |
---|---|---|---|---|---|
Wolterink et al., 2017 [63] | Vessel Segmentation | CNN + Stacked Dilation Convolution | CNN with ten-layer network. The first eight layers are the feature-extraction levels, whereas Layers 9 and 10 are fully connected classification layers. Each feature-extraction layer uses 32 kernels. The level of the dilation rate increases between Layers 2 and 7. | The myocardium and blood pool had Dice indices of 0.80 ± 0.06 and 0.93 ± 0.02, respectively, average distances to boundaries of 0.96 ± 0.31 and 0.89 ± 0.24 mm, respectively, and Hausdorff distances of 6.13 ± 3.76 and 7.07 ± 3.01 mm, respectively. | Due to hardware limitations, the work still used a large receptive field and led to a less precise prediction. |
Du et al., 2020 [64] | Vessel Segmentation | Dilated Residual Network + Modified SPP | The network’s inception module initializes a multilevel feature representation of cardiovascular pictures. The dilated-residual-network (DRN) component extracts features, classifies the pixels, and anticipates the segmentation zones. A hybrid pyramid-pooling network (HPPN) then aggregates the local and worldwide DRN information. | Best result in quantitative segmentation compared with four well-known methods in all five substructures (left ventricle (LV), right ventricle (RV), left atrium (LA), right atrium (RA), and LV myocardium (LV_Myo)). | The HD value of this method is higher than that of U-Net, which shows that it still has some issues with segmenting small targets. |
Kim et al., 2018 [45] | Lung Cancer | Multiscale Fusion CNN | Multiscale-convolution inputs with varying levels of inherent contextual abstract information in multiple scales with progressive integration and multistream feature integration in an end-to-end approach. | On two parts of the LUNA16 Dataset (V1 and V2), the method did much better than other approaches by a wide margin. The average CPMs were 0.908 for V1, and 0.942 for V2. | The anchor scheme used by the nodule detectors introduces an excessive number of hyperparameters that must be fine-tuned for each unique problem. |
Muralidharan et al., 2022 [65] | Chest X-ray | Multiscale Fusion | The input image is divided into seven modes, which are then fed into a multiscale deep CNN with 14 layers (blocks) and an additional four extra layers. Each block has an input layer, convolution layer, batch-normalization layer, dropout layer, and max-pooling layer, whereby the block is stacked three successive times. | The proposed model successfully differentiates COVID-19 from viral pneumonia and normal classes with accuracy, precision, recall, and F1-score values of 0.96, 0.97, 0.99, and 0.98, respectively. | The obtained results are still based on random combinations of the extracted modes, and so they need to run the model with every possible combination of the hyperparameters to obtain the desired result. |
Amer et al., 2021 [66] | Echocardiography | Multiscale Fusion + Cascaded Dilated Convolution | The network uses residual blocks and cascaded-dilated-convolution modules to pull both coarse and fine multiscale features from the input image. | Dice-similarity-performance measure of 95.1% compared with expert’s annotation and surpasses Deeplabv3 and U-Net performances by 8.4% and 1.2%, respectively. | The work only measures the image-segmentation performance, without including the LV-ejection-fraction (ED and ES) clinical cardiac indicators. |
Yang et al., 2021 [67] | Cardiac MRI | Dilated Convolution | The dilated block of the segmentation network captures and aggregates multiscale information to create segmentation probability maps. The discriminator part differentiates the segmentation probability map and the ground truth at the pixel level to provide confidence probability maps. | The Dice coefficients on the ACDC 2017 for both ED and ES are 0.94 and 0.89, respectively. The Hausdorff distances for both the ED and ES are 10.6 and 12.6 mm, respectively. | The model still produces weak Dice coefficients in both the ED and ES of the left-ventricle-myocardium part. |
Wang et al., 2021 [41] | Cardiac MRI | Multiscale Fusion/Dilated Convolution | The encoder part uses dilated convolution. The decoding part reconstructs the full-size skip-connection structure for contextual-semantic-information fusion. | The Dice coefficients on the ACDC 2017, MICCAI 2009, and MICCAI 2018 datasets reached 96.2%, 98.0%, and 96.8%, respectively. Overall, Jaccard indices of 0.897, 0.964, and 0.937 were observed, with Hausdorff distances of 7.0, 5.2, and 7.5 mm, respectively. | The work only measures the image-segmentation performance, without including the LV-ejection-fraction (ED and ES) clinical cardiac indicators. |
Amer et al., 2022 [30] | Echocardiography Lung Computed Tomography (CT) Images | U-Net + Multi-scale Spatial Attention + Dilated Convolution | The model uses a U-Net architecture with channel attention and multiscale spatial attention to learn multiscale feature representations with diverse modalities, as well as shape and size variability. | The proposed model outperformed the basic U-Net, ResDUnet, Attention U-Net, and U-Net3+ models by 4.1%, 2.5%, 1.8%, and 0.4%, respectively, on lung CT images. It also outperformed the basic U-Net, ResDUnet, Attention U-Net, and U-Net3++ models by 2.8%, 1.6%, 1.1%, and 0.6%, respectively, on the left-ventricle images. | The approach still struggles to capture edge details accurately, and it loses segmentation detail at complicated edges. |
Literature | Target Task | Network Structure | Method | Strength | Weakness |
---|---|---|---|---|---|
Hu et al., 2018 [74] | Plant Leaf | Multiscale Fusion CNN | With a list of bilinear interpolation procedures, the input image is split up into several low-resolution images. These images are then fed into the network so that it can learn to understand different features at different depths. | Produced a better accuracy rate in most of the MalayaKew Leaf Dataset and LeafSnap Plant Leaf Dataset. | The training process required a more complex sample set that needed to provide both whole and segmented images. |
Li et al., 2018 [80] | Chinese Herbal Medicines | Multiscale Fusion CNN | Near and far multiscale input images are fused together into a six-channel image using a CNN of three convolutional and three pooling layers. | The requirements of Chinese-herbal-medicine classification were met by the model, with a classification accuracy of more than 90%. | There are still many problems with the method, such as less training data, a less accurate classification, and less ability to avoid interference. |
Turkoglu et al., 2021 [70] | ZueriCrop Dataset | Early Fusion + CNN | The model consists of layered CNN networks. In a hierarchical tree, different network levels are indicative of increasingly finer label resolutions. At the refining stage, the three-dimensional probability regions from three different stages are passed to the CNN. | The achieved precision, recall, F1 score, and accuracy are 0.601, 0.498, 0.524, and 0.88, respectively, which outperforms the advanced benchmarked methods. | It is unclear how to adapt the model layout to standard CNNs without affecting the feature-extraction backbone for recurrent networks. |
Li et al., 2021 [81] | Crop Image (UAVSAR and RapidEye) | Multiscale Fusion CNN | A sequence of object scales is gradually fed into the CNN, which transforms the acquired features from smaller scales into larger scales by adopting gradually larger convolutional windows. | This technique provides a novel method for solving the issue of image classification for a variety of terrain types. | The model still generates blurred boundaries between crop fields due to the requirement for an input patch. |
Wang et al., 2021 [82] | Tomato Gray Mold Dataset | Feature Fusion + MobileNetv2 + Channel Attention Module | MobileNetv2 was used as the base network, whereby multiscale feature fusions provide the fused feature maps. The efficient channel-attention module then enhances these feature maps, and the relevant feature paths are weighted. The resultant features were used to predict mold on tomatoes. | Precision and F1 score reached 0.934 and 0.956, respectively, and it outperformed the Tiny-YOLOv3, MobileNetv2-YOLOv3, MobileNetv2-SSD, and Faster R-CNN performances. | Missed detection persists, and especially at extreme shooting angles, and it imposes inaccurate early diagnosis at different parts under different shooting conditions. |
Zhou et al., 2022 [83] | Fish Dataset | ASPP + GAN | A generative adversarial network (GAN) is introduced before applying CNN to augment the existing dataset. Then, the ASPP module fuses the input and output of a dilated convolutional layer with a short sample rate to acquire rich multiscale contextual information. | On the validation dataset, the obtained F1 score, GA, and mIoU reached 0.961, 0.981, and 0.973, respectively. | The model still loses a lot of segmentation detail at the complicated edges. |
Literature | Target Task | Network Structure | Method | Strength | Weakness |
---|---|---|---|---|---|
Ding X., He Q., 2017 [95] | Fault Bearing Dataset | Wavelet-Packet-Energy (WPE) Image + Deep Convolutional Network | The deep convolutional network has three convolutional layers, two max-pooling layers, and one multiscale layer. The multiscale layer combines the final convolutional layer’s output with the subsequent pooling layer’s output to diagnose any issue on the bearing. Six spindle-bearing datasets with ten-class health states under four loads are used to verify the proposed method performance. | The deep convolutional network achieved stable and high identification accuracies of 98.8%, 98.8%, 99.4%, 99.4%, 99.8, and 99.6 for datasets A, B, C, D, E, and F, respectively. | Increased complexity, which implies a greater computational cost and is impractical in practice. |
Jiang et al., 2020 [100] | C-MAPSS Dataset | Bi-LSTM and Multiscale CNN Fusion Network | The last 3 layers of the fusion network used Bi-LSTM with 64 cells, a multiscale CNN with 32 convolution kernels, and 2 × 2 maximum pooling kernels. The combined output of the two networks determines the predicted RUL. | The proposed fusion model has better RMSE indicators compared with the CNN, LSTM, and Bi-LSTM, tested on four subsets of the dataset. | The method is prone to overfitting, and it is difficult to use the dropout algorithm to prevent it because recurrent connections to LSTM units are probabilistically removed from the activation and weight updates during network training. |
Wang et al., 2021 [103] | Pronostia Bearing Dataset | Multiscale-CNN with Dilated Convolution Block | A complex signal is decomposed using an integrated dilated convolution block. Multiple stacked integrated dilated convolution blocks are fused to create a multiscale feature extractor to mitigate the information loss. | The mean absolute error (MAE) and root mean squared error (RMSE) of the proposed method are the lowest among the comparison methods. | The method does not include uncertainty prediction in the deep-learning model, making it impractical in practice. |
Zhu et al., 2019 [99] | Pronostia Bearing Dataset | Multiscale-CNN | The time-frequency representation (TFR) can represent a complex and nonstationary signal of the bearing degradation. The TFRs and their assigned RULs were sent to a multiscale model structure to pull out more features that could be used to predict the RUL. The multiscale layer maintains the global and local properties to boost the network capacity. | The mean absolute error (MAE) and root mean squared error (RMSE) of the proposed method are the lowest among the other data-driven methods. | The performance is restricted by the complexity of the network structure, which implies a greater computational cost. |
Li et al., 2020 [104] | C-MAPSS Dataset | Multiscale Deep Convolutional Attention Network | The MS-DCNN has three different sizes of convolution operations and multiscale blocks that are put together in parallel. The three multiscale-block output-feature maps are passed to a standard CNN after the multiscale convolution. At the end of the MS-DCNN network, one neuron is connected to provide the final result of the predicted RUL value. | Compared with the other advanced methods, such as the semi-supervised setup, MODBNE, DBN, and LSTM, the RMSE indicators of the proposed method reduced the error by 8.92%, 14.87%, 3.55%, 1.94%, respectively, tested on four datasets. | To learn the prediction models, the method needs a substantial amount of data, which may not be feasible in real-life situations. |
Wang et al., 2021 [98] | Pronostia Bearing Dataset | Multiscale Convolutional Attention Network | First, self-attention modules are constructed to combine multisensor data. Then, an automatic multiscale learning technique is implemented. Finally, high-level representations are loaded into dynamic dense layers for regression analysis and RUL estimation. | The proposed strategy fuses multisensor data and improved RUL-prediction accuracy. Its prediction performance was better than previous prognostics methods. | The approach incorrectly presumes that the monitoring data collected by different sensors contribute equally to the RUL estimation, which leads to an inaccurate RUL prediction. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elizar, E.; Zulkifley, M.A.; Muharar, R.; Zaman, M.H.M.; Mustaza, S.M. A Review on Multiscale-Deep-Learning Applications. Sensors 2022, 22, 7384. https://doi.org/10.3390/s22197384
Elizar E, Zulkifley MA, Muharar R, Zaman MHM, Mustaza SM. A Review on Multiscale-Deep-Learning Applications. Sensors. 2022; 22(19):7384. https://doi.org/10.3390/s22197384
Chicago/Turabian StyleElizar, Elizar, Mohd Asyraf Zulkifley, Rusdha Muharar, Mohd Hairi Mohd Zaman, and Seri Mastura Mustaza. 2022. "A Review on Multiscale-Deep-Learning Applications" Sensors 22, no. 19: 7384. https://doi.org/10.3390/s22197384
APA StyleElizar, E., Zulkifley, M. A., Muharar, R., Zaman, M. H. M., & Mustaza, S. M. (2022). A Review on Multiscale-Deep-Learning Applications. Sensors, 22(19), 7384. https://doi.org/10.3390/s22197384