A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images
<p>Overview of the proposed work. Lite-SRL: on-board self-supervised representation learning algorithm for RSSC task; CWB: computation workload balancing module; DHP: on-board distributed hybrid parallelism training framework.</p> "> Figure 2
<p>Network structure.</p> "> Figure 3
<p>(<b>a</b>) We design the training process of Lite-SRL as a sequential structure to adapt model parallelization. (<b>b</b>) Schematic of the proposed distributed hybrid parallel (DHP) training baseline.</p> "> Figure 4
<p>Illustration of our proposed distributed hybrid parallel training baseline and dynamic chain system. (<b>a</b>) Distributed hybrid parallel training baseline. (<b>b</b>) Dynamic chain system. In iteration 1, Devices 1, 4, and 6 forms a computation chain, while Devices 3 and 5 are in a waiting state. During this time, Device 5 completes the forward computation from Device 2. At the end of iteration 1, Device 6 disconnects from Device 4, automatically links to Device 5, and immediately performs the third part of the training, Device 4 links to Device 3 and waits to link with Device 6. In iteration 3, Device 4 links to Device 6, and the rest nodes also link to available nodes. Subsequent iterations follow the same procedure.</p> "> Figure 5
<p>Illustration of data augmentations.</p> "> Figure 6
<p>The flowchart of self-supervised learning experiments.</p> "> Figure 7
<p>Guaranteed accuracy with less computation. (<b>a</b>) Fine-tune and freeze experiment results on NWPU-45 dataset with training proportion of 20%, the horizontal axis compares the number of parameters. (<b>b</b>) Freeze experiment results on NWPU-45 dataset with training proportion of 20%; horizontal axis compares the training time consumption per iteration, and the diameter of the bubble is proportional to the memory consumption during network training.</p> "> Figure 8
<p>The t-SNE visualization of feature distributions on different datasets. (<b>a</b>) Lite-SRL model on WHU-SAR6 dataset; (<b>b</b>) fine-tuned Lite-SRL model on WHU-SAR6 dataset; (<b>c</b>) Lite-SRL model on OpenSARUrban dataset; (<b>d</b>) fine-tuned Lite-SRL model on OpenSARUrban dataset; for SAR dataset due to the imaging mechanism, we did not use ImageNet’s pre-trained model. (<b>e</b>) ImageNet pre-trained model on NWPU-45 dataset; (<b>f</b>) Lite-SRL model on NWPU-45 dataset; (<b>g</b>) fine-tuned Lite-SRL model on NWPU-45 dataset; (<b>h</b>) ImageNet pre-trained model on AID dataset; (<b>i</b>) Lite-SRL model on AID dataset; (<b>j</b>) fine-tuned Lite-SRL model on NWPU-45 dataset.</p> "> Figure 9
<p>Confusion matrix of fine-tuned results: (<b>a</b>) on OpenSARUrban with 20% training proportion; (<b>b</b>) on WHU-SAR6 with 20% training proportion.</p> "> Figure 10
<p>Confusion matrix of fine-tuned results on NWPU-45 20% training proportion.</p> "> Figure 11
<p>Confusion matrix of fine-tuned results on AID 50% training proportion.</p> "> Figure 12
<p>The flowchart of Lite-SRL’s deployment corresponds to the content in the following. N-Layers corresponds to <a href="#remotesensing-14-02956-f013" class="html-fig">Figure 13</a>a; Statistic the Memory Usage corresponds to <a href="#remotesensing-14-02956-f013" class="html-fig">Figure 13</a>b; Statistic the Time Consumption corresponds to <a href="#remotesensing-14-02956-f013" class="html-fig">Figure 13</a>c; Candidate Partition Points corresponds to <a href="#remotesensing-14-02956-f014" class="html-fig">Figure 14</a>a; Best Partition Point corresponds to <a href="#remotesensing-14-02956-f014" class="html-fig">Figure 14</a>b.</p> "> Figure 13
<p>Data collected by CWB. (<b>a</b>) Partitionable layers contained in the Lite-SRL network structure, corresponding to 28 partitionable points <math display="inline"><semantics> <mrow> <mrow> <mo>{</mo> <mrow> <msub> <mi>p</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>⋯</mo> <msub> <mi>p</mi> <mrow> <mn>28</mn> </mrow> </msub> </mrow> <mo>}</mo> </mrow> </mrow> </semantics></math>. (<b>b</b>) CWB calculated the memory workload occupied by each network layer during the training process, including the intermediate variables and network parameters for each layer. (<b>c</b>) CWB measured time consumption, including inference latency and backward propagation latency of each layer when trained on TX2, together with the data transmission latency between TX2. The transmission latency was derived from the gradient data size between two layers and the inter-device transfer rate.</p> "> Figure 14
<p>CWB calculates the optimal partition points. Two sets of candidate partition points are <math display="inline"><semantics> <mrow> <mrow> <mo>{</mo> <mrow> <msub> <mi>p</mi> <mn>2</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mn>3</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mn>4</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mn>5</mn> </msub> </mrow> <mo>}</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mo> </mo> <mrow> <mo>{</mo> <mrow> <msub> <mi>p</mi> <mn>7</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mn>8</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mn>9</mn> </msub> <mo>,</mo> <msub> <mi>p</mi> <mrow> <mn>10</mn> </mrow> </msub> </mrow> <mo>}</mo> </mrow> </mrow> </semantics></math>, the rest of the partition points have been screened out as they cannot satisfy the memory allocation requirements. (<b>a</b>) Runtime proportion of each node under candidate partition points. (<b>b</b>) Using Equation (9) to calculate equipment utilization evaluation indices under candidate partition points.</p> "> Figure 15
<p>The left side is the illustration of baseline and the right side is the illustration of dynamic chain system. (<b>a</b>) Lite-SRL with ResNet-18 as encoder; three nodes are required to complete the training of a mini-batch, baseline uses six nodes to form two chains, and dynamic can form three chains. (<b>b</b>) Lite-SRL with ResNet-34 as encoder; four nodes are required to complete the training of a mini-batch. Baseline forms one chain with two nodes idle, while the dynamic chain system can schedule all nodes for training. (<b>c</b>) Lite-SRL with ResNet-50 as encoder, five nodes are required to complete the training of a mini-batch. Baseline forms one chain with 1 node idle, while the dynamic chain system can schedule all nodes for training.</p> "> Figure A1
<p>The total images number of RSSC datasets, i.e., OpenSARUrban [<a href="#B14-remotesensing-14-02956" class="html-bibr">14</a>], WHU-SAR6 [<a href="#B11-remotesensing-14-02956" class="html-bibr">11</a>], NWPU-Resisc45 [<a href="#B3-remotesensing-14-02956" class="html-bibr">3</a>], AID [<a href="#B15-remotesensing-14-02956" class="html-bibr">15</a>] compared with natural image datasets, i.e., ImageNet.</p> ">
Abstract
:1. Introduction
- To improve the scene classification accuracy under insufficient annotated data, we proposed a simple yet effective self-supervised representation learning algorithm called Lite-SRL. To reduce computation consumption, we design a lightweight contrastive learning structure in Lite-SRL and adopt the stop-gradient operation;
- To realize on-board deployment of Lite-SRL algorithm, we proposed a training framework called DHP and a generic computation workload balancing module CWB. As far as we know, we represent the first work to combine self-supervised learning with on-board data processing;
- Extensive experiments on four representative datasets demonstrated that Lite-SRL could improve the scene classification accuracy under limited annotated data, and it is generalizable to SAR and optical images. Compared with six state-of-the-art methods, Lite-SRL had clear advantages in overall accuracy, number of parameters, memory consumption, and training latency;
- Eventually, to evaluate the proposed work’s on-board operational capability, we transplant Lite-SRL to the low-power computing platform NVIDIA Jetson TX2.
2. Related Works
2.1. RSSC under Limited Annotated Samples
2.2. Self-Supervised Contrastive Learning
2.3. Distributed Training under Limited Resources
3. Methods
3.1. Overview of the Proposed Framework
3.2. Lite-SRL Self-Supervised Representation Learning Network
3.2.1. Network Structure
Algorithm 1. Learning Procedure of Lite-SRL |
E: Encoder with |
P: Prediction MLP Aug: random image augmentation |
: parameters of E and P Stop: stop-gradient operation Input: Training samples Output: negative cosine similarity loss |
1: for number of training epochs do |
2: Training samples in a minibatch form |
3: Do augmentation 4: In Lite-SRL 2-way do and ; and 5: Calculate negative cosine similarity with stop-gradient operation 6: Do backwards propagation with SGD optimizer 7: Update weights 8: end for 9: After training, use pre-trained model for downstream Remote Sensing Scene Classification |
3.2.2. Lite-SRL Network Partition
3.3. Distributed Training Strategy
3.3.1. Computation Workload Balancing Module
Algorithm 2. CWB search for the best partition point |
Step1:CWB performs memory workload balancing Step2:CWB performs time equalization 1: Assign and to 2: Assume 3 TX2s can satisfy memory allocation, then 2 sets of candidate partition point that satisfy memory workload balancing are recorded as |
3: for a in do 4: for b in do 5: Partition point 1 adopts a, partition point 2 adopts b 6: Denote the running time of as 7: Denote the running time of as 8: Denote the running time of as |
9: The training time for a mini-batch is |
10: The partition point use , the ratio of running time to waiting time of is |
11: Calculate the equipment utilization indices E using Equation (9) 12: end for 13: end for 14: The partition point of E with the highest score is the best partition point |
3.3.2. Dynamic Chain System
4. Experimental Setups
4.1. Datasets Description
- The OpenSARUrban [14] dataset consists of 10 categories of urban scene images collected from Sentinel-1; its scene images cover 21 major cities in China. Each category contains about 40 to 2000 images with a size of 100 × 100 pixels, and the resolution of the images is about 20 m;
- The WHU-SAR6 [11] dataset consists of six categories of scene images collected form Sentinel-1 and GF-3. Each category contains about 250 to 420 images with ranging in size from 500 to 600 pixels. Since the total number of WHU-SAR6 images is relatively small, to increase the dataset volume we crop the images into small patches of 256 × 256 pixels without destroying the scene semantic information.
- The NWPU-RESISC45 [3] dataset is the current largest open benchmark dataset for scene classification task, consisting of 45 categories of scene images. Each category contains 700 images with a size of 256 × 256 pixels, and the spatial resolution of the images is about 0.2 to 30 m.
- The AID [15] dataset consists of 30 categories of scene images; each category containing about 200 to 400 images, for a total of 10,000 samples, each with a size of 600 × 600 pixels.
4.2. Data Augmentation
4.3. Implementation Details
- Experiments of self-supervised learning. In this part we use workstations to compare the proposed Lite-SRL with other advanced self-supervised methods comprehensively. The two workstations are identically configured with NVIDIA RTX 3090GPU, Intel Xeon CPU E5-1650, and 64 G RAM.
- Experiments for on-board deployment of Lite-SRL algorithm. We used the proposed distributed training modules and provided detailed records for the deployment process. The experimental on-board computing platform consists of NVIDIA Jetson TX2 nodes and a high-speed switch.
5. Experimental Results
5.1. Guaranteed Accuracy with Less Computation
5.2. Self-Supervised Representation Extractor
5.3. Improving the Scene Classification Accuracy with Limited Annotated Data
5.4. Confusion Matrix Analysis
6. Deployment of Lite-SRL
6.1. Computation Workload Balancing
6.2. Distributed Training with Higher Efficiency
7. Conclusions
- We will design a dedicated lightweight feature extractor in the self-supervised structure to further reduce the memory computation;
- We will explore techniques such as gradient compression, network pruning, etc., to further improve distributed training efficiency;
- We will explore hardware acceleration solutions for onboard distributed training;
- We expect to add more remote sensing observation missions to on-board distributed self-supervised training applications.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Abbreviation | Full Name |
---|---|
AID | Aerial Image Dataset |
BN | Batch Normalization |
BYOL | Bootstrap Your Own Latent |
CNN | Convolution Neural Network |
CWB | Computation workload Balancing module |
MG-CAP | Multi-Granularity Canonical Appearance Pools |
MLP | Multi-Layer Perceptron |
MoCo | Momentum Contrast for Visual Representation Learning |
MTL | Multitask Learning |
NWPU-45 | NWPU-Resisc45 Dataset |
DHP | Distributed Hybrid Parallelism Training Framework |
Lite-SRL | Lightweight Self-supervised Representation Learning algorithm |
ReLU | Rectified Linear Unit |
RSIs | Remote Sensing Images |
RSSC | Remote Sensing Scene Classification |
SGD | Stochastic Gradient Descent |
SimCLR | Simple Framework For Contrastive Learning |
Simsiam | Simple Siamese Representation Learning |
SwAV | Unsupervised Learning By Contrasting Cluster Assignments |
t-SNE | T-Distributed Stochastic Neighbor Embedding |
References
- Hu, F.; Xia, G.-S.; Hu, J.; Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef] [Green Version]
- Ni, K.; Liu, P.; Wang, P. Compact Global-Local Convolutional Network with Multifeature Fusion and Learning for Scene Classification in Synthetic Aperture Radar Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7284–7296. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Xu, X.; Zhang, X.; Zhang, T. Multi-Scale SAR Ship Classification with Convolutional Neural Network. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Online Event, 11–16 July 2021; pp. 4284–4287. [Google Scholar]
- Lu, X.; Sun, X.; Diao, W.; Feng, Y.; Wang, P.; Fu, K. LIL: Lightweight Incremental Learning Approach through Feature Transfer for Remote Sensing Image Scene Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5611320. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. Squeeze-And-Excitation Laplacian Pyramid Network with Dual-Polarization Feature Fusion for Ship Classification in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4019905. [Google Scholar] [CrossRef]
- Gu, Y.; Wang, Y.; Li, Y. A Survey on Deep Learning-Driven Remote Sensing Image Scene Understanding: Scene Classification, Scene Retrieval and Scene-Guided Object Detection. Appl. Sci. 2019, 9, 2110. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Zhang, X.; Ke, X.; Liu, C.; Xu, X. HOG-ShipCLSNet: A Novel Deep Learning Network with HOG Feature Fusion for SAR Ship Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5210322. [Google Scholar] [CrossRef]
- Liao, N.; Datcu, M.; Zhang, Z.; Guo, W.; Zhao, J.; Yu, W. Analyzing the Separability of SAR Classification Dataset in Open Set Conditions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 7895–7910. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S. HyperLi-Net: A Hyper-Light Deep Learning Network for High-Accurate and High-Speed Ship Detection from Synthetic Aperture Radar Imagery. ISPRS J. Photogramm. Remote Sens. 2020, 167, 123–153. [Google Scholar] [CrossRef]
- Su, B.; Liu, J.; Su, X.; Luo, B.; Wang, Q. CFCANet: A Complete Frequency Channel Attention Network for SAR Image Scene Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 11750–11763. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. A Polarization Fusion Network with Geometric Feature Embedding for SAR Ship Classification. Pattern Recognit. 2022, 123, 108365. [Google Scholar] [CrossRef]
- Dumitru, C.O.; Schwarz, G.; Datcu, M. SAR Image Land Cover Datasets for Classification Benchmarking of Temporal Changes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1571–1592. [Google Scholar] [CrossRef]
- Zhao, J.; Zhang, Z.; Yao, W.; Datcu, M.; Xiong, H.; Yu, W. OpenSARUrban: A Sentinel-1 SAR Image Dataset for Urban Interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 187–203. [Google Scholar] [CrossRef]
- Xia, G.-S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhang, T.; Zhang, X. A Full-Level Context Squeeze-And-Excitation ROI Extractor for SAR Ship Instance Segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4506705. [Google Scholar] [CrossRef]
- Kolesnikov, A.; Zhai, X.; Beyer, L. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1920–1929. [Google Scholar]
- Noroozi, M.; Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 69–84. [Google Scholar]
- Stojnic, V.; Risojevic, V. Self-supervised learning of remote sensing scene representations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 1182–1191. [Google Scholar]
- Zhang, T.; Zhang, X.; Shi, J.; Wei, S.; Wang, J.; Li, J.; Su, H.; Zhou, Y. Balance Scene Learning Mechanism for Offshore and Inshore Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4004905. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; Volume 119, pp. 1597–1607. [Google Scholar]
- Ayush, K.; Uzkent, B.; Meng, C.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10181–10190. [Google Scholar]
- Franklin, D. NVIDIA Developer Blog: NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge. Available online: https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edge/ (accessed on 13 April 2022).
- Xu, X.; Zhang, X.; Zhang, T. Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images. Remote Sens. 2022, 14, 1018. [Google Scholar] [CrossRef]
- Aitech’s S-A1760 Venus™ Brings NVIDIA-Based AI Supercomputing to Next Generation Space Applications: Radiation-CharActerized COTS System Qualified for Use in Small Sat Clusters and Short-Duration Spaceflights. Available online: https://aitechsystems.com/aitechs-s-a1760-venus-brings-nvidia-based-ai-supercomputing-to-next-generation-space-applications/ (accessed on 13 April 2022).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Processing Syst. 2019, 32, 8026–8037. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Shazeer, N.; Cheng, Y.; Parmar, N.; Tran, D.; Vaswani, A.; Koanantakool, P.; Hawkins, P.; Lee, H.; Hong, M.; Young, C.; et al. Mesh-tensorflow: Deep learning for supercomputers. arXiv 2018, arXiv:1811.02084. [Google Scholar]
- Onoufriou, G.; Bickerton, R.; Pearson, S.; Leontidis, G. Nemesyst: A hybrid parallelism deep learning-based framework applied for internet of things enabled food retailing refrigeration systems. Comput. Ind. 2019, 113, 103133. [Google Scholar] [CrossRef] [Green Version]
- Grill, J.-B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G. Bootstrap your own latent: A new approach to self-supervised learning. arXiv 2020, arXiv:2006.07733. [Google Scholar]
- Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 15750–15758. [Google Scholar]
- Li, X.; Shi, D.; Diao, X.; Xu, H. SCL-MLNet: Boosting Few-Shot Remote Sensing Scene Classification via Self-Supervised Contrastive Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5801112. [Google Scholar] [CrossRef]
- Li, Y.; Shao, Z.; Huang, X.; Cai, B.; Peng, S. Meta-FSEO: A Meta-Learning Fast Adaptation with Self-Supervised Embedding Optimization for Few-Shot Remote Sensing Scene Classification. Remote Sens. 2021, 13, 2776. [Google Scholar] [CrossRef]
- Tao, C.; Qi, J.; Lu, W.; Wang, H.; Li, H. Remote Sensing Image Scene Classification With Self-Supervised Paradigm Under Limited Labeled Samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8004005. [Google Scholar] [CrossRef]
- Kang, J.; Fernandez-Beltran, R.; Duan, P.; Liu, S.; Plaza, A.J. Deep Unsupervised Embedding for Remotely Sensed Images Based on Spatially Augmented Momentum Contrast. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2598–2610. [Google Scholar] [CrossRef]
- Jung, H.; Oh, Y.; Jeong, S.; Lee, C.; Jeon, T. Contrastive Self-Supervised Learning with Smoothed Representation for Remote Sensing. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8010105. [Google Scholar] [CrossRef]
- Zhao, L.; Luo, W.; Liao, Q.; Chen, S.; Wu, J. Hyperspectral Image Classification with Contrastive Self-Supervised Learning under Limited Labeled Samples. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6008205. [Google Scholar] [CrossRef]
- Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1422–1430. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. arXiv 2020, arXiv:2006.09882. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Kim, S.; Yu, G.-I.; Park, H.; Cho, S.; Jeong, E.; Ha, H.; Lee, S.; Jeong, J.S.; Chun, B.-G. Parallax: Sparsity-aware data parallel training of deep neural networks. In Proceedings of the Fourteenth EuroSys Conference, Dresden, Germany, 25–28 March 2019; pp. 1–15. [Google Scholar]
- Jia, Z.; Zaharia, M.; Aiken, A. Beyond data and model parallelism for deep neural networks. Proc. Mach. Learn. Syst. 2019, 1, 1–13. [Google Scholar]
- Lee, S.; Kim, J.K.; Zheng, X.; Ho, Q.; Gibson, G.; Xing, P. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning; Carnegie Mellon University: Pittsburgh, PA, USA, 2014; pp. 2834–2842. [Google Scholar]
- Akintoye, S.B.; Han, L.; Zhang, X.; Chen, H.; Zhang, D. A hybrid parallelization approach for distributed and scalable deep learning. arXiv 2021, arXiv:2104.05035. [Google Scholar] [CrossRef]
- Demirci, G.V.; Ferhatosmanoglu, H. Partitioning sparse deep neural networks for scalable training and inference. In Proceedings of the ACM International Conference on Supercomputing, Virtual Event, 14–17 June 2021; pp. 254–265. [Google Scholar]
- Moreno-Alvarez, S.; Haut, J.M.; Paoletti, M.E.; Rico-Gallego, J.A. Heterogeneous model parallelism for deep neural networks. Neuro Comput. 2021, 441, 1–12. [Google Scholar] [CrossRef]
- Das, D.; Avancha, S.; Mudigere, D.; Vaidynathan, K.; Sridharan, S.; Kalamkar, D.; Kaul, B.; Dubey, P. Distributed deep learning using synchronous stochastic gradient descent. arXiv 2016, arXiv:1602.06709. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Van Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2811–2821. [Google Scholar] [CrossRef]
- Chen, Z.; Wang, S.; Hou, X.; Shao, L.; Dhabi, A. Recurrent transformer network for remote sensing scene categorisation. In Proceedings of the 2018 British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; Volume 266, p. 0987. [Google Scholar]
- Wang, S.; Guan, Y.; Shao, L. Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification. IEEE Trans. Image Proces. 2020, 29, 5396–5407. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Z.; Luo, Z.; Li, J.; Chen, C.; Piao, Y. When Self-Supervised Learning Meets Scene Classification: Remote Sensing Scene Classification Based on a Multitask Learning Framework. Remote Sens. 2020, 12, 3276. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, X. HTC+ for SAR Ship Instance Segmentation. Remote Sens. 2022, 14, 2395. [Google Scholar] [CrossRef]
Datasets | Images Number | Categories Number | Training Proportions |
---|---|---|---|
OpenSARUrban 1 [14] | 16679 | 10 | 10%, 20% |
WHU-SAR6 2 [11] | 17590 | 6 | 10%, 20% |
NWPU-RESISC45 [3] | 31500 | 45 | 10%, 20% |
AID [15] | 10000 | 30 | 10%, 20%, 50% |
Method: Freeze | Parameters (Millions) | Overall Accuracy (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
WHU-SAR6 | OpenSARUrban | NWPU-45 | AID | ||||||
10% | 20% | 10% | 20% | 10% | 20% | 10% | 20% | ||
ImageNet 1 (Supervised) [16] | - | - | - | - | - | 73.17 | 77.08 | 79.40 | 80.45 |
SimCLR [22] | 13.57 | 83.40 | 86.73 | 67.87 | 68.33 | 86.45 | 88.32 | 85.52 | 87.23 |
MoCo-v2 [43] | 22.48 | 82.39 | 85.07 | 65.52 | 66.07 | 83.37 | 86.63 | 84.56 | 86.05 |
SWAV [44] | 18.45 | 83.04 | 86.30 | 65.98 | 67.28 | 84.16 | 87.85 | 84.85 | 86.59 |
BYOL [32] | 31.81 | 86.11 | 87.75 | 68.36 | 69.73 | 88.63 | 90.06 | 87.24 | 88.32 |
SimSiam [33] | 22.73 | 87.59 | 88.64 | 70.20 | 70.86 | 91.19 | 91.26 | 89.15 | 90.49 |
Lite-SRL (ours) | 12.82 | 87.71 | 88.56 | 70.23 | 71.09 | 91.22 | 91.28 | 89.27 | 90.67 |
Method: Fine-Tune | Parameters (Millions) | Overall Accuracy (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
WHU-SAR6 | OpenSARUrban | NWPU-45 | AID | ||||||
10% | 20% | 10% | 20% | 10% | 20% | 10% | 20% | ||
Randomly initialized | - | - | - | - | - | 77.16 | 82.87 | 80.63 | 83.47 |
ImageNet (Supervised) | - | - | - | - | - | 84.74 | 89.93 | 89.81 | 90.54 |
SimCLR | 13.57 | 91.85 | 93.74 | 80.21 | 83.87 | 90.35 | 92.02 | 91.32 | 93.54 |
MoCo-v2 | 22.48 | 90.59 | 92.70 | 79.07 | 82.75 | 88.71 | 90.56 | 89.96 | 91.47 |
SWAV | 18.45 | 91.58 | 93.37 | 79.85 | 83.63 | 89.26 | 92.07 | 91.53 | 92.84 |
BYOL | 31.81 | 93.21 | 94.86 | 80.62 | 84.88 | 90.57 | 92.94 | 91.95 | 93.68 |
SimSiam | 22.73 | 94.77 | 95.69 | 81.49 | 85.29 | 92.68 | 93.48 | 92.38 | 94.63 |
Lite-SRL (ours) | 12.82 | 94.57 | 95.83 | 81.76 | 85.43 | 92.77 | 93.51 | 92.55 | 94.82 |
Method | Overall Accuracy (%) | |||
---|---|---|---|---|
NWPU 10% | NWPU 20% | AID 20% | AID50% | |
D-CNN with GoogLeNet [55] | 86.89 | 90.49 | 86.89 | 90.49 |
RTN [56] | 89.90 | 92.71 | 92.44 | - |
MG-CAP(Sqrt-E) [57] | 90.83 | 92.95 | 93.34 | 96.12 |
ResNet-101 [53] | 89.41 | 92.51 | 93.31 | 96.34 |
ResNet-101+MTL [58] | 91.61 | 93.93 | 93.67 | 96.61 |
ResNet-18+Lite-SRL (ours) | 92.77 | 93.51 | 94.82 | 95.78 |
ResNet-101+Lite-SRL (ours) | 93.41 | 94.43 | 95.29 | 96.82 |
Average Running Time of One Iteration (ms) | Partition 1 | Partition 2 | Partition 3 | ||||
---|---|---|---|---|---|---|---|
Node 1 | Node 2 | Node 3 | Node 4 | Node 5 | Node 6 | ||
Baseline | 3572 | Average runtime of each node in one iteration (ms) | |||||
1035 | 1039 | 1145 | 1139 | 921 | 923 | ||
Running iterations | |||||||
500 | 500 | 500 | 500 | 500 | 500 |
Average Running Time of One Iteration (ms) | Partition 1 | Partition 2 | Partition 3 | ||||
---|---|---|---|---|---|---|---|
Node 1 | Node 2 | Node 3 | Node 4 | Node 5 | Node 6 | ||
Dynamic | 2750 | Average runtime of each node in one iteration (ms) | |||||
1036 | 1038 | 1037 | 1140 | 1142 | 922 | ||
Running iterations | |||||||
329 | 331 | 340 | 406 | 594 | 1000 |
Method | Memory Consumption (MB) | Distributed Training Time Consumption | Accuray 2 (%) | |||
---|---|---|---|---|---|---|
Baseline (ms) | Dynamic (ms) | Improvement | Baseline | Dynamic | ||
ResNet18 + Lite-SRL | 7599.7 | 3572 | 2750 | 23.0% | 91.31 | 91.27 |
ResNet34 + Lite-SRL | 10,185.9 | 4895 | 3984 | 18.6% | 91.75 | 91.78 |
ResNet50 1 + Lite-SRL | 13,039.3 | 6473 | 5962 | 6.9% | 92.11 | 92.09 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, X.; Li, C.; Lei, Y. A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. Remote Sens. 2022, 14, 2956. https://doi.org/10.3390/rs14132956
Xiao X, Li C, Lei Y. A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. Remote Sensing. 2022; 14(13):2956. https://doi.org/10.3390/rs14132956
Chicago/Turabian StyleXiao, Xiao, Changjian Li, and Yinjie Lei. 2022. "A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images" Remote Sensing 14, no. 13: 2956. https://doi.org/10.3390/rs14132956
APA StyleXiao, X., Li, C., & Lei, Y. (2022). A Lightweight Self-Supervised Representation Learning Algorithm for Scene Classification in Spaceborne SAR and Optical Images. Remote Sensing, 14(13), 2956. https://doi.org/10.3390/rs14132956