Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification
"> Figure 1
<p>The overall architecture of the SSMR method that we propose. The first stage is the self-supervised learning stage, using multi-layer feature contrast. In this stage, a feature extractor is trained as the embedding network of the second stage meta-learning. During the second stage, few-shot learning takes place using meta-learning. In the third stage, the model obtained in the second stage is subject to self-distillation based on multi-layer feature contrast.</p> "> Figure 2
<p>The overall architecture of the self-distillation stage. The self-distillation of the model is realized by comprehensively considering the difference between the soft label output by the K-1 model and the prediction of the K-generation model, the difference between the prediction of the K-generation model and the hard label, and the feature difference extracted by the K-1 model and the K-generation model. The selected features can be local or global.</p> "> Figure 3
<p>The samples in the MASATI remote sensing image dataset have high requirements for the performance of the classification model. Most remote sensing images of marine scenes contain a large number of wave ripples as interference. Especially for several types of data related to ships, the ship itself will be mixed with white ripples, which will increase the difficulty of classification and require a high local feature extraction ability of the model. The network also needs to be able to distinguish single ships, multiple ships, and land ships, which further requires the network to have the ability to capture global features.</p> "> Figure 4
<p>Part of the Miniimagenet dataset. The types of data, image structures, and shooting angles are greatly different from remote sensing images.</p> "> Figure 5
<p>We use remote sensing image data in the MASATI and data in the Miniimagenet datasets to create a new dataset for training. The training set contains a small number of remote sensing image data. During meta-learning training, there may be only one or two of the five images sampled in the training set or even no remote sensing images. When testing the performance of the model, the unseen remote sensing images are used to evaluate the cross-data domain ability of the model and whether the methods overcome the supervision collapse.</p> "> Figure 6
<p>The effects of shot numbers on the test performances of different methods are reported with 95% confidence intervals.</p> "> Figure 7
<p>A randomly selected remote sensing image and two sets of feature maps extracted from the embedding network.</p> ">
Abstract
:1. Introduction
- We use self-supervised learning to reduce supervision collapse in the process of meta-learning. Through self-supervised learning, the embedding network can learn feature information in the data other than the label information while maintaining the invariance of data augmentation (such as cropping and color offset). Our embedding network captures the category information through self-supervised learning and learns an effective feature representation method. We show that self-supervised (contrastive) learning and meta-learning are mutually beneficial.
- In the process of self-supervised learning, we use the multilayer features extracted from remote sensing images in different convolution layers of embedding networks for comparative learning and calculate the distance between the corresponding local features or global features for classification. This contrast learning method has little dependence on the label information of data and enhances the generalizability of the model to new unseen data.
- We enhance the self-distillation method with multilayer feature contrast. When the previous generation model constrains and transfers knowledge to the next generation model, it makes use of not only the divergence of data label information but also the divergence of feature information extracted from the convolution layers of the previous generation model and the next-generation model. This design enriches the knowledge that can be transferred by self-distillation in few-shot scenarios and further reduces the dependence of the training process on a small amount of label information, which easily leads to supervision collapse.
- We construct a challenging dataset to train and test the proposed method and compare it with some representative methods. The experimental results for the dataset show that our method has superior performance and overcomes supervision collapse. We also conducted a series of ablation additional experiments to verify the effect of each module design in our method.
2. Related Work
2.1. Few-Shot Image Classification
2.2. Self-Supervised Learning
2.3. Knowledge Distillation
3. Stopping Supervision Collapse: SSMR
3.1. Self-Supervised Learning Embedding Network Training Based on Multilayer Feature Contrast
3.2. Metric-Based Meta-Learning Model Fine-Tuning
3.3. Self-Distillation Model Optimization Based on Multi-Layer Feature Contrast
4. Experiment
4.1. Dataset
4.2. Model Architecture
4.3. Implementation Details
4.3.1. Data Augmentation
4.3.2. Optimizer Selection and Hyperparameter Setting
4.4. Quantitative Comparison
4.5. Feature Maps
4.6. Ablation Experiment
4.6.1. Selection of Contrastive Learning Features
4.6.2. Data Augmentation
4.6.3. Optimizer
4.6.4. Self-Supervised Learning Ablation Experiment
4.6.5. Self-Distillation Ablation Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xia, Y.; Wan, S.; Jin, P.; Yue, L. A Novel Sea-Land Segmentation Algorithm Based on Local Binary Patterns for Ship Detection. Int. J. Signal Process. Image Process. Pattern Recognit. 2014, 7, 237–246. [Google Scholar] [CrossRef]
- Tang, J.; Deng, C.; Huang, G.; Zhao, B. Compressed-Domain Ship Detection on Spaceborne Optical Image Using Deep Neural Network and Extreme Learning Machine. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1174–1185. [Google Scholar] [CrossRef]
- Kown, Y.H.; Baek, S.H.; Lim, Y.K.; Pyo, J.; Ligaray, M.; Park, Y.; Cho, K.H. Monitoring Coastal Chlorophyll-a Concentrations in Coastal Areas Using Machine Learning Models. Water 2018, 10, 1020. [Google Scholar] [CrossRef] [Green Version]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the Sigspatial International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; p. 270. [Google Scholar]
- Chen, S.; Tian, Y. Pyramid of Spatial Relatons for Scene-Level Land Use Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1947–1957. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Fan, H.; Gui-Song, X.; Jingwen, H.; Liangpei, Z. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar]
- Browne, D.; Giering, M.; Prestwich, S. PulseNetOne: Fast Unsupervised Pruning of Convolutional Neural Networks for Remote Sensing. Remote Sens. 2020, 12, 1092. [Google Scholar] [CrossRef] [Green Version]
- Kang, J.; Fernandez-Beltran, R.; Ye, Z.; Tong, X.; Plaza, A. Deep Metric Learning Based on Scalable Neighborhood Components for Remote Sensing Scene Characterization. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8905–8918. [Google Scholar] [CrossRef]
- Xia, G.S.; Yang, W.; Delon, J.; Gousseau, Y.; Hong, S. Structural High-resolution Satellite Image Indexing. In Proceedings of the ISPRS TC VII Symposium—100 Years ISPRS, Vienna, Austria, 5–7 July 2010; pp. 298–303. [Google Scholar]
- Zhao, L.; Tang, P.; Huo, L. Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4620–4631. [Google Scholar] [CrossRef]
- Rußwurm, M.; Wang, S.; Korner, M.; Lobell, D. Meta-learning for few-shot land cover classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 200–201. [Google Scholar]
- Alajaji, D.; Alhichri, H.S.; Ammour, N.; Alajlan, N. Few-Shot Learning For Remote Sensing Scene Classification. In Proceedings of the Neural Information Processing Systems, Tunis, Tunisia, 9–11 March 2020; pp. 81–84. [Google Scholar]
- Li, L.; Han, J.; Yao, X.; Cheng, G.; Guo, L. DLA-MatchNet for Few-Shot Remote Sensing Image Scene Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 7844–7853. [Google Scholar] [CrossRef]
- Kim, J.; Chi, M. SAFFNet: Self-Attention-Based Feature Fusion Network for Remote Sensing Few-Shot Scene Classification. Remote. Sens. 2021, 13, 2532. [Google Scholar] [CrossRef]
- Doersch, C.; Gupta, A.; Zisserman, A. Crosstransformers: Spatially-aware few-shot transfer. In Proceedings of the Annual Conference on Neural Information Processing Systems 2020 (NeurIPS 2020), Virtual, 6–12 December 2020; pp. 21981–21993. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R.S. Prototypical networks for few-shot learning. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4077–4087. [Google Scholar]
- Ye, H.-J.; Hu, H.; Zhan, D.-C.; Sha, F. Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8808–8817. [Google Scholar]
- Zhang, C.; Cai, Y.; Lin, G.; Shen, C. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 12203–12213. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Robbins, H.E. A Stochastic Approximation Method. Ann. Math. Stat 1951, 22, 400–407. [Google Scholar] [CrossRef]
- Nichol, A.; Schulman, J. Reptile: A Scalable Metalearning Algorithm. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Jiang, X.; Havaei, M.; Varno, F.; Chartrand, G.; Chapados, N.; Matwin, S. Learning to learn with conditional class dependencies. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.P.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3630–3638. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1199–1208. [Google Scholar]
- Oreshkin, B.N.; López, P.R.; Lacoste, A. TADAM: Task dependent adaptive metric for improved few-shot learning. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 721–731. [Google Scholar]
- Chen, W.Y.; Liu, Y.C.; Kira, Z.; Wang, Y.; Huang, J.B. A Closer Look at Few-shot Classification. arXiv 2019, arXiv:1904.04232. [Google Scholar]
- Hilliard, N.; Phillips, L.; Howland, S.; Yankov, A.; Hodas, N.O. Few-Shot Learning with Metric-Agnostic Conditional Embeddings. arXiv 2018, arXiv:1802.04376. [Google Scholar]
- Rusu, A.A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; Hadsell, R. Meta-Learning with Latent Embedding Optimization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1–8. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Fischer, P.; Springenberg, J.T.; Riedmiller, M.A.; Brox, T. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1734–1747. [Google Scholar] [CrossRef] [Green Version]
- Komodakis, N.; Gidaris, S. Unsupervised representation learning by predicting image rotations. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A. Colorful image colorization. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 649–666. [Google Scholar]
- Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Noroozi, M.; Favaro, P. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 69–84. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R.B. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9726–9735. [Google Scholar]
- Tian, Y.; Sun, C.; Poole, B.; Krishnan, D.; Schmid, C.; Isola, P. What makes for good views for contrastive learning? In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 6827–6839. [Google Scholar]
- Van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Tian, Y.; Krishnan, D.; Isola, P. Contrastive Representation Distillation. arXiv 2020, arXiv:1910.10699. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Montréal, QC, Canada, 6–8 July 2020; pp. 1597–1607. [Google Scholar]
- Gidaris, S.; Bursuc, A.; Komodakis, N.; Pérez, P.; Cord, M. Boosting Few-Shot Visual Learning With Self-Supervision. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8058–8067. [Google Scholar]
- Su, J.-C.; Maji, S.; Hariharan, B. When does self-supervision improve few-shot learning? In Proceedings of the European Conference on Computer Vision, Virtual, 23–28 August 2020; pp. 645–666. [Google Scholar]
- Tian, Y.; Krishnan, D.; Isola, P. Contrastive Multiview Coding. In Proceedings of the European Conference on Computer Vision, Virtual, 23–28 August 2020; pp. 776–794. [Google Scholar]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised Feature Learning via Non-parametric Instance Discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3733–3742. [Google Scholar]
- Hinton, G.E.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. arXiv 2014, arXiv:1412.6550. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In Proceedings of the International Conference on Learning Representations, Palais des Congrès Neptune, Toulon, France, 24–26 April 2017. [Google Scholar]
- Liu, Y.; Cao, J.; Li, B.; Yuan, C.; Hu, W.; Li, Y.; Duan, Y.-F. Knowledge Distillation via Instance Relationship Graph. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7089–7097. [Google Scholar]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 499–515. [Google Scholar]
- Zhang, L.; Song, J.; Gao, A.; Chen, J.; Bao, C.; Ma, K. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3712–3721. [Google Scholar]
- Bai, T.; Chen, J.; Zhao, J.; Wen, B.; Jiang, X.; Kot, A. Feature distillation with guided adversarial contrastive learning. arXiv 2020, arXiv:2009.09922. [Google Scholar]
- Zhou, W.; Newsam, S.; Li, C.; Shao, Z. PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J. Photogramm. Remote Sens. 2018, 145, 197–209. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Han, J.; Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef] [Green Version]
- Xiao, Z.; Long, Y.; Li, D.; Wei, C.; Tang, G.; Liu, J. High-Resolution Remote Sensing Image Retrieval Based on CNNs from a Dimensional Perspective. Remote Sens. 2017, 9, 725. [Google Scholar] [CrossRef] [Green Version]
- Antonio-Javier, G.; Antonio, P.; Pablo, G. Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks. Remote Sens. 2018, 10, 511. [Google Scholar]
- Bachman, P.; Hjelm, R.D.; Buchwalter, W. Learning representations by maximizing mutual information across views. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 15535–15545. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Ba, J.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Tian, Y.; Wang, Y.; Krishnan, D.; Tenenbaum, J.B.; Isola, P. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the European Conference on Computer Vision, Virtual, 23–28 August 2020; pp. 266–282. [Google Scholar]
- Saikia, T.; Brox, T.; Schmid, C. Optimized generic feature learning for few-shot classification across domains. arXiv 2020, arXiv:2001.07926. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Kipf, T.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [Google Scholar]
- Kang, D.; Kwon, H.; Min, J.; Cho, M. Relational Embedding for Few-Shot Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 8822–8833. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 3008–3017. [Google Scholar]
- Balestriero, R.; Bottou, L.; LeCun, Y. The Effects of Regularization and Data Augmentation are Class Dependent. arXiv 2022, arXiv:2204.03632. [Google Scholar]
Samples | Categories | Number |
---|---|---|
From MASATI | 7 (land, coast, ocean, ship, multiple ships, etc.) | 7389 (11.9%) |
From Miniimagenet | 91 (dog, lion, fish, warship, trash can, etc.) | 54,600 (88.1%) |
Total | 98 | 61,989 |
Layers | Input Size | Output Size | ConvBlocks (Kernel, Output Channels, Stride, Padding, Numbers) |
---|---|---|---|
Conv1 | 128 × 128 | 62 × 62 | [5 × 5, 192, 2, 2] |
[3 × 3, 192, 1, 0] | |||
ResBlock2 | 62 × 62 | 30 × 30 | [4 × 4, 384, 2, 0] |
[1 × 1, 384, 1, 0] × 15 | |||
ResBlock3 | 30 × 30 | 14 × 14 | [4 × 4, 768, 2, 0] |
[1 × 1, 768, 1, 0] × 15 | |||
ResBlock4 | 14 × 14 | 7 × 7 | [2 × 2, 1536, 2, 0] |
[1 × 1, 1536, 1, 0] × 15 | |||
ResBlock5 | 7 × 7 | 5 × 5 | [3 × 3, 1536, 1, 0] |
[1 × 1, 1536, 1, 0] × 15 | |||
ResBlock6 | 5 × 5 | 3 × 3 | [3 × 3, 1536, 1, 0] |
[1 × 1, 1536, 1, 0] × 15 | |||
Conv7 | 3 × 3 | 1 × 1 | [3 × 3, 1536, 1, 0] |
[1 × 1, 1536, 1, 0] |
Method | Embedding Net | One-Shot Five-Way Accuracy | Five-Shot Five-Way Accuracy |
---|---|---|---|
MAML [24] | ConvNet | 53.47 ± 0.82 | 66.46 ± 0.12 |
RelationNet [29] | ConvNet | 44.21 ± 0.14 | 49.22 + 0.26 |
ProtoNet [21] | ResNet18 | 49.27 ± 0.66 | 65.16 ± 0.11 |
FEAT [22] | ResNet18 | 47.45 ± 0.13 | 57.38 ± 0.22 |
SemiProtoFEAT [22] | ResNet18 | 48.26 ± 0.22 | 59.52 + 0.24 |
Gcn [68] | Gcn | 46.91 ± 0.20 | 59.96 ± 0.21 |
RENET [69] | ResNet18 | 64.11 ± 0.46 | 82.32 ± 0.32 |
SSMR (ours) | Self-supervised network | 66.52 ± 0.20 | 83.26 ± 0.49 |
Method | 10-Shot Accuracy | 15-Shot Accuracy | 20-Shot Accuracy |
---|---|---|---|
FEAT [22] | 57.95 ± 0.20 | 59.06 ± 0.76 | 60.71 ± 0.73 |
MAML [24] | 71.04 ± 0.15 | 73.93 ± 0.25 | 73.44 ± 0.37 |
ProtoNet [21] | 73.18 ± 0.20 | 75.57 ± 0.31 | 75.46 ± 0.14 |
SSMR (ours) | 85.96 ± 0.12 | 86.48 ± 0.11 | 88.21 ± 0.10 |
Local Feature | One-Shot Five-Way Accuracy | Five-Shot Five-Way Accuracy |
---|---|---|
1 × 1 | 64.33 ± 0.20 | 81.67 ± 0.14 |
5 × 5 | 65.17 ± 0.15 | 82.14 ± 0.16 |
7 × 7 | 66.52 ± 0.20 | 83.26 ± 0.49 |
Data Augmentation | One-Shot Five-Way Accuracy | Five-Shot Five-Way Accuracy |
---|---|---|
ColorJitter | 64.76 ± 0.15 | 81.83 ± 0.30 |
RandomGrayscale | 64.60 ± 0.23 | 82.38 ± 0.45 |
Resize + CenterCrop | 65.33 ± 0.20 | 82.67 ± 0.14 |
Strategy in SimCLR [45] | 63.81 ± 0.43 | 80.19 ± 0.67 |
RandAugment [70] | 62.59 ± 0.68 | 79.83 ± 0.11 |
Optimizer | One-Shot Five-Way Accuracy | Five-Shot Five-Way Accuracy |
---|---|---|
Adam [67] | 59.67 ± 0.19 | 78.04 ± 0.35 |
SGD [25] | 66.52 ± 0.20 | 83.26 ± 0.49 |
Embedding Network | One-Shot Five-Way Accuracy | Five-Shot Five-Way Accuracy |
---|---|---|
ConvNet | 34.50 ± 0.12 | 40.59 ± 0.35 |
ResNet18 | 49.33 ± 0.26 | 62.77 ± 0.51 |
Self-supervised network | 66.52 ± 0.20 | 83.26 ± 0.49 |
Method | One-Shot Five-Way Accuracy | Five-Shot Five-Way Accuracy |
---|---|---|
SMR (no self-distillation) | 65.61 ± 0.56 | 81.91 ± 0.27 |
SSMR | 66.52 ± 0.20 | 83.26 ± 0.49 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, H.; Du, X.; Li, S. Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification. Remote Sens. 2022, 14, 3111. https://doi.org/10.3390/rs14133111
Zhou H, Du X, Li S. Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification. Remote Sensing. 2022; 14(13):3111. https://doi.org/10.3390/rs14133111
Chicago/Turabian StyleZhou, Haonan, Xiaoping Du, and Sen Li. 2022. "Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification" Remote Sensing 14, no. 13: 3111. https://doi.org/10.3390/rs14133111
APA StyleZhou, H., Du, X., & Li, S. (2022). Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification. Remote Sensing, 14(13), 3111. https://doi.org/10.3390/rs14133111