Enhancing Rooftop Photovoltaic Segmentation Using Spatial Feature Reconstruction and Multi-Scale Feature Aggregation
<p>An illustrative diagram depicting rooftop photovoltaic panels with different spatial resolutions. The top row shows the original images, while the bottom row displays the corresponding segmentation labels.</p> "> Figure 2
<p>The proposed network consists of three key components: the encoding phase, skip connections, and the decoding phase. The skip connection segment incorporates the Spatial Feature Reconstruction (SFR) module to strengthen feature extraction. To enlarge the receptive field, Multi-scale Feature Aggregation (MFA) is applied in the lower layers using parallel dilated convolutions. During the decoding phase, high-level semantic features from the MFA module are merged with low-level features to improve feature fusion.</p> "> Figure 3
<p>The structural diagram of the Res2Net module.</p> "> Figure 4
<p>The structural diagram of the proposed Spatial Feature Reconstruction module.</p> "> Figure 5
<p>The training loss values of different methods are presented.</p> "> Figure 6
<p>The mIoU, mAcc, and mFscore performance metrics across different methods.</p> "> Figure 7
<p>Visualization results obtained by various methods selected from the Rooftop PV dataset. From left to right column: (<b>a</b>) raw image; (<b>b</b>) ground truth; (<b>c</b>) Ours (<b>d</b>); U-Net [<a href="#B31-energies-18-00119" class="html-bibr">31</a>]; (<b>e</b>) DeepLabv3+ [<a href="#B32-energies-18-00119" class="html-bibr">32</a>]; (<b>f</b>) HRnet [<a href="#B33-energies-18-00119" class="html-bibr">33</a>]; (<b>g</b>) Mask2Former [<a href="#B38-energies-18-00119" class="html-bibr">38</a>]; (<b>h</b>) PSPNet [<a href="#B34-energies-18-00119" class="html-bibr">34</a>]; (<b>i</b>) SegFormer [<a href="#B35-energies-18-00119" class="html-bibr">35</a>]. (<b>j</b>) Beit [<a href="#B36-energies-18-00119" class="html-bibr">36</a>]; (<b>k</b>) MaskFormer [<a href="#B37-energies-18-00119" class="html-bibr">37</a>].</p> "> Figure 8
<p>The visualized features map of the proposed different modules are shown, where (<b>a</b>,<b>b</b>) represent the original image and the label map, respectively. (<b>c</b>) represents the visualized feature map of the Baseline. (<b>d</b>–<b>f</b>) represent the visualized feature maps after applying Res2Net, MFA, and SFR, respectively. (<b>g</b>) represents the visualized feature map after combining MFA and SFR, and (<b>h</b>) represents the visualized feature map with all modules combined.</p> "> Figure 9
<p>Failure cases of our method in high-resolution scenarios.</p> ">
Abstract
:1. Introduction
- 1.
- This study introduces a new Multi-scale Feature Aggregation Network designed to better capture PV panel information in remote sensing imagery through feature extraction at different scales. The architecture of this network effectively integrates features from multiple levels, fully utilizing the rich information in images, thereby enhancing the model’s ability to recognize PV panels of varying sizes and shapes. This multi-scale design not only increases the model’s flexibility, adapting to more diverse ground situations, but also provides more comprehensive data support for subsequent feature analysis.
- 2.
- Furthermore, this study proposes a new Spatial Feature Reconstruction method that effectively captures global contextual information in horizontal and vertical directions by reconstructing spatial features based on the unique characteristics of PV panels. In this process, the study focuses on modeling the rectangular key regions in the imagery, allowing the model to more accurately locate the positions and shape features of PV panels. This reconstruction technique not only improves the accuracy of PV panel segmentation but also provides robust support for feature understanding in complex scenes.
- 3.
- Finally, on the publicly available rooftop PV segmentation dataset, the proposed method demonstrates significant advantages in performance compared to other comparative methods. Systematic experiments validate that the combination of the new Multi-scale Feature Aggregation Network and Spatial Feature Reconstruction approach leads to substantial improvements in both the accuracy and robustness of the model. This result not only verifies the effectiveness of the proposed methods but also lays a solid foundation for future applications in the automatic recognition and monitoring of PV panels.
2. Related Work
3. Methods
3.1. Overview of the Proposed Method
3.2. Res2Net Backbone
3.3. Spatial Feature Reconstruction
3.4. Multi-Scale Feature Aggregation
3.5. Loss Function
4. Experimental Results and Analysis
4.1. Implementation Details
4.2. Datasets
4.3. Comparison Methods
- (1)
- U-Net [31]: U-Net is a deep learning framework widely used for image segmentation. It features an encoder–decoder structure with skip connections, which effectively combines high-resolution spatial features with low-resolution contextual information. This design makes it highly effective in applications such as medical image segmentation.
- (2)
- DeepLabV3+ [32]: DeepLabV3+ is an image segmentation model that leverages atrous convolution and pyramid pooling to extract multi-scale contextual features. This approach enhances segmentation accuracy by capturing fine details and broader contextual relationships.
- (3)
- HRNet [33]: The High-Resolution Network (HRNet) maintains high-resolution representations throughout its architecture, enabling precise multi-scale feature integration. It has demonstrated strong performance in image segmentation and object detection tasks.
- (4)
- PSPnet [34]: The Pyramid Scene Parsing Network (PSPNet) introduces a pyramid pooling module to effectively capture global context information at various scales. This capability enhances its segmentation performance in complex and diverse scenes.
- (5)
- Segformer [35]: Segformer is an efficient image segmentation model that integrates transformer-based and convolutional approaches. Its hybrid architecture balances accuracy and computational efficiency, achieving high performance across various segmentation tasks.
- (6)
- Beit [36]: Beit is a vision transformer-based model designed for image segmentation and other vision tasks, leveraging masked image modeling pretraining to achieve high accuracy, particularly in large-scale datasets, though at the cost of increased computational complexity.
- (7)
- Maskformer [37]: Maskformer is a transformer-based model that unifies segmentation tasks by predicting a set of binary masks with associated labels, effectively bridging the gap between instance, semantic, and panoptic segmentation, while maintaining high accuracy and adaptability across diverse segmentation scenarios.
- (8)
- Mask2former [38]: Mask2Former is a versatile segmentation framework that leverages a transformer architecture to generate masks for both object-level and pixel-level segmentation tasks in a unified manner, achieving high performance on various benchmark datasets.
4.4. Evaluation Metrics
4.5. Experimental Results
Ablation Studies
5. Failure Cases and Limitations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- SolarPower Europe. Global Market Outlook for Solar Power 2023–2027; Technique Report; SolarPower Europe: Brussels, Belgium, 2023. [Google Scholar]
- Qi, Q.; Zhao, J.; Tan, Z.; Tao, K.; Zhang, X.; Tian, Y. Development assessment of regional rooftop photovoltaics based on remote sensing and deep learning. Appl. Energy 2024, 375, 124172. [Google Scholar] [CrossRef]
- Zhu, R.; Guo, D.; Wong, M.S.; Qian, Z.; Chen, M.; Yang, B.; Chen, B.; Zhang, H.; You, L.; Heo, J.; et al. Deep solar PV refiner: A detail-oriented deep learning network for refined segmentation of photovoltaic areas from satellite imagery. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103134. [Google Scholar] [CrossRef]
- Zhong, Q.; Nelson, J.R.; Tong, D.; Grubesic, T.H. A spatial optimization approach to increase the accuracy of rooftop solar energy assessments. Appl. Energy 2023, 316, 119128. [Google Scholar] [CrossRef]
- Lu, R.; Wang, N.; Zhang, Y.; Lin, Y.; Wu, W.; Shi, Z. Extraction of agricultural fields via DASFNet with dual attention mechanism and multi-scale feature fusion in South Xinjiang, China. Remote Sens. 2022, 14, 2253. [Google Scholar] [CrossRef]
- Qian, Z.; Chen, M.; Zhong, T.; Zhang, F.; Zhu, R.; Zhang, Z.; Zhang, K.; Sun, Z.; Lü, G. Deep roof refiner: Adetail-oriented deep learning network for refined delineation of roof structure lines using satellite imagery. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102680. [Google Scholar] [CrossRef]
- Li, L.; Lu, N.; Qin, J. Joint-task learning framework with scale adaptive and position guidance modules for improved household rooftop photovoltaic segmentation in remote sensing image. Appl. Energy 2025, 377, 124521. [Google Scholar] [CrossRef]
- Di Giovanni, G.; Rotilio, M.; Giusti, L.; Ehtsham, M. Exploiting building information modeling and machine learning for optimizing rooftop photovoltaic systems. Energy Build. 2024, 313, 114250. [Google Scholar] [CrossRef]
- Satpathy, P.R.; Ramacharamurthy, V.K.; Roslan, M.F.; Motahhir, S. An adaptive architecture for strategic Enhancement of energy yield in shading sensitive Building-Applied Photovoltaic systems under Real-Time environments. Energy Build. 2024, 324, 114877. [Google Scholar] [CrossRef]
- Aljafari, B.; Satpathy, P.R.; Thanikanti, S.B.; Nwulu, N. Supervised classification and fault detection in grid-connected PV systems using 1D-CNN: Simulation and real-time validation. Energy Rep. 2024, 12, 2156–2178. [Google Scholar] [CrossRef]
- Sulaiman, M.H.; Jadin, M.S.; Mustaffa, Z.; Daniyal, H.; Azlan, M.N.M. Short-term forecasting of rooftop retrofitted photovoltaic power generation using machine learning. J. Build. Eng. 2024, 94, 109948. [Google Scholar] [CrossRef]
- Malof, J.M.; Hou, R.; Collins, L.M.; Bradbury, K.; Newell, R. Automatic solar photovoltaic panel detection in satellite imagery. In Proceedings of the 2015 International Conference on Renewable Energy Research and Applications (ICRERA), Palermo, Italy, 22–25 November 2015; pp. 1428–1431. [Google Scholar]
- Yuan, J.; Yang, H.H.L.; Omitaomu, O.A.; Bhaduri, B.L. Large-scale solar panel mapping from aerial images using deep convolutional networks. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 2703–2708. [Google Scholar]
- Golovko, V.; Bezobrazov, S.; Kroshchanka, A.; Sachenko, A.; Komar, M.; Karachka, A. Convolutional neural network based solar photovoltaic panel detection in satellite photos. In Proceedings of the 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Bucharest, Romania, 21–23 September 2017; pp. 14–19. [Google Scholar]
- Yan, L.; Zhu, R.; Kwan, M.P.; Luo, W.; Wang, D.; Zhang, S. Estimation of urban-scale photovoltaic potential: A deep learningbased approach for constructing three-dimensional building models from optical remote sensing imagery. Sustain. Cities Soc. 2023, 93, 104515. [Google Scholar] [CrossRef]
- Nasrallah, H.; Samhat, A.E.; Shi, Y.; Zhu, X.X.; Faour, G.; Ghandour, A.J. Lebanon solar rooftop potential assessment using buildings segmentation from aerial images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 18. [Google Scholar] [CrossRef]
- Cui, W.; Peng, X.; Yang, J.; Yuan, H.; Lai, L.L. Evaluation of rooftop photovoltaic power generation potential based on deep learning and high-definition map image. Energies 2023, 16, 6563. [Google Scholar] [CrossRef]
- Krapf, S.; Kemmerzell, N.; Khawaja Haseeb Uddin, S.; Hack Vazquez, M.; Netzler, F.; Lienkamp, M. Towards scalable economic photovoltaic potential analysis using aerial images and deep learning. Energies 2021, 14, 3800. [Google Scholar] [CrossRef]
- Lin, S.; Zhang, C.; Ding, L.; Zhang, J.; Liu, X.; Chen, G.; Wang, S.; Chai, J. Accurate recognition of building rooftops and assessment of long-term carbon emission reduction from rooftop solar photovoltaic systems fusing GF-2 and multi-source data. Remote Sens. 2022, 14, 3144. [Google Scholar] [CrossRef]
- Chen, S.; Shi, W.; Zhou, M.; Zhang, M.; Xuan, Z. CGSANet: A contour-guided and local structure-aware encoder–decoder network for accurate building extraction from very high-resolution remote sensing imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 15, 1526–1542. [Google Scholar] [CrossRef]
- Khan, S.D.; Alarabi, L.; Basalamah, S. An encoder–decoder deep learning framework for building footprints extraction from aerial imagery. Arab. J. Sci. Eng. 2023, 48, 1273–1284. [Google Scholar] [CrossRef]
- Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef]
- Chen, J.; Jiang, Y.; Luo, L.; Gong, W. ASF-Net: Adaptive screening feature network for building footprint extraction from remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Mei, J.; Li, R.J.; Gao, W.; Cheng, M.M. CoANet: Connectivity attention network for road extraction from satellite imagery. IEEE Trans. Image Process. 2021, 30, 8540–8552. [Google Scholar] [CrossRef] [PubMed]
- Hou, X.; Wang, B.; Hu, W.; Yin, L.; Wu, H. SolarNet: A deep learning framework to map solar power plants in China from satellite imagery. arXiv 2019, arXiv:1912.03685. [Google Scholar]
- Costa, M.V.C.V.D.; Carvalho, O.L.F.D.; Orlandi, A.G.; Hirata, I.; Albuquerque, A.O.D.; Silva, F.V.E.; Guimarães, R.F.; Gomes, R.A.T.; Júnior, O.A.D.C. Remote sensing for monitoring photovoltaic solar plants in Brazil using deep semantic segmentation. Energies 2021, 14, 2960. [Google Scholar] [CrossRef]
- Wang, J.; Chen, X.; Jiang, W.; Hua, L.; Liu, J.; Sui, H. PVNet: A novel semantic segmentation model for extracting high-quality photovoltaic panels in large-scale systems from high-resolution remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2023, 119, 103309. [Google Scholar] [CrossRef]
- Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef]
- Li, J.; Wen, Y.; He, L. Scconv: Spatial and channel reconstruction convolution for feature redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
- Zhou, Q.; Qu, Z.; Li, Y.X. Tunnel crack detection with linear seam based on mixed attention and multiscale feature fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–808. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 40, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; An kumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. Beit: Bert pre-training of image transformers. arXiv 2021, arXiv:2106.08254. [Google Scholar]
- Cheng, B.; Schwing, A.; Kirillov, A. Per-pixel classification is not all you need for semantic segmentation. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; pp. 17864–17875. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
Category | Item | Configuration |
---|---|---|
Hardware | GPU | RTX 4090 × 1 |
Python | 3.8.16 | |
CUDA | 11.3 | |
Environment Config. | Pytorch | 1.11.0 |
MMSegmentation | 0.25.0 | |
Learning rate | 0.0001 | |
Weight decay | 0.01 | |
Loss function | BCE loss function | |
Optimizer | AdamW |
Model | Precision | Recall | F1-Score | mAcc | mIoU | mDice |
---|---|---|---|---|---|---|
U-Net [31] | 92.96 | 93.26 | 93.11 | 93.26 | 87.62 | 93.11 |
DeepLabv3+ [32] | 94.81 | 95.99 | 95.39 | 95.99 | 91.43 | 95.39 |
HRNet [33] | 94.77 | 96.36 | 95.54 | 96.36 | 91.70 | 95.54 |
PSPNet [34] | 96.12 | 95.76 | 95.94 | 95.76 | 92.39 | 95.94 |
SegFormer [35] | 93.58 | 93.70 | 93.64 | 93.70 | 88.49 | 93.64 |
Beit [36] | 81.17 | 93.52 | 85.93 | 93.52 | 76.94 | 85.93 |
MaskFormer [37] | 96.10 | 96.50 | 96.30 | 96.50 | 93.02 | 96.30 |
Mask2Former [38] | 95.52 | 97.10 | 96.29 | 97.10 | 93.01 | 96.29 |
Ours | 96.56 | 97.10 | 97.75 | 98.39 | 94.15 | 96.91 |
Method | Flops | Params | FPS |
---|---|---|---|
U-Net [31] | 38.35 G | 28.99 M | 28.35 |
DeepLabv3+ [32] | 48.67 G | 60.21 M | 15.42 |
HRNet [33] | 17.94 G | 65.85 M | 12.09 |
PSPNet [34] | 34.23 G | 46.61 M | 9.48 |
Segformer [35] | 13.52 G | 37.16 M | 17.97 |
Mask2former [38] | 125.43 G | 184.52 M | 6.05 |
Beit [36] | 96.27 G | 161.38 M | 6.89 |
Maskformer [37] | 79.41 G | 148.54 M | 12.61 |
Ours | 42.18 G | 34.25 M | 21.24 |
Methods | PV Dataset | |||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | mAcc | mIoU | mDice | |
BaseLine (B) [31] | 92.96 | 93.26 | 93.11 | 93.26 | 87.62 | 93.11 |
B+Res2Net | 93.45 | 93.79 | 93.62 | 94.64 | 92.95 | 93.78 |
B+SFR | 94.06 | 94.67 | 94.36 | 95.42 | 92.38 | 94.24 |
B+MFA | 94.67 | 95.32 | 94.99 | 96.75 | 92.89 | 94.76 |
B+Res2Net+SFR | 95.38 | 95.94 | 95.66 | 97.34 | 93.47 | 95.38 |
B+Res2Net+MFA | 95.77 | 96.14 | 95.95 | 97.72 | 93.58 | 95.78 |
B+MFA+SFR | 96.12 | 97.42 | 96.77 | 98.09 | 93.97 | 96.42 |
Ours | 97.56 | 97.10 | 97.75 | 98.39 | 94.15 | 96.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, Y.; Lin, L.; Ma, J.; Bi, M. Enhancing Rooftop Photovoltaic Segmentation Using Spatial Feature Reconstruction and Multi-Scale Feature Aggregation. Energies 2025, 18, 119. https://doi.org/10.3390/en18010119
Xiao Y, Lin L, Ma J, Bi M. Enhancing Rooftop Photovoltaic Segmentation Using Spatial Feature Reconstruction and Multi-Scale Feature Aggregation. Energies. 2025; 18(1):119. https://doi.org/10.3390/en18010119
Chicago/Turabian StyleXiao, Yu, Long Lin, Jun Ma, and Maoqiang Bi. 2025. "Enhancing Rooftop Photovoltaic Segmentation Using Spatial Feature Reconstruction and Multi-Scale Feature Aggregation" Energies 18, no. 1: 119. https://doi.org/10.3390/en18010119
APA StyleXiao, Y., Lin, L., Ma, J., & Bi, M. (2025). Enhancing Rooftop Photovoltaic Segmentation Using Spatial Feature Reconstruction and Multi-Scale Feature Aggregation. Energies, 18(1), 119. https://doi.org/10.3390/en18010119