[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Multi-Sensor Image Classification Using the Random Forest Algorithm in Google Earth Engine with KOMPSAT-3/5 and CAS500-1 Images
Previous Article in Journal
Field-Level Classification of Winter Catch Crops Using Sentinel-2 Time Series: Model Comparison and Transferability
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(24), 4621; https://doi.org/10.3390/rs16244621
Submission received: 30 October 2024 / Revised: 3 December 2024 / Accepted: 7 December 2024 / Published: 10 December 2024

Abstract

:
Pan-sharpening aims to generate high-resolution (HR) multispectral (MS) images by fusing HR panchromatic (PAN) and low-resolution (LR) MS images covering the same area. However, due to the lack of real HR MS reference images, how to accurately evaluate the quality of a fused image without reference is challenging. On the one hand, most methods evaluate the quality of the fused image using the full-reference indices based on the simulated experimental data on the popular Wald’s protocol; however, this remains controversial to the full-resolution data fusion. On the other hand, existing limited no reference methods, most of which depend on manually crafted features, cannot fully capture the sensitive spatial/spectral distortions of the fused image. Therefore, this paper proposes a perceptual quality assessment method based on deep feature similarity measure. The proposed network includes spatial/spectral feature extraction and similarity measure (FESM) branch and overall evaluation network. The Siamese FESM branch extracts the spatial and spectral deep features and calculates the similarity of the corresponding pair of deep features to obtain the spatial and spectral feature parameters, and then, the overall evaluation network realizes the overall quality assessment. Moreover, we propose to quantify both the overall precision of all the training samples and the variations among different fusion methods in a batch, thereby enhancing the network’s accuracy and robustness. The proposed method was trained and tested on a large subjective evaluation dataset comprising 13,620 fused images. The experimental results suggested the effectiveness and the competitive performance.

1. Introduction

Considering the constraints imposed by satellite sensors and imaging conditions [1], including incident radiation energy, on-board storage capacity, and data transmission, a compromise must be made between spatial and spectral resolutions when obtaining remote sensing images. Most satellites provide pairs of HR PAN and LR MS images. Pan-sharpening is the process of merging the spectral information from a MS image with the high spatial information of a PAN image in order to generate a HR MS image.
The pan-sharpening method originated about four decades ago; in the course of this development, numerous fusion methods have been proposed, greatly promoting the development of this technology. Pan-sharpening methods can be divided into two categories: traditional fusion methods and deep learning-based fusion methods. The former category can be further categorized into methods based on component substitution (CS) [2,3], methods based on multiresolution analysis (MRA) [4,5] and methods based on variational optimization (VO) [6,7,8]. In the last decade, deep learning has made remarkable achievements in various fields [9,10]. Building upon this foundation, the deep learning (DL)-based pan-sharpening method exhibits exceptional performance because of its excellent feature extraction capabilities and nonlinear representation ability. Researchers have explored and developed a lot of DL-based pan-sharpening methods [11,12], which have promoted the development in this field. However, how to accurately evaluate the quality of the pansharpened image has always been a challenge [13,14].
Wald [15] proposed that the fused image should meet consistency and synthesis criteria. According to the aforementioned two criteria, quality assessment methods can be mainly categorized into two categories: evaluation at reduced resolution (RR) and evaluation at full resolution (FR). (1) RR evaluation: It is based on the Wald protocol’s synthesis principle, which is extensively employed as a strategy. For RR evaluation, the original PAN and MS images are firstly downsampled according to their spatial resolution ratio before being fused. The original MS image serves as the reference image, and the quality of the fused image is calculated by the existing quantitative indicators [16,17,18]. However, the RR method defaults to the scale invariance hypothesis, and the quality assessment results of simulation experiments may not represent the quality of the fused images under real observation conditions. It is particularly true when the spatial resolution ratio of PAN and MS images is large, as the feasibility of the simulation experiment itself is controversial. (2) FR evaluation: Due to the unattainability of the real HR MS image under real observation conditions, the FR evaluation is a research hotspot of great practical significance. The existing FR quality evaluation of fused images is mainly based on spatial and spectral aspects. The spatial distortion is calculated through a comparison between PAN and fused images, while the spectral distortion is assessed by comparing the fused and original LR MS observations. Traditional FR quality evaluation methods rely on hand-designed spatial and spectral features. For example, Alparone et al. [19] proposed the typical quality with no reference (QNR), which is the most widely used FR quality evaluation index. Based on this method, several variants have been developed [20,21]. Besides QNR-like methods, Meng et al. [1] proposed a FR quality assessment method based on online multivariate Gaussian (MVG). However, all the aforementioned quality assessment methods rely on manually crafted features and fail to effectively capture the spatial and spectral distortion features of the fused image.
The quality evaluation results should be correlated with the perceptual distance, which reflects the extent to which human perception distinguishes between the target image and the reference image. Currently, several perception-driven measurement methods have been proposed, including SSIM [16], MSSIM [22] and FSIM [23]. The deep learning community has discovered that the utilization of deep feature layers in convolutional networks holds immense potential for various kinds of tasks [24,25]. Deep feature layers can also be used in the quality assessment of pansharpened images. For instance, Bao et al. [26] proposed a blind FR evaluation method based on multi-stream collaborative learning. However, it simply integrated the spatial/spectral deep features of the pansharpened image without calculating the similarity between the corresponding deep features. As a result, the accuracy of the quality assessment results can be further improved.
The aforementioned FR evaluation methods based on deep learning suffer from two limitations:
Limitation 1: The feature extraction network yields a substantial number of feature parameters, resulting in the presence of redundant features. Moreover, the lack of accurate calculation of distortions of the fusion image possibly degrades the performance of the network, as the extracted deep features may not properly represent the spatial or spectral distortions of the fusion image.
Limitation 2: The loss function of the quality evaluation network roughly and indiscriminately employs L M A E or L M S E for all the samples, assuming that each evaluation score within a batch is independent, which disregards the correlations and variations among the evaluation scores of different fused images. This generally leads to the predicted evaluation score exhibiting a tendency to fluctuate within a narrow range while significantly deviating from the real distribution of the spatial and spectral distortions of the fused images.
In this paper, we propose a perceptual quality assessment method for pan-sharpening. The main contributions are summarized as follows:
(1)
We propose a perceptual quality assessment method based on deep feature similarity measure (DFSM-net), which can quantitatively calculate the similarity of the corresponding pair of deep features to improve the accuracy of feature parameters in reflecting the distortions of the fused image.
(2)
We propose a loss measurement based on the idea of decomposition to quantify the categorical variation in a batch, enabling the network to evaluate the error in a batch from another perspective and thereby enhancing the precision of network prediction scores.

2. Proposed Method

In this paper, we propose a perceptual evaluation method based on deep feature similarity measure. The proposed network evaluates the spatial/spectral distortions of the fused image separately through the Siamese network and finally realizes the overall quality assessment. The network, as illustrated in Figure 1, comprises spatial and spectral feature extraction and similarity measure (FESM) branch and overall evaluation network. The Siamese FESM branch extracts the spatial and spectral deep features and calculates the similarity of the corresponding pair of deep features to obtain the feature parameters; therefore, the spatial/spectral distortion can be extracted more accurately, also resulting in reducing the redundancy of features, and then, the overall evaluation network realizes the overall quality assessment. We also propose a new loss to quantify the variations among different fusion methods in a batch to guide the net to predict the score more accurately. In the pre-training stage, the spatial/spectral FESM branch is pre-trained with the spatial/spectral DMOS, respectively. In the overall training stage, the network parameters that have completed pre-training in the first stage are frozen and imported into the overall evaluation network. Subsequently, the fully connected layers are trained with the overall DMOS to output the overall quality evaluation score.

2.1. Feature Extraction and Similarity Measure Block

We propose the Siamese feature extraction and similarity measure (FESM) block to extract the distortion features of fused images. As illustrated in Figure 2, each branch in the FESM block includes feature extractor and similarity measure block.
The feature extractor requires input images to have consistent sizes. In the spectral feature extraction branch, we upsample MS images to the same size as fused images. In the spatial feature extraction branch, the PAN image serves as the reference image with only one band, while the fused image consists of 4 or 8 bands. Therefore, it is necessary to convert the fused image into a single-band image. To this end, we employ the NMF [27] algorithm to the fused image in the spatial branch. The NMF algorithm is capable of factorizing a matrix V into matrices W and H. The PAN image is considered the matrix V, the fused image is considered the matrix W and band coefficients to be determined are considered the matrix H. The coefficient matrix H is obtained by minimizing V W H 2 using the following formula for K iterations:
H a u H a u ( W T V ) a u ( W T W H ) a u
where W is the fused image, V is the panchromatic image and H is the coefficient matrix.
The feature extractor is based on the VGG16 network [28]. Specifically, we employed a 5-layer VGG network to extract feature maps of input images. The VGG network is very suitable for extracting image features and transforming images into multi-scale representations. Ding [29] pointed out that maximum pooling may introduce aliasing artifacts during the downsampling process and proposed l 2 pooling as an alternative. In this paper, we also replace the maximum pooling with l 2 pooling:
P ( x ) = g ( x x )
where ⨀ represents element-wise multiplication, and g ( ) denotes the convolution operation performed using a Hanning window for blurring.
The similarity parameter is calculated by measuring the similarity of the corresponding feature maps using the following formula:
l ( x ˜ j ( i ) , y ˜ j ( i ) ) = 2 μ x ˜ j ( i ) μ y ˜ j ( i ) + c 1 ( μ x ˜ j ( i ) ) 2 + ( μ y ˜ j ( i ) ) 2 + c 1
s ( x ˜ j ( i ) , y ˜ j ( i ) ) = 2 σ x ˜ j y ˜ j ( i ) + c 2 ( σ x ˜ j ( i ) ) 2 + ( σ y ˜ j ( i ) ) 2 + c 2
where x and y represent input images; μ x ˜ j ( i ) represents the average value of the feature map of the fused image in the i-th channel of the j-th block of the convolutional network; σ x ˜ j ( i ) represents the variance of the feature map of the fused image in the i-th channel of the j-th block of the convolutional network; l represents the luminance similarity; s represents the structure similarity; the constants c 1 and c 2 are included to avoid instability.
The luminance similarity parameters of the feature maps from all channels in every block of the convolutional network are aggregated to form the vector L:
L = { l ( x ˜ 1 ( 1 ) , y ˜ 1 ( 1 ) ) , , l ( x ˜ 1 ( n 1 ) , y ˜ 1 ( n 1 ) ) , , l ( x ˜ m ( 1 ) , y ˜ m ( 1 ) ) , , l ( x ˜ m ( n m ) , y ˜ m ( n m ) ) }
where m { 1 , 2 , 3 , 4 , 5 } , and n m { 64 , 128 , 256 , 512 , 512 } .
The structure similarity parameters of the feature maps from all the channels in every block of the convolutional network are aggregated to form the vector S:
S = { s ( x ˜ 1 ( 1 ) , y ˜ 1 ( 1 ) ) , , s ( x ˜ 1 ( n 1 ) , y ˜ 1 ( n 1 ) ) , , s ( x ˜ m ( 1 ) , y ˜ m ( 1 ) ) , , s ( x ˜ m ( n m ) , y ˜ m ( n m ) ) }
where m { 1 , 2 , 3 , 4 , 5 } , and n m { 64 , 128 , 256 , 512 , 512 } .
For the spatial feature extraction branch, the PAN image and the fused image transformed to one band are input into the neural network to obtain the deep feature maps, respectively. After calculating the luminance and structure similarity of the corresponding pair of deep feature maps, the parameter vectors L s p a and S s p a are obtained, and upon combination, the feature vector SPA is derived:
S P A = [ L s p a , S s p a ]
For the spectral feature extraction branch, the upsampled MS image and the fused image are input into the neural network to obtain the deep feature maps, respectively. After calculating the luminance and structure similarity of the corresponding pair of deep feature maps, the parameter vectors L s p e and S s p e are obtained, and upon combination, the feature vector SPE is derived:
S P E = [ L s p e , S s p e ]
The spatial feature vector SPA and the spectral feature vector SPE are combined to form the joint spatial–spectral feature vector.

2.2. Feature Regression

The parameters of the spatial/spectral FESM branch are determined during the pre-training stage, being pre-trained with spatial/spectral DMOS, respectively. In the overall quality evaluation stage, the parameters of the FESM branches are fixed and imported into the overall evaluation network to obtain joint spatial–spectral features.
The regression network is responsible for mapping these extracted features to the overall evaluation scores. We used a simple regression network for quality evaluation—specifically, containing two fully connected layers and a ReLU activation layer. The regression network receives the joint spatial–spectral feature vector and outputs the overall evaluation score:
q ^ = C o n v ( Re L U ( C o n v ( C o n c a t [ S P E , S P A ] ) ) )
where q ^ is the quality evaluation score.

2.3. Loss Function

The commonly used loss function for quality evaluation networks is L M A E or L M S E . The aforementioned loss can only quantify the overall precision of all the samples. This calculation assumes that each evaluation score within a batch is independent, which disregards the correlations and variations among the evaluation scores of different fused images. After analyzing the fusion images obtained by different fusion methods, we found that the fusion images have similarities and variations in spatial and spectral distortions. From a statistical point of view, it is assumed that the quality evaluation scores are characterized by the following two characteristics. Firstly, there is a high similarity of the evaluation scores for fusion images obtained from the same fusion method. Secondly, there is a distribution variation among different fusion methods, and this common sense is equally crucial to enhance the sensitivity of the evaluation model to the spatial and spectral distortions. However, using the traditional L M A E or L M S E as the loss function to train the network cannot statistically satisfy the aforementioned characteristics. Consequently, we propose a set of loss that not only quantifies the difference between the prediction score and subjective evaluation score but also quantifies the variations among different fusion methods in a batch, thereby enhancing the network’s accuracy and robustness. The loss function is composed of the following two components:
L = L M A E + λ L C V
L represents the total loss. λ represents the weight of L C V . The first component, L M A E , quantifies the difference between the predicted scores and the subjective evaluation scores:
L M A E = 1 N i = 1 N q q ^ 1
where q represents the evaluation score predicted by the network, while q ^ denotes the subjective evaluation score. N represents the batch size.
The second part, L C V , represents the categorical variation in a batch, calculating the difference between the average of the network prediction scores of any two different fusion methods in a batch compared to the difference between the average of the ground truth evaluation scores of the fusion images of the same two fusion methods, and took the difference between the two as the error of these two categories:
L C V = 2 M ( M 1 ) i = 1 M j = 1 i j M ( μ i μ j ) ( μ ^ i μ ^ j ) 1
where μ i represents the mean of the prediction scores of the fused images by the i-th fusion method within a batch; μ ^ i represents the mean of the ground truth evaluation scores of the fused images by the i-th fusion method in the same batch; M represents the number of fusion methods.

2.4. Network Training Details

The training process comprises two stages: pre-training and overall training stage. In the first stage, the spatial/spectral FESM network was pre-trained with the spatial/spectral DMOS, respectively. In the second stage, the parameters of the FESM branches that completed pre-training in the first stage were frozen and imported into the overall network. Subsequently, the fully connected layers were trained with the overall DMOS as a constraint, ensuring that backpropagation only modifies the parameters of these specific layers. We believe that the parameters in the FESM branch trained according to the spectral/spatial DMOS can more effectively extract spatial and spectral features of fused images. Figure 3 demonstrates that the pre-trained network exhibits accelerated convergence and achieves a higher SRCC index.

3. Results and Discussion

3.1. Datasets

To fully validate the effectiveness of the proposed approach, experiments were conducted on a vast collection of pansharpened images referred to as NBU_PansharpRSData [30]. The dataset comprises 13,620 fused images derived from a collection of 2270 pairs of PAN and MS images captured by different satellite sensors.
The differential mean opinion scores (DMOS) is a confessed and popular strategy to represent the quality evaluation results of distortion images based on image evaluation datasets, such as LIVE [31], CSIQ [32] and KonIQ10K [33]. In addition, the DMOS has been employed and demonstrated to be feasible for the quality evaluation of remote sensing fusion images. For example, Zhou et al. [34] proposed a subjective evaluation dataset comprising 360 images. Agudelo-Medina et al. [35] proposed a subjective evaluation dataset consisting of 440 images. However, the above datasets are relatively small, and the trained network is easy to overfit. Xiong et al. [30] proposed a subjective quality assessment database consisting of 13,620 fusion images, which is more suitable for a training model and verifying the model’s performance. In addition, it is known that the full-resolution quantitative evaluation of the pansharpened images remains a subject of intense debate due to the unavailable of an authentic HR MS reference image. Therefore, we hold the viewpoint that the subjective evaluation is a reliable measure and can be regarded as the benchmark to some extent for FR fusion of three- or four-band MS without a HR MS reference image.
The specific calculation process of DMOS is as follows [30]: In the subjective test, a total of 28 graduate students (comprising 20 males and 8 females) were invited to partake in the evaluative process. Before the official subjective test began, all participants were given instructions to observe 60 pansharpened images and subsequently assess them based on the rating criteria in order to familiarize themselves with the evaluation standards. The spatial evaluation criteria pay close attention to sharpness and artefacts, the spectral quality evaluation criterion is color preservation and the overall quality evaluation criteria are focused on the general impression and object recognition. The participants used a five-level grading criterion to rate the fusion image. In the formal test, each participant was required to evaluate 300 pansharpened images consecutively, followed by a 30-min break before proceeding to the next test. After obtaining the raw subjective scores, the scores provided by unreliable participants were eliminated using confidence intervals. In the experiment, five participants were excluded, while the subjective scores provided by the remaining 23 participants were transformed to DMOS scores. The difference score d i , j for a subject was computed as
d i , j = s i , j r e f s i , j
where s i , j is the subjective score given to the j-th image by the i-th subject, and s i , j r e f is the subjective score to s i , j .
d i , j is then transformed to a z-score:
Z i , j = d i , j μ i σ i
where μ i and σ i are the average and variability of the scores given by subject i. Z i , j is then rescaled and transformed to the range [0, 100] to obtain a DMOS score:
z i , j = 100 ( z i , j + 3 ) 6
D M O S j = 1 U i = 1 U z i , j
where U denotes the count of valid subjects.
The original value range of the DMOS evaluation scores in the datasets is [0, 100]. During training, we uniformly divide the evaluation scores by 100 to narrow the value range to [0, 1].

3.2. Experimental Setup

The experiments were implemented according to the commonly used evaluation protocols. Specifically, for deep learning-based methods, every dataset was partitioned into a training set and a testing set at a ratio of 4:1. To ensure a comprehensive evaluation of the entire dataset, the training and testing processes were repeated five times. For the traditional methods, since no training set was needed, every dataset was divided into five parts. Each part of the dataset was tested, and the experiment result was obtained by averaging the results of the five tests. The test results of the proposed method were compared with seven NR quality evaluation methods from 2008 to 2023. These methods include conventional methods, such as QNR [19], GQNR [20], HQNR [36], NIQE [37] and MQNR [1], and deep learning-based methods, such as FCBM [38] and MCL-net [26].
In order to train the supervised model, the proposed model was implemented with a PyTorch framework on NVIDIA RTX3090 GPU.
In training processes, we partitioned the input image into 16 batches of 256 × 256 pixels each. The Adam optimizer was employed to minimize the error, utilizing a batch size of 8 and conducting 300 epochs of training. The learning rate was initialized at 1 × 10 5 and reduced by half every 50 epochs.
In this paper, K was set to 100, and λ was set to 4.

3.3. Experimental Results and Analysis

3.3.1. Statistical Results on All Images in the Dataset

The results of the proposed approach and the comparison methods for each dataset are illustrated in Figure 4. It can be seen that the performance of conventional methods is relatively limited, and the performance varies among different satellite datasets, with some datasets exhibiting notably low indices. For example, MQNR exhibits relatively higher SRCC, KRCC and PLCC indices on QB datasets compared to other satellite datasets; GQNR exhibits a relatively superior performance on the IKONOS dataset compared to the other satellite datasets. The performance of the deep learning-based method surpasses that of the traditional algorithm on each dataset, thereby demonstrating the exceptional feature learning capability in deep learning. The SRCC, KRCC and PLCC indices of the proposed method exhibit significantly superior performance compared to the other methods. Moreover, this performance remains consistently stable across different satellite datasets. The RMSE of the evaluation results obtained from the proposed method is lower than the other evaluation methods, thereby indicating a higher level of accuracy in the prediction results.

3.3.2. Evaluation Results of the Sample Images

We conducted a further evaluation of fusion images using different fusion methods, including traditional pan-sharpening methods like BDSD-PC [39], MF [40], GSA [2] and MTF-GLP [4], as well as deep learning-based methods such as MSDCNN [41] and PSGAN [42]. We performed a subjective quality evaluation of the fusion images and obtained a ranking based on a comprehensive assessment of the image quality. Subjective evaluation ranking depends on the following aspects: sharpness, artifacts, color preservation and object recognition. The subjective evaluation results are as follows: BDSD-PC > PSGAN > MSDCNN > MF > GSA > MTF-GLP, as shown in Figure 5. The fusion image of the BDSD-PC method has excellent performance in both spatial detail and spectral fidelity. The prediction results of our proposed model are largely consistent with the subjective evaluation results, as shown in Table 1.

3.4. Ablation Experiment

3.4.1. Effectiveness of Similarity Measure (SM) Block

This block measures the similarity between each pair of corresponding feature maps, obtaining parameters including luminance similarity and structure similarity. With the help of this block, the similarity of the corresponding feature layers of the VGG network is calculated to measure the distortions of the fusion image. Each pair of feature layers is finally reduced to two parameters, and the total number of feature parameters of the network is 5888, with the size of the input fusion image being 1024 × 1024. In contrast, if we use the last feature layer of the VGG network as the feature parameter, the number of feature parameters is 262,144. Therefore, the redundancy of the features is reduced. To validate the effectiveness of this block, we conducted experiments by excluding it and utilized the feature map’s output from the last layer of the VGG16 network as the features of the fused image. The results in Table 2 demonstrate that the removal of this block leads to a significant decrease in network performance, thereby confirming the efficacy of this block. In the spectral branch, MS is upsampled to a panchromatic image size so that the image size of both is the same. In the ablation experiments, it is proven that upsampling the MS image helps preserve more image features than downsampling the fusion image and leads to an improved network performance, as shown in Table 3.

3.4.2. Effectiveness of Siamese Feature Extraction Branch

The Siamese feature extraction branch is used to extract spatial and spectral features from input images to form a joint spatial–spectral feature vector. Table 4 demonstrates that the Siamese feature extraction branch delivers more benefits than a single spatial/spectral branch.

3.4.3. Effectiveness of the Pre-Trained Feature Extraction Network

Additional experiments were performed to confirm the efficacy of pre-training in extracting the features of images. The findings from the experiment are presented in Table 5. It shows that all the indices of the pre-trained model are higher than those of the directly trained model. This demonstrates that the sensitive distortion features can be better extracted through pre-training feature extraction network, resulting in a higher performance.

3.4.4. Selection of Loss Function

Many IQA networks choose L M A E [43,44] or L M S E [45,46] as the loss. The aforementioned loss only quantifies the difference between each predict score and subjective evaluation score, assuming that each evaluation score within a batch is independent, which disregards the correlations and variations among the evaluation scores of different fused images. In the experiment presented in Table 2, we conducted an ablation study on images of the IKONOS dataset, indicating that excluding L C V from the model’s loss function leads to a significant drop in SRCC and other indices, accompanied by a notable increase in RMSE. This suggests a marked reduction in the accuracy of the prediction results. The distribution on the left of Figure 6 shows that removing L C V results in a high concentration of scores predicted by the model in a narrow range. After adding L C V , the distribution range of the scores is highly consistent with the distribution range of the real DMOS, which shows the effectiveness of L C V .

3.4.5. Selection of Weight for the L C V

The ablation analysis of the λ to set the weight of L C V was conducted. In the experiments, λ is chosen from {0.5, 1, 2, 4, 8}, and the results obtained are presented in Table 6. The findings demonstrate that, as the value of λ increases, there is a gradual growth observed in SRCC and the other indices. When λ equals four, most indicators reach their maximum values.

3.4.6. Cross-Assessment on Subsets

To verify that the proposed assessment method is objective and reliable, the dataset is divided into two parts, Subset 1 and Subset 2, for cross-assessment. The proposed model is trained on two subsets, respectively, and then, the subsets are swapped for cross-assessment. The experimental results show that the training and verification on Subset 2 can obtain relatively high indices, while the indices on Subset 1 are relatively low, but there is minor difference between them, as shown in Table 7. This shows that the proposed method is reliable and generalized.

4. Conclusions

In this paper, we propose a perceptual quality assessment method based on deep feature similarity measure, including spatial/spectral feature extraction and a similarity measure (FESM) branch and an overall evaluation network. In the spatial/spectral FESM branch, the feature extractor extracts the spatial/spectral feature maps, and the similarity measure block calculates the similarity of the corresponding feature maps to obtain the spatial and spectral feature parameters. The overall evaluation network realizes the overall quality assessment. Moreover, we propose to quantify both the overall precision of all the training samples and the variations among different fusion methods in a batch, thereby enhancing the network’s accuracy and robustness. The experiments were carried out on six datasets, and the experimental results proved that the proposed method was superior to the other methods. However, it takes a long time for the proposed method to converge. In the future, we will explore extracting global features from the frequency domain to achieve convergence faster.

Author Contributions

Conceptualization, Z.Z. and X.M.; methodology, Z.Z. and X.M.; software, Z.Z.; validation, Z.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, Z.Z., S.Z., X.M., L.C. and F.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (42171326), the Zhejiang Provincial Natural Science Foundation of China (LR23D010001 and LY22F010014), the Ningbo Natural Science Foundation under Grant 2022J076, and, in part, by the 2025 Science and Technology Major Project of Ningbo City (2021Z107).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Meng, X.C.; Bao, K.D.; Shu, J.F.; Zhou, B.Z.; Shao, F.; Sun, W.W.; Li, S.T. A Blind Full-Resolution Quality Evaluation Method for Pansharpening. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
  2. Aiazzi, B.; Baronti, S.; Selva, M. Improving Component Substitution Pansharpening Through Multivariate Regression of MS + Pan Data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
  3. Laben, C.A.; Brower, B.V. Process for Enhancing the Spatial Resolution of Multispectral Imagery Using Pan-Sharpening. U.S. Patent US6011875A, 4 January 2000. [Google Scholar]
  4. Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored Multiscale Fusion of High-resolution MS and Pan Imagery. Photogramm. Eng. Remote Sens. 2006, 72, 591–596. [Google Scholar] [CrossRef]
  5. Alparone, L.; Garzelli, A.; Vivone, G. Intersensor Statistical Matching for Pansharpening: Theoretical Issues and Practical Solutions. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4682–4695. [Google Scholar] [CrossRef]
  6. Meng, X.C.; Shen, H.F.; Li, H.F.; Zhang, L.P.; Fu, R.D. Review of the pansharpening methods for remote sensing images based on the idea of meta-analysis: Practical discussion and challenges. Inf. Fusion 2019, 46, 102–113. [Google Scholar] [CrossRef]
  7. Shen, H.; Meng, X.; Zhang, L. An Integrated Framework for the Spatio–Temporal–Spectral Fusion of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7135–7148. [Google Scholar] [CrossRef]
  8. Dian, R.; Li, S.; Guo, A.; Fang, L. Deep Hyperspectral Image Sharpening. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5345–5355. [Google Scholar] [CrossRef]
  9. Hang, R.; Li, Z.; Ghamisi, P.; Hong, D.; Xia, G.; Liu, Q. Classification of Hyperspectral and LiDAR Data Using Coupled CNNs. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4939–4950. [Google Scholar] [CrossRef]
  10. Zhuang, L.; Gao, L.; Zhang, B.; Fu, X.; Bioucas-Dias, J.M. Hyperspectral Image Denoising and Anomaly Detection Based on Low-Rank and Sparse Representations. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  11. Li, J.; Zheng, K.; Liu, W.; Li, Z.; Yu, H.; Ni, L. Model-Guided Coarse-to-Fine Fusion Network for Unsupervised Hyperspectral Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
  12. Xie, Q.; Zhou, M.; Zhao, Q.; Xu, Z.; Meng, D. MHF-Net: An Interpretable Deep Network for Multispectral and Hyperspectral Image Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1457–1473. [Google Scholar] [CrossRef] [PubMed]
  13. Vivone, G.; Mura, M.D.; Garzelli, A.; Restaino, R.; Scarpa, G.; Ulfarsson, M.O.; Alparone, L.; Chanussot, J. A New Benchmark Based on Recent Advances in Multispectral Pansharpening: Revisiting Pansharpening with Classical and Emerging Pansharpening Methods. IEEE Geosci. Remote Sens. Mag. 2021, 9, 53–81. [Google Scholar] [CrossRef]
  14. Arienzo, A.; Vivone, G.; Garzelli, A.; Alparone, L.; Chanussot, J. Full-Resolution Quality Assessment of Pansharpening: Theoretical and hands-on approaches. IEEE Geosci. Remote Sens. Mag. 2022, 10, 168–201. [Google Scholar] [CrossRef]
  15. Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote Sens. 1997, 63, 691–699. [Google Scholar]
  16. Zhou, W.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
  17. Alparone, L.; Baronti, S.; Garzelli, A.; Nencini, F. A Global Quality Measurement of Pan-Sharpened Multispectral Imagery. IEEE Geosci. Remote Sens. Lett. 2004, 1, 313–317. [Google Scholar] [CrossRef]
  18. Garzelli, A.; Nencini, F. Hypercomplex Quality Assessment of Multi/Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2009, 6, 662–665. [Google Scholar] [CrossRef]
  19. Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A.; Nencini, F.; Selva, M. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens. 2008, 74, 193–200. [Google Scholar] [CrossRef]
  20. Kwan, C.; Budavari, B.; Bovik, A.C.; Marchisio, G.B. Blind Quality Assessment of Fused WorldView-3 Images by Using the Combinations of Pansharpening and Hypersharpening Paradigms. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1835–1839. [Google Scholar] [CrossRef]
  21. Alparone, L.; Garzelli, A.; Vivone, G. Spatial Consistency for Full-Scale Assessment of Pansharpening. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 5132–5134. [Google Scholar]
  22. Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; pp. 1398–1402. [Google Scholar]
  23. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]
  24. Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
  25. Chen, H.; Shao, F.; Chai, X.; Jiang, Q.; Meng, X.; Ho, Y.S. Collaborative Learning and Style-Adaptive Pooling Network for Perceptual Evaluation of Arbitrary Style Transfer. IEEE Trans. Neural Netw. Learn. Syst. 2023. [Google Scholar] [CrossRef] [PubMed]
  26. Bao, K.; Meng, X.; Chai, X.; Shao, F. A Blind Full Resolution Assessment Method for Pansharpened Images Based on Multistream Collaborative Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
  27. Lee, D.D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, USA, 1 January 2000; pp. 535–541. [Google Scholar]
  28. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR 2014, abs/1409.1556. [Google Scholar]
  29. Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2567–2581. [Google Scholar] [CrossRef]
  30. Xiong, Y.M.; Shao, F.; Meng, X.C.; Jiang, Q.P.; Sun, W.W.; Fu, R.D.; Ho, Y.S. A large-scale remote sensing database for subjective and objective quality assessment of pansharpened images. J. Vis. Commun. Image Represent. 2020, 73, 102947. [Google Scholar] [CrossRef]
  31. Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef] [PubMed]
  32. Larson, E.C.; Chandler, D.M. Most apparent distortion: Full-reference image quality assessment and the role of strategy. J. Electron. Imaging 2010, 19, 011006. [Google Scholar]
  33. Hosu, V.; Lin, H.; Sziranyi, T.; Saupe, D. KonIQ-10k: An Ecologically Valid Database for Deep Learning of Blind Image Quality Assessment. IEEE Trans. Image Process. 2020, 29, 4041–4056. [Google Scholar] [CrossRef]
  34. Zhou, B.Z.; Shao, F.; Meng, X.C.; Fu, R.D.; Ho, Y.S. No-Reference Quality Assessment for Pansharpened Images via Opinion-Unaware Learning. IEEE Access 2019, 7, 40388–40401. [Google Scholar] [CrossRef]
  35. Agudelo-Medina, O.A.; Benítez-Restrepo, H.D.; Vivone, G.; Bovik, A.C. Perceptual Quality Assessment of Pan-Sharpened Images. Remote. Sens. 2019, 11, 877. [Google Scholar] [CrossRef]
  36. Aiazzi, B.; Alparone, L.; Baronti, S.; Carlà, R.; Garzelli, A.; Santurri, L. Full scale assessment of pansharpening methods and data products. In Proceedings of the Image and Signal Processing for Remote Sensing XX, Amsterdam, The Netherlands, 22–24 September 2014; pp. 924, 402–1924, 1402–1912. [Google Scholar]
  37. Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
  38. Wang, Y.; Liu, G.; Wei, L.Y.; Yang, L.; Xu, L. A method to improve full-resolution remote sensing pansharpening image quality assessment via feature combination. Signal Process. 2023, 208, 108975. [Google Scholar] [CrossRef]
  39. Vivone, G. Robust Band-Dependent Spatial-Detail Approaches for Panchromatic Sharpening. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6421–6433. [Google Scholar] [CrossRef]
  40. Restaino, R.; Vivone, G.; Mura, M.D.; Chanussot, J. Fusion of Multispectral and Panchromatic Images Based on Morphological Operators. IEEE Trans. Image Process. 2016, 25, 2882–2895. [Google Scholar] [CrossRef] [PubMed]
  41. Yuan, Q.Q.; Wei, Y.C.; Meng, X.C.; Shen, H.F.; Zhang, L.P. A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 978–989. [Google Scholar] [CrossRef]
  42. Liu, X.; Wang, Y.; Liu, Q. Psgan: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 873–877. [Google Scholar]
  43. Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3664–3673. [Google Scholar]
  44. Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment. IEEE Trans. Image Process. 2018, 27, 206–219. [Google Scholar] [CrossRef]
  45. Zhu, H.; Li, L.; Wu, J.; Dong, W.; Shi, G. MetaIQA: Deep Meta-Learning for No-Reference Image Quality Assessment. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14131–14140. [Google Scholar]
  46. Badal, N.; Soundararajan, R.; Garg, A.; Patil, A. No Reference Pansharpened Image Quality Assessment Through Deep Feature Similarity. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7235–7247. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the DFSM-net.
Figure 1. Flowchart of the DFSM-net.
Remotesensing 16 04621 g001
Figure 2. Flowchart of feature extraction and similarity measure block.
Figure 2. Flowchart of feature extraction and similarity measure block.
Remotesensing 16 04621 g002
Figure 3. SRCC by two different training methods.
Figure 3. SRCC by two different training methods.
Remotesensing 16 04621 g003
Figure 4. Performance evaluation on six satellite datasets.
Figure 4. Performance evaluation on six satellite datasets.
Remotesensing 16 04621 g004
Figure 5. A sample pair of a PAN/MS image and the fusion images fused by different methods.
Figure 5. A sample pair of a PAN/MS image and the fusion images fused by different methods.
Remotesensing 16 04621 g005
Figure 6. Scatter plots of the predicted scores against the subjective difference mean opinion scores (DMOS). (a) Loss function w/o L C V . (b) Loss function with L C V .
Figure 6. Scatter plots of the predicted scores against the subjective difference mean opinion scores (DMOS). (a) Loss function w/o L C V . (b) Loss function with L C V .
Remotesensing 16 04621 g006
Table 1. Quantitative evaluation results of the sample images corresponding to Figure 5. (The more consistency with the subjective ranking results, the better).
Table 1. Quantitative evaluation results of the sample images corresponding to Figure 5. (The more consistency with the subjective ranking results, the better).
IQA MethodQNR
(Rank)
NIQE (Rank)GQNR
(Rank)
HQNR (Rank)MQNR
(Rank)
FCBM
(Rank)
MCL-Net
(Rank)
Proposed
(Rank)
BDSD-PC0.9526
(3)
0.6481
(4)
1.4537
(3)
0.9641
(3)
0.6598
(1)
0.5005
(1)
0.5323
(3)
0.3567
(1)
PSGAN0.9766
(1)
0.3301
(2)
0.2213
(2)
0.9671
(2)
1.1644
(5)
0.5245
(5)
0.5315
(2)
0.4005
(2)
MSDCNN0.9754
(2)
0.2900
(1)
0.2144
(1)
0.9682
(1)
1.3304
(6)
0.5236
(4)
0.5287
(1)
0.4024
(3)
MF0.8532
(5)
0.6657
(5)
5.5824
(5)
0.9395
(4)
0.7088
(2)
0.5106
(4)
0.5469
(6)
0.4179
(4)
GSA0.8566
(4)
0.6330
(3)
4.5014
(4)
0.9215
(6)
0.7191
(3)
0.5045
(2)
0.5400
(4)
0.4201
(5)
MTF-GLP0.8472
(6)
0.6787
(6)
5.7922
(6)
0.9368
(5)
0.9403
(4)
0.5064
(3)
0.5403
(5)
0.4712
(6)
Table 2. Ablation study on the SM block and L C V . (The best is marked in bold and the second best is underlined).
Table 2. Ablation study on the SM block and L C V . (The best is marked in bold and the second best is underlined).
Setting SRCC KRCC PLCC RMSE
w/o SM and L C V 0.40810.26980.40990.1525
w/o L C V 0.59710.40310.59380.1491
w/o SM0.77550.56510.81010.0899
Proposed0.81000.60630.85780.0838
Table 3. Ablation study on the size of the fusion images.
Table 3. Ablation study on the size of the fusion images.
Model SRCC KRCC PLCC RMSE
Downsampling 0.71260.50590.71010.1449
Proposed0.81000.60630.85780.0838
Table 4. Ablation study on the feature extraction branch. (The best is marked in bold and the second best is underlined).
Table 4. Ablation study on the feature extraction branch. (The best is marked in bold and the second best is underlined).
Model SRCC KRCC PLCC RMSE
Single spatial branch0.64540.44700.67100.1199
Single spectral branch0.78910.57790.83950.0896
Proposed0.81000.60630.85780.0838
Table 5. Ablation study on the training method.
Table 5. Ablation study on the training method.
Model SRCC KRCC PLCC RMSE
Directly trained0.76130.54140.80380.0982
Pre-trained0.81000.60630.85780.0838
Table 6. Ablation study on the weight of L C V . (The best is marked in bold and the second best is underlined).
Table 6. Ablation study on the weight of L C V . (The best is marked in bold and the second best is underlined).
Setting SRCC KRCC PLCC RMSE
0.50.80500.59840.84950.0924
10.80570.60180.85550.0846
20.80950.60520.85680.0837
40.81000.60630.85780.0838
80.79500.59940.83040.1074
Table 7. The cross-assessment results on the subsets.
Table 7. The cross-assessment results on the subsets.
Setting SRCC KRCC PLCC RMSE
Training on Subset 10.79590.58550.84280.1001
Cross assessment on Subset 20.80710.59530.80070.1068
Training on Subset 20.82000.61370.80990.0825
Cross assessment on Subset 10.77300.55940.78340.1107
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Z.; Zhang, S.; Meng, X.; Chen, L.; Shao, F. Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure. Remote Sens. 2024, 16, 4621. https://doi.org/10.3390/rs16244621

AMA Style

Zhang Z, Zhang S, Meng X, Chen L, Shao F. Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure. Remote Sensing. 2024; 16(24):4621. https://doi.org/10.3390/rs16244621

Chicago/Turabian Style

Zhang, Zhenhua, Shenfu Zhang, Xiangchao Meng, Liang Chen, and Feng Shao. 2024. "Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure" Remote Sensing 16, no. 24: 4621. https://doi.org/10.3390/rs16244621

APA Style

Zhang, Z., Zhang, S., Meng, X., Chen, L., & Shao, F. (2024). Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure. Remote Sensing, 16(24), 4621. https://doi.org/10.3390/rs16244621

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop