Generative Adversarial Network for Overcoming Occlusion in Images: A Survey
<p>Architecture of the original GAN [<a href="#B53-algorithms-16-00175" class="html-bibr">53</a>].</p> "> Figure 2
<p>The three sub-tasks in amodal completion.</p> "> Figure 3
<p>Different types of image segmentation.</p> "> Figure 4
<p>Outline of the approaches for addressing the challenges in overcoming occlusion through GAN. For amodal segmentation the implemented architecture are, a discriminator with a two hourglass generator [<a href="#B60-algorithms-16-00175" class="html-bibr">60</a>], a coarse-to-fine architecture with contextual attention [<a href="#B63-algorithms-16-00175" class="html-bibr">63</a>] or multiple discriminators [<a href="#B67-algorithms-16-00175" class="html-bibr">67</a>], and a generator with priori knowledge [<a href="#B66-algorithms-16-00175" class="html-bibr">66</a>]. For order recovery, GAN is designed as a generator with a single discriminator [<a href="#B71-algorithms-16-00175" class="html-bibr">71</a>,<a href="#B73-algorithms-16-00175" class="html-bibr">73</a>], or multiple discriminators [<a href="#B69-algorithms-16-00175" class="html-bibr">69</a>,<a href="#B70-algorithms-16-00175" class="html-bibr">70</a>]. To perform amodal content completion for facial images, the architectures include: a single generator and discriminator [<a href="#B79-algorithms-16-00175" class="html-bibr">79</a>,<a href="#B80-algorithms-16-00175" class="html-bibr">80</a>,<a href="#B81-algorithms-16-00175" class="html-bibr">81</a>], multiple discriminators [<a href="#B82-algorithms-16-00175" class="html-bibr">82</a>,<a href="#B83-algorithms-16-00175" class="html-bibr">83</a>,<a href="#B84-algorithms-16-00175" class="html-bibr">84</a>,<a href="#B85-algorithms-16-00175" class="html-bibr">85</a>,<a href="#B86-algorithms-16-00175" class="html-bibr">86</a>,<a href="#B87-algorithms-16-00175" class="html-bibr">87</a>], multiple generators [<a href="#B88-algorithms-16-00175" class="html-bibr">88</a>], multiple generators and discriminators [<a href="#B89-algorithms-16-00175" class="html-bibr">89</a>,<a href="#B90-algorithms-16-00175" class="html-bibr">90</a>], or a coarse-to-fine architecture [<a href="#B91-algorithms-16-00175" class="html-bibr">91</a>,<a href="#B92-algorithms-16-00175" class="html-bibr">92</a>,<a href="#B93-algorithms-16-00175" class="html-bibr">93</a>]. Generic object completion is carried out through coarse-to-fine architecture [<a href="#B63-algorithms-16-00175" class="html-bibr">63</a>], multiple discriminators with contextual attention [<a href="#B78-algorithms-16-00175" class="html-bibr">78</a>], or partial convolution and CGAN [<a href="#B75-algorithms-16-00175" class="html-bibr">75</a>,<a href="#B76-algorithms-16-00175" class="html-bibr">76</a>]. Human completion for attribute classification is utilized in [<a href="#B108-algorithms-16-00175" class="html-bibr">108</a>,<a href="#B110-algorithms-16-00175" class="html-bibr">110</a>]. Other works use GAN to complete the images of food [<a href="#B112-algorithms-16-00175" class="html-bibr">112</a>], vehicles [<a href="#B67-algorithms-16-00175" class="html-bibr">67</a>], and humans [<a href="#B66-algorithms-16-00175" class="html-bibr">66</a>,<a href="#B114-algorithms-16-00175" class="html-bibr">114</a>]. GAN is also used to generate training data of generic objects [<a href="#B115-algorithms-16-00175" class="html-bibr">115</a>,<a href="#B117-algorithms-16-00175" class="html-bibr">117</a>], humans [<a href="#B113-algorithms-16-00175" class="html-bibr">113</a>,<a href="#B119-algorithms-16-00175" class="html-bibr">119</a>,<a href="#B120-algorithms-16-00175" class="html-bibr">120</a>], and face images [<a href="#B106-algorithms-16-00175" class="html-bibr">106</a>].</p> ">
Abstract
:1. Introduction
- We survey the literature for the available frameworks where they utilize GAN in one or more aspects of amodal completion.
- We discuss in detail the architecture of existing works and how they have incorporated GAN in tackling the problems that occur from occlusion.
- We summarize the loss function, the dataset, and the reported results of the available works.
- We also provide an overview of prevalent objective functions in training the GAN model for amodal completion tasks.
- Finally, we discuss several directions for the future research in tasks of occlusion handling wherein GAN can be utilized.
2. Methodology
3. Related Works
# | Title | Pub. | Year |
---|---|---|---|
1 | Generative adversarial networks: introduction and outlook [37] | IEEE | 2017 |
2 | Comparative study on generative adversarial networks [51] | arXiv | 2018 |
3 | Generative adversarial networks: An overview [52] | IEEE | 2018 |
4 | Recent progress on generative adversarial networks (GANs): A survey [34] | IEEE | 2019 |
5 | How generative adversarial networks and their variants work: An overview [32] | ACM | 2019 |
6 | Generative adversarial networks (GANs): An overview of theoretical model, evaluation metrics, and recent developments [35] | arXiv | 2020 |
7 | Generative adversarial network technologies and applications in computer vision [36] | Hindawi | 2020 |
8 | Generative adversarial networks in digital pathology: a survey on trends and future potential [42] | Elsevier | 2020 |
9 | Deep generative adversarial networks for image-to-image translation: A review [38] | MDPI | 2020 |
10 | A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications [50] | IEEE | 2021 |
11 | Generative adversarial network: An overview of theory and applications [49] | Elsevier | 2021 |
12 | The theoretical research of generative adversarial networks: an overview [33] | Elsevier | 2021 |
13 | Generative adversarial networks (GANs) challenges, solutions, and future directions [28] | ACM | 2021 |
14 | Generative adversarial networks: a survey on applications and challenges [31] | Springer | 2021 |
15 | A survey on generative adversarial networks for imbalance problems in computer vision tasks [46] | Springer | 2021 |
16 | Generative Adversarial Networks and their Application to 3D Face Generation: A Survey [41] | Elsevier | 2021 |
17 | Applications of generative adversarial networks (GANs): An updated review [45] | Springer | 2021 |
18 | Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy [3] | ACM | 2022 |
19 | Exploring Generative Adversarial Networks and Adversarial Training [27] | Elsevier | 2022 |
20 | Generative Adversarial Networks for face generation: A survey [40] | ACM | 2022 |
21 | Generative Adversarial Networks: A Survey on Training, Variants, and Applications [30] | Springer | 2022 |
22 | Augmenting data with generative adversarial networks: An overview [47] | IOS | 2022 |
23 | A Survey on Training Challenges in Generative Adversarial Networks for Biomedical Image Analysis [43] | arXiv | 2022 |
24 | Attention-based generative adversarial network in medical imaging: A narrative review [44] | Elsevier | 2022 |
25 | Generative adversarial networks for image super-resolution: A survey [48] | arXiv | 2022 |
26 | Generic image application using GANs (Generative Adversarial Networks): A Review [39] | Springer | 2022 |
27 | A Survey on Generative Adversarial Networks: Variants, Applications, and Training [29] | ACM | 2022 |
4. Background
4.1. Generative Adversarial Network
4.1.1. Achieving Nash Equilibrium
4.1.2. Mode Collapse
4.1.3. Vanishing Gradient
4.1.4. Lack of Evaluation Metrics
4.2. Amodal Completion
5. GAN in Amodal Completion
5.1. Amodal Segmentation
5.2. Order Recovery
5.3. Amodal Appearance Reconstruction
5.3.1. Generic Object Completion
5.3.2. Face Completion
5.3.3. Attribute Classification
5.3.4. Miscellaneous Applications
5.4. Training Data
6. Loss Functions
- Adversarial Loss: The loss function used in training GAN is known as an adversarial loss. It measures the distance between the distribution of the generated sample and the real sample. Each of G and D have their dedicated loss function which together form the adversarial loss, as shown in Equation (1). However, G is trained as the term that reflects the distribution of the generated data (). Extensions to the original loss function are the conditional loss and the Wasserstein loss defined in CGAN and WGAN, respectively.
- Content Loss: In image generation, content loss [138] measures the difference between the content representation of the real and the generated images, to make them more similar in terms of perceptual content. If p and x are the original and the generated images, and and are their respective representations in layer l, the content loss is calculated as
- Reconstruction Loss: The key idea behind reconstruction loss proposed by Li et al. [139] is to benefit from the visual features learned by D from the training data. The extracted features from the real data by D are fed to G to regenerate real data. By adding reconstruction loss to the GAN’s loss function, G is encouraged to reconstruct from the features of D, which brings G closer to the configurations of the real data. The reconstruction loss equation is as follows:
- Style Loss: The style loss, originally designed for image style transfer by Gatys et al. [138], is defined to ensure that the style representation of the generated image matches that of the input style image. It depends on the feature correlation between the feature maps, given by the Gram matrix (). Let a and x be the original image and the generated image, respectively, and and their corresponding style representation in layer l. The style loss is computed by the element-wise mean square difference between and ,
- andLoss: loss function is the absolute difference between the ground-truth and the generated image. On the other hand, loss is the squared difference between the actual and the generated data. When used alone, these loss functions lead to blurred results [140]. However, when combined with other loss functions, they can improve the quality of the generated images, especially loss. The generator is encouraged to not only fool the discriminator but also to be closer to the real data in or sense. Although these losses cannot capture high-frequency details, they accurately capture low frequencies. loss enforces correctness in low-frequency features; hence, it results in less blurred images compared to [8]. Both losses are defined in Equations (6) and (7).
- Perceptual Loss: The perceptual loss measures the high-level perceptual and semantic differences between the real and the fake images. Several works [141,142] introduce perceptual loss as a combination of the content loss (or feature reconstruction loss) and the style loss. However, Liu et al. [62] simply compute the distance between the real and the completed images. Others incorporate more similarity metrics into it [140].
- BCE Loss: BCE loss measures how close the probability of the predicted data is to the real data. Its value increases as the predicted probability deviates from the real label. The BCE is defined as
- Hinge Loss: In GAN, Hinge loss is used to help the convergence to a Nash equilibrium. Proposed by Lim and Ye [143], the objective function for G is
7. Open Challenges and Future Directions
- Amodal training data: Up until now, there has been no fully annotated generic amodal dataset with sufficient ground-truth labels for the three sub-tasks in amodal completion. Most of the existing datasets are specific to a particular application or task. This not only makes training the models themselves more difficult, but verifying their learning capability as well. In many cases, there is no sufficient labeled amodal validation data to establish the accuracy of the model. We present the challenges related to each sub-task in amodal completion.For amodal segmentation, the current datasets do not contain sufficient occlusion cases between similar objects. Hence, the model cannot tell where the boundary of one object ends and the other one begins.The existing real (manually annotated) amodal datasets have no ground-truth appearance for the occluded region. This makes training and validating the model for amodal content completion more challenging.As for the case of order recovery, some occlusion situations are very rare in the existing datasets. On the other hand, it is impossible to cover all probable cases of occlusion in the real datasets. Nevertheless, in the future, the current datasets can be extended through generated occlusion to include more of those infrequent cases with varying degrees of occlusion.
- Evaluation metrics: There are several quantitative and qualitative evaluation measures for GAN [59]. However, as it can be noticed from the results, there is no standard and unanimous evaluation metric for assessing the performance of GAN when it generates the occluded content. Many existing works depend on the human preference judgement which can be biased and subjective. Therefore, designing a consensus evaluation metric is of utmost importance.
- Reference data: Existing GAN models fail to generate occluded content accurately if the hidden area is large. Particularly, when the occluded object is non-symmetric, such as the face or the human body. The visible region of the object may not hold sufficient relevant features to guide a visually plausible regeneration. As the next step, reference images can be used along the input image to guide the completion more effectively.
8. Discussion
- Architecture: While the original GAN consists of a single generator and discriminator, several works utilize multiple generators and discriminators. The implementation of local and global discriminators is especially common, because it enhances the quality of the generated data. The generator is encouraged to concentrate on both the global contextual and local features, and produce images that are closer to the distribution of the real data. In addition to this, an initial-to-refined (also called coarse-to-fine) architecture is implemented in many models. The initial stage produces a coarse output from the input image, which is then further refined in the refinement step.
- Objective function: To improve the quality of the generated output and stabilize the training of the GAN, a combination of loss terms is used. While adversarial loss and Hinge loss are used in training the two networks in the GAN, other objective functions encourage the model to produce an image that is consistent with the ground-truth image.
- Input: Under severe occlusion, the GAN may fail to produce a visually pleasing output solely depending on the visible region. Therefore, providing additional input information guides GAN in producing better results. In the amodal shape and content completion, synthetic instances similar to the occluded object are useful, because they can be used as a reference by the model. A priori knowledge is also beneficial, as it can either be manually encoded (e.g., utilizing various human poses for human deocclusion) or transferred from a pre-trained model (e.g., using a pre-trained face recognition model in face deocclusion). In addition to these, employing the amodal mask and the category of the occluded object in the content completion task restricts the GAN model to focus on completing the object in question. For producing the amodal mask, a modal mask is needed as an input. If the input is not available, most of existing works depend on a pre-trained segmentation model to predict the visible segmentation mask.
- Feature extraction: The pixels in the visible region of an image are rather important and contain essential information for various tasks; hence, they are considered as valid pixels. Contrary to this, the invisible pixels are invalid ones; hence, they should not be included in the feature extraction/encoding process. However, the vanilla convolution process cannot differentiate between valid and invalid pixels, which generates images with visual artifacts and color discrepancies. Therefore, partial convolution and a soft gating mechanism are implemented to enforce the generator to focus only on valid pixels and eliminate/minimize the effect of the invalid ones. On the other hand, dilated convolution layers can replace the vanilla convolution layers to borrow information from relevant spatially distant pixels. Additionally, contextual attention layers and attention mechanism are added to the networks of the GAN to leverage the information from the image context and capture global dependencies.
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Thielen, J.; Bosch, S.E.; van Leeuwen, T.M.; van Gerven, M.A.; van Lier, R. Neuroimaging findings on amodal completion: A review. i-Perception 2019, 10, 2041669519840047. [Google Scholar] [CrossRef] [PubMed]
- Saleh, K.; Szénási, S.; Vámossy, Z. Occlusion Handling in Generic Object Detection: A Review. In Proceedings of the 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), Herl’any, Slovakia, 21–23 January 2021; pp. 477–484. [Google Scholar]
- Wang, Z.; She, Q.; Ward, T.E. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Yang, T.; Pan, Q.; Li, J.; Li, S.Z. Real-time multiple objects tracking with occlusion handling in dynamic scenes. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 21–23 September 2005; Volume 1, pp. 970–975. [Google Scholar]
- Enzweiler, M.; Eigenstetter, A.; Schiele, B.; Gavrila, D.M. Multi-cue pedestrian classification with partial occlusion handling. In Proceedings of the 2010 IEEE Computer Society Conference on cOmputer Vision Furthermore, Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 990–997. [Google Scholar]
- Benenson, R. Occlusion Handling. In Computer Vision: A Reference Guide; Ikeuchi, K., Ed.; Springer US: Boston, MA, USA, 2014; pp. 551–552. [Google Scholar] [CrossRef]
- Tian, Y.; Guan, T.; Wang, C. Real-time occlusion handling in augmented reality based on an object tracking approach. Sensors 2010, 10, 2885–2900. [Google Scholar] [CrossRef] [Green Version]
- Ao, J.; Ke, Q.; Ehinger, K.A. Image amodal completion: A survey. In Computer Vision and Image Understanding; Elsevier: Amsterdam, The Netherlands, 2023; p. 103661. [Google Scholar]
- Anuj, L.; Krishna, M.G. Multiple camera based multiple object tracking under occlusion: A survey. In Proceedings of the 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bengaluru, India, 21–23 February 2017; pp. 432–437. [Google Scholar]
- Shravya, A.; Monika, K.; Malagi, V.; Krishnan, R. A comprehensive survey on multi object tracking under occlusion in aerial image sequences. In Proceedings of the 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), Bangalore, India, 19–20 March 2019; pp. 225–230. [Google Scholar]
- Ning, C.; Menglu, L.; Hao, Y.; Xueping, S.; Yunhong, L. Survey of pedestrian detection with occlusion. Complex Intell. Syst. 2021, 7, 577–587. [Google Scholar] [CrossRef]
- Li, F.; Li, X.; Liu, Q.; Li, Z. Occlusion Handling and Multi-scale Pedestrian Detection Based on Deep Learning: A Review. IEEE Access 2022, 10, 19937–19957. [Google Scholar] [CrossRef]
- Zhang, L.; Verma, B.; Tjondronegoro, D.; Chandran, V. Facial expression analysis under partial occlusion: A survey. ACM Comput. Surv. (CSUR) 2018, 51, 1–49. [Google Scholar] [CrossRef] [Green Version]
- Dagnes, N.; Vezzetti, E.; Marcolin, F.; Tornincasa, S. Occlusion detection and restoration techniques for 3D face recognition: A literature review. Mach. Vis. Appl. 2018, 29, 789–813. [Google Scholar] [CrossRef]
- Zeng, D.; Veldhuis, R.; Spreeuwers, L. A survey of face recognition techniques under occlusion. IET Biom. 2021, 10, 581–606. [Google Scholar] [CrossRef]
- Meena, M.K.; Meena, H.K. A Literature Survey of Face Recognition Under Different Occlusion Conditions. In Proceedings of the 2022 IEEE Region 10 Symposium (TENSYMP), Mumbai, India, 1–3 July 2022; pp. 1–6. [Google Scholar]
- Biswas, S. Performance Improvement of Face Recognition Method and Application for the COVID-19 Pandemic. Acta Polytech. Hung. 2022, 19, 1–21. [Google Scholar]
- Gilroy, S.; Jones, E.; Glavin, M. Overcoming occlusion in the automotive environment—A review. IEEE Trans. Intell. Transp. Syst. 2019, 22, 23–35. [Google Scholar] [CrossRef]
- Rosić, S.; Stamenković, D.; Banić, M.; Simonović, M.; Ristić-Durrant, D.; Ulianov, C. Analysis of the Safety Level of Obstacle Detection in Autonomous Railway Vehicles. Acta Polytech. Hung. 2022, 1, 187–205. [Google Scholar] [CrossRef]
- Macedo, M.C.d.F.; Apolinario, A.L. Occlusion Handling in Augmented Reality: Past, Present and Future. IEEE Trans. Vis. Comput. Graph. 2021, 29, 1590–1609. [Google Scholar] [CrossRef]
- Zhang, Z.; Ji, X.; Cui, X.; Ma, J. A Survey on Occluded Face recognition. In Proceedings of the 2020 The 9th International Conference on Networks, Communication and Computing, Tokyo, Japan, 18–20 December 2020; pp. 40–49. [Google Scholar]
- Sajeeda, A.; Hossain, B.M. Exploring Generative Adversarial Networks and Adversarial Training. Int. J. Cogn. Comput. Eng. 2022, 3, 78–89. [Google Scholar] [CrossRef]
- Saxena, D.; Cao, J. Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–42. [Google Scholar] [CrossRef]
- Jabbar, A.; Li, X.; Omar, B. A survey on generative adversarial networks: Variants, applications, and training. ACM Comput. Surv. (CSUR) 2021, 54, 1–49. [Google Scholar] [CrossRef]
- Farajzadeh-Zanjani, M.; Razavi-Far, R.; Saif, M.; Palade, V. Generative Adversarial Networks: A Survey on Training, Variants, and Applications. In Generative Adversarial Learning: Architectures and Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 7–29. [Google Scholar]
- Pavan Kumar, M.; Jayagopal, P. Generative adversarial networks: A survey on applications and challenges. Int. J. Multimed. Inf. Retr. 2021, 10, 1–24. [Google Scholar] [CrossRef]
- Hong, Y.; Hwang, U.; Yoo, J.; Yoon, S. How generative adversarial networks and their variants work: An overview. ACM COmputing Surv. (CSUR) 2019, 52, 1–43. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Wang, Q.; Zhang, J.; Hu, L.; Ouyang, W. The theoretical research of generative adversarial networks: An overview. Neurocomputing 2021, 435, 26–41. [Google Scholar] [CrossRef]
- Pan, Z.; Yu, W.; Yi, X.; Khan, A.; Yuan, F.; Zheng, Y. Recent progress on generative adversarial networks (GANs): A survey. IEEE Access 2019, 7, 36322–36333. [Google Scholar] [CrossRef]
- Salehi, P.; Chalechale, A.; Taghizadeh, M. Generative adversarial networks (GANs): An overview of theoretical model, evaluation metrics, and recent developments. arXiv 2020, arXiv:2005.13178. [Google Scholar]
- Jin, L.; Tan, F.; Jiang, S. Generative adversarial network technologies and applications in computer vision. Comput. Intell. Neurosci. 2020, 2020, 1459107. [Google Scholar] [CrossRef] [PubMed]
- Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sinica 2017, 4, 588–598. [Google Scholar] [CrossRef]
- Alotaibi, A. Deep generative adversarial networks for image-to-image translation: A review. Symmetry 2020, 12, 1705. [Google Scholar] [CrossRef]
- Porkodi, S.; Sarada, V.; Maik, V.; Gurushankar, K. Generic image application using GANs (Generative Adversarial Networks): A Review. Evol. Syst. 2022, 1–15. [Google Scholar] [CrossRef]
- Kammoun, A.; Slama, R.; Tabia, H.; Ouni, T.; Abid, M. Generative Adversarial Networks for face generation: A survey. ACM Comput. Surv. (CSUR) 2022, 55, 1–37. [Google Scholar] [CrossRef]
- Toshpulatov, M.; Lee, W.; Lee, S. Generative adversarial networks and their application to 3D face generation: A survey. Image Vis. Comput. 2021, 108, 104119. [Google Scholar] [CrossRef]
- Tschuchnig, M.E.; Oostingh, G.J.; Gadermayr, M. Generative adversarial networks in digital pathology: A survey on trends and future potential. Patterns 2020, 1, 100089. [Google Scholar] [CrossRef]
- Saad, M.M.; O’Reilly, R.; Rehmani, M.H. A Survey on Training Challenges in Generative Adversarial Networks for Biomedical Image Analysis. arXiv 2022, arXiv:2201.07646. [Google Scholar]
- Zhao, J.; Hou, X.; Pan, M.; Zhang, H. Attention-based generative adversarial network in medical imaging: A narrative review. Comput. Biol. Med. 2022, 149, 105948. [Google Scholar] [CrossRef]
- Alqahtani, H.; Kavakli-Thorne, M.; Kumar, G. Applications of generative adversarial networks (gans): An updated review. Arch. Comput. Methods Eng. 2021, 28, 525–552. [Google Scholar] [CrossRef]
- Sampath, V.; Maurtua, I.; Aguilar Martín, J.J.; Gutierrez, A. A survey on generative adversarial networks for imbalance problems in computer vision tasks. J. Big Data 2021, 8, 1–59. [Google Scholar] [CrossRef]
- Ljubić, H.; Martinović, G.; Volarić, T. Augmenting data with generative adversarial networks: An overview. Intell. Data Anal. 2022, 26, 361–378. [Google Scholar] [CrossRef]
- Tian, C.; Zhang, X.; Lin, J.C.W.; Zuo, W.; Zhang, Y.; Lin, C.W. Generative adversarial networks for image super-resolution: A survey. arXiv 2022, arXiv:2204.13620. [Google Scholar]
- Aggarwal, A.; Mittal, M.; Battineni, G. Generative adversarial network: An overview of theory and applications. Int. J. Inf. Manag. Data Insights 2021, 1, 100004. [Google Scholar] [CrossRef]
- Gui, J.; Sun, Z.; Wen, Y.; Tao, D.; Ye, J. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Trans. Knowl. Data Eng. 2021, 35, 3313–3332. [Google Scholar] [CrossRef]
- Hitawala, S. Comparative study on generative adversarial networks. arXiv 2018, arXiv:1801.04271. [Google Scholar]
- Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow Ian, J.; Jean, P.A.; Mehdi, M.; Bing, X.; David, W.F.; Sherjil, O.; Courville Aaron, C. Generative adversarial nets. In Proceedings of the 27th international Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 2, pp. 2672–2680. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
- Rubner, Y.; Tomasi, C.; Guibas, L.J. The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 2000, 40, 99–121. [Google Scholar] [CrossRef]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 5767–5777. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29, 2234–2242. [Google Scholar]
- Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. arXiv 2017, arXiv:1701.04862. [Google Scholar]
- Borji, A. Pros and cons of gan evaluation measures. Comput. Vis. Image Underst. 2019, 179, 41–65. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Q.; Wang, S.; Wang, Y.; Huang, Z.; Wang, X. Human De-occlusion: Invisible Perception and Recovery for Humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3691–3701. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 14–17 May 2018; pp. 85–100. [Google Scholar]
- Xiong, W.; Yu, J.; Lin, Z.; Yang, J.; Lu, X.; Barnes, C.; Luo, J. Foreground-aware image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5840–5848. [Google Scholar]
- Rajchl, M.; Lee, M.C.; Oktay, O.; Kamnitsas, K.; Passerat-Palmbach, J.; Bai, W.; Damodaram, M.; Rutherford, M.A.; Hajnal, J.V.; Kainz, B.; et al. Deepcut: Object segmentation from bounding box annotations using convolutional neural networks. IEEE Trans. Med. Imaging 2016, 36, 674–683. [Google Scholar] [CrossRef] [Green Version]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5505–5514. [Google Scholar]
- Zhang, Q.; Liang, Q.; Liang, H.; Yang, Y. Removal and Recovery of the Human Invisible Region. Symmetry 2022, 14, 531. [Google Scholar] [CrossRef]
- Yan, X.; Wang, F.; Liu, W.; Yu, Y.; He, S.; Pan, J. Visualizing the invisible: Occluded vehicle segmentation and recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7618–7627. [Google Scholar]
- Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D.N. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1947–1962. [Google Scholar] [CrossRef] [Green Version]
- Dhamo, H.; Tateno, K.; Laina, I.; Navab, N.; Tombari, F. Peeking behind objects: Layered depth prediction from a single image. Pattern Recognit. Lett. 2019, 125, 333–340. [Google Scholar] [CrossRef] [Green Version]
- Mani, K.; Daga, S.; Garg, S.; Narasimhan, S.S.; Krishna, M.; Jatavallabhula, K.M. Monolayout: Amodal scene layout from a single image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1689–1697. [Google Scholar]
- Zheng, C.; Dao, D.S.; Song, G.; Cham, T.J.; Cai, J. Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition. Int. J. Comput. Vis. 2021, 129, 3195–3215. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Dhamo, H.; Navab, N.; Tombari, F. Object-driven multi-layer scene decomposition from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5369–5378. [Google Scholar]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4471–4480. [Google Scholar]
- Zhan, X.; Pan, X.; Dai, B.; Liu, Z.; Lin, D.; Loy, C.C. Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3784–3792. [Google Scholar]
- Ehsani, K.; Mottaghi, R.; Farhadi, A. Segan: Segmenting and generating the invisible. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6144–6153. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kahatapitiya, K.; Tissera, D.; Rodrigo, R. Context-aware automatic occlusion removal. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1895–1899. [Google Scholar]
- Cai, J.; Han, H.; Cui, J.; Chen, J.; Liu, L.; Zhou, S.K. Semi-supervised natural face de-occlusion. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1044–1057. [Google Scholar] [CrossRef]
- Chen, Y.A.; Chen, W.C.; Wei, C.P.; Wang, Y.C.F. Occlusion-aware face inpainting via generative adversarial networks. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 1202–1206. [Google Scholar]
- Cheung, Y.M.; Li, M.; Zou, R. Facial Structure Guided GAN for Identity-preserved Face Image De-occlusion. In Proceedings of the 2021 International Conference on Multimedia Retrieval, Taipei, Taiwan, 21 August 2021; pp. 46–54. [Google Scholar]
- Li, Y.; Liu, S.; Yang, J.; Yang, M.H. Generative face completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3911–3919. [Google Scholar]
- Mathai, J.; Masi, I.; AbdAlmageed, W. Does generative face completion help face recognition? In Proceedings of the 2019 International Conference on Biometrics (ICB), Crete, Greece, 4–7 June 2019; pp. 1–8. [Google Scholar]
- Liu, H.; Zheng, W.; Xu, C.; Liu, T.; Zuo, M. Facial landmark detection using generative adversarial network combined with autoencoder for occlusion. Math. Probl. Eng. 2020, 2020, 1–8. [Google Scholar] [CrossRef]
- Cai, J.; Hu, H.; Shan, S.; Chen, X. Fcsr-gan: End-to-end learning for joint face completion and super-resolution. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–8. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Li, C.; Ge, S.; Zhang, D.; Li, J. Look through masks: Towards masked face recognition with de-occlusion distillation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3016–3024. [Google Scholar]
- Dong, J.; Zhang, L.; Zhang, H.; Liu, W. Occlusion-aware gan for face de-occlusion in the wild. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Jabbar, A.; Li, X.; Assam, M.; Khan, J.A.; Obayya, M.; Alkhonaini, M.A.; Al-Wesabi, F.N.; Assad, M. AFD-StackGAN: Automatic Mask Generation Network for Face De-Occlusion Using StackGAN. Sensors 2022, 22, 1747. [Google Scholar] [CrossRef]
- Li, Z.; Hu, Y.; He, R.; Sun, Z. Learning disentangling and fusing networks for face completion under structured occlusions. Pattern Recognit. 2020, 99, 107073. [Google Scholar] [CrossRef] [Green Version]
- Jabbar, A.; Li, X.; Iqbal, M.M.; Malik, A.J. FD-StackGAN: Face De-occlusion Using Stacked Generative Adversarial Networks. KSII TRansactions Internet Inf. Syst. (TIIS) 2021, 15, 2547–2567. [Google Scholar]
- Duan, Q.; Zhang, L. Look more into occlusion: Realistic face frontalization and recognition with boostgan. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 214–228. [Google Scholar] [CrossRef]
- Duan, Q.; Zhang, L.; Gao, X. Simultaneous face completion and frontalization via mask guided two-stage GAN. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3761–3773. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Proceedings of the Workshop on faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17 October 2008. [Google Scholar]
- Le, V.; Brandt, J.; Lin, Z.; Bourdev, L.; Huang, T.S. Interactive facial feature localization. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 679–692. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep Face Recognition. In Proceedings of the British Machine Vision Conference (BMVC), Swansea, UK, 7–10 September 2015; BMVA Press: Durham, UK, 2015; pp. 41.1–41.12. [Google Scholar]
- Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 87–102. [Google Scholar]
- Liao, S.; Lei, Z.; Yi, D.; Li, S.Z. A benchmark study of large-scale unconstrained face recognition. In Proceedings of the IEEE International Joint Conference on Biometrics, Clearwater, FL, USA, 29 September–2 October 2014; pp. 1–8. [Google Scholar]
- Lee, C.H.; Liu, Z.; Wu, L.; Luo, P. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5549–5558. [Google Scholar]
- Martinez, A.; Benavente, R. The Ar Face Database: Cvc Technical Report, 24; Universitat Autonoma do Barcelona: Barcelona, Spain, 1998. [Google Scholar]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-pie. IMage Vis. Comput. 2010, 28, 807–813. [Google Scholar] [CrossRef]
- Phillips, P.J.; Moon, H.; Rizvi, S.A.; Rauss, P.J. The FERET evaluation methodology for face-recognition algorithms. IEEE TRansactions Pattern Anal. Mach. Intell. 2000, 22, 1090–1104. [Google Scholar] [CrossRef]
- Cong, K.; Zhou, M. Face Dataset Augmentation with Generative Adversarial Network. J. Phys. Conf. Ser. 2022, 2218, 012035. [Google Scholar] [CrossRef]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Wider face: A face detection benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5525–5533. [Google Scholar]
- Fabbri, M.; Calderara, S.; Cucchiara, R. Generative adversarial models for people attribute recognition in surveillance. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Fulgeri, F.; Fabbri, M.; Alletto, S.; Calderara, S.; Cucchiara, R. Can adversarial networks hallucinate occluded people with a plausible aspect? Comput. Vis. Image Underst. 2019, 182, 71–80. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Papadopoulos, D.P.; Tamaazousti, Y.; Ofli, F.; Weber, I.; Torralba, A. How to make a pizza: Learning a compositional layer-based gan model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8002–8011. [Google Scholar]
- Zhang, K.; Wu, D.; Yuan, C.; Qin, X.; Wu, H.; Zhao, X.; Zhang, L.; Du, Y.; Wang, H. Random Occlusion Recovery with Noise Channel for Person Re-identification. In Proceedings of the International Conference on Intelligent Computing. Springer, Shenzhen, China, 12–15 August 2020; pp. 183–191. [Google Scholar]
- Tagore, N.K.; Chattopadhyay, P. A bi-network architecture for occlusion handling in Person re-identification. Signal Image Video Process. 2022, 16, 1–9. [Google Scholar] [CrossRef]
- Wang, X.; Shrivastava, A.; Gupta, A. A-fast-rcnn: Hard positive generation via adversary for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2606–2615. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Han, G.; Zhou, W.; Sun, N.; Liu, J.; Li, X. Feature fusion and adversary occlusion networks for object detection. IEEE Access 2019, 7, 124854–124865. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [Green Version]
- Zhao, C.; Lv, X.; Dou, S.; Zhang, S.; Wu, J.; Wang, L. Incremental generative occlusion adversarial suppression network for person ReID. IEEE Trans. Image Process. 2021, 30, 4212–4224. [Google Scholar] [CrossRef]
- Wu, D.; Zhang, K.; Zheng, S.J.; Hao, Y.T.; Liu, F.Q.; Qin, X.; Cheng, F.; Zhao, Y.; Liu, Q.; Yuan, C.A.; et al. Random occlusion recovery for person re-identification. J. Imaging Sci. Technol. 2019, 63, 30405. [Google Scholar] [CrossRef] [Green Version]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [Green Version]
- McCormac, J.; Handa, A.; Leutenegger, S.; Davison, A.J. Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv 2016, arXiv:1612.05079. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 746–760. [Google Scholar]
- Song, S.; Yu, F.; Zeng, A.; Chang, A.X.; Savva, M.; Funkhouser, T. Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1746–1754. [Google Scholar]
- Armeni, I.; Sax, S.; Zamir, A.R.; Savarese, S. Joint 2d-3d-semantic data for indoor scene understanding. arXiv 2017, arXiv:1702.01105. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Chang, M.F.; Lambert, J.; Sangkloy, P.; Singh, J.; Bak, S.; Hartnett, A.; Wang, D.; Carr, P.; Lucey, S.; Ramanan, D.; et al. Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 8748–8757. [Google Scholar]
- Zhu, Y.; Tian, Y.; Metaxas, D.; Dollár, P. Semantic amodal segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1464–1472. [Google Scholar]
- Qi, L.; Jiang, L.; Liu, S.; Shen, X.; Jia, J. Amodal instance segmentation with kins dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3014–3023. [Google Scholar]
- Caesar, H.; Uijlings, J.; Ferrari, V. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1209–1218. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Li, D.; Zhang, Z.; Chen, X.; Ling, H.; Huang, K. A richly annotated dataset for pedestrian attribute recognition. arXiv 2016, arXiv:1603.07054. [Google Scholar]
- Zheng, L.; Shen, L.; Tian, L.; Wang, S.; Wang, J.; Tian, Q. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1116–1124. [Google Scholar]
- Li, W.; Zhao, R.; Wang, X. Human reidentification with transferred metric learning. In Proceedings of the Computer Vision–ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Republic of Korea, 5–9 November 2012; pp. 31–44. [Google Scholar]
- Li, W.; Zhao, R.; Xiao, T.; Wang, X. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 152–159. [Google Scholar]
- Zheng, Z.; Zheng, L.; Yang, Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3754–3762. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Li, Y.; Xiao, N.; Ouyang, W. Improved generative adversarial networks with reconstruction loss. Neurocomputing 2019, 323, 363–372. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Brox, T. Generating images with perceptual similarity metrics based on deep networks. Adv. Neural Inf. Process. Syst. 2016, 29, 658–666. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
- Lim, J.H.; Ye, J.C. Geometric gan. arXiv 2017, arXiv:1705.02894. [Google Scholar]
- Li, K.; Malik, J. Amodal instance segmentation. In Proceedings of theComputer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 677–693. [Google Scholar]
- Héder, M.; Rigó, E.; Medgyesi, D.; Lovas, R.; Tenczer, S.; Török, F.; Farkas, A.; Emodi, M.; Kadlecsik, J.; Mezo, G.; et al. The past, present and future of the ELKH cloud. Inform. Társadalom 2022, 22, 128–137. [Google Scholar] [CrossRef]
# | Title | Pub. | Year |
---|---|---|---|
1 | Multiple camera based multiple object tracking under occlusion: A survey [14] | IEEE | 2017 |
2 | Facial expression analysis under partial occlusion: A survey [18] | ACM | 2018 |
3 | Occlusion detection and restoration techniques for 3D face recognition: a literature review [19] | Springer | 2018 |
4 | Overcoming occlusion in the automotive environment—A review [23] | IEEE | 2019 |
5 | A comprehensive survey on multi object tracking under occlusion in aerial image sequences [15] | IEEE | 2019 |
6 | A Survey on Occluded Face recognition [26] | ACM | 2020 |
7 | Occlusion Handling in Generic Object Detection: A Review [2] | IEEE | 2021 |
8 | Occlusion Handling in Augmented Reality: Past, Present and Future [25] | IEEE | 2021 |
9 | A survey of face recognition techniques under occlusion [20] | Wiley | 2021 |
10 | Survey of pedestrian detection with occlusion [16] | Springer | 2021 |
11 | Occlusion Handling and Multi-scale Pedestrian Detection Based on Deep Learning: A Review [17] | IEEE | 2022 |
12 | Image Amodal Completion: A Survey [13] | arXiv | 2022 |
13 | A Literature Survey of Face Recognition Under Different Occlusion Conditions [21] | IEEE | 2022 |
# | Paper | Type of GAN | Loss Function | Dataset | Results |
---|---|---|---|---|---|
1. | Cai et al. [79] | OA-GAN | 1. Training with synthetic occlusion: PL, style loss, pixel loss, smoothness loss, 2 loss, and AL. 2. Training with natural images: smoothness loss, 2 loss, and AL. | CelebA [94] | PSNR = 22.61, SSIM = 0.787 |
2. | Chen et al. [80] | DCGAN | AL. | LFW [95] | Equal Error Rate (EER) * = 0.88 |
3. | Cheung et al. [81] | FSG-GAN | 1 loss, identity-preserve loss, and AL. | CelebA, and LFW. | CelebA: PSNR * = 20.7513, SSIM = 0.8318; LFW: PSNR * = 20.8905, SSIM = 0.8527 |
4. | Li et al. [82] | GAN with two discriminators | Local and global AL, RL (2), and pixel-wise softmax loss. | CelebA, and Helen [96]. | PSNR * = 19.60, SSIM * = 0.803, ID = 0.470 |
5. | Mathai et al. [83] | GAN (modified generator) with two discriminators | RL (1), global WGAN loss, and local PatchGAN loss. | 1. Training the inpainter: CASIA WebFaces [97], VGG Faces [98], and MS-Celeb-1M [99]. 2. Testing the model: LFW, and LFW-BLUFR [100]. | DIR@FAR = 89.68 |
6. | Liu et al. [84] | GAN (modified generator) with two discriminators | RL (2), and AL. | CelebAMask-HQ [101] | NRMSE = 6.96 (result of facial landmark detection) |
7. | Cai et al. [85] | FCSR-GAN | MSE loss, PL, local and global AL, and face parsing loss. | CelebA, and Helen. | CelebA: PSNR = 20.22, SSIM = 0.780; Helen: PSNR = 20.01, SSIM = 0.761 |
8. | Li et al. [87] | GAN (modified generator) with two discriminators | Local and global AL, RL, and contextual attention loss. | CelebA, AR [102], and LFW. | Recognition accuracy = 95.44% |
9. | Dong et al. [88] | OA-GAN | AL and 1 loss | CelebA, and CK+ [103], with additional occlusion images from the Internet. | PSNR * = 22.402, SSIM * = 0.753 |
10. | Jabbar et al. [89] | AFD-StackGAN (PatchGAN discriminators) | 1 loss, RL (1, and SSIM), PL, and AL. | Custom dataset. | PSNR = 33.201, SSIM = 0.978, MSE = 32.435, NIQE (↓) = 4.902, BRISQUE (↓) = 39.872 |
11. | Li et al. [90] | DF-GAN | AL and cycle loss. | AR, Multi-PIE [104], Color FERET [105], and LFW. | AR: PSNR = 23.85, SSIM = 0.9168; MultiPIE: PSNR = 28.21, SSIM = 0.9176; FERET: PSNR = 28.15, SSIM = 0.931; LFW: PSNR = 23.18, SSIM = 0.869 |
12. | Jabbar et al. [91] | FD-StackGAN | RL (1, and SSIM loss), PL, and AL. | Custom dataset. | PSNR = 32.803, SSIM = 0.981, MSE = 34.145, NIQE (↓) = 4.499, BRISQUE (↓) = 42.504 |
13. | Duan and Zhang. [92] | BoostGAN | AL, identity preserving loss, 1 loss, symmetry loss, and total variation (TV) loss. | Multi-PIE and LFW. | Recognition rate = 96.02 |
14. | Duan et al. [93] | TSGAN | AL, dual triplet loss, 1 loss, symmetry loss, and TV loss. | Multi-PIE and LFW. | Recognition rate = 96.87 |
15. | Cong and Zhou. [106] | DCGAN | Cycle consistency loss from CycleGAN, AL, and Wasserstain distance loss. | Wider Face [107]. | IS = 10.36; FID = 8.85 |
# | Paper | Model | Loss Function | Dataset | Task | Results |
---|---|---|---|---|---|---|
1. | Zhou et al. [60] | GAN with PGA | 1. For mask generation: binary cross-entropy (BCE), adversarial loss, and 1 loss. 2. For content completion: adversarial loss, 1 loss, perceptual loss, and style loss. | AHP (custom dataset) | Amodal segmentation and content completion | For mask generation: IoU = 86.1/40.3, L1 = 0.1635; For content completion: FID = 19.49, L1 = 0.0617 |
2. | Xiong et al. [63] | Coarse-to-fine structure with a PatchGAN discriminator | 1. For contour completion: a focal loss based content loss, and Hinge loss for adversarial loss. 2. For content completion: 1 loss. | Places2 [121], and custom-designed dataset | Contour and content completion | L1 = 0.009327, L2 = 0.002329, PSNR = 29.86, SSIM = 0.9383, user study = 731 out of 1099 valid votes |
3. | Zhang et al. [66] | GAN with multiple PatchGAN discriminators | 1. For mask generation: adversarial loss, perceptual loss, and BCE loss. 2. For content generation: adversarial loss, 1 loss, style loss, content loss, and TV loss. | Custom dataset | Amodal segmentation and content completion | For mask generation: mIoU = 0.82, L1 = 0.0638; For content completion: L1 = 0.0344, L2 = 0.0324, FID = 33.28 |
4. | Yan et al. [67] | GAN with multiple PatchGAN-based discriminators | 1 loss, perceptual loss, and adversarial loss. | OVD (custom dataset) | Amodal segmentation and content completion | For mask generation: P = 0.9854, R = 0.8148, F1 = 0.8898, IoU = 0.8066, L1 = 0.0320, L2 = 0.0314; For content completion: ICP = 0.8350, SS = 0.9356, L1 = 0.0173, L2 = 0.0063 |
5. | Dhamo et al. [69] | PatchGAN-based | Adversarial loss and 1 loss | SceneNet [122] and NYU depth v2 [123] | RGB-D completion | Rel = 0.017, RMSE = 0.095, SSIM = 0.903, RMSE = 19.76, PSNR = 22.22 |
6. | Dhamo et al. [73] | Original GAN | 1. For object completion: 1 loss 2. For layout prediction: reconstruction (1) loss, perceptual loss, and adversarial loss. | SunCG [124] and Stanford2D-3D [125] | RGBA-D completion | SunCG: MPE = 43.12, RMSE = 65.66; Stanford2D-3D: MPE = 42.45, RMSE = 54.92 |
7. | Mani et al. [70] | GAN with two discriminators | 2 loss, adversarial loss, and the discriminator loss. | KITTI [126] and Argoverse [127] | Scene completion | KITTI object: mIoU = 26.08, mAP = 40.79; KITTI tracking: mIoU = 24.16, mAP = 36.83; Argoverse: mIoU = 32.05, mAP = 48.31 |
8. | Zheng et al. [71] | GAN with two discriminators | Reconstruction loss, adversarial loss, and perceptual (1) loss. | COCOA [128], KINS [129], and CSD (custom dataset) | Scene completion | RMSE = 0.0914, SSIM = 0.8768, PSNR = 30.45 |
9. | Zhan et al. [75] | PCNet with CGAN | 1. For mask generation: BCE loss. 2. For content completion: losses in PC [62], 1 loss, perceptual loss, and adversarial loss. | COCOA, and KINS | Content completion | KINS: mIoU = 94.76%; COCOA: mIoU = 81.35% |
10. | Ehsani et al. [76] | SeGAN | 1. For mask generation: BCE loss. 2. For content generation: Adversarial loss and 1 loss. | DYCE (custom dataset) | Content completion | L1 = 0.07, L2 = 0.03, user study = 69.78% |
11. | Kahatapitiya et al. [78] | Inpainter with contextual attention | Spatially discounted reconstruction 1 loss, local and global WGAN-GP adversarial loss. | COCO-Stuff [130] and MS COCO [131] | Content completion | User study positive = 79.7%, negative = 20.3% |
12. | Fabbri et al. [108] | DCGAN-based | 1. For attribute classification: weighted BCE loss. 2. For content completion: reconstruction loss and adversarial loss of the generator. | RAP [132] | Content completion | mA = 65.82, accuracy = 76.01, P = 48.98, R = 55.50, F1 = 52.04 |
13. | Fulgeri et al. [110] | Modified GAN (one generator and three discriminators) | Adversarial loss, content loss, and attribute loss (weighted BCE). | RAP, and Aic (custom dataset) | Content completion | RAP: mA = 72.18, accuracy = 59.59, P = 73.51, R = 73.72, F1 = 73.62, SSIM = 0.8239, PSNR = 20.65; AiC: mA = 78.37, accuracy = 53.3, P = 55.73, R = 85.46, F1 = 67.46, SSIM = 0.7101, PSNR = 21.81 |
14. | Papadopoulos et al. [112] | PizzaGAN | Adversarial loss, classification loss, cycle consistency loss as in CycleGAN, mask regularization mask. | Custom dataset | Amodal segmentation and content completion | Mask generation mIoU = 29.30% (quantitative results are not reported for content generation) |
15. | Zhang et al. [113] | CGAN | Adversarial loss. | Market-1501 [133] | Content completion | mAP = 90.42, Rank-1 = 93.35, Rank-5 = 96.87, Rank-10 = 97.92 |
16. | Tagore et al. [114] | OHGAN | BCE loss, and 2 loss. | CUHK01 [134], CUHK03 [135], Market-1501, and DukeMTMC-reID [136] | Content completion | CUHK01: Rank-1 = 93.4, Rank-5 = 96.4, Rank-10 = 98.8; CUHK03: Rank-1 = 92.8, Rank-5 = 95.4, Rank-10 = 97.0; Market-1501: Rank-1 = 94.0, Rank-5 = 96.4, Rank-10 = 97.5, mAP = 86.4; DukeMTMC-reID: Rank-1 = 91.2, Rank-5 = 93.4, Rank-10 = 95.8, mAP = 82.4 |
17. | Wang et al. [115] | A custom-designed adversarial network | BCE loss. | VOC2007, VOC2012 [137], and MS COCO | Occlusion generation and deformation | VOC2007: mAP = 73.6; VOC2012: mAP = 69.0; MS COCO: AP = 27.1 |
18. | Han et al. [117] | Adversary occlusion module | BCE loss. | VOC2007, VOC2012, MS COCO, and KITTI | Occlusion generation | VOC2007: mAP = 78.1; VOC2012: mAP = 76.7; MS COCO: AP = 42.7; KITTI: mAP = 89.01 |
19. | Wu et al. [120] | Original GAN | Euclidean loss, and BCE loss. | CUHK03, Market-1501, and DukeMTMC-reID | Content completion | Market-1501: mAP = 90.36, Rank-1 = 93.29, Rank-5 = 96.96, Rank-10 = 97.68; DukeMTMC-reID: mAP = 82.81, Rank-1 = 86.35, Rank-5 = 92.87, Rank-10 = 94.56; CUHK03: mAP = 61.95, Rank-1 = 59.78, Rank-5 = 70.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saleh, K.; Szénási, S.; Vámossy, Z. Generative Adversarial Network for Overcoming Occlusion in Images: A Survey. Algorithms 2023, 16, 175. https://doi.org/10.3390/a16030175
Saleh K, Szénási S, Vámossy Z. Generative Adversarial Network for Overcoming Occlusion in Images: A Survey. Algorithms. 2023; 16(3):175. https://doi.org/10.3390/a16030175
Chicago/Turabian StyleSaleh, Kaziwa, Sándor Szénási, and Zoltán Vámossy. 2023. "Generative Adversarial Network for Overcoming Occlusion in Images: A Survey" Algorithms 16, no. 3: 175. https://doi.org/10.3390/a16030175
APA StyleSaleh, K., Szénási, S., & Vámossy, Z. (2023). Generative Adversarial Network for Overcoming Occlusion in Images: A Survey. Algorithms, 16(3), 175. https://doi.org/10.3390/a16030175