[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

MS-GAN: multi-scale GAN with parallel class activation maps for image reconstruction

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Recently, image reconstruction has been a research hotspot in the field of deep learning. For image reconstruction, generative adversarial networks (GANs) have obtained some remarkable results, but the existing methods based on GANs have not achieved satisfactory reconstructed results in quality. In order to improve the quality, we propose a more effective multi-scale GAN for image reconstruction and the proposed method is called MS-GAN. The generator of MS-GAN uses an improved U-net to capture the important details from the sparse inputs. In MS-GAN, the parallel class activation maps (P-CAMs) and spectral normalization (SN) are added to U-net. P-CAMs are composed of two parallel class activation maps (CAMs) and can specifically guide the generator to focus on the important details of the images for a more realistic visual effect. For the training process, MS-GAN consists of two phases: the generating phase and the refinement phase. The generating phase is to use binary sparse edges and color domains to generate the preliminary images. The refinement phase is to further improve the quality of the preliminary images. Experimental verifications are conducted on some datasets, which include edges2shoes, edges2handbags and Getchu. Experimental results show that our approach outperforms the existing state-of-the-art methods. The images reconstructed by MS-GAN is more photo-realistic in terms of visual effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Wang, K.F., Gou, C., Duan, Y.J., Lin, Y.L., Zheng, X.H., Wang, F.Y.: Generative adversarial networks: introduction and outlook. IEEE/CAA J. Automatica Sinica 4(4), 588–598 (2017)

    Article  MathSciNet  Google Scholar 

  2. Isola, P., Zhu, J.Y., Zhou, T.H., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of IEEE Conference in Computer Vision Pattern Recognition, CVPR, Honolulu, HI, United states, pp. 5967–5976 (2017)

  3. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, United states, pp. 8798–8807 (2018)

  4. Liu, G., Chen, X., Hu, Y.: Anime sketch coloring with swish-gated residual u-net and spectrally normalized gan. Eng. Lett. 27(3), 396–402 (2019)

    MathSciNet  Google Scholar 

  5. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015, Munich, Germany, pp. 234–241 (2015)

  6. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: Proc of International Conference on Learning Representations, ICLR, Vancouver, BC, Canada (2018)

  7. Sun, Y., Dai, S.M., Li, J.D., Zhang, Y., Li, X.Q.: Tooth-marked tongue recognition using gradient-weighted class activation maps. Future Internet 11(2), 380–387 (2019)

    Article  Google Scholar 

  8. Yang, W.J., Huang, H.J., Zhang, Z., Chen, X.T, Huang, K.Q., Zhang, S.: Towards rich feature discovery with class activation maps augmentation for person re-identification. In: Proceedings of IEEE Conference in Computer Vision Pattern Recognition, Long Beach, CA, United states, pp. 1389–1398 (2019)

  9. Liu, J.Y., Liu, H., Zheng, X.Y., Han, J.G.: Exploring multi-scale deep encoder-decoder and patchgan for perceptual ultrasound image super-resolution. In: Proceedings of 1st International Conference on Neural Computing for Advanced Applications, NCAA 2020, Shenzhen, China, pp. 47–59 (2020)

  10. Ugur, D., Gozde, U.: Patch-based image inpainting with generative adversarial networks. CoRR, vol. abs/1803.07422 (2018)

  11. Kaneko, T., Kameoka, H., Tanaka, K., Hojo, N.: Cyclegan-VC2: improved cyclegan-based non-parallel voice conversion. In: Proceedings of ICASSP IEEE International Conference on Acoustics, Speech, and Signal Processing, Brighton, United kingdom, pp. 6820–6824 (2019)

  12. Schmidhuber, J.: Deep Learning in neural networks: an overview. Neural Netw 61(1), 85–117 (2015)

    Article  Google Scholar 

  13. Karnewar, A., Wang, O.: MSG-GAN: multi-scale gradients for generative adversarial networks. In: Proceedings of IEEE Conference in Computer Vision Pattern Recognition, Virtual, Online, United states, pp. 7796–7805 (2020)

  14. Liang, J., Zeng, H., Zhang, L.: High-resolution photorealistic image translation in real-time: a laplacian pyramid translation network. CoRR, vol. abs/2105.09188 (2021)

  15. Li, Y., Wu, F., Chen, X., Zha, Z.J.: Linestofacephoto: face photo generation from lines with conditional self-attention generative adversarial network. In: Proceedings of ACM International Conference on Multimedia, Nice, France, pp. 2323–2331 (2019)

  16. Wang, L., Sindagi, V., Patel, V.: High-quality facial photo-sketch synthesis using multi-adversarial networks, In: Proceeding of IEEE International Conference on Automatic Face Gesture Recognition, FG, Xi'an, China, pp. 83–90 (2018)

  17. You, S., You, N., Pan, M.: PI-REC: progressive image reconstruction network with edge and color domain. CoRR, vol. abs/1903.10146 (2019)

  18. Chu, W.T., Huang, P.S.: Thermal face recognition based on multi-scale image synthesis. In: Proceedings of 27th International Conference on MultiMedia Modeling, MMM 2021, Prague, Czech republic, pp. 99–110 (2021)

  19. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle- consistent adversarial networks. In: Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2242–2251 (2017)

  20. Almahairi, A., Rajeswar, S., Sordoni, A., Bachman, P., Courville, A.: Augmented CycleGAN: learning many-to-many mappings from unpaired data. In Proceedings of International Conference on Machine Learning, ICML, Stockholm, Sweden, pp. 300–309 (2018)

  21. Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Proceedings of 31st Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, CA, United states, pp. 466–477 (2017)

  22. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Proceedings of 31st Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, CA, United states, pp. 701–709 (2017)

  23. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, pp. 179–196 (2018)

  24. Lee, H.Y., Tseng, H.Y., Huang, J.-B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations: In: Proceedings of 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, pp. 36–52 (2018)

  25. Lee, H.Y., Tseng, H.Y., Mao, Q., Huang, J.B., Lu, Y.D., Singh, M., Yang, M.H.: DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vis. 128, 2402–2417 (2020)

  26. Dekel, T., Gan, C., Krishnan, D., Liu, C., Freeman, W.T.: Sparse, smart contours to represent and edit images. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, United states, pp. 3511–3520 (2018)

  27. Xian, W.Q., Sangkloy, P., Agrawal, V., Raj, A., Lu, J.W., Fang, C., Yu, F., Hays, J.: TextureGAN: controlling deep image synthesis with texture patches. In Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, United states, pp. 8456–8465 (2018)

  28. Zhang, L.M, Li, C.Z., Wong, T.T., Ji, Y., Liu, C.P.: Two-stage sketch colorization. ACM Trans. Graph. 37(6) (2018)

  29. Jo, Y., Park, J.: SC-FEGAN: face editing generative adversarial network with user's sketch and color. In: Proceedings of IEEE International Conference on Computer Vision, Seoul, Korea, Republic of, pp. 1745–1753 (2019)

  30. Xiao, C.F., Yu, D., Han, X.G., Zheng, Y.Y., Fu, H.B.: SketchHairSalon: deep sketch-based hair image synthesis. CoRR, abs/2109.07874 (2021)

  31. Shi, C.L., Zhang, J.C., Yao, Y.Z., Sun, Y.L., Rao, H.M., Shu, X.B.: CAN-GAN: conditioned—attention normalized GAN for face age synthesis. Pattern Recogn. Lett. 138, 520–526 (2020)

    Article  Google Scholar 

  32. Sun, Y.L., Tang, J.H., Shu, X.B., Sun, Z.N., Tistarelli, M.: Facial age synthesis with label distribution-guided generative adversarial network. IEEE Trans. Inf. Forensics Secur., 15, 2679–2691 (2020)

  33. Liu, S., Sun, Y., Zhu, D.F., Bao, R.D., Wang, W., Shu, X.B., Yan, S.C.: Face aging with contextual generative adversarial nets. In: Proceedings of ACM Multimedia Conference on Mountain View, CA, United states, pp. 82–90 (2017)

  34. Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 1511–1520 (2017)

  35. Dosovitskiy, A., Springenberg, J.T., Tatarchenko, M., Brox, T.: Learning to generate chairs, tables and cars with convolutional networks. IEEE Trans Pattern Anal Mach Intell 39(4), 692–705 (2017)

    Google Scholar 

  36. Ulyanov, D., Lebedev, V., Vedaldi, A., Lempitsky, V.: Texture networks: feed-forward synthesis of textures and stylized images. In: Proceedings of International Conference on Machine Learning, ICML, New York City, NY, United states, pp. 2027–2041 (2016)

  37. Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Diversified texture synthesis with feed-forward networks. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Honolulu, HI, United states USA, pp. 266–274 (2017)

  38. Datta, L.: A survey on activation functions and their relation with Xavier and He normal initialization, CoRR, vol. abs/2004.06632 (2020)

  39. Jing, Y., Liu, X., Ding, Y., Wang, X., Ding, E., Song, M., Wen, S.: Dynamic instance normalization for arbitrary style transfer, CoRR, vol. abs/1911.06953 (2019)

  40. Li, Y., Yuan, Y.: Convergence analysis of two-layer neural networks with RELU activation. In: Proceedings of 31st Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, CA, United states, pp. 598–608 (2017)

  41. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of International Conference on Machine Learning, ICML, Sydney, NSW, Australia, pp. 298–321 (2017)

  42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Las Vegas, NV, United states, pp. 770–778 (2016)

  43. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of International Conference on Machine Learning, ICML, Lile, France, pp. 448–456 (2015)

  44. Zhang, X., Zou, Y., Shi, W.: Dilated convolution neural network with LeakyReLU for environmental sound classification. In: Proceedings of International Conference on Digital Signal Process DSP, London, UK (2017)

  45. Nazeri, K., Ng, E., Joseph, T., Qureshi, F.Z., Ebrahimi, M.: EdgeConnect: generative image inpainting with adversarial edge learning. CoRR, vol. abs/1901.00212 (2019)

  46. Song, Y.H., Yang, C., Shen, Y.J., Wang, P., Huang, Q., Jay Kuo, C.C.: SPG-Net: segmentation prediction and guidance network for image inpainting. In: Proceedings of the British Machine Vision Conference, BMVC, Newcastle, United Kingdom (2018)

  47. Canny, J.: Computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell., PAMI-8(6), 679–698 (1986)

  48. Coates, A., Ng, A.Y.: Learning feature representations with kmeans. Lect. Notes Comput. Sci. 7700 LECTURE NO, 561–580 (2012)

  49. Jin, Y., Zhang, J., Li, M., Tian, Y., Zhu, H., Fang, Z.: Towards the automatic anime characters creation with generative adversarial networks. CoRR, vol. abs/1708.05509 (2018)

  50. Bock, S., Goppold, J., Wei, M. (2018). An improvement of the convergence proof of the ADAM-optimizer. CoRR, vol. abs/1804.10587 (2018)

  51. Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Long Beach, CA, United states, pp. 2332–2341 (2019)

  52. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of 31st Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, CA, United states, pp. 6627–6638 (2017)

  53. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4), 600–612 (2004)

    Article  Google Scholar 

  54. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of IEEE Computer Society Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, United states, pp. 586–595 (2018)

  55. Ding, X., Wang, Y., Xu, Z.: CcGAN: continuous conditional generative adversarial networks for image generation, CoRR, vol. abs/2011.07466 (2020)

  56. Jiang, Y.F., Chang, S.Y., Wang, Z.Y.: TransGAN: two pure transformers can make one strong GAN, and that can scale up. CoRR, vol. abs/2102.07074 (2021)

  57. Liu, Y., Gu, C.: Two Greedy subspace Kaczmarz algorithm for image reconstruction. Int. J. Appl. Math. 50(4), 853–859 (2020)

    Google Scholar 

  58. Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. CoRR, vol. abs/1907.10830 (2019)

Download references

Funding

The work described in this paper was funded by 2021 Ministry of Education New Liberal Arts Research and Reform Practice Project "The Construction of Interdisciplinary Innovation and Entrepreneurship Practice Teaching System Based on Design Empowerment" (2021160043) and 2021 Hubei Provincial Department of Education Philosophy and Social Science Youth Project: Research on Chinese Ancient Porcelain Restoration Based on Virtual Simulation Technology (21Q072). Any conclusions or recommendations stated here are those of the authors and do not necessarily reflect the official position of the sponsors (Grant Nos. HBCIR2020Z005 and HBCY1914).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aihua Ke.

Ethics declarations

Conflict of interest

Jian Rao declares that he has no conflict of interest. Aihua Ke declares that she has no conflict of interest. Gang Liu declares that he has no conflict of interest. Yue Ming declares that she has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, J., Ke, A., Liu, G. et al. MS-GAN: multi-scale GAN with parallel class activation maps for image reconstruction. Vis Comput 39, 2111–2126 (2023). https://doi.org/10.1007/s00371-022-02468-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02468-4

Keywords

Navigation