[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Pyramid style-attentional network for arbitrary style transfer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

At present, the self-attention mechanism represented by the non-local network has been applied in style transfer widely. Models can achieve good style transfer effects by considering long-range dependencies between content images and style images while well maintaining semantic content information. However, the self-attention mechanism has to calculate the relationship between all positions between the content feature maps and style feature maps. The associated computational complexity of the mechanism is rather high, which will consume a lot of computing resources and adversely impact the efficiency of style transfer of high-resolution images. To solve this problem, we propose a novel Pyramid Style-attentional Network (PSANet) to reduce the computational complexity of the self-attention network by using pyramid pooling on feature maps. We compare our method with the vanilla style-attentional network in terms of speed and quality. The experimental results show that our model can significantly reduce the computational complexity and achieve good transfer effects. Especially for handling high-resolution images, the execution time of our method can reduce by \(34.7\%\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Chen D, Yuan L, Liao J, Yu N, Hua G (2021) Explicit filterbank learning for neural image style transfer and image processing. IEEE transactions on pattern analysis and machine intelligence 43(7):2373–2387. https://doi.org/10.1109/tpami.2020.2964205

    Article  Google Scholar 

  2. Chen TQ, Schmidt M (2016) Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337

  3. Deng Y, Tang F, Dong W, Sun W, Huang F, Xu C (2020) Arbitrary style transfer via multi-adaptation network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 2719–2727. https://doi.org/10.1145/3394171.3414015

  4. Dumoulin V, Shlens J, Kudlur M (2016) A learned representation for artistic style. arXiv preprint arXiv:1610.07629

  5. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2414–2423. https://doi.org/10.1109/CVPR.2016.265

  6. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/cvpr.2016.90

  8. Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1501–1510. https://doi.org/10.1109/ICCV.2017.167

  9. Jing Y, Liu X, Ding Y, Wang X, Ding E, Song M, Wen S (2020) Dynamic instance normalization for arbitrary style transfer. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 4369–4376. https://doi.org/10.1609/aaai.v34i04.5862

  10. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp 694–711. https://doi.org/10.1007/978-3-319-46475-6_43

  11. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  12. Li C, Wand M (2016) Combining markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2479–2486. https://doi.org/10.1109/CVPR.2016.272

  13. Li X, Liu S, Kautz J, Yang M-H (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3809–3817.https://doi.org/10.1109/CVPR.2019.00393

  14. Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M-H (2017) Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 385–395

  15. Li Z, Zhou F, Yang L, Li X, Li J (2020) Accelerate neural style transfer with super-resolution. Multimedia Tools and Applications 79(7):4347–4364. https://doi.org/10.1007/s11042-018-6929-x

    Article  Google Scholar 

  16. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  17. Lin T, Ma Z, Li F, He D, Li X, Ding E, Wang N, Li J, Gao X (2021) Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5141–5150. https://doi.org/10.1109/cvpr46437.2021.00510

  18. Liu S, Lin T, He D, Li F, Deng R, Li X, Ding E, Wang H (2021) Paint transformer: Feed forward neural painting with stroke prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6598–6607. https://doi.org/10.1109/iccv48922.2021.00653

  19. Liu S, Lin T, He D, Li F, Wang M, Li X, Sun Z, Li Q, Ding E (2021) Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6649–6658. https://doi.org/10.1109/iccv48922.2021.00658

  20. Park DY, Lee KH (2019) Arbitrary style transfer with style-attentional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5880–5888. https://doi.org/10.1109/CVPR.2019.00603

  21. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32:8026–8037

    Google Scholar 

  22. Phillips F, Mackintosh B (2011) Wiki art gallery, inc.: A case for critical thinking. Issues in Accounting Education 26(3):593–608. https://doi.org/10.2308/iace-50038

  23. Sheng L, Lin Z, Shao J, Wang X (2018) Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8242–8250. https://doi.org/10.1109/CVPR.2018.00860

  24. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  25. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008

  26. Virtusio JJ, Ople JJM, Tan DS, Tanveer M, Kumar N, Hua K-L (2021) Neural style palette: A multimodal and interactive style transfer from a single style image. IEEE Transactions on Multimedia 23:2245–2258. https://doi.org/10.1109/tmm.2021.3087026

    Article  Google Scholar 

  27. Wang P, Li Y, Vasconcelos N (2021) Rethinking and improving the robustness of image style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 124–133. https://doi.org/10.1109/cvpr46437.2021.00019

  28. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803. https://doi.org/10.1109/CVPR.2018.00813

  29. Wang Z, Zhao L, Lin S, Mo Q, Zhang H, Xing W, Lu D (2020) Glstylenet: exquisite style transfer combining global and local pyramid features. IET Computer Vision 14(8):575–586. https://doi.org/10.1049/iet-cvi.2019.0844

    Article  Google Scholar 

  30. Wang Z, Zhao L, Chen H, Qiu L, Mo Q, Lin S, Xing W, Lu D (2020) Diversified arbitrary style transfer via deep feature perturbation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7789–7798. https://doi.org/10.1109/CVPR42600.2020.00781

  31. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1

  32. Wu H, Sun Z, Zhang Y, Li Q (2019) Direction-aware neural style transfer with texture enhancement. Neurocomputing 370:39–55. https://doi.org/10.1016/j.neucom.2019.08.075

    Article  Google Scholar 

  33. Wu X, Hu Z, Sheng L, Xu D (2021) Styleformer: Real-time arbitrary style transfer via parametric style composition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 14618–14627. https://doi.org/10.1109/iccv48922.2021.01435

  34. Yao Y, Ren J, Xie X, Liu W, Liu Y-J, Wang J (2019) Attention-aware multi-stroke style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1467–1475. https://doi.org/10.1109/CVPR.2019.00156

  35. Yu B, Yang L, Chen F (2018) Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11(9):3252–3261. https://doi.org/10.1109/JSTARS.2018.2860989

    Article  Google Scholar 

  36. Yu C, Liu Y, Gao C, Shen C, Sang N (2020) Representative graph neural network. In: European Conference on Computer Vision, pp 379–396. https://doi.org/10.1007/978-3-030-58571-6_23

  37. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2021) Restormer: Efficient transformer for high-resolution image restoration. arXiv preprint arXiv:2111.09881

  38. Zhang Y, Fang C, Wang Y, Wang Z, Lin Z, Fu Y, Yang J (2019) Multimodal style transfer via graph cuts. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5943–5951. https://doi.org/10.1109/ICCV.2019.00604

  39. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2881–2890. https://doi.org/10.1109/CVPR.2017.660

  40. Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 593–602. https://doi.org/10.1109/ICCV.2019.00068

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Anhui Province of China under Grant No. 2008085MF220.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianjin Fang.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Gaoming Yang and Shicheng Zhang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, G., Zhang, S., Fang, X. et al. Pyramid style-attentional network for arbitrary style transfer. Multimed Tools Appl 83, 13483–13502 (2024). https://doi.org/10.1007/s11042-023-15650-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15650-0

Keywords