Abstract
At present, the self-attention mechanism represented by the non-local network has been applied in style transfer widely. Models can achieve good style transfer effects by considering long-range dependencies between content images and style images while well maintaining semantic content information. However, the self-attention mechanism has to calculate the relationship between all positions between the content feature maps and style feature maps. The associated computational complexity of the mechanism is rather high, which will consume a lot of computing resources and adversely impact the efficiency of style transfer of high-resolution images. To solve this problem, we propose a novel Pyramid Style-attentional Network (PSANet) to reduce the computational complexity of the self-attention network by using pyramid pooling on feature maps. We compare our method with the vanilla style-attentional network in terms of speed and quality. The experimental results show that our model can significantly reduce the computational complexity and achieve good transfer effects. Especially for handling high-resolution images, the execution time of our method can reduce by \(34.7\%\).
Similar content being viewed by others
References
Chen D, Yuan L, Liao J, Yu N, Hua G (2021) Explicit filterbank learning for neural image style transfer and image processing. IEEE transactions on pattern analysis and machine intelligence 43(7):2373–2387. https://doi.org/10.1109/tpami.2020.2964205
Chen TQ, Schmidt M (2016) Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337
Deng Y, Tang F, Dong W, Sun W, Huang F, Xu C (2020) Arbitrary style transfer via multi-adaptation network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 2719–2727. https://doi.org/10.1145/3394171.3414015
Dumoulin V, Shlens J, Kudlur M (2016) A learned representation for artistic style. arXiv preprint arXiv:1610.07629
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2414–2423. https://doi.org/10.1109/CVPR.2016.265
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9):1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778. https://doi.org/10.1109/cvpr.2016.90
Huang X, Belongie S (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1501–1510. https://doi.org/10.1109/ICCV.2017.167
Jing Y, Liu X, Ding Y, Wang X, Ding E, Song M, Wen S (2020) Dynamic instance normalization for arbitrary style transfer. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 4369–4376. https://doi.org/10.1609/aaai.v34i04.5862
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp 694–711. https://doi.org/10.1007/978-3-319-46475-6_43
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Li C, Wand M (2016) Combining markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2479–2486. https://doi.org/10.1109/CVPR.2016.272
Li X, Liu S, Kautz J, Yang M-H (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3809–3817.https://doi.org/10.1109/CVPR.2019.00393
Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M-H (2017) Universal style transfer via feature transforms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 385–395
Li Z, Zhou F, Yang L, Li X, Li J (2020) Accelerate neural style transfer with super-resolution. Multimedia Tools and Applications 79(7):4347–4364. https://doi.org/10.1007/s11042-018-6929-x
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Lin T, Ma Z, Li F, He D, Li X, Ding E, Wang N, Li J, Gao X (2021) Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5141–5150. https://doi.org/10.1109/cvpr46437.2021.00510
Liu S, Lin T, He D, Li F, Deng R, Li X, Ding E, Wang H (2021) Paint transformer: Feed forward neural painting with stroke prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6598–6607. https://doi.org/10.1109/iccv48922.2021.00653
Liu S, Lin T, He D, Li F, Wang M, Li X, Sun Z, Li Q, Ding E (2021) Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6649–6658. https://doi.org/10.1109/iccv48922.2021.00658
Park DY, Lee KH (2019) Arbitrary style transfer with style-attentional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5880–5888. https://doi.org/10.1109/CVPR.2019.00603
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32:8026–8037
Phillips F, Mackintosh B (2011) Wiki art gallery, inc.: A case for critical thinking. Issues in Accounting Education 26(3):593–608. https://doi.org/10.2308/iace-50038
Sheng L, Lin Z, Shao J, Wang X (2018) Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8242–8250. https://doi.org/10.1109/CVPR.2018.00860
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008
Virtusio JJ, Ople JJM, Tan DS, Tanveer M, Kumar N, Hua K-L (2021) Neural style palette: A multimodal and interactive style transfer from a single style image. IEEE Transactions on Multimedia 23:2245–2258. https://doi.org/10.1109/tmm.2021.3087026
Wang P, Li Y, Vasconcelos N (2021) Rethinking and improving the robustness of image style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 124–133. https://doi.org/10.1109/cvpr46437.2021.00019
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803. https://doi.org/10.1109/CVPR.2018.00813
Wang Z, Zhao L, Lin S, Mo Q, Zhang H, Xing W, Lu D (2020) Glstylenet: exquisite style transfer combining global and local pyramid features. IET Computer Vision 14(8):575–586. https://doi.org/10.1049/iet-cvi.2019.0844
Wang Z, Zhao L, Chen H, Qiu L, Mo Q, Lin S, Xing W, Lu D (2020) Diversified arbitrary style transfer via deep feature perturbation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7789–7798. https://doi.org/10.1109/CVPR42600.2020.00781
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Wu H, Sun Z, Zhang Y, Li Q (2019) Direction-aware neural style transfer with texture enhancement. Neurocomputing 370:39–55. https://doi.org/10.1016/j.neucom.2019.08.075
Wu X, Hu Z, Sheng L, Xu D (2021) Styleformer: Real-time arbitrary style transfer via parametric style composition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 14618–14627. https://doi.org/10.1109/iccv48922.2021.01435
Yao Y, Ren J, Xie X, Liu W, Liu Y-J, Wang J (2019) Attention-aware multi-stroke style transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1467–1475. https://doi.org/10.1109/CVPR.2019.00156
Yu B, Yang L, Chen F (2018) Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11(9):3252–3261. https://doi.org/10.1109/JSTARS.2018.2860989
Yu C, Liu Y, Gao C, Shen C, Sang N (2020) Representative graph neural network. In: European Conference on Computer Vision, pp 379–396. https://doi.org/10.1007/978-3-030-58571-6_23
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H (2021) Restormer: Efficient transformer for high-resolution image restoration. arXiv preprint arXiv:2111.09881
Zhang Y, Fang C, Wang Y, Wang Z, Lin Z, Fu Y, Yang J (2019) Multimodal style transfer via graph cuts. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 5943–5951. https://doi.org/10.1109/ICCV.2019.00604
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2881–2890. https://doi.org/10.1109/CVPR.2017.660
Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 593–602. https://doi.org/10.1109/ICCV.2019.00068
Acknowledgements
This work was supported by the Natural Science Foundation of Anhui Province of China under Grant No. 2008085MF220.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Gaoming Yang and Shicheng Zhang contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, G., Zhang, S., Fang, X. et al. Pyramid style-attentional network for arbitrary style transfer. Multimed Tools Appl 83, 13483–13502 (2024). https://doi.org/10.1007/s11042-023-15650-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15650-0