TeSTNeRF: Text-Driven 3D Style Transfer via Cross-Modal Learning

TeSTNeRF: Text-Driven 3D Style Transfer via Cross-Modal Learning

Jiafu Chen, Boyan Ji, Zhanjie Zhang, Tianyi Chu, Zhiwen Zuo, Lei Zhao, Wei Xing, Dongming Lu

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
AI and Arts. Pages 5788-5796. https://doi.org/10.24963/ijcai.2023/642

Text-driven 3D style transfer aims at stylizing a scene according to the text and generating arbitrary novel views with consistency. Simply combining image/video style transfer methods and novel view synthesis methods results in flickering when changing viewpoints, while existing 3D style transfer methods learn styles from images instead of texts. To address this problem, we for the first time design an efficient text-driven model for 3D style transfer, named TeSTNeRF, to stylize the scene using texts via cross-modal learning: we leverage an advanced text encoder to embed the texts in order to control 3D style transfer and align the input text and output stylized images in latent space. Furthermore, to obtain better visual results, we introduce style supervision, learning feature statistics from style images and utilizing 2D stylization results to rectify abrupt color spill. Extensive experiments demonstrate that TeSTNeRF significantly outperforms existing methods and provides a new way to guide 3D style transfer.
Keywords:
Application domains: Images and visual arts
Application domains: Other domains of art or creativity