Abstract
Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frèchet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ankanbhunia/doodleformer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Additional details and results are provided in supplementary material.
References
Lifull home’s dataset. https://www.nii.ac.jp/dsc/idr/lifull. Accessed 30 Sept 2010
Abu-Aisheh, Z., Raveaux, R., Ramel, J.Y., Martineau, P.: An exact graph edit distance algorithm for solving pattern recognition problems. In: 4th International Conference on Pattern Recognition Applications and Methods 2015 (2015)
Balasubramanian, S., Balasubramanian, V.N., et al.: Teaching GANs to sketch in vector format. arXiv preprint arXiv:1904.03620 (2019)
Bishop, C.M.: Mixture Density Networks. Aston University (1994)
Cao, N., Yan, X., Shi, Y., Chen, C.: AI-sketcher: a deep generative model for producing high-quality sketches. In: AAAI (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, Y., Tu, S., Yi, Y., Xu, L.: Sketch-pix2Seq: a model to generate sketches of multiple categories. arXiv preprint arXiv:1709.04121 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Ganin, Y., Kulkarni, T., Babuschkin, I., Eslami, S.A., Vinyals, O.: Synthesizing programs for images using reinforced adversarial learning. In: ICML (2018)
Ge, S., Goswami, V., Zitnick, C.L., Parikh, D.: Creative sketch generation. In: ICLR (2021)
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Ha, D., Eck, D.: A neural representation of sketch drawings. In: ICLR (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
He, S., et al.: Context-aware layout to image generation with enhanced object appearance. In: CVPR (2021)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NeurIPS (2017)
Hinton, G.E., Nair, V.: Inferring motor programs from images of handwritten digits. In: NeurIPS (2006)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Li, Y., Song, Y.Z., Hospedales, T.M., Gong, S.: Free-hand sketch synthesis with deformable stroke models. In: IJCV (2017)
Lin, H., Fu, Y., Xue, X., Jiang, Y.G.: Sketch-BERT: learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. In: CVPR (2020)
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: ICCV (2021)
Liu, F., Deng, X., Lai, Y.K., Liu, Y.J., Ma, C., Wang, H.: SketchGAN: joint sketch completion and recognition with GAN. In: CVPR (2019)
Nauata, N., Chang, K.-H., Cheng, C.-Y., Mori, G., Furukawa, Y.: House-GAN: relational generative adversarial networks for graph-constrained house layout generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 162–177. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_10
Qi, Y., Su, G., Chowdhury, P.N., Li, M., Song, Y.Z.: SketchLattice: latticed representation for sketch manipulation. In: ICCV (2021)
Ramasinghe, S., Farazi, M., Khan, S., Barnes, N., Gould, S.: Rethinking conditional GAN training: an approach using geometrically structured latent manifolds. In: NeurIPS (2021)
Ribeiro, L.S.F., Bui, T., Collomosse, J., Ponti, M.: SketchFormer: transformer-based representation for sketched structure. In: CVPR (2020)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NeurIPS (2015)
Su, G., Qi, Y., Pang, K., Yang, J., Song, Y.Z.: SketchHealer: a graph-to-sequence network for recreating partial human sketches. In: BMVC (2020)
Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: ICCV (2019)
Sun, W., Wu, T.: Learning layout and style reconfigurable GANs for controllable image synthesis. PAMI (2021)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Viazovetskyi, Y., Ivashkin, V., Kashin, E.: StyleGAN2 distillation for feed-forward image manipulation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 170–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_11
Xu, P., Hospedales, T.M., Yin, Q., Song, Y.Z., Xiang, T., Wang, L.: Deep learning for free-hand sketch: a survey and a toolbox. arXiv preprint arXiv:2001.02600 (2020)
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR (2018)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
Zheng, N., Jiang, Y., Huang, D.: StrokeNet: a neural painting environment. In: ICLR (2018)
Zhou, T., et al.: Learning to doodle with stroke demonstrations and deep q-networks. In: BMVC (2018)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bhunia, A.K. et al. (2022). DoodleFormer: Creative Sketch Drawing with Transformers. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13677. Springer, Cham. https://doi.org/10.1007/978-3-031-19790-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-19790-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19789-5
Online ISBN: 978-3-031-19790-1
eBook Packages: Computer ScienceComputer Science (R0)