Abstract
With the rapid development of cartoon industry, various studies on two-dimensional (2D) cartoon have been proposed for different application scenarios, such as quality assessment, style transfer, colorization, detection, compression, generation and editing. However, there is still a lack of literature to summarize and introduce these 2D cartoon image processing (CIP) works comprehensively. The cartoon images are usually composed of clear lines, smooth color patches and flat backgrounds, which are quite different from natural images. Therefore, based on the characteristics of cartoons, many specific CIP strategies are proposed. Especially with the development of deep learning technology, recent CIP methods have achieved better results than direct application of natural image processing algorithms. Thus, this paper reviews the commonalities and differences of 2D CIP methods according to different scenarios and applications, and focuses on recent deep-learning-based algorithms specifically. In addition, this paper also collects related CIP datasets, conducts experiments for some typical tasks, and discusses the future work.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aizawa, M., Sei, Y., Tahara, Y., Orihara, R., & Ohsuga, A. (2019). Do you like sclera? Sclera-region detection and colorization for anime character line drawings. International Journal of Networked and Distributed Computing, 7(3), 113–120.
Akita, K., Morimoto, Y., & Tsuruno, R. (2019). Fully automatic colorization for anime character considering accurate eye colors. In: ACM SIGGRAPH 2019 Posters (pp. 1–2).
Akita, K., Morimoto, Y., & Tsuruno, R. (2020). Deep-eyes: Fully automatic anime character colorization with painting of details on empty pupils. Eurographics 2020-Short Papers 2.
Andersson, F., & Arvidsson, S. (2020). Generative adversarial networks for photo to hayao miyazaki style cartoons. Preprint arXiv:2005.07702.
Aneja, D., & Li, W. (2019). Real-time lip sync for live 2d animation. Preprint arXiv:1910.08685.
Anime4k (2019). https://github.com/bloc97/Anime4K.
Anonymous, community, D., & Branwen, G. (2021). Danbooru2020: A large-scale crowdsourced and tagged anime illustration dataset. https://www.gwern.net/Danbooru2020.
Augereau, O., Matsubara, M., & Kise, K. (2016). Comic visualization on smartphones based on eye tracking. In Proceedings of the 1st International Workshop on coMics ANalysis, Processing and Understanding (pp. 1–4).
Augereau, O., Iwata, M., & Kise, K. (2018). A survey of comics research in computer science. Journal of Imaging, 4(7), 87.
Bahng, H., Yoo, S., Cho, W., Park, D.K., Wu, Z., Ma, X., & Choo, J. (2018). Coloring with words: Guiding image colorization through text-based palette generation. In Proceedings of the European conference on computer vision (eccv) (pp. 431–447).
Bilen, H., & Vedaldi, A. (2016). Weakly supervised deep detection networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2846–2854).
Bonneel, N., Tompkin, J., Sunkavalli, K., Sun, D., Paris, S., & Pfister, H. (2015). Blind video temporal consistency. ACM Transactions on Graphics (TOG), 34(6), 1–9.
Boser, B.E., Guyon, I.M., & Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144–152).
Brennan, S. E. (2007). Caricature generator: The dynamic exaggeration of faces by computer. Leonardo, 40(4), 392–400.
Bryandlee (2021). https://github.com/bryandlee.
Cao, K., Liao, J., & Yuan, L. (2018). Carigans: Unpaired photo-to-caricature translation. Preprint arXiv:1811.00222.
Chainer-dcgan (2015). https://github.com/mattya/chainer-DCGAN.
Chaudhari, S., Polatkan, G., Ramanath, R., & Mithal, V. (2019). An attentive survey of attention models. Preprint arXiv:1904.02874.
Chen, X., & Gupta, A. (2015). Webly supervised learning of convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 1431–1439).
Chen, Y., Chen, M., Song, C., & Ni, B. (2020). Cartoonrenderer: An instance-based multi-style cartoon image translator. In International conference on multimedia modeling, Springer (pp. 176–187).
Chen, Y., Lai, Y.K., & Liu, Y.J. (2018b). Cartoongan: Generative adversarial networks for photo cartoonization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9465–9474).
Chen, J., Liu, G., & Chen, X. (2019a). Animegan: A novel lightweight gan for photo animation. In International symposium on intelligence computation and applications, Springer (pp. 242–256).
Chen, J., Shen, Y., Gao, J., Liu, J., & Liu, X. (2018a). Language-based image editing with recurrent attentive models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8721–8729).
Chen, H., Zheng, N.N., Liang, L., Li, Y., Xu, Y.Q., & Shum, H.Y. (2002). Pictoon: a personalized image-based cartoon system. In Proceedings of the tenth ACM international conference on Multimedia (pp. 171–178).
Chen, H., Chai, X., Shao, F., Wang, X., Jiang, Q., Chao, M., & Ho, Y. S. (2021). Perceptual quality assessment of cartoon images. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2021.3121875.
Cheng, Z., Meng, F., & Mao, J. (2019). Semi-auto sketch colorization based on conditional generative adversarial networks. In 2019 12th international congress on image and signal processing. IEEE: BioMedical Engineering and Informatics (CISP-BMEI), (pp. 1–5).
Cheng, M. M., Zheng, S., Lin, W. Y., Vineet, V., Sturgess, P., Crook, N., et al. (2014). Imagespirit: Verbal guided image parsing. ACM Transactions on Graphics (TOG), 34(1), 1–11.
Chen, Y., Zhao, Y., Cao, L., Jia, W., & Liu, X. (2021). Learning deep blind quality assessment for cartoon images. IEEE Transactions on Neural Networks and Learning Systems, 1, 8519–8534.
Chen, Y., Zhao, Y., Li, S., Zuo, W., Jia, W., & Liu, X. (2019). Blind quality assessment for cartoon images. IEEE Transactions on Circuits and Systems for Video Technology, 30(9), 3282–3288.
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8789–8797).
Chu, W.T., & Li, W.W. (2017). Manga facenet: Face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval (pp. 412–415).
Chu, W., Hung, W.C., Tsai, Y.H., Cai, D., & Yang, M.H. (2019). Weakly-supervised caricature face parsing through domain adaptation. In 2019 IEEE international conference on image processing (ICIP), IEEE (pp. 3282–3286).
Ci, Y., Ma, X., Wang, Z., Li, H., & Luo, Z. (2018). User-guided deep anime line art colorization with conditional adversarial networks. In Proceedings of the 26th ACM international conference on Multimedia pp. (1536–1544).
Cohn, N., Taylor, R., & Pederson, K. (2017). A picture is worth more words over time: Multimodality and narrative structure across eight decades of American superhero comics. Multimodal Communication, 6(1), 19–37.
Dong, C., Loy, C. C., He, K., & Tang, X. (2015). Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295–307.
Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O., Lamb, A., Arjovsky, M., & Courville, A. (2016). Adversarially learned inference. Preprint arXiv:1606.00704
Dunst, A., Hartel, R., & Laubrock, J. (2017). The graphic narrative corpus (gnc): design, annotation, and analysis for the digital humanities. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE (Vol. 3, pp. 15–20).
Edwards, P., Landreth, C., Fiume, E., & Singh, K. (2016). Jali: An animator-centric viseme model for expressive lip synchronization. ACM Transactions on Graphics (TOG), 35(4), 1–11.
Efros, A.A., & Freeman, W.T. (2001). Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on computer graphics and interactive techniques (pp. 341–346).
Favreau, J. D., Lafarge, F., & Bousseau, A. (2016). Fidelity vs. simplicity: a global approach to line drawing vectorization. ACM Transactions on Graphics (TOG), 35(4), 1–10.
Fišer, J., Asente, P., & Sỳkora, D. (2015). Shipshape: a drawing beautification assistant. In: Proceedings of the workshop on Sketch-Based Interfaces and Modeling (pp. 49–57).
Frans, K. (2017). Outline colorization through tandem adversarial networks. Preprint arXiv:1704.08834
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of statistics, 1189–1232.
Fujimoto, A., Ogawa, T., Yamamoto, K., Matsui, Y., Yamasaki, T., & Aizawa, K. (2016). Manga109 dataset and creation of metadata. In Proceedings of the 1st international workshop on comics analysis, processing and understanding (pp. 1–5).
Furukawa, S., Fukusato, T., Yamaguchi, S., & Morishima, S. (2017). Voice animator: Automatic lip-synching in limited animation by audio. In International conference on advances in computer entertainment, Springer (pp. 153–171).
Furusawa, C., Hiroshiba, K., Ogaki, K., & Odagiri, Y. (2017). Comicolorization: semi-automatic manga colorization. In SIGGRAPH Asia 2017 Technical Briefs (pp. 1–4).
Gatys, L.A., Ecker, A.S., & Bethge, M. (2015). A neural algorithm of artistic style. Preprint arXiv:1508.06576.
Gatys, L.A., Ecker, A.S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414–2423).
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).
Girshick, R, Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Gong J, Hold-Geoffroy, Y., & Lu, J. (2020). Autotoon: Automatic geometric warping for face cartoon generation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 360–369).
Gooch, B., Reinhard, E., & Gooch, A. (2004). Human facial illustrations: Creation and psychophysical evaluation. ACM Transactions on Graphics (TOG), 23(1), 27–44.
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. Preprint arXiv:1406.2661.
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723–773.
Grimm, C., & Joshi, P. (2012). Just draw it! a 3d sketching system.
Gu, Z., Dong, C., Huo, J., Li, W., & Gao, Y. (2021). Carime: Unpaired caricature generation with multiple exaggerations. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2021.3086722.
Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., & Revel, A. (2013). ebdtheque: a representative database of comics. In 2013 12th international conference on document analysis and recognition, IEEE (pp. 1145–1149).
Gupta, T., Schwenk, D., Farhadi, A., Hoiem, D., & Kembhavi, A. (2018). Imagine this! scripts to compositions to videos. In Proceedings of the European conference on computer vision (ECCV) (pp. 598–613).
Han, X., Hou, K., Du, D., Qiu, Y., Cui, S., Zhou, K., & Yu, Y. (2018). Caricatureshop: Personalized and photorealistic caricature sketching. IEEE transactions on visualization and computer graphics, 26(7), 2349–2361.
Hanser, E., Mc Kevitt, P., Lunney, T., & Condell, J. (2009). Scenemaker: Intelligent multimodal visualisation of natural language scripts. In: Irish conference on artificial intelligence and cognitive science, Springer (pp. 144–153).
Hati, Y., Jouet, G., Rousseaux, F., & Duhart, C. (2019). Paintstorch: a user-guided anime line art colorization tool with double generator conditional adversarial network. In European conference on visual media production (pp. 1–10).
Hensman, P., & Aizawa, K. (2017). cgan-based manga colorization using a single training image. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE (Vol. 3, pp. 72–77).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st international conference on neural information processing systems, Curran Associates Inc., NIPS’17 (p. 6629-6640).
Hicsonmez, S., Samet, N., Akbas, E., & Duygulu, P. (2020). Ganilla: Generative adversarial networks for image to illustration translation. Image and Vision Computing, 95, 103886.
Hoffman, J., Tzeng, E., Park, T., Zhu, J.Y., Isola, P., Saenko, K., Efros, A., & Darrell, T. (2018). Cycada: Cycle-consistent adversarial domain adaptation. In International conference on machine learning, PMLR (pp. 1989–1998).
Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision (pp. 1501–1510).
Huang, J., Liao, J., & Kwong, S. (2021). Semantic example guided image-to-image translation. IEEE Transactions on Multimedia, 23, 1654–1665.
Huang, J., Liao, J., Tan, Z., & Kwong, S. (2020). Multi-density sketch-to-image translation network. Preprint arXiv:2006.10649.
Huang, X., Liu, M.Y., Belongie, S., & Kautz, J. (2018b). Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV) (pp. 172–189).
Huang, J., Tan, M., Yan, Y., Qing, C., Wu, Q., & Yu, Z. (2018a). Cartoon-to-photo facial translation with generative adversarial networks. In Asian conference on machine learning, PMLR (pp. 566–581).
Huang, H., Wang, H., Luo, W., Ma, L., Jiang, W., Zhu, X., Li, Z., & Liu, W. (2017). Real-time neural style transfer for videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 783–791).
Huo, J., Li, W., Shi, Y., Gao, Y., & Yin, H. (2017). Webcaricature: a benchmark for caricature recognition. Preprint arXiv:1703.03230.
Ikuta, H., Ogaki, K., & Odagiri, Y. (2016). Blending texture features from multiple reference images for style transfer. In SIGGRAPH ASIA 2016 technical briefs (pp. 1–4).
Illustrationgan (2016). https://github.com/tdrussell/IllustrationGAN.
Inoue, N., Furuta, R., Yamasaki, T., & Aizawa, K. (2018). Cross-domain weakly-supervised object detection through progressive domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5001–5009).
Isola, P., Zhu, J.Y., Zhou, T., & Efros, A.A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125–1134).
Ito, K., Matsui, Y., Yamasaki, T., & Aizawa, K. (2015). Separation of manga line drawings and screentones. In Eurographics (Short Papers) (pp. 73–76).
Iyyer, M., Manjunatha, V., Guha, A., Vyas, Y., Boyd-Graber, J., Daume, H., & Davis, L.S. (2017). The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7186–7195).
Jampani, V., Gadde, R., & Gehler, P.V. (2017). Video propagation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 451–461).
Jang, W., Ju, G., Jung, Y., Yang, J., Tong, X., & Lee, S. (2021). Stylecarigan: Caricature generation via stylegan feature map modulation. ACM Transactions on Graphics (TOG), 40(4), 1–16.
Jeromel, A., & Žalik, B. (2020). An efficient lossy cartoon image compression method. Multimedia Tools and Applications, 79(1), 433–451.
Jha, S., Agarwal, N., & Agarwal, S. (2018a). Bringing cartoons to life: Towards improved cartoon face detection and recognition systems. Preprint arXiv:1804.01753.
Jha, S., Agarwal, N., & Agarwal, S. (2018b). Towards improved cartoon face detection and recognition systems. Preprint arXiv:1804.01753
Jin, Y., Zhang, J., Li, M., Tian, Y., Zhu, H., & Fang, Z. (2017). Towards the automatic anime characters creation with generative adversarial networks. Preprint arXiv:1708.05509.
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, Springer (pp. 694–711).
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. Preprint arXiv:1710.10196.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401–4410).
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110–8119).
Kataoka, Y., Matsubara, T., & Uehara, K. (2017). Automatic manga colorization with color style by generative adversarial nets. In 2017 18th IEEE/ACIS International conference on software engineering (pp. 495–499). Networking and Parallel/Distributed Computing (SNPD), IEEE: Artificial Intelligence.
Kim, T., Cha, M., Kim, H., Lee, J.K., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In International conference on machine learning, PMLR (pp. 1857–1865).
Kim, H., Jhoo, H.Y., Park, E., & Yoo, S. (2019a). Tag2pix: Line art colorization using text tag with secat and changing loss. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9056–9065).
Kim, J., Kim, M., Kang, H., & Lee, K. (2019b). U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. Preprint arXiv:1907.10830.
Kingma, D.P., & Welling, M. (2013). Auto-encoding variational bayes. Preprint arXiv:1312.6114.
Kliegl, R., & Laubrock, J. (2017). Eye-movement tracking during reading. In Research methods in psycholinguistics and the neurobiology of language: A practical guide (pp. 68–88). Wiley-Blackwell.
Kodali, N., Abernethy, J., Hays, J., & Kira, Z. (2017). How to train your DRAGAN. Preprint arXiv:1705.07215.
Kopf, J., & Lischinski, D. (2012). Digital reconstruction of halftoned color comics. ACM Transactions on Graphics (TOG), 31(6), 1–10.
Kowalski, M., Naruniec, J., & Trzcinski, T. (2017). Deep alignment network: A convolutional neural network for robust face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 88–97).
Laubrock, J., & Dunst, A. (2020). Computational approaches to comics analysis. Topics in cognitive science, 12(1), 274–310.
Lazarou, C. (2020). Autoencoding generative adversarial networks. Preprint arXiv:2004.05472.
Le, N.K.H., Why, Y.P., & Ashraf, G. (2011). Shape stylized face caricatures. In International conference on multimedia modeling, Springer (pp. 536–547).
Lee, Y., & Park, J. (2020). Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13906–13915).
Lee, Y., Hwang, J.w., Lee, S., Bae, Y., & Park, J. (2019b). An energy and gpu-computation efficient backbone network for real-time object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 752–760).
Lee, J., Kim, E., Lee, Y., Kim, D., Chang, J., & Choo, J. (2020). Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5801–5810).
Lee, G., Kim, D., Yoo, Y., Han, D., Ha, J.W., & Chang, J. (2019a). Unpaired sketch-to-line translation via synthesis of sketches. In SIGGRAPH Asia 2019 technical briefs (pp. 45–48).
Lee, Y. J., Zitnick, C. L., & Cohen, M. F. (2011). Shadowdraw: Real-time user guidance for freehand drawing. ACM Transactions on Graphics (TOG), 30(4), 1–10.
Lei, C., & Chen, Q., (2019). Fully automatic video colorization with self-regularization and diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3753–3761).
Li, J. (2018). Twin-gan–unpaired cross-domain image translation with weight-sharing gans. Preprint arXiv:1809.00946.
Li, H., & Han, T. (2019). Towards diverse anime face generation: Active label completion and style feature network. In Eurographics (Short Papers) (pp. 65–68).
Li, C., & Wand, M. (2016). Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2479–2486).
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.H. (2017b). Universal style transfer via feature transforms. Preprint arXiv:1705.08086.
Li, D., Huang, J.B., Li, Y., Wang, S., & Yang, M.H. (2016). Weakly supervised object localization with progressive domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3512–3520).
Li, Y., Liu, M.Y., Li, X., Yang, M.H., & Kautz, J. (2018). A closed-form solution to photorealistic image stylization. In Proceedings of the European conference on computer vision (ECCV) (pp. 453–468).
Li, Y., Wang, N., Liu, J., & Hou, X. (2017d). Demystifying neural style transfer. Preprint arXiv:1701.01036.
Li, X., Zhang, W., Shen, T., & Mei, T. (2019). Everyone is a cartoonist: Selfie cartoonization with attentive adversarial networks. In 2019 IEEE international conference on multimedia and expo (ICME), IEEE (pp. 652–657).
Li, B., Zhu, Y., Wang, Y., Lin, C.W., Ghanem, B., & Shen, L. (2021). Anigan: Style-guided generative adversarial networks for unsupervised anime face generation. Preprint arXiv:2102.12593.
Liang, L., Chen, H., Xu, Y.Q., & Shum, H.Y. (2002). Example-based caricature generation with exaggeration. In 10th Pacific conference on computer graphics and applications, 2002. Proceedings., IEEE (pp. 386–393).
Liang, X., Zhang, H., & Xing, E.P. (2017). Generative semantic manipulation with contrasting gan. Preprint arXiv:1708.00315.
Li, C., Liu, X., & Wong, T. T. (2017). Deep extraction of manga structural lines. ACM Transactions on Graphics (TOG), 36(4), 1–12.
Li, Y., Song, Y. Z., Hospedales, T. M., & Gong, S. (2017). Free-hand sketch synthesis with deformable stroke models. International Journal of Computer Vision, 122(1), 169–190.
Liu, G., Chen, X., & Hu, Y. (2018a). Anime sketch coloring with swish-gated residual u-net. In International symposium on intelligence computation and applications, Springer (pp. 190–204).
Liu, Z. Q., & Leung, K. M. (2006). Script visualization (scriptviz): a smart system that makes writing fun. Soft Computing, 10(1), 34–40.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.
Liu, Y., Qin, Z., Wan, T., & Luo, Z. (2018). Auto-painter: Cartoon image generation from sketch by using conditional wasserstein generative adversarial networks. Neurocomputing, 311, 78–87.
Liu, X., Wong, T. T., & Heng, P. A. (2015). Closure-aware sketch simplification. ACM Transactions on Graphics (TOG), 34(6), 1–10.
Li, S., Wen, Q., Zhao, S., Sun, Z., & He, S. (2020). Two-stage photograph cartoonization via line tracing. Computer Graphics Forum, Wiley Online Library, 39, 587–599.
Li, W., Xiong, W., Liao, H., Huo, J., Gao, Y., & Luo, J. (2020). Carigan: Caricature generation through weakly paired adversarial learning. Neural Networks, 132, 66–74.
Maejima, A., Kubo, H., Funatomi, T., Yotsukura, T., Nakamura, S., & Mukaigawa, Y. (2019). Graph matching based anime colorization with multiple references. In ACM SIGGRAPH 2019 Posters (pp. 1–2).
Mainberger, M., Bruhn, A., Weickert, J., & Forchhammer, S. (2011). Edge-based compression of cartoon-like images with homogeneous diffusion. Pattern Recognition, 44(9), 1859–1873.
malnyun_faces (2021). https://github.com/bryandlee/malnyun_faces.
Ma, M., & Mc Kevitt, P. (2006). Virtual human animation in natural language visualisation. Artificial Intelligence Review, 25(1), 37–53.
Mao, X., Liu, X., Wong, T. T., & Xu, X. (2015). Region-based structure line detection for cartoons. Computational Visual Media, 1(1), 69–78.
Mathews, J., & Nair, M. S. (2015). Adaptive block truncation coding technique using edge-based quantization approach. Computers & Electrical Engineering, 43, 169–179.
Mishra, A., Rai, S.N., Mishra, A., & Jawahar, C. (2016). Iiit-cfw: A benchmark database of cartoon faces in the wild. In European conference on computer vision, Springer (pp. 35–47).
Mo, S., Cho, M., & Shin, J. (2018). Instagan: Instance-aware image-to-image translation. Preprint arXiv:1812.10889.
Mo, S., Cho, M., & Shin, J. (2020). Freeze the discriminator: a simple baseline for fine-tuning GANs. Preprint arXiv:2002.10964.
naver-webtoon-faces (2021). https://github.com/bryandlee/naver-webtoon-faces.
Nizan, O., & Tal, A. (2020). Breaking the cycle-colleagues are all you need. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7860–7869).
Ni, Z., Zeng, H., Ma, L., Hou, J., Chen, J., & Ma, K. K. (2018). A Gabor feature-based quality assessment model for the screen content images. IEEE Transactions on Image Processing, 27(9), 4516–4528.
Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning, PMLR (pp. 2642–2651).
Park, T., Liu, M.Y., Wang, T.C., & Zhu, J.Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2337–2346).
Park, T., Zhu, J. Y., Wang, O., Lu, J., Shechtman, E., Efros, A., & Zhang, R. (2020). Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems, 33, 7198–7211.
Peng, C., Wang, N., Li, J., & Gao, X. (2020). Universal face photo-sketch style transfer via multiview domain translation. IEEE Transactions on Image Processing, 29, 8519–8534.
Pȩśko, M., Svystun, A., Andruszkiewicz, P., Rokita, P., & Trzciński, T. (2019). Comixify: Transform video into comics. Fundamenta Informaticae, 168(2–4), 311–333.
Pinkney, J.N., & Adler, D. (2020). Resolution dependent gan interpolation for controllable image synthesis between domains. Preprint arXiv:2010.05334.
Qian, Z., Bo, W., Wei, W., Hai, L., & Hui, L.J. (2020). Line art correlation matching network for automatic animation colorization. Preprint arXiv:2004.06718.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint arXiv:1511.06434.
Raj, Y. A., & Alli, P. (2019). Turtle edge encoding and flood fill based image compression scheme. Cluster Computing, 22(1), 361–377.
Real-cugan (2022). https://github.com/bilibili/ailab/tree/main/Real-CUGAN.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Preprint arXiv:1506.01497.
Ren, H., Li, J., & Gao, N. (2019). Two-stage sketch colorization with color parsing. IEEE Access, 8, 44599–44610.
Rosin, P.L., Wang, T., Winnemöller, H., Mould, D., Berger, I., Collomosse, J., Lai, Y.K., Li, C., Li, H., & Shamir, A., et al. (2017). Benchmarking non-photorealistic rendering of portraits.
Rosin, P., & Collomosse, J. (2012). Image and video-based artistic stylisation (Vol. 42). Berlin: Springer Science & Business Media.
Royer, A., Bousmalis, K., Gouws, S., Bertsch, F., Mosseri, I., Cole, F., & Murphy, K. (2020). Xgan: Unsupervised image-to-image translation for many-to-many mappings. In Domain Adaptation for Visual Understanding, Springer (pp. 33–49).
Ruder, M., Dosovitskiy, A., & Brox, T. (2016). Artistic style transfer for videos. In German conference on pattern recognition, Springer (pp. 26–36).
Saito, M., & Matsui, Y. (2015). Illustration2vec: a semantic vector representation of illustrations. In SIGGRAPH Asia 2015 Technical Briefs (pp. 1–4).
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Proceedings of the 30th international conference on neural information processing systems, Curran Associates Inc., NIPS’16 (p. 2234-2242).
Sanches, C.L., Augereau, O., & Kise, K. (2016). Manga content analysis using physiological signals. In Proceedings of the 1st international workshop on coMics ANalysis, Processing and Understanding (pp. 1–6).
Sato, K., Matsui, Y., Yamasaki, T., & Aizawa, K. (2014). Reference-based manga colorization by graph correspondence using quadratic programming. In SIGGRAPH Asia 2014 Technical Briefs (pp. 1–4).
Shen, W., Wang, X., Wang, Y., Bai, X., & Zhang, Z. (2015). Deepcontour: A deep convolutional feature learned by positive-sharing loss for contour detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3982–3991).
Shet, R.N., Lai, K.H., Edirisinghe, E.A., & Chung, P.W. (2005). Use of neural networks in automatic caricature generation: An approach based on drawing style capture.
Shi, Y., Deb, D., & Jain, A.K. (2019). Warpgan: Automatic caricature generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10762–10771).
Shi, M., Zhang, J.Q., Chen, S.Y., Gao, L., Lai, Y.K., & Zhang, F.L. (2020). Deep line art video colorization with a few references. Preprint arXiv:2003.10685.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Preprint arXiv:1409.1556.
Simo-Serra, E., Iizuka, S., & Ishikawa, H. (2018). Mastering sketching: Adversarial augmentation for structured prediction. ACM Transactions on Graphics (TOG), 37(1), 1–13.
Simo-Serra, E., Iizuka, S., & Ishikawa, H. (2018). Real-time data-driven interactive rough sketch inking. ACM Transactions on Graphics (TOG), 37(4), 1–14.
Simo-Serra, E., Iizuka, S., Sasaki, K., & Ishikawa, H. (2016). Learning to simplify: Fully convolutional networks for rough sketch cleanup. ACM Transactions on Graphics (TOG), 35(4), 1–11.
Siyao, L., Zhao, S., Yu, W., Sun, W., Metaxas, D., Loy, C.C., & Liu, Z. (2021). Deep animation video interpolation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6587–6595).
Sketchkeras (2017). https://github.com/lllyasviel/sketchKeras.
Song, G., Luo, L., Liu, J., Ma, W. C., Lai, C., Zheng, C., & Cham, T. J. (2021). Agilegan: Stylizing portraits by inversion-consistent transfer learning. ACM Transactions on Graphics (TOG), 40(4), 1–13.
Sonka, M., Hlavac, V., & Boyle, R. (2014). Image processing, analysis, and machine vision. Nelson Education.
Stricker, M., Augereau, O., Kise, K., & Iwata, M. (2018). Facial landmark detection for manga images. Preprint arXiv:1811.03214.
Su, H., Niu, J., Liu, X., Li, Q., Cui, J., & Wan, J. (2020). Unpaired photo-to-manga translation based on the methodology of manga drawing. Preprint arXiv:2004.10634.
Sultan, K.A., Jubair, M.I., Islam, M.N., & Khan, S.H. (2020). toon2real: Translating cartoon images to realistic images. In 2020 IEEE 32nd International conference on tools with artificial intelligence (ICTAI), IEEE (pp. 1175–1179).
Sultan, K., Rupty, L.K., Pranto, N.I., Shuvo, S.K., & Jubair, M.I. (2018). Cartoon-to-real: An approach to translate cartoon to realistic images using gan. Preprint arXiv:1811.11796.
Sultana, F., Sufian, A., & Dutta, P. (2019). A review of object detection models based on convolutional neural network. Preprint arXiv:1905.01614.
Sun, R., Huang, C., Shi, J., & Ma, L. (2018). Mask-aware photorealistic face attribute manipulation. Preprint arXiv:1804.08882.
Sun, L., Chen, P., Xiang, W., Chen, P., Wy, Gao, & Kj, Zhang. (2019). Smartpaint: A co-creative drawing system based on generative adversarial networks. Frontiers of Information Technology & Electronic Engineering, 20(12), 1644–1656.
Sỳkora, D., Buriánek, J., & Žára, J. (2004). Unsupervised colorization of black-and-white cartoons. In Proceedings of the 3rd international symposium on Non-photorealistic animation and rendering (pp. 121–127).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).
Takayama, K., Johan, H., & Nishita, T. (2012). Face detection and face recognition of cartoon characters using feature extraction. In Image, Electronics and Visual Computing Workshop (p. 48).
Tang, H., Liu, H., Xu, D., Torr, P.H., & Sebe, N. (2019). Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks. Preprint arXiv:1911.11897.
Taylor, T. (2011). Compression of cartoon images. PhD thesis, Case Western Reserve University.
Taylor, S., Kim, T., Yue, Y., Mahler, M., Krahe, J., Rodriguez, A. G., et al. (2017). A deep learning approach for generalized speech animation. ACM Transactions on Graphics (TOG), 36(4), 1–11.
Thasarathan, H., Nazeri, K., & Ebrahimi, M. (2019). Automatic temporally coherent video colorization. In 2019 16th conference on computer and robot vision (CRV), IEEE (pp. 189–194).
Tsai, Y.C., Lee, M.S., Shen, M., & Kuo, C.C.J. (2006). A quad-tree decomposition approach to cartoon image compression. In 2006 IEEE workshop on multimedia signal processing, IEEE (pp. 456–460).
Tseng, C.C., & Lien, J.J.J. (2007). Synthesis of exaggerative caricature with inter and intra correlations. In Asian conference on computer vision, Springer (pp. 314–323).
Tseng, H.Y., Fisher, M., Lu, J., Li, Y., Kim, V., & Yang, M.H. (2020). Modeling artistic workflows for image generation and editing. In European conference on computer vision, Springer (pp. 158–174).
Tsubota, K., Ikami, D., & Aizawa, K. (2019). Synthesis of screentone patterns of manga characters. In 2019 IEEE international symposium on multimedia (ISM), IEEE (pp. 212–215).
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7167–7176).
Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.
waifu2x (2018). https://github.com/nagadomi/waifu2x.
Wang, X., & Yu, J. (2020). Learning to cartoonize using white-box cartoon representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8090–8099).
Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., & Catanzaro, B. (2018b). High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8798–8807).
Wang, X., Oxholm, G., Zhang. D., & Wang, Y. F. (2017b). Multimodal transfer: A hierarchical deep convolutional neural network for fast artistic style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5239–5247).
Wang, L., Sindagi, V., & Patel, V. (2018a). High-quality facial photo-sketch synthesis using multi-adversarial networks. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE (pp. 83–90).
Wang, Z., Chen, J., & Hoi, S. C. (2020). Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3365–3387.
Wang, N., Gao, X., Sun, L., & Li, J. (2017). Bayesian face sketch synthesis. IEEE Transactions on Image Processing, 26(3), 1264–1274.
Wang, M., Hong, R., Yuan, X. T., Yan, S., & Chua, T. S. (2012). Movie2comics: Towards a lively video content presentation. IEEE Transactions on Multimedia, 14(3), 858–870.
Wang, N., Tao, D., Gao, X., Li, X., & Li, J. (2014). A comprehensive survey to face hallucination. International Journal of Computer Vision, 106(1), 9–30.
Wilber, M. J., Fang, C., Jin, H., Hertzmann, A., Collomosse, J., & Belongie, S. (2017). Bam! the behance artistic media dataset for recognition beyond photography. In Proceedings of the IEEE international conference on computer vision (pp. 1202–1211).
Wu, R., Gu, X., Tao, X., Shen, X., & Tai, Y. W., et al. (2019). Landmark assisted cyclegan for cartoon face generation. Preprint arXiv:1907.01424.
Xiang, S., & Li, H. (2018). Anime style space exploration using metric learning and generative adversarial networks. Preprint arXiv:1805.07997.
Xiang, S., & Li, H. (2019). Disentangling style and content in anime illustrations. Preprint arXiv:1905.10742.
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision (pp. 1395–1403).
Xie, J., Winnemöller, H., Li, W., & Schiller, S. (2017). Interactive vectorization. In Proceedings of the 2017 CHI conference on human factors in computing systems (pp. 6695–6705).
Xie, M., Li, C., Liu, X., & Wong, T. T. (2020). Manga filling style conversion with screentone variational autoencoder. ACM Transactions on Graphics (TOG), 39(6), 1–15.
Xin, Y., Wong, H. C., Lo, S. L., & Li, J. (2020). Progressive full data convolutional neural networks for line extraction from anime-style illustrations. Applied Sciences, 10(1), 41.
Yang, C., Kim, T., Wang, R., Peng, H., & Kuo, C. C. J. (2019). Show, attend, and translate: Unsupervised image translation with self-regularization and attention. IEEE Transactions on Image Processing, 28(10), 4845–4856.
Yang, X., Li, F., & Liu, H. (2019). A survey of dnn methods for blind image quality assessment. IEEE Access, 7, 123788–123806.
Yao, C. Y., Hung, S. H., Li, G. W., Chen, I. Y., Adhitya, R., & Lai, Y. C. (2016). Manga vectorization and manipulation with procedural simple screentone. IEEE Transactions on Visualization and Computer Graphics, 23(2), 1070–1084.
Yeh, R., Chen, C., Lim, T. Y., Hasegawa-Johnson, M., & Do, M. N. (2016). Semantic image inpainting with perceptual and contextual losses. Preprint arXiv:1607.07539 2(3).
Yi, R., Liu, Y. J., Lai, Y. K., & Rosin, P. L. (2019). Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10743–10752).
Yi, R., Liu, Y. J., Lai, Y. K., & Rosin, P. L. (2020a). Unpaired portrait drawing generation via asymmetric cycle mapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8217–8225).
Yi, Z., Zhang, H., Tan, P., & Gong, M. (2017). Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision (pp. 2849–2857).
Yi, R., Xia, M., Liu, Y. J., Lai, Y. K., & Rosin, P. L. (2020). Line drawings for face portraits from photos using global and local structure based gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3462–3475.
Yonetsuji, T. (2017). Paintschainer. github com/pfnet/Paintschainer.
You, S., You, N., & Pan, M. (2019). Pi-rec: Progressive image reconstruction network with edge and color domain. Preprint arXiv:1903.10146.
Youku video super-resolution and enhancement challenge(youku-vsre2019) (2021). [Online], Available: https://tianchi.aliyun.com/dataset/dataDetail?dataId=39568 dataset, 2019.
Yu, Z. Z. H. Z. Z., & Zheng, Z, G. B. (2017). Photo-to-caricature translation on faces in the wild.
Yu, Q., Yang, Y., Liu, F., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2017). Sketch-a-net: A deep neural network that beats humans. International Journal of Computer Vision, 122(3), 411–425.
Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019b). Self-attention generative adversarial networks. In International conference on machine learning, PMLR (pp. 7354–7363)
Zhang, B., He, M., Liao, J., Sander, P. V., Yuan, L., Bermak, A., & Chen, D. (2019a). Deep exemplar-based video colorization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8052–8061).
Zhang, L., JI, Y., & Liu, C. (2020b). Danbooregion: An illustration region dataset. In: European conference on computer vision (ECCV) (pp. 137–154).
Zhang, L., Ji, Y., Lin, X., & Liu, C. (2017). Style transfer for anime sketches with enhanced residual u-net and auxiliary classifier gan. In 2017 4th IAPR Asian conference on pattern recognition (ACPR), IEEE (pp. 506–511).
Zhang, L., Li, C., Simo-Serra, E., Ji, Y., Wong, T. T., & Liu, C. (2021a). User-guided line art flat filling with split filling mechanism. In IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9884–9893).
Zhang, B., Li, J., Wang, Y., Cui, Z., Xia, Y., Wang, C., Li, J., & Huang, F. (2020a). Acfd: Asymmetric cartoon face detector. Preprint arXiv:2007.00899.
Zhang, Y., Tsipidi, E., Schriber, S., Kapadia, M., Gross, M., & Modi, A. (2019c). Generating animations from screenplays. Preprint arXiv:1904.05440.
Zhang, L., Wang, X., Fan, Q., Ji, Y., & Liu, C. (2021b). Generating manga from illustrations via mimicking manga creation workflow. In IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5638–5647).
Zhang, L., Li, C., Wong, T. T., Ji, Y., & Liu, C. (2018). Two-stage sketch colorization. ACM Transactions on Graphics (TOG), 37(6), 1–14.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.
Zhao, Y., Wu, R., & Dong, H. (2020). Unpaired image-to-image translation using adversarial consistency loss. In European conference on computer vision, Springer (pp. 800–815).
Zhe-Lin, L., Qin-Xiang, X., Li-Jun, J., & Shi-Zi, W. (2009). Full color cartoon image lossless compression based on region segment. In 2009 WRI world congress on computer science and information engineering, IEEE (Vol. 6, pp. 545–548).
Zheng, Y., Zhao, Y., Ren, M., Yan, H., Lu, X., Liu, J., & Li, J. (2020). Cartoon face recognition: A benchmark dataset. In Proceedings of the 28th ACM international conference on multimedia (pp. 2264–2272).
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017a). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).
Zhu, M., Wang, N., Gao, X., Li, J., & Li, Z. (2019). Face photo-sketch synthesis via knowledge transfer. In IJCAI (pp. 1048–1054).
Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017b). Toward multimodal image-to-image translation. Preprint arXiv:1711.11586.
Zhu, M., Li, J., Wang, N., & Gao, X. (2021). Learning deep patch representation for probabilistic graphical model-based face sketch synthesis. International Journal of Computer Vision, 129(6), 1820–1836.
Zou, C., Mo, H., Du, R., Wu, X., Gao, C., & Fu, H. (2018). Lucss: Language-based user-customized colourization of scene sketches. Preprint arXiv:1808.10544.
Zou, C., Mo, H., Gao, C., Du, R., & Fu, H. (2019). Language-based colorization of scene sketches. ACM Transactions on Graphics (TOG), 38(6), 1–16.
Acknowledgements
This work is supported by the Key R &D and Transformation Program of Qinghai Province No. 2021-GX-111, the Fundamental Research Funds for the Central Universities No. JZ2022HGPA0309, the National Natural Science Foundation of China (Nos. 61972129, 62072013, 62076086) and Shenzhen Cultivation of Excellent Scientific and Technological Innovation Talents RCJC20200714114435057.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Jianfei Cai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Y., Ren, D., Chen, Y. et al. Cartoon Image Processing: A Survey. Int J Comput Vis 130, 2733–2769 (2022). https://doi.org/10.1007/s11263-022-01645-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01645-1