Virtual clothing try-on, transferring a clothing image onto a target person image, is drawing industrial and research attention. Both 2D image-based and 3D model-based methods proposed recently have their benefits and limitations. Whereas 3D model-based methods provide realistic deformations of the clothing, it needs a difficult 3D model construction process and cannot handle the non-clothing areas well. Image-based deep neural network methods are good at generating disclosed human parts, retaining the unchanged area, and blending image parts, but cannot handle large deformation of clothing. In this paper, we propose CloTH-VTON that utilizes the high-quality image synthesis of 2D image-based methods and the 3D model-based deformation to the target human pose. For this 2D and 3D combination, we propose a novel 3D cloth reconstruction method from a single 2D cloth image, leveraging a 3D human body model, and transfer to the shape and pose of the target person. Our cloth reconstruction method can be easily applied to diverse cloth categories. Our method produces final try-on output with naturally deformed clothing and preserving details in high resolution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Pons-Moll, G., Pujades, S., Hu, S., Black, M.: Clothcap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 1–15 (2017)
Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-garment net: learning to dress 3D people from images. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2019)
Song, D., Li, T., Mao, Z., Liu, A.A.: Sp-viton: shape-preserving image-based virtual try-on network. Multimedia Tools Appl. 79, 1–13 (2019)
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., Yang, M.: Toward characteristic-preserving image-based virtual try-on network. In: The European Conference on Computer Vision (ECCV) (2018)
Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.K.: Cp-vton+: Clothing shape and texture preserving image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34, 1–16 (2015)
Mir, A., Alldieck, T., Pons-Moll, G.: Learning to transfer texture from clothing images to 3D humans. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Minar, M.R., Tuan, T.T., Ahn, H., Rosin, P., Lai, Y.K.: 3D reconstruction of clothes using a human body model and its application to image-based virtual try-on. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36 (2017)
Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8320–8329 (2018)
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: an image-based virtual try-on network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Yang, H., Zhang, R., Guo, X., Liu, W., Zuo, W., Luo, P.: Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)
Sun, F., Guo, J., Su, Z., Gao, C.: Image-based virtual try-on network with structural coherence. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 519–523 (2019)
Yu, R., Wang, X., Xie, X.: Vtnfp: an image-based virtual try-on network with body and clothing feature preservation. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Jandial, S., et al.: Sievenet: a unified framework for robust image-based virtual try-on. In: The IEEE Winter Conference on Applications of Computer Vision (WACV) (2020)
Han, X., Hu, X., Huang, W., Scott, M.R.: Clothflow: a flow-based model for clothed person generation. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Jae Lee, H., Lee, R., Kang, M., Cho, M., Park, G.: La-viton: a network for looking-attractive virtual try-on. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2019)
Kubo, S., Iwasawa, Y., Suzuki, M., Matsuo, Y.: Uvton: Uv mapping to consider the 3d structure of a human in image-based virtual try-on network. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2019)
Ayush, K., Jandial, S., Chopra, A., Krishnamurthy, B.: Powering virtual try-on via auxiliary human segmentation learning. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2019)
Yildirim, G., Jetchev, N., Vollgraf, R., Bergmann, U.: Generating high-resolution fashion model images wearing custom outfits. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2019)
Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Cheng, W.H.: Fit-me: Image-based virtual try-on with arbitrary poses. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 4694–4698. IEEE (2019)
Zheng, N., Song, X., Chen, Z., Hu, L., Cao, D., Nie, L.: Virtually trying on new clothing with arbitrary poses. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 266–274 (2019)
Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Liu, J., Cheng, W.H.: Fashionon: semantic-guided image-based virtual try-on with detailed human and clothing information. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 275–283 (2019)
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems, pp. 406–416 (2017)
Balakrishnan, G., Zhao, A., Dalca, A.V., Durand, F., Guttag, J.: Synthesizing images of humans in unseen poses. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Esser, P., Sutter, E., Ommer, B.: A variational U-Net for conditional appearance and shape generation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Siarohin, A., Sangineto, E., Lathuilière, S., Sebe, N.: Deformable gans for pose-based human image generation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Qian, X., et al.: Pose-normalized image generation for person re-identification. In: The European Conference on Computer Vision (ECCV) (2018)
Dong, H., Liang, X., Gong, K., Lai, H., Zhu, J., Yin, J.: Soft-gated warping-gan for pose-guided person image synthesis. In: Advances in Neural Information Processing Systems, vol. 31, pp. 474–484, Curran Associates, Inc., (2018)
Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., Bai, X.: Progressive pose attention transfer for person image generation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Song, S., Zhang, W., Liu, J., Mei, T.: Unsupervised person image generation with semantic parsing transformation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Raj, A., Sangkloy, P., Chang, H., Lu, J., Ceylan, D., Hays, J.: Swapnet: Garment transfer in single view images. In: The European Conference on Computer Vision (ECCV) (2018)
Neuberger, A., Borenstein, E., Hilleli, B., Oks, E., Alpert, S.: Image based virtual try-on network from unpaired data. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Zanfir, M., Popa, A.I., Zanfir, A., Sminchisescu, C.: Human appearance transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5391–5399 (2018)
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)
Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Jiang, W., Kolotouros, N., Pavlakos, G., Zhou, X., Daniilidis, K.: Coherent reconstruction of multiple humans from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5579–5588 (2020)
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5253–5263 (2020)
Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 84–93 (2020)
Li, Z., Yu, T., Pan, C., Zheng, Z., Liu, Y.: Robust 3D self-portraits in seconds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1344–1353 (2020)
Chibane, J., Alldieck, T., Pons-Moll, G.: Implicit functions in feature space for 3D shape reconstruction and completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6970–6981 (2020)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019)
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2shape: detailed full human body geometry from a single image. In: IEEE International Conference on Computer Vision (ICCV) (2019)
Weng, C.Y., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3D character animation from a single photo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5908–5917 (2019)
Natsume, R., Saito, S., Huang, Z., Chen, W., Ma, C., Li, H., Morishima, S.: Siclope: silhouette-based clothed people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4480–4490 (2019)
Lazova, V., Insafutdinov, E., Pons-Moll, G.: 360-degree textures of people in clothing from a single image. In: 2019 International Conference on 3D Vision (3DV), pp. 643–653. IEEE (2019)
Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Patel, C., Liao, Z., Pons-Moll, G.: Tailornet: predicting clothing in 3D as a function of human pose, shape and garment style. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Wang, T.Y., Ceylan, D., Popovic, J., Mitra, N.J.: Learning a shared shape space for multimodal garment design. ACM Trans. Graph. 37, 1:1–1:14 (2018)
Wang, Y., Shao, T., Fu, K., Mitra, N.: Learning an intrinsic garment space for interactive authoring of garment animation. ACM Trans. Graph. 38 (2019)
Lahner, Z., Cremers, D., Tung, T.: Deepwrinkles: accurate and realistic clothing modeling. In: The European Conference on Computer Vision (ECCV) (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24, 509–522 (2002)
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11, 567–585 (1989)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: The European Conference on Computer Vision (ECCV) (2018)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8026–8037, Curran Associates, Inc., (2019)
Acgpn. (https://github.com/switchablenorms/DeepFashion_Try_On)
SMPL. (https://smpl.is.tue.mpg.de/)
Smplify. (http://smplify.is.tue.mpg.de/)
Chumpy. (https://github.com/mattloper/chumpy)
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_11
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)
Salimans, T., et al.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, vol. 29, pp. 2234–2242, Curran Associates, Inc., (2016)
Nilsson, J.A.M.T.: Understanding SSIM. arXiv: 2006.13846 (2020)
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2018R1D1A1B07043879).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Minar, M.R., Ahn, H. (2021). CloTH-VTON: Clothing Three-Dimensional Reconstruction for Hybrid Image-Based Virtual Try-ON. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12627. Springer, Cham. https://doi.org/10.1007/978-3-030-69544-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-69544-6_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69543-9
Online ISBN: 978-3-030-69544-6
eBook Packages: Computer ScienceComputer Science (R0)