Abstract
Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. Hence, we frame image harmonization as an image-level regression problem to learn the arguments of the filters that humans use for the task. We present a Harmonizer framework for image harmonization. Unlike prior methods that are based on black-box autoencoders, Harmonizer contains a neural network for filter argument prediction and several white-box filters (based on the predicted arguments) for image harmonization. We also introduce a cascade regressor and a dynamic loss strategy for Harmonizer to learn filter arguments more stably and precisely. Since our network only outputs image-level arguments and the filters we used are efficient, Harmonizer is much lighter and faster than existing methods. Comprehensive experiments demonstrate that Harmonizer surpasses existing methods notably, especially with high-resolution inputs. Finally, we apply Harmonizer to video harmonization, which achieves consistent results across frames and 56 fps at 1080P resolution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afifi, M., Brown, M.S.: Deep white-balance editing. In: CVPR (2020)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE TPAMI 37(8), 1670–1687 (2014)
Bonneel, N., Sunkavalli, K., Paris, S., Pfister, H.: Example-based video color grading. ACM Trans. Graph. 32(4), 1–39 (2013)
Bonneel, N., Tompkin, J., Sunkavalli, K., Sun, D., Paris, S., Pfister, H.: Blind video temporal consistency. ACM Transa. Graph. 34(6), 1–9 (2015)
Bradley, R.A., Terry, M.E.: The rank analysis of incomplete block designs - I. The method of paired comparisons. Biometrika 39(3/4), 324–345 (1952)
Cohen-Or, D., Sorkine, O., Gal, R., Leyvand, T., Xu, Y.Q.: Color harmonization. ACM Trans. Graph. 25(3), 624–630 (2006)
Cong, W., Niu, L., Zhang, J., Liang, J., Zhang, L.: Bargainnet: background-guided domain translation for image harmonization. In: ICME (2021)
Cong, W., et al.: Dovenet: deep image harmonization via domain verification. In: CVPR (2020)
Cun, X., Pun, C.M.: Improving the harmony of the composite image by spatial-separated attention module. IEEE Trans. Image Process 29, 4759–4771 (2020)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NeurIPS (2014)
Guo, Z., Guo, D., Zheng, H., Gu, Z., Zheng, B., Dong, J.: Image harmonization with transformer. In: ICCV (2021)
Guo, Z., Zheng, H., Jiang, Y., Gu, Z., Zheng, B.: Intrinsic image harmonization. In: CVPR (2021)
Hao, G., Iizuka, S., Fukui, K.: Image harmonization with attention-based deep feature modulation. In: BMVC (2020)
Hu, Y., He, H., Xu, C., Wang, B., Lin, S.: Exposure: a white-box photo post-processing framework. ACM Trans. Graph. 37(2), 1–17 (2018)
Huang, H., Xu, S., Cai, J., Liu, W., Hu, S.: Temporally coherent video harmonization using adversarial networks. IEEE Trans. Image Process 29, 214–224 (2020)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Jia, J., Sun, J., Tang, C.K., Shum, H.Y.: Drag-and-drop pasting. ACM Trans. Graph. 25(3), 631–637 (2006)
Jiang, Y., et al.: A self-supervised framework for image harmonization. In: ICCV (2021)
Johnson, M.K., Dale, K., Avidan, S., Pfister, H., Freeman, W.T., Matusik., W.: Cg2real: improving the realism of computer generated images using a large collection of photographs. IEEE Trans. Vis. Comput. Graph. 17(9), 1273-1285 (2010)
Ke, Z., Sun, J., Li, K., Yan, Q., Lau, R.W.: Modnet: real-time trimap-free portrait matting via objective decomposition. In: AAAI (2022)
Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: ECCV (2018)
Lalonde, J.F., Efros, A.A.: Using color compatibility for assessing image realism. In: ICCV (2007)
Lei, C., Xing, Y., Chen, Q.: Blind video temporal consistency via deep video prior. In: Neurips (2020)
Liang, J., Cun, X., Pun, C.: Spatial-separated curve rendering network for efficient and high-resolution image harmonization. arXiv abs/2109.05750 (2021)
Lin, S., Yang, L., Saleemi, I., Sengupta, S.: Robust high-resolution video matting with temporal guidance. In: WACV (2022)
Ling, J., Xue, H., Song, L., Xie, R., Gu, X.: Region-aware adaptive instance normalization for image harmonization. In: CVPR (2021)
Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep painterly harmonization. In: EGSR (2018)
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. 22(3), 313–318 (2003)
Pitie, F., Kokaram, A.: The linear monge-kantorovitch linear colour mapping for example-based colour transfer. In: European Conference on Visual Media Production (2007)
Pitie, F., Kokaram, A., Dahyot, R.: N-dimensional probability density function transfer and its application to color. In: ICCV (2015)
Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O., Jagersand, M.: U2-net: going deeper with nested u-structure for salient object detection. Pattern Recogn. 106, 107404 (2020)
Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21(5), 34–41 (2001)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sofiiuk, K., Popenova, P., Konushin, A.: Foreground-aware semantic representations for image harmonization. In: WACV (2021)
Sunkavalli, K., Johnson, M.K., Matusik, W., Pfister, H.: Multi-scale image harmonization. ACM Trans. Graph. 29(4), 1–10 (2010)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML (2019)
Tao, M.W., Johnson, M.K., Paris, S.: Error-tolerant image compositing. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 31–44. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_3
Tsai, Y.H., Shen, X., Lin, Z., Sunkavalli, K., Lu, X., Yang, M.H.: Deep image harmonization. In: CVPR (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)
Wang, X., Yu, J.: Learning to cartoonize using white-box cartoon representations. In: CVPR (2020)
Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. ACM Trans. Graph. 31(4), 1–10 (2012)
Yan, Z., Zhang, H., Wang, B., Paris, S., Yu, Y.: Automatic photo adjustment using deep neural networks. ACM Trans. Graph. 35(2), 1–15 (2016)
Zaragoza, J., Chin, T.J., Brown, M.S., Suter, D.: As-projective-as-possible image stitching with moving DLT. In: CVPR (2013)
Zhang, F., Liu, F.: Parallax-tolerant image stitching. In: CVPR (2014)
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE TPAMI 21(8), 690–706 (1999)
Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Learning a discriminative model for the perception of realism in composite images. In: ICCV (2015)
Zou, Z., Shi, T., Qiu, S., Yuan, Y., Shi, Z.: Stylized neural painting. In: CVPR (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ke, Z., Sun, C., Zhu, L., Xu, K., Lau, R.W.H. (2022). Harmonizer: Learning to Perform White-Box Image and Video Harmonization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13675. Springer, Cham. https://doi.org/10.1007/978-3-031-19784-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-19784-0_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19783-3
Online ISBN: 978-3-031-19784-0
eBook Packages: Computer ScienceComputer Science (R0)