Abstract
A great interest has arisen in using Deep Generative Models (DGM) for generative design. When assessing the quality of the generated designs, human designers focus more on structural plausibility, e.g., no missing component, rather than visual artifacts, e.g., noises or blurriness. Meanwhile, commonly used metrics such as Fréchet Inception Distance (FID) may not evaluate accurately because they are sensitive to visual artifacts and tolerant to semantic errors. As such, FID might not be suitable to assess the performance of DGMs for a generative design task. In this work, we propose to encode the to-be-evaluated images with a Denoising Autoencoder (DAE) and measure the distribution distance in the resulting latent space. Hereby, we design a novel metric Fréchet Denoised Distance (FDD). We experimentally test our FDD, FID and other state-of-the-art metrics on multiple datasets, e.g., BIKED, Seeing3DChairs, FFHQ and ImageNet. Our FDD can effectively detect implausible structures and is more consistent with structural inspections by human experts. Our source code is publicly available at https://github.com/jiajie96/FDD_pytorch.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In: 2014 IEEE CVPR, pp. 3762–3769 (2014). https://doi.org/10.1109/CVPR.2014.487
Baker, N., Lu, H., Erlikhman, G., Kellman, P.J.: Deep convolutional networks do not classify based on global object shape. PLoS Comput. Biol. 14 (2018). https://api.semanticscholar.org/CorpusID:54476941
Barratt, S.T., Sharma, R.: A note on the inception score. ArXiv abs/1801.01973 (2018). https://api.semanticscholar.org/CorpusID:38384342
Betzalel, E., Penso, C., Navon, A., Fetaya, E.: A study on the evaluation of generative models. CoRR abs/2206.10935 (2022). https://doi.org/10.48550/ARXIV.2206.10935
Binkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD GANs. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net (2018). https://openreview.net/forum?id=r1lUOzWCW
Borji, A.: Pros and cons of GAN evaluation measures: new developments. Comput. Vis. Image Underst. 215, 103329 (2022). https://doi.org/10.1016/j.cviu.2021.103329, https://www.sciencedirect.com/science/article/pii/S1077314221001685
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. ArXiv abs/1809.11096 (2018), https://api.semanticscholar.org/CorpusID:52889459
Buzuti, L.F., Thomaz, C.E.: Fréchet autoencoder distance: a new approach for evaluation of generative adversarial networks. Comput. Vis. Image Underst. 235, 103768 (2023)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS 2020, Curran Associates Inc., Red Hook, NY, USA (2020)
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: diverse image synthesis for multiple domains. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8185–8194 (2020). https://doi.org/10.1109/CVPR42600.2020.00821
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009). https://ieeexplore.ieee.org/abstract/document/5206848/
Dhariwal, P., Nichol, A.Q.: Diffusion models beat GANs on image synthesis. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https://openreview.net/forum?id=AAWuCvzaVt
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, J., Vuaille, L., Bäck, T., Wang, H.: On the noise scheduling for generating plausible designs with diffusion models (2023)
Fan, J., Vuaille, L., Wang, H., Bäck, T.: Adversarial latent autoencoder with self-attention for structural image synthesis. arXiv preprint arXiv:2307.10166 (2023)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. ArXiv abs/1811.12231 (2018). https://api.semanticscholar.org/CorpusID:54101493
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hermann, K.L., Chen, T., Kornblith, S.: The origins and prevalence of texture bias in convolutional neural networks. arXiv: Computer Vision and Pattern Recognition (2019). https://api.semanticscholar.org/CorpusID:220266152
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (NIPS 2017) (2018)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)
Horak, D., Yu, S., Khorshidi, G.S.: Topology distance: a topology-based approach for evaluating generative adversarial networks. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 7721–7728. AAAI Press (2021). https://doi.org/10.1609/AAAI.V35I9.16943
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. ArXiv abs/1710.10196 (2017). https://api.semanticscholar.org/CorpusID:3568073
Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. Adv. Neural. Inf. Process. Syst. 35, 26565–26577 (2022)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Kucker, S.C., et al.: Reproducibility and a unifying explanation: lessons from the shape bias. Infant Behav. Dev. 54, 156–165 (2019). https://api.semanticscholar.org/CorpusID:53045726
Kynkäänniemi, T., Karras, T., Aittala, M., Aila, T., Lehtinen, J.: The role of imagenet classes in fréchet inception distance. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=4oXTQ6m_ws8
Landau, B., Smith, L.B., Jones, S.S.: The importance of shape in early lexical learning. Cogn. Dev. 3, 299–321 (1988). https://api.semanticscholar.org/CorpusID:205117480
Liu, W., et al.: Towards visually explaining variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8642–8651 (2020)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)
Maiorca, A., Yoon, Y., Dutoit, T.: Evaluating the quality of a synthesized motion with the fréchet motion distance. In: ACM SIGGRAPH 2022 Posters. SIGGRAPH 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3532719.3543228
Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 7176–7185. PMLR (2020). https://proceedings.mlr.press/v119/naeem20a.html
Nobari, A.H., Rashad, M.F., Ahmed, F.: Creativegan: editing generative adversarial networks for creative design synthesis. CoRR abs/2103.06242 (2021). https://arxiv.org/abs/2103.06242
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. Trans. Mach. Learn. Res. (2024). https://openreview.net/forum?id=a68SUt6zFt
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1511.06434
Regenwetter, L., Curry, B., Ahmed, F.: BIKED: a dataset for computational bicycle design with machine learning benchmarks. J. Mech. Des. 144(3) (2021). https://doi.org/10.1115/1.4052585
Regenwetter, L., Nobari, A.H., Ahmed, F.: Deep generative models in engineering design: a review. J. Mech. Des. 144(7), 071704 (2022)
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. ArXiv abs/1606.03498 (2016), https://api.semanticscholar.org/CorpusID:1687220
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020). https://arxiv.org/abs/2010.02502
Stein, G., et al.: Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. CoRR abs/2306.04675 (2023). https://doi.org/10.48550/ARXIV.2306.04675
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2818–2826 (2015). https://api.semanticscholar.org/CorpusID:206593880
Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning (2008). https://api.semanticscholar.org/CorpusID:207168299
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ArXiv abs/1708.07747 (2017). https://api.semanticscholar.org/CorpusID:702279
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 586–595. Computer Vision Foundation / IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00068, http://openaccess.thecvf.com/content_cvpr_2018/html/ Zhang_The_Unreasonable_Effectiveness_CVPR_2018_paper.html
Zhou, W.: Image quality assessment: from error measurement to structural similarity. IEEE Trans. Image Process. 13, 600–613 (2004)
Acknowledgements
We gratefully acknowledge Laure Vuaille for her valuable insights and the support provided by BMW Group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fan, J., Trigui, A., Bäck, T., Wang, H. (2025). Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15136. Springer, Cham. https://doi.org/10.1007/978-3-031-73229-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-73229-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73228-7
Online ISBN: 978-3-031-73229-4
eBook Packages: Computer ScienceComputer Science (R0)