[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

A great interest has arisen in using Deep Generative Models (DGM) for generative design. When assessing the quality of the generated designs, human designers focus more on structural plausibility, e.g., no missing component, rather than visual artifacts, e.g., noises or blurriness. Meanwhile, commonly used metrics such as Fréchet Inception Distance (FID) may not evaluate accurately because they are sensitive to visual artifacts and tolerant to semantic errors. As such, FID might not be suitable to assess the performance of DGMs for a generative design task. In this work, we propose to encode the to-be-evaluated images with a Denoising Autoencoder (DAE) and measure the distribution distance in the resulting latent space. Hereby, we design a novel metric Fréchet Denoised Distance (FDD). We experimentally test our FDD, FID and other state-of-the-art metrics on multiple datasets, e.g., BIKED, Seeing3DChairs, FFHQ and ImageNet. Our FDD can effectively detect implausible structures and is more consistent with structural inspections by human experts. Our source code is publicly available at https://github.com/jiajie96/FDD_pytorch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 59.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In: 2014 IEEE CVPR, pp. 3762–3769 (2014). https://doi.org/10.1109/CVPR.2014.487

  2. Baker, N., Lu, H., Erlikhman, G., Kellman, P.J.: Deep convolutional networks do not classify based on global object shape. PLoS Comput. Biol. 14 (2018). https://api.semanticscholar.org/CorpusID:54476941

  3. Barratt, S.T., Sharma, R.: A note on the inception score. ArXiv abs/1801.01973 (2018). https://api.semanticscholar.org/CorpusID:38384342

  4. Betzalel, E., Penso, C., Navon, A., Fetaya, E.: A study on the evaluation of generative models. CoRR abs/2206.10935 (2022). https://doi.org/10.48550/ARXIV.2206.10935

  5. Binkowski, M., Sutherland, D.J., Arbel, M., Gretton, A.: Demystifying MMD GANs. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net (2018). https://openreview.net/forum?id=r1lUOzWCW

  6. Borji, A.: Pros and cons of GAN evaluation measures: new developments. Comput. Vis. Image Underst. 215, 103329 (2022). https://doi.org/10.1016/j.cviu.2021.103329, https://www.sciencedirect.com/science/article/pii/S1077314221001685

  7. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. ArXiv abs/1809.11096 (2018), https://api.semanticscholar.org/CorpusID:52889459

  8. Buzuti, L.F., Thomaz, C.E.: Fréchet autoencoder distance: a new approach for evaluation of generative adversarial networks. Comput. Vis. Image Underst. 235, 103768 (2023)

    Article  Google Scholar 

  9. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS 2020, Curran Associates Inc., Red Hook, NY, USA (2020)

    Google Scholar 

  10. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: Stargan v2: diverse image synthesis for multiple domains. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8185–8194 (2020). https://doi.org/10.1109/CVPR42600.2020.00821

  11. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009). https://ieeexplore.ieee.org/abstract/document/5206848/

  12. Dhariwal, P., Nichol, A.Q.: Diffusion models beat GANs on image synthesis. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https://openreview.net/forum?id=AAWuCvzaVt

  13. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  14. Fan, J., Vuaille, L., Bäck, T., Wang, H.: On the noise scheduling for generating plausible designs with diffusion models (2023)

    Google Scholar 

  15. Fan, J., Vuaille, L., Wang, H., Bäck, T.: Adversarial latent autoencoder with self-attention for structural image synthesis. arXiv preprint arXiv:2307.10166 (2023)

  16. Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F., Brendel, W.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. ArXiv abs/1811.12231 (2018). https://api.semanticscholar.org/CorpusID:54101493

  17. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  18. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)

    MathSciNet  Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  20. Hermann, K.L., Chen, T., Kornblith, S.: The origins and prevalence of texture bias in convolutional neural networks. arXiv: Computer Vision and Pattern Recognition (2019). https://api.semanticscholar.org/CorpusID:220266152

  21. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (NIPS 2017) (2018)

    Google Scholar 

  22. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)

  23. Horak, D., Yu, S., Khorshidi, G.S.: Topology distance: a topology-based approach for evaluating generative adversarial networks. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 7721–7728. AAAI Press (2021). https://doi.org/10.1609/AAAI.V35I9.16943

  24. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. ArXiv abs/1710.10196 (2017). https://api.semanticscholar.org/CorpusID:3568073

  25. Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. Adv. Neural. Inf. Process. Syst. 35, 26565–26577 (2022)

    Google Scholar 

  26. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  27. Kucker, S.C., et al.: Reproducibility and a unifying explanation: lessons from the shape bias. Infant Behav. Dev. 54, 156–165 (2019). https://api.semanticscholar.org/CorpusID:53045726

  28. Kynkäänniemi, T., Karras, T., Aittala, M., Aila, T., Lehtinen, J.: The role of imagenet classes in fréchet inception distance. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=4oXTQ6m_ws8

  29. Landau, B., Smith, L.B., Jones, S.S.: The importance of shape in early lexical learning. Cogn. Dev. 3, 299–321 (1988). https://api.semanticscholar.org/CorpusID:205117480

  30. Liu, W., et al.: Towards visually explaining variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8642–8651 (2020)

    Google Scholar 

  31. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)

    Google Scholar 

  32. Maiorca, A., Yoon, Y., Dutoit, T.: Evaluating the quality of a synthesized motion with the fréchet motion distance. In: ACM SIGGRAPH 2022 Posters. SIGGRAPH 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3532719.3543228

  33. Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 7176–7185. PMLR (2020). https://proceedings.mlr.press/v119/naeem20a.html

  34. Nobari, A.H., Rashad, M.F., Ahmed, F.: Creativegan: editing generative adversarial networks for creative design synthesis. CoRR abs/2103.06242 (2021). https://arxiv.org/abs/2103.06242

  35. Oquab, M., et al.: DINOv2: learning robust visual features without supervision. Trans. Mach. Learn. Res. (2024). https://openreview.net/forum?id=a68SUt6zFt

  36. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445

  37. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1511.06434

  38. Regenwetter, L., Curry, B., Ahmed, F.: BIKED: a dataset for computational bicycle design with machine learning benchmarks. J. Mech. Des. 144(3) (2021). https://doi.org/10.1115/1.4052585

  39. Regenwetter, L., Nobari, A.H., Ahmed, F.: Deep generative models in engineering design: a review. J. Mech. Des. 144(7), 071704 (2022)

    Article  Google Scholar 

  40. Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. ArXiv abs/1606.03498 (2016), https://api.semanticscholar.org/CorpusID:1687220

  41. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). https://doi.org/10.1109/ICCV.2017.74

  42. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020). https://arxiv.org/abs/2010.02502

  43. Stein, G., et al.: Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. CoRR abs/2306.04675 (2023). https://doi.org/10.48550/ARXIV.2306.04675

  44. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2818–2826 (2015). https://api.semanticscholar.org/CorpusID:206593880

  45. Van Den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  46. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning (2008). https://api.semanticscholar.org/CorpusID:207168299

  47. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ArXiv abs/1708.07747 (2017). https://api.semanticscholar.org/CorpusID:702279

  48. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 586–595. Computer Vision Foundation / IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00068, http://openaccess.thecvf.com/content_cvpr_2018/html/ Zhang_The_Unreasonable_Effectiveness_CVPR_2018_paper.html

  49. Zhou, W.: Image quality assessment: from error measurement to structural similarity. IEEE Trans. Image Process. 13, 600–613 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge Laure Vuaille for her valuable insights and the support provided by BMW Group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiajie Fan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14762 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, J., Trigui, A., Bäck, T., Wang, H. (2025). Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15136. Springer, Cham. https://doi.org/10.1007/978-3-031-73229-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73229-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73228-7

  • Online ISBN: 978-3-031-73229-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics