[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3548814.3551460acmconferencesArticle/Chapter ViewAbstractPublication PagessapConference Proceedingsconference-collections
research-article

Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

Published: 22 September 2022 Publication History

Abstract

Generative adversarial networks (GANs) generate high-dimensional vector spaces (latent spaces) that can interchangeably represent vectors as images. Advancements have extended their ability to computationally generate images indistinguishable from real images such as faces, and more importantly, to manipulate images using their inherit vector values in the latent space. This interchangeability of latent vectors has the potential to calculate not only the distance in the latent space, but also the human perceptual and cognitive distance toward images, that is, how humans perceive and recognize images. However, it is still unclear how the distance in the latent space correlates with human perception and cognition. Our studies investigated the relationship between latent vectors and human perception or cognition through psycho-visual experiments that manipulates the latent vectors of face images. In the perception study, a change perception task was used to examine whether participants could perceive visual changes in face images before and after moving an arbitrary distance in the latent space. In the cognition study, a face recognition task was utilized to examine whether participants could recognize a face as the same, even after moving an arbitrary distance in the latent space. Our experiments show that the distance between face images in the latent space correlates with human perception and cognition for visual changes in face imagery, which can be modeled with a logistic function. By utilizing our methodology, it will be possible to interchangeably convert between the distance in the latent space and the metric of human perception and cognition, potentially leading to image processing that better reflects human perception and cognition.

References

[1]
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296–8305.
[2]
Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a matter of style: age transformation using a style-based regression model. ACM Trans. Graph. 40, 4 (July 2021), 1–12.
[3]
R C Atkinson and R M Shiffrin. 1968. Human memory: A proposed system and its control processes. The psychology of learning and motivation: II. 249 (1968).
[4]
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2018. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arxiv:1811.10597 [cs.CV]
[5]
Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, and Daniel Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. https://doi.org/10.48550/ARXIV.2202.14020
[6]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques(SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., USA, 187–194.
[7]
J Deng, W Dong, R Socher, L Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.
[8]
Alexey Dosovitskiy and Thomas Brox. 2016. Generating Images with Perceptual Similarity Metrics based on Deep Networks. arXiv (Feb. 2016). arxiv:1602.02644 [cs.LG]
[9]
Bernhard Egger, William A P Smith, Ayush Tewari, Stefanie Wuhrer, Michael Zollhoefer, Thabo Beeler, Florian Bernard, Timo Bolkart, Adam Kortylewski, Sami Romdhani, Christian Theobalt, Volker Blanz, and Thomas Vetter. 2019. 3D Morphable Face Models – Past, Present and Future. (Sept. 2019). arxiv:1909.01815 [cs.CV]
[10]
Chris Elsden, David Chatting, Michael Duggan, Andrew Carl Dwyer, and Pip Thornton. 2022. Zoom Obscura: Counterfunctional Design for Video-Conferencing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 143, 17 pages. https://doi.org/10.1145/3491102.3501973
[11]
Zhenglin Geng, Chen Cao, and Sergey Tulyakov. 2020. Towards Photo-Realistic Facial Expression Manipulation. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2744–2761.
[12]
Gillian Rhodes, Andy Calder, Mark Johnson, and James V. Haxby. 2011. Oxford Handbook of Face Perception. In Oxford Handbook of Face Perception(1 ed.). Oxford University Press.
[13]
Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. 2019. GANalyze: Toward Visual Definitions of Cognitive Image Properties. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 5743–5752.
[14]
Lore Goetschalckx, Alex Andonian, and Johan Wagemans. 2021. Generative adversarial networks unlock new methods for cognitive science. Trends in Cognitive Sciences 25, 9 (2021), 788–801. https://doi.org/10.1016/j.tics.2021.06.006
[15]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
[16]
Ralph Norman Haber and L. G. Standing. 1969. Direct Measures of Short-Term Visual Storage. Quarterly Journal of Experimental Psychology 21, 1(1969), 43–54. https://doi.org/10.1080/14640746908400193 arXiv:https://doi.org/10.1080/14640746908400193
[17]
Phillip Isola, Jianxiong Xiao, Devi Parikh, Antonio Torralba, and Aude Oliva. 2013. What Makes a Photograph Memorable?IEEE transactions on pattern analysis and machine intelligence 36 (10 2013). https://doi.org/10.1109/TPAMI.2013.200
[18]
P Isola, J Xiao, A Torralba, and A Oliva. 2011. What makes an image memorable?CVPR 2011 (2011).
[19]
Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2020. Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5194–5202.
[20]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv (March 2016). arxiv:1603.08155 [cs.CV]
[21]
Kamila M Jozwik, Jonathan O’Keeffe, Katherine R Storrs, Wenxuan Guo, Tal Golan, and Nikolaus Kriegeskorte. 2022. Face dissimilarity judgments are predicted by representational distance in morphable and image-computable models. Proc. Natl. Acad. Sci. U. S. A. 119, 27 (July 2022), e2115047119.
[22]
Ryota Kanai and Frans A J Verstraten. 2004. Visual transients without feature changes are sufficient for the percept of a change. Vision Res. 44, 19 (2004), 2233–2240.
[23]
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.
[24]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.
[25]
Shunichi Kasahara and Kazuma Takada. 2021. Stealth Updates of Visual Information by Leveraging Change Blindness and Computational Visual Morphing. ACM Trans. Appl. Percept. 18, 4 (Oct. 2021), 1–17.
[26]
Cédric Laloyaux, Christel Devue, Stéphane Doyen, Elodie David, and Axel Cleeremans. 2008. Undetected changes in visible stimuli influence subsequent decisions. Conscious. Cogn. 17, 3 (Sept. 2008), 646–656.
[27]
Steven J Luck and Edward K Vogel. 2013. Visual working memory capacity: from psychophysics and neurobiology to individual differences. Trends Cogn. Sci. 17, 8 (Aug. 2013), 391–400.
[28]
Yaniv Morgenstern, Frieder Hartmann, Filipp Schmidt, Henning Tiedemann, Eugen Prokott, Guido Maiello, and Roland W Fleming. 2020. An image-computable model of human visual shape similarity. (Jan. 2020), 2020.01.10.901876 pages.
[29]
Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2021. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3115428
[30]
Ronald A Rensink. 2002. Change detection. Annu. Rev. Psychol. 53(2002), 245–277.
[31]
Ronald A Rensink. 2005. CHAPTER 13 - Change Blindness. In Neurobiology of Attention, Laurent Itti, Geraint Rees, and John K Tsotsos (Eds.). Academic Press, Burlington, 76–81.
[32]
Ronald A Rensink, J Kevin O’Regan, and James J Clark. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychol. Sci. 8, 5 (Sept. 1997), 368–373.
[33]
rolux. 2019. stylegan2encoder. https://github.com/rolux/stylegan2encoder.
[34]
Mark W Schurgin, John T Wixted, and Timothy F Brady. 2020. Psychophysical scaling reveals a unified theory of visual memory strength. Nat Hum Behav 4, 11 (Nov. 2020), 1156–1172.
[35]
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2019. Interpreting the Latent Space of GANs for Semantic Face Editing. arXiv (July 2019). arxiv:1907.10786 [cs.CV]
[36]
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR.
[37]
J. Rafid Siddiqui. 2022. FExGAN-Meta: Facial Expression Generation with Meta Humans. https://doi.org/10.48550/ARXIV.2203.05975
[38]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (Sept. 2014). arxiv:1409.1556 [cs.CV]
[39]
Gaeun Son, Dirk B. Walther, and Michael L. Mack. 2021. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. bioRxiv (2021). https://doi.org/10.1101/2020.10.09.333708 arXiv:https://www.biorxiv.org/content/early/2021/04/01/2020.10.09.333708.full.pdf
[40]
Tim Valentine, Michael B Lewis, and Peter J Hills. 2016. Face-space: A unifying concept in face recognition research. Q. J. Exp. Psychol. 69, 10 (Oct. 2016), 1996–2019.
[41]
Evangelos Ververas and Stefanos Zafeiriou. 2020. SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters. Int. J. Comput. Vis. 128, 10 (Nov. 2020), 2629–2650.
[42]
Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin. 2020. StyleGAN2 Distillation for Feed-Forward Image Manipulation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 170–186.
[43]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV). 325–341.
[44]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
[45]
zllrunning. 2019. face-parsing.PyTorch. https://github.com/zllrunning/face-parsing.PyTorch.

Cited By

View all
  • (2024)Investigating the impact of motion visual synchrony on self face recognition using real time morphingScientific Reports10.1038/s41598-024-63233-214:1Online publication date: 7-Jun-2024
  • (2023)Investigating Effects of Facial Self-Similarity Levels on the Impression of Virtual Agents in Serious/Non-Serious ContextsProceedings of the Augmented Humans International Conference 202310.1145/3582700.3582721(221-230)Online publication date: 12-Mar-2023

Index Terms

  1. Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAP '22: ACM Symposium on Applied Perception 2022
    September 2022
    86 pages
    ISBN:9781450394550
    DOI:10.1145/3548814
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 September 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. change perception
    2. face cognition
    3. generative adversarial networks

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • JST Moonshot R&D Program

    Conference

    SAP '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 43 of 94 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 30 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Investigating the impact of motion visual synchrony on self face recognition using real time morphingScientific Reports10.1038/s41598-024-63233-214:1Online publication date: 7-Jun-2024
    • (2023)Investigating Effects of Facial Self-Similarity Levels on the Impression of Virtual Agents in Serious/Non-Serious ContextsProceedings of the Augmented Humans International Conference 202310.1145/3582700.3582721(221-230)Online publication date: 12-Mar-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media