[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3313831.3376870acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Published: 23 April 2020 Publication History

Abstract

Modeling visual search not only offers an opportunity to predict the usability of an interface before actually testing it on real users but also advances scientific understanding about human behavior. In this work, we first conduct a set of analyses on a large-scale dataset of visual search tasks on realistic webpages. We then present a deep neural network that learns to predict the scannability of webpage content, i.e., how easy it is for a user to find a specific target. Our model leverages both heuristic-based features such as target size and unstructured features such as raw image pixels. This approach allows us to model complex interactions that might be involved in a realistic visual search task, which can not be achieved by traditional analytical models. We analyze the model behavior to offer our insights into how the salience map learned by the model aligns with human intuition.

Supplementary Material

MP4 File (a741-yuan-presentation.mp4)

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 265--283.
[2]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6077--6086.
[3]
Gilles Bailly, Antti Oulasvirta, Duncan P. Brumby, and Andrew Howes. 2014. Model of visual search and selection time in linear menus. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3865--3874.
[4]
Ali Borji. 2019. Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).
[5]
Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia. 2015b. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering. arXiv preprint arXiv:1511.05960 (2015).
[6]
X Chen, G Bailly, DP Brumby, A Oulasvirta, and A Howes. 2015a. The Emergence of Interactive Behaviour: A Model of Rational Menu Search. In CHI'15 Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, Vol. 33. Association for Computing Machinery (ACM), 4217--4226.
[7]
Andy Cockburn, Carl Gutwin, and Saul Greenberg. 2007. A Predictive model of Menu Performance. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 627--636.
[8]
Maurizio Corbetta and Gordon L Shulman. 2002. Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience 3, 3 (2002), 201.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Wai-Tat Fu and Peter Pirolli. 2007. SNIF-ACT: A cognitive model of user navigation on the World Wide Web. Human-Computer Interaction 22, 4 (2007), 355--412.
[11]
James E Hoffman. 1979. A two-stage model of visual search. Perception & Psychophysics 25, 4 (1979), 319--327.
[12]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
[13]
Laurent Itti and Christof Koch. 2000. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 40, 10--12 (2000), 1489--1506.
[14]
Melvin Johnson, Mike Schuster, Quoc V Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, and others. 2017. Google's multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339--351.
[15]
Jussi PP Jokinen, Sayan Sarcar, Antti Oulasvirta, Chaklam Silpasuwanchai, Zhenxin Wang, and Xiangshi Ren. 2017. Modelling learning of new keyboard layouts. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 4203--4215.
[16]
Jussi PP Jokinen, Zhenxin Wang, Sayan Sarcar, Antti Oulasvirta, and Xiangshi Ren. 2020. Adaptive feature guidance: Modelling visual search with graphical layouts. International Journal of Human-Computer Studies 136 (2020), 102376.
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Christof Koch and Shimon Ullman. 1987. Shifts in selective visual attention: towards the underlying neural circuitry. In Matters of Intelligence. Springer, 115--141.
[19]
Eileen Kowler. 2011. Eye movements: The past 25 years. Vision Research 51, 13 (2011), 1457--1483.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[21]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521, 7553 (2015), 436.
[22]
Yang Li, Samy Bengio, and Gilles Bailly. 2018. Predicting Human Performance in Vertical Menu Selection Using Deep Learning. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, Article 29, 7 pages.
[23]
Taosheng Liu, Jonas Larsson, and Marisa Carrasco. 2007. Feature-based attention modulates orientation-selective responses in human visual cortex. Neuron 55, 2 (2007), 313--323.
[24]
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances In Neural Information Processing Systems. 289--297.
[25]
Julio C Martinez-Trujillo and Stefan Treue. 2004. Feature-based attention increases the selectivity of population responses in primate visual cortex. Current Biology 14, 9 (2004), 744--751.
[26]
Brian McElree and Marisa Carrasco. 1999. The temporal dynamics of visual search: evidence for parallel processing in feature and conjunction searches. Journal of Experimental Psychology: Human Perception and Performance 25, 6 (1999), 1517.
[27]
Ubric Neisser. 1967. Cognitive Psychology (New York: Appleton). Century, Crofts (1967).
[28]
Enkhbold Nyamsuren and Niels A Taatgen. 2013. Pre-attentive and attentive vision module. Cognitive Systems Research 24 (2013), 62--71.
[29]
Ken Pfeuffer and Yang Li. 2018. Analysis and Modeling of Grid Performance on Touchscreen Mobile Devices. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, Article 288, 12 pages.
[30]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.
[31]
Jiye Shen, Eyal M Reingold, and Marc Pomplun. 2003. Guidance of eye movements during conjunctive visual search: the distractor-ratio effect. Canadian Journal of Experimental Psychology 57, 2 (2003), 76.
[32]
Kevin J Shih, Saurabh Singh, and Derek Hoiem. 2016. Where to look: Focus regions for visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4613--4621.
[33]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[34]
Benjamin W Tatler, Roland J Baddeley, and Iain D Gilchrist. 2005. Visual correlates of fixation selection: Effects of scale and time. Vision Research 45, 5 (2005), 643--659.
[35]
Farnaz Tehranchi and Frank E Ritter. 2018. Modeling visual search in interactive graphic interfaces: Adding visual pattern matching algorithms to ACT-R. In Proceedings of 16th International Conference on Cognitive Modeling. University of Wisconsin Madison, WI, 162--167.
[36]
Leong-Hwee Teo, Bonnie John, and Marilyn Blackmon. 2012. CogTool-Explorer: a model of goal-directed user exploration that considers information layout. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2479--2488.
[37]
Kashyap Todi, Jussi Jokinen, Kris Luyten, and Antti Oulasvirta. 2019. Individualising Graphical Layouts with Predictive Visual Search Models. ACM Transactions on Interactive Intelligent Systems (TiiS) 10, 1 (2019), 1--24.
[38]
Anne M Treisman and Garry Gelade. 1980. A feature-integration theory of attention. Cognitive Psychology 12, 1 (1980), 97--136.
[39]
Stefan Treue and Julio C Martinez Trujillo. 1999. Feature-based attention influences motion processing gain in macaque visual cortex. Nature 399, 6736 (1999), 575.
[40]
Hidde van der Meulen, Petra Varsanyi, Lauren Westendorf, Andrew L Kun, and Orit Shaer. 2016. Towards understanding collaboration around interactive surfaces: Exploring joint visual attention. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 219--220.
[41]
Robert Walter, Andreas Bulling, David Lindlbauer, Martin Schuessler, and Jörg Müller. 2015. Analyzing Visual Attention During Whole Body Interaction with Public Displays. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, New York, NY, USA, 1263--1267.
[42]
Jeremy M Wolfe. 1994. Guided search 2.0 a revised model of visual search. Psychonomic Bulletin & Review 1, 2 (1994), 202--238.
[43]
Jeremy M Wolfe and Todd S Horowitz. 2017. Five factors that guide attention in visual search. Nature Human Behaviour 1, 3 (2017), 0058.
[44]
Xiaoli Wu, Tom Gedeon, and Linlin Wang. 2018. The analysis method of visual information searching in the human-computer interactive process of intelligent control system. In Congress of the International Ergonomics Association. Springer, 73--84.
[45]
Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision. Springer, 451--466.
[46]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21--29.
[47]
Li Zhaoping and Uta Frith. 2011. A clash of bottom-up and top-down processes in visual search: The reversed letter effect revisited. Journal of Experimental Psychology: Human Perception and Performance 37, 4 (2011), 997.
[48]
Quanlong Zheng, Jianbo Jiao, Ying Cao, and Rynson WH Lau. 2018. Task-driven webpage saliency. In Proceedings of the European Conference on Computer Vision (ECCV). 287--302.

Cited By

View all
  • (2024)User Performance Modelling for Spatial Entities Comparison with Geodashboards: Using View Quality and Distractor as ConceptsCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3661325(7-14)Online publication date: 24-Jun-2024
  • (2024)Perceived User Reachability in Mobile UIs Using Data Analytics and Machine LearningInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2327199(1-24)Online publication date: 25-Mar-2024
  • (2023)Never-ending Learning of User InterfacesProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606824(1-13)Online publication date: 29-Oct-2023
  • Show More Cited By

Index Terms

  1. Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
    April 2020
    10688 pages
    ISBN:9781450367080
    DOI:10.1145/3313831
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2020

    Check for updates

    Author Tags

    1. convolutional neural network
    2. deep learning
    3. performance modeling
    4. scannability
    5. visual attention
    6. webpage

    Qualifiers

    • Research-article

    Conference

    CHI '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)278
    • Downloads (Last 6 weeks)62
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)User Performance Modelling for Spatial Entities Comparison with Geodashboards: Using View Quality and Distractor as ConceptsCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3661325(7-14)Online publication date: 24-Jun-2024
    • (2024)Perceived User Reachability in Mobile UIs Using Data Analytics and Machine LearningInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2327199(1-24)Online publication date: 25-Mar-2024
    • (2023)Never-ending Learning of User InterfacesProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606824(1-13)Online publication date: 29-Oct-2023
    • (2023)Cognitive Modelling: From GOMS to Deep Reinforcement LearningExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3574173(1-3)Online publication date: 19-Apr-2023
    • (2022)Debiased Label Aggregation for Subjective Crowdsourcing TasksExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519614(1-8)Online publication date: 27-Apr-2022
    • (2022)Cognitive Modelling: From GOMS to Deep Reinforcement LearningExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3503771(1-3)Online publication date: 27-Apr-2022
    • (2022)A Survey on the Use of Computer Vision to Improve Software Engineering TasksIEEE Transactions on Software Engineering10.1109/TSE.2020.303298648:5(1722-1742)Online publication date: 1-May-2022
    • (2022)Predicting Human Performance in Vertical Hierarchical Menu Selection in Immersive AR Using Hand-gesture and Head-gaze2022 15th International Conference on Human System Interaction (HSI)10.1109/HSI55341.2022.9869495(1-8)Online publication date: 28-Jul-2022
    • (2021)A Review of Recent Deep Learning Approaches in Human-Centered Machine LearningSensors10.3390/s2107251421:7(2514)Online publication date: 3-Apr-2021
    • (2021)Adapting User Interfaces with Model-based Reinforcement LearningProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445497(1-13)Online publication date: 6-May-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media