More Web Proxy on the site http://driver.im/

research-article

Open access

UICrit: Enhancing Automated Design Evaluation with a UI Critique Dataset

Authors:

Bjoern Hartmann,

Yang LiAuthors Info & Claims

UIST '24: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

Article No.: 46, Pages 1 - 17

https://doi.org/10.1145/3654777.3676381

Published: 11 October 2024 Publication History

All formats PDF

Abstract

Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven designers, each with at least a year of professional design experience. We carried out an in-depth analysis to characterize the dataset’s features. We then applied this dataset to achieve a 55% performance gain in LLM-generated UI feedback via various few-shot and visual prompting techniques. We also discuss future applications of this dataset, including training a reward model for generative UI techniques, and fine-tuning a tool-agnostic multi-modal LLM that automates UI evaluation.

References

[1]

Apple. 2023. Human Interface Guidelines. https://developer.apple.com/design/human-interface-guidelines. Accessed: 2024-03-31.

[2]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3 (01 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa

[3]

Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, and Bryan A. Plummer. 2022. A Dataset for Interactive Vision Language Navigation with Unknown Command Feasibility. In European Conference on Computer Vision (ECCV).

Digital Library

[4]

Souradeep Chakraborty, Zijun Wei, Conor Kelton, Seoyoung Ahn, Aruna Balasubramanian, Gregory J. Zelinsky, and Dimitris Samaras. 2023. Predicting Visual Attention in Graphic Design Documents. IEEE Transactions on Multimedia 25 (2023), 4478–4493. https://doi.org/10.1109/TMM.2022.3176942

Digital Library

[5]

Chin-Yi Cheng, Forrest Huang, Gang Li, and Yang Li. 2023. PLay: parametrically conditioned layout generation using latent diffusion. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML’23). JMLR.org, Article 216, 23 pages.

[6]

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual Symposium on User Interface Software and Technology(UIST ’17).

Digital Library

[7]

Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, and Yuki M. Asano. 2024. PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs. arxiv:2402.08657 [cs.CV] https://arxiv.org/abs/2402.08657

[8]

Peitong Duan, Björn Hartmann, Karina Nguyen, Yang Li, Marti Hearst, and Meredith Ringel Morris. 2023. Towards Semantically-Aware UI Design Tools: Design, Implementation and Evaluation of Semantic Grouping Guidelines. In ICML 2023 Workshop on Artificial Intelligence and Human-Computer Interaction. https://research.google/pubs/pub52594/

[9]

Peitong Duan, Jeremy Warner, Yang Li, and Bjoern Hartmann. 2024. Generating Automatic Feedback on UI Mockups with Large Language Models. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 6, 20 pages. https://doi.org/10.1145/3613904.3642782

Digital Library

[10]

Peitong Duan, Casimir Wierzynski, and Lama Nachman. 2020. Optimizing User Interface Layouts via Gradient Descent. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376589

Digital Library

[11]

J.L. Fleiss 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378–382.

[12]

Camilo Fosco, Vincent Casser, Amish Kumar Bedi, Peter O’Donovan, Aaron Hertzmann, and Zoya Bylinskii. 2020. Predicting Visual Importance Across Graphic Design Types. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 249–260. https://doi.org/10.1145/3379337.3415825

Digital Library

[13]

Gemini Team Google. 2024. Gemini: A family of highly capable multimodal models. (2024). arxiv:2312.11805 [cs.CL] https://arxiv.org/abs/2312.11805

[14]

Barney G. Glaser and Anselm L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, New York, NY.

[15]

Jan Hartmann, Alistair Sutcliffe, and Antonella De Angeli. 2008. Towards a theory of user judgment of aesthetics and user interface quality. ACM Trans. Comput.-Hum. Interact. 15 (11 2008). https://doi.org/10.1145/1460355.1460357

Digital Library

[16]

Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. 2023. Layoutdm: Discrete diffusion model for controllable layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10167–10176.

[17]

Yue Jiang, Luis A. Leiva, Hamed Rezazadegan Tavakoli, Paul R. B. Houssel, Julia Kylmälä, and Antti Oulasvirta. 2023. UEyes: Understanding Visual Saliency across User Interface Types. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 285, 21 pages. https://doi.org/10.1145/3544548.3581096

Digital Library

[18]

Chunggi Lee, Sanghoon Kim, Dongyun Han, Hongjun Yang, Young-Woo Park, Bum Chul Kwon, and Sungahn Ko. 2020. GUIComp: A GUI Design Assistant with Real-Time, Multi-Faceted Feedback. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376327

Digital Library

[19]

Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, and Sushant Prakash. 2023. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. arxiv:2309.00267 [cs.CL] https://arxiv.org/abs/2309.00267

[20]

Luis A. Leiva, Asutosh Hota, and Antti Oulasvirta. 2021. Enrico: A Dataset for Topic Modeling of Mobile UI Designs. In 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services (Oldenburg, Germany) (MobileHCI ’20). Association for Computing Machinery, New York, NY, USA, Article 9, 4 pages. https://doi.org/10.1145/3406324.3410710

Digital Library

[21]

Luis A. Leiva, Yunfei Xue, Avya Bansal, Hamed R. Tavakoli, Tuðçe Köroðlu, Jingzhou Du, Niraj R. Dayama, and Antti Oulasvirta. 2020. Understanding Visual Saliency in Mobile User Interfaces. In 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services (Oldenburg, Germany) (MobileHCI ’20). Association for Computing Machinery, New York, NY, USA, Article 3, 12 pages. https://doi.org/10.1145/3379503.3403557

Digital Library

[22]

Gang Li, Gilles Baechler, Manuel Tragut, and Yang Li. 2022. Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New Orleans, LA, USA, 1–13. https://doi.org/10.1145/3491102.3502042

Digital Library

[23]

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. 2020. Mapping Natural Language Instructions to Mobile UI Action Sequences. CoRR abs/2005.03776 (2020). arXiv:2005.03776https://arxiv.org/abs/2005.03776

[24]

Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang Yang, Jian-Guang Lou, and Dongmei Zhang. 2024. LayoutPrompter: Awaken the Design Ability of Large Language Models. Advances in Neural Information Processing Systems 36 (2024).

[25]

Fangchen Liu, Kuan Fang, Pieter Abbeel, and Sergey Levine. 2024. MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting. arxiv:2403.03174 [cs.RO] https://arxiv.org/abs/2403.03174

[26]

Kurt Luther, Jari-Lee Tolentino, Wei Wu, Amy Pavel, Brian P. Bailey, Maneesh Agrawala, Björn Hartmann, and Steven P. Dow. 2015. Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 473–485. https://doi.org/10.1145/2675133.2675283

Digital Library

[27]

Jean-bernard Martens and L.M.J. Meesters. 1998. Image dissimilarity. Signal Processing 70 (11 1998), 155–176. https://doi.org/10.1016/S0165-1684(98)00123-6

Digital Library

[28]

Walter T. Nakamura, Edson Cesar de Oliveira, Elaine H.T. de Oliveira, David Redmiles, and Tayana Conte. 2022. What factors affect the UX in mobile apps? A systematic mapping study on the analysis of app store reviews. Journal of Systems and Software 193 (2022), 111462. https://doi.org/10.1016/j.jss.2022.111462

Digital Library

[29]

Jakob Nielsen. 1994. Enhancing the explanatory power of usability heuristics. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, Massachusetts, USA) (CHI ’94). Association for Computing Machinery, New York, NY, USA, 152–158. https://doi.org/10.1145/191666.191729

Digital Library

[30]

Jakob Nielsen. 2012. Usability 101: Introduction to usability. https://www.nngroup.com/articles/usability-101-introduction-to-usability/

[31]

Jakob Nielsen and Rolf Molich. 1990. Heuristic Evaluation of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 249–256. https://doi.org/10.1145/97243.97281

Digital Library

[32]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, and et. al.2024. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774

[33]

Antti Oulasvirta, Samuli De Pascale, Janin Koch, Thomas Langerak, Jussi Jokinen, Kashyap Todi, Markku Laine, Manoj Kristhombuge, Yuxi Zhu, Aliaksei Miniukovich, Gregorio Palmas, and Tino Weinkauf. 2018. Aalto Interface Metrics (AIM): A Service and Codebase for Computational GUI Evaluation. In Adjunct Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (Berlin, Germany) (UIST ’18 Adjunct). Association for Computing Machinery, New York, NY, USA, 16–19. https://doi.org/10.1145/3266037.3266087

Digital Library

[34]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV] https://arxiv.org/abs/2103.00020

[35]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. CoRR abs/1910.10683 (2019). arXiv:1910.10683http://arxiv.org/abs/1910.10683

[36]

Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, and Timothy Lillicrap. 2023. Android in the Wild: A Large-Scale Dataset for Android Device Control. arxiv:2307.10088 [cs.LG] https://arxiv.org/abs/2307.10088

[37]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084

[38]

D Royce Sadler. 1989. Formative assessment and the design of instructional systems. Instructional science 18 (1989), 119–144.

[39]

Chengyao Shen and Qi Zhao. 2014. Webpage Saliency. In ECCV. IEEE.

[40]

Robert L Thorndike. 1953. Who belongs in the family?Psychometrika 18, 4 (1953), 267–276.

[41]

Kashyap Todi, Daryl Weir, and Antti Oulasvirta. 2016. Sketchplore: Sketch and Explore with a Layout Optimiser. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 543–555. https://doi.org/10.1145/2901790.2901817

Digital Library

[42]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html

[43]

Jason Wu, Siyan Wang, Siman Shen, Yi-Hao Peng, Jeffrey Nichols, and Jeffrey P Bigham. 2023. WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany,) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 286, 14 pages. https://doi.org/10.1145/3544548.3581158

Digital Library

[44]

Ziming Wu, Yulun Jiang, Yiding Liu, and Xiaojuan Ma. 2020. Predicting and Diagnosing User Engagement with Mobile UI Animation via a Data-Driven Approach. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (, Honolulu, HI, USA, ) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376324

Digital Library

[45]

Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, and Shuming Shi. 2024. Reasons to Reject? Aligning Language Models with Judgments. arxiv:2312.14591 [cs.CL] https://arxiv.org/abs/2312.14591

Index Terms

UICrit: Enhancing Automated Design Evaluation with a UI Critique Dataset
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI
  2. Interaction design
    1. Systems and tools for interaction design

Recommendations

WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Modeling user interfaces (UIs) from visual information allows systems to make inferences about the functionality and semantics needed to support use cases in accessibility, app automation, and testing. Current datasets for training machine learning ...
UISketch: A Large-Scale Dataset of UI Element Sketches
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

This paper contributes the first large-scale dataset of 17,979 hand-drawn sketches of 21 UI element categories collected from 967 participants, including UI/UX designers, front-end developers, HCI, and CS grad students, from 10 different countries. We ...
MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

UIST '24: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

October 2024

2334 pages

ISBN:9798400706288

DOI:10.1145/3654777

Editors:
Lining Yao
University of California, Berkeley
,
Mayank Goel
Carnegie Mellon University
,
Alexandra Ion
Carnegie Mellon University
,
Pedro Lopes
University of Chicago

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

UIST '24

UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

PA, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
410
Total Downloads

Downloads (Last 12 months)410
Downloads (Last 6 weeks)223

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents