[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3379337.3415845acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
Open access

Crosspower: Bridging Graphics and Linguistics

Published: 20 October 2020 Publication History


Despite the ubiquity of direct manipulation techniques available in computer-aided design applications, creating digital content remains a tedious and indirect task. This is because applications require users to perform numerous low-level editing operations rather than allowing them to directly indicate high-level design goals. Yet, the creation of graphic content, such as videos, animations, and presentations often begins with a description of design goals in natural language, such as screenplays, scripts, outlines. Therefore, there is an opportunity for language-oriented authoring, i.e., leveraging the information found in the structure of a language to facilitate the creation of graphic content. We present a systematic exploration of the identification, graphic description, and interaction with various linguistic structures to assist in the creation of visual content. The prototype system, Crosspower, and its proposed interaction techniques, enables content creators to indicate and customize their desired visual content in a flexible and direct manner.

Supplementary Material

VTT File (ufp4740vf.vtt)
VTT File (3379337.3415845.vtt)
SRT File (ufp4740vfc.srt)
Video figure captions
M4V File (ufp4740vf.m4v)
Video figure
MP4 File (ufp4740pv.mp4)
Preview video
MP4 File (3379337.3415845.mp4)
Presentation Video


Maneesh Agrawala, Wilmot Li, and Floraine Berthouzoz. 2011. Design principles for visual communication. Commun. ACM 54, 4 (April 2011), 60--69.
Olga Babko-Malaya. (2005). Propbank annotation guidelines. URL: http://verbs. colorado. edu.
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. "The berkeley framenet project." In Proceedings of the 17th international conference on Computational linguistics-Volume 1, pp. 86--90. Association for Computational Linguistics, 1998.
Michel Beaudouin-Lafon. 2000. Instrumental interaction: an interaction model for designing post-WIMP user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI '00). Association for Computing Machinery, New York, NY, USA, 446--453.
Michel Beaudouin-Lafon and Wendy E. Mackay. 2000. Reification, polymorphism and reuse: three principles for designing visual interfaces. In Proceedings of the working conference on Advanced visual interfaces (AVI '00). Association for Computing Machinery, New York, NY, USA, 102--109.
Benjamin B. Bederson, James D. Hollan, Allison Druin, Jason Stewart, David Rogers, and David Proft. "Local tools: An alternative to tool palettes." In Proceedings of the 9th annual ACM symposium on User interface software and technology, pp. 169--170. 1996.
Richard A. Bolt. 1980. 'Put-that-there?: Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on Computer graphics and interactive techniques (SIGGRAPH '80). ACM, New York, NY, USA, 262--270.
Angel Chang, Manolis Savva, and Christopher D. Manning. "Learning spatial knowledge for text to 3D scene generation." Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.
Siddhartha Chaudhuri, Evangelos Kalogerakis, Stephen Giguere, and Thomas Funkhouser. 2013. Attribit: content creation with semantic attributes. In Proceedings of the 26th annual ACM symposium on User interface software and technology (UIST '13). Association for Computing Machinery, New York, NY, USA, 193--202.
Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen, and Josh Clow. 1997. QuickSet: multimodal interaction for distributed applications. In Proceedings of the fifth ACM international conference on Multimedia (MULTIMEDIA '97). ACM, New York, NY, USA, 31--40.
Bob Coyne and Richard Sproat. 2001. WordsEye: an automatic text-to-scene conversion system. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques (SIGGRAPH '01). ACM, New York, NY, USA, 487--496.
Weiwei Cui, Xiaoyu Zhang, Yun Wang, He Huang, Bei Chen, Lei Fang, Haidong Zhang, Jian-Guan Lou, and Dongmei Zhang. "Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements." IEEE transactions on visualization and computer graphics 26, no. 1 (2019): 906--916.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. "Imagenet: A large-scale hierarchical image database." In 2009 IEEE conference on computer vision and pattern recognition, pp. 248--255. Ieee, 2009.
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. 2008. Video browsing by direct manipulation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). Association for Computing Machinery, New York, NY, USA, 237--246.
Marianela Ciolfi Felice, Nolwenn Maudet, Wendy E. Mackay, and Michel Beaudouin-Lafon. 2016. Beyond Snapping: Persistent, Tweakable Alignment and Distribution with StickyLines. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16). Association for Computing Machinery, New York, NY, USA, 133--144.
Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. 2019. Text-based editing of talking-head video. ACM Trans. Graph. 38, 4, Article 68 (July 2019), 14 pages.
Bill Gates. 1996. Content is king. Retrieved October, 29, p.2017.
Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST '15). Association for Computing Machinery, New York, NY, USA, 489--500.
Google NLP. 2020. https://cloud.google.com/natural-language/
David G. Hays "Dependency theory: A formalism and some observations." Language 40, no. 4 (1964): 511--525.
Seunghoon Hong, Dingdong Yang, Jongwook Choi, and Honglak Lee. 2018. Inferring semantic layout for hierarchical text-to-image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7986--7994.
Bernd Huber, Hijung Valentina Shin, Bryan Russell, Oliver Wang, and Gautham J. Mysore. 2019. B-Script: Transcript-based B-roll Video Editing with Recommendations. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Paper 81, 11 pages.
Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B. Bederson, Allison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, Nicolas Roussel, Björn Eiderbäck. 2003. Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '03), 17--24. http://dx.doi.org/10.1145/642611.642616.
Takeo Igarashi and John F. Hughes. 2001. Voice as sound: using non-verbal voice input for interactive control. In Proceedings of the 14th annual ACM symposium on User interface software and technology (UIST '01). Association for Computing Machinery, New York, NY, USA, 155--156.
Dhiraj Joshi, James Z Wang, and Jia Li. 2004. The story picturing engine: finding elite images to illustrate a story using mutual reinforcement. In Proceedings of the 6thACM SIGMM international workshop on Multimedia information retrieval. ACM, 119--126.
Murat Kalender, M. Tolga Eren, Zonghuan Wu, Ozgun Cirakman, Sezer Kutluk, Gunay Gultekin, and Emin Erkan Korkmaz. 2018. Videolization: knowledge graph based automated video generation from web content. Multimedia Tools and Applications 77, 1 (01 Jan 2018), 567--595.
Chen, Kevin, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, and Silvio Savarese. 2018. Text2shape: Generating shapes from natural language by learning joint embeddings. In Asian Conference on Computer Vision, pp. 100--116. Springer, Cham, 2018.
Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nick Rizzolo et al. "Cogcompnlp: Your swiss army knife for nlp." In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.
Gunhee Kim, Seungwhan Moon, and Leonid Sigal. 2015.Ranking and retrieval of image sequences from multiple paragraph queries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1993--2001.
Yea-Seul Kim, Mira Dontcheva, Eytan Adar, and Jessica Hullman. 2019. Vocal Shortcuts for Creative Experts. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, Paper 332, 1--14.
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "Visual genome: Connecting language and vision using crowdsourced dense image annotations." International Journal of Computer Vision 123, no. 1 (2017): 32--73.
Gierad P. Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, Jason Linder, and Eytan Adar. 2013. PixelTone: a multimodal interface for image editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). Association for Computing Machinery, New York, NY, USA, 2185--2194.
Mackenzie Leake, Hijung Valentina Shin, Joy Kim and Maneesh Agrawala. 2020. Generating Audio-Visual Slideshows from Text Articles Using Word Concreteness. ACM Human Factors in Computing Systems (CHI), Apr 2020. To Appear.
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. "The Stanford CoreNLP natural language processing toolkit." In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp. 55--60. 2014.
Tanya Marwah, Gaurav Mittal, and Vineeth N Balasubramanian. 2017. Attentive semantic video generation using captions. In Proceedings of the IEEE International Conference on Computer Vision.1426--1434.
Adam Meyers, Ruth Reeves, Catherine Macleod, Rachel Szekely, Veronika Zielinska, Brian Young, and Ralph Grishman. "The NomBank project: An interim report." In Proceedings of the workshop frontiers in corpus annotation at hlt-naacl 2004, pp. 24--31. 2004.
George A. Miller. "WordNet: a lexical database for English." Communications of the ACM 38, no. 11 (1995): 39--41.
Sharon Oviatt. 1999. Ten myths of multimodal interaction. Commun. ACM 42, 11 (November 1999), 74--81.
Ken Perlin. "Future Reality: How emerging technologies will change language itself." IEEE computer graphics and applications 36, no. 3 (2016): 84--89.
Martha Palmer, Daniel Gildea, and Paul Kingsbury. "The proposition bank: An annotated corpus of semantic roles." Computational linguistics 31, no. 1 (2005): 71--106.
Steve Rubin, Floraine Berthouzoz, Gautham Mysore, Wilmot Li, and Maneesh Agrawala. 2012. UnderScore: Musical Underlays for Audio Stories. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST '12). ACM, New York, NY, USA, 359--366.
Steve Rubin, Floraine Berthouzoz, Gautham J. Mysore, Wilmot Li, and Maneesh Agrawala. 2013. Content-based tools for editing audio stories. In Proceedings of the 26th annual ACM symposium on User interface software and technology (UIST '13). ACM, New York, NY, USA, 113--122.
Vidya Setlur, Sarah E. Battersby, Melanie Tory, Rich Gossweiler, and Angel X. Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16). ACM, New York, NY, USA, 365--377.
Ben Shneiderman. (1982). The future of interactive systems and the emergence of' direct manipulation. Behavior and Information Technology, 1, 237--256.
Hijung Valentina Shin, Wilmot Li, and Frédo Durand. 2016. Dynamic Authoring of Audio with Linked Scripts. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16). ACM, New York, NY, USA, 509--516.
Vivek Srikumar and Dan Roth. "Modeling semantic relations expressed by prepositions." Transactions of the Association for Computational Linguistics 1 (2013): 231--242.
Arjun Srinivasan and John Stasko. "Orko: Facilitating multimodal interaction for visual exploration and analysis of networks." IEEE transactions on visualization and computer graphics 24, no. 1 (2017): 511--521.
Ivan E. Sutherland. 1963. Sketchpad, a Man-Machine Graphic Communication System. Ph.D Dissertation. MIT, Cambridge, MA.
Anh Truong, Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2016. QuickCut: An Interactive Toolfor Editing Narrated Video. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16). ACM, New York, NY, USA,497--507.
Barbara Tversky, Julie Bauer Morrison, and Mireille Bétrancourt (2002). Animation: can it facilitate?. International journal of human-computer studies, 57(4), 247--262.
Edward R. Tufte. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press, Cheshire, CT, USA.
Kai Wang, Manolis Savva, Angel X. Chang, and Daniel Ritchie. 2018. Deep convolutional priors for indoor scene synthesis. ACM Trans. Graph. 37, 4, Article 70 (July 2018), 14 pages.
Terry Winograd. 1972. Understanding Natural Language. Academic Press, Inc., Orlando, FL, USA.
Haijun Xia, Bruno Araujo, and Daniel Wigdor. 2017. Collection Objects: Enabling Fluid Formation and Manipulation of Aggregate Selections. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). Association for Computing Machinery, New York, NY, USA, 5592--5604.
Haijun Xia, Bruno Araujo, Tovi Grossman, and Daniel Wigdor. 2016. Object-Oriented Drawing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, New York, NY, USA, 4610--4621.
Haijun Xia, Jennifer Jacobs, Maneesh Agrawala. Crosscast: Adding Visuals to Audio Travel Podcasts. In Proceedings of the 33rd annual ACM symposium on User interface software and technology (UIST '20). ACM, New York, NY, USA.
Haijun Xia, Nathalie Henry Riche, Fanny Chevalier, Bruno De Araujo, and Daniel Wigdor. 2018. DataInk: Direct and Creative Data-Oriented Drawing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Paper 223, 1--13.

Cited By

View all
  • (2024)AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675600(1-22)Online publication date: 27-Oct-2024
  • (2024)DrawTalking: Building Interactive Worlds by Sketching and SpeakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676334(1-25)Online publication date: 13-Oct-2024
  • (2024)LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video EditingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645143(699-714)Online publication date: 18-Mar-2024
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image ACM Conferences
UIST '20: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology
October 2020
1297 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2020


Request permissions for this article.

Check for updates

Author Tags

  1. language-oriented authoring
  2. natural language processing
  3. reification
  4. text-based editing


  • Research-article


UIST '20

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25
The 38th Annual ACM Symposium on User Interface Software and Technology
September 28 - October 1, 2025
Busan , Republic of Korea


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)281
  • Downloads (Last 6 weeks)51
Reflects downloads up to 18 Jan 2025

Other Metrics


Cited By

View all
  • (2024)AltCanvas: A Tile-Based Editor for Visual Content Creation with Generative AI for Blind or Visually Impaired PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675600(1-22)Online publication date: 27-Oct-2024
  • (2024)DrawTalking: Building Interactive Worlds by Sketching and SpeakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676334(1-25)Online publication date: 13-Oct-2024
  • (2024)LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video EditingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645143(699-714)Online publication date: 18-Mar-2024
  • (2024)How do video content creation goals impact which concepts people prioritize for generating B-roll imagery?Proceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3664252(542-549)Online publication date: 23-Jun-2024
  • (2024)DrawTalking: Towards Building Interactive Worlds by Sketching and SpeakingExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651089(1-8)Online publication date: 11-May-2024
  • (2024)AI-Generated Media for Exploring Alternate RealitiesExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650861(1-8)Online publication date: 11-May-2024
  • (2024)Elastica: Adaptive Live Augmented Presentations with Elastic Mappings Across ModalitiesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642725(1-19)Online publication date: 11-May-2024
  • (2023)Cells, Generators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language ModelsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606833(1-18)Online publication date: 29-Oct-2023
  • (2023)Soundify: Matching Sound Effects to VideoProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606823(1-13)Online publication date: 29-Oct-2023
  • (2023)CrossTalk: Intelligent Substrates for Language-Oriented Interaction in Video-Based Communication and CollaborationProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606773(1-16)Online publication date: 29-Oct-2023
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


Login options







Share this Publication link

Share on social media