research-article

Open access

An Artists' Perspectives on Natural Interactions for Virtual Reality 3D Sketching

Authors:

Richard Rodriguez,

Brian T. Sullivan,

Mayra Donaji Barrera Machuca,

Anil Ufuk Batmaz,

Cyane Tornatzky,

Francisco R. OrtegaAuthors Info & Claims

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Article No.: 163, Pages 1 - 20

https://doi.org/10.1145/3613904.3642758

Published: 11 May 2024 Publication History

All formats PDF

Abstract

Virtual Reality (VR) applications like OpenBrush offer artists access to 3D sketching tools within the digital 3D virtual space. These 3D sketching tools allow users to “paint” using virtual digital strokes that emulate real-world mark-making. Yet, users paint these strokes through (unimodal) VR controllers. Given that sketching in VR is a relatively nascent field, this paper investigates ways to expand our understanding of sketching in virtual space, taking full advantage of what an immersive digital canvas offers. Through a study conducted with the participation of artists, we identify potential methods for natural multimodal and unimodal interaction techniques in 3D sketching. These methods demonstrate ways to incrementally improve existing interaction techniques and incorporate artistic feedback into the design.

Figure 1:

1 Introduction

3D Sketching in Virtual Reality (VR) is a relatively new medium of artistic expression that allows users to create strokes in 3D space. Sketching in VR allows artists to experiment with mark making off the 2D plane; it allows them to explore how their bodies move in relation to that 3D space; and it allows artists to interact with technology that interfaces with the 3D “canvas”. Artists have already taken to the new 3D sketching medium and have created a wide variety of innovative works. An early example of such explorations is the “Final Spin” from Jen Zen presented at the SIGGRAPH 2000 Art Gallery [115], where Jen Zen used 3D sketching to create human figures. Modern VR Head-Mounted Displays (HMDs) have increased the use of 3D sketching by artists who use this medium in live performances where they sketch immersive paintings that the audience can experience. For example, the artist Anna Zhilyaeva made a 3D interpretation of “Liberty Leading the People” in the Louvre Museum in Paris [116]. Another example is Aura Garden [90], where the audience collaborated to create light sketches in VR. Despite the artistic possibilities that this new medium offers, studies in modern 3D sketching systems have primarily focused on the creation of different types of strokes within the sketching program and controller unimodal interaction for naive users (see Table 1). Although artists have taken to VR sketching to creative cutting-edge works, there is a need for more research to understand and address the specific requirements of artists in this context.

Table 1:

Paper	Participant Type	Modalities	Modality Count	Interaction Type	Software Tested	Sketching Activity	Focus and Conclusion
[31]	Designers	Pen, Pen+Tablet	2	U	Custom	Open sketch	Study focused on usability and concluded that VR is not optimized for sketching.
[51]	Designers	Gesture, Multitouch	2	U	Custom	Open sketch	Focused on usability and concluded that interaction in VR is still challenging.
[52]	Naive users	Pen, Pen+Tablet, Gesture	3	U	Custom	Open sketch, Single stroke	Focused on the app performance. Participants provided a usability report. Authors agreed that the components of VR sketching should be explored.
[86]	Naive users	Gesture	1	U	Custom	Single stroke	Explored one natural user interaction and focused on usability of the application.
[11]	Naive users	Gesture	1	U	Custom	Copy model, Open sketch, Single stroke	Study focused on the usability of the software through one unimodal interaction.
[110]	Naive users	Gesture, Speech+ Gesture	2	U, M	Custom	Copy model	Study focused on the usability of the software thorugh one unimodal and one multimodal interaction.
[56]	Artists	Gesture	1	U	Custom	Open sketch	Explored the usability of VR sketching space and tools for artists. Need more tools to support artists.
Our paper	Artists	Bimanual, Controller, Gaze, Pen, Speech, Controller+ Gesture, Gesture+ Gaze, Gesture+ Speech	8	U, M	Comm-ercial	Open sketch	Explores novel tools that can be beneficial to artist, natural user interactions, whether unimodal or multimodal, that artists prefer when interacting with VR sketching applications, and the usability of commercial VR sketching applications.
Legend:
U - Unimodal
M - Multimodal

Table 1: Overview of previous studies that have focused on unimodal interactions and multimodal interactions.

This paper examines suggestions from trained artists on their ideas around novel input methods for sketching in VR. We examine the potential of alternative solutions for 3D sketching in VR that involve natural user interactions with unimodal input methods such as speech, gestures, gaze, and/or a multimodal combination of these. By understanding the needs of artists in a VR sketching application, we are able to make recommendations so that sketching applications can be designed to meet the artists’ needs. Our goal is to identify novel ways to improve the artist experience when using 3D sketching systems, so they can express themselves better within the virtual space.

Natural user interactions [76, 99] offer several advantages that could improve artist experiences when using VR HMDs. For example, users can engage with virtual environments (VE) more naturally, mimicking real-world actions and communication, which reduces the cognitive load and provides a more immersive and intuitive interaction paradigm [111]. Moreover, multimodal interactions, which combine multiple channels, such as gestures, speech, pen, gaze, and touch, provide further advantages to communicate and interact with computers. For example, the ability to include multiple input channels allows tasks to be tailored to individual preferences and abilities, promoting accessibility [44]. The resiliency offered by multimodal interaction also increases the system’s robustness, ensuring a more reliable interaction even in challenging conditions (e.g., higher workload) [80, 106, 107]. Overall, multimodal interaction amplifies the sense of presence and agency in VR, fostering deeper engagement and enabling a wider range of users to navigate natively and interact within immersive digital spaces seamlessly.

Despite the advantages of using natural user and multimodal interactions, few commercial 3D sketching systems support them. A common problem with these systems, like OpenBrush [35], is that they have to work on the constraints of commercial VR HMDs, e.g., using a controller as an input, or cater to specific populations, e.g., GravitySketch built by designers, for designers motto [92] who want a fast prototyping tool in VR and collaborate with others. However, prior studies involving artists have revealed that to facilitate 3D sketching effectively, it is essential to provide artists with suitable tools [56]. For example, when an artist draws in 3D, they are not just working on a flat surface like they experience in traditional 2D sketching and must try to convey depth, perspective, and complex spatial relationships simultaneously. For example, artist James R. Eads [73] creates portals to imaginary universes, syncing his strokes to the subtle beat of music. Viewers can walk through each portal and experience his vision, but also hear sounds that pulse through strokes drawn using Tilt Brush. While selecting sketching tools, changing colors, or adjusting brush sizes might not resemble the mental model of the users familiar with 2D sketching, the versatility of VR allows users to interact with multiple modes of mid-air interaction, which adds more dimension to users’ creative process. This may allow them to express their ideas more comprehensively and with greater nuance.

When applying natural user interactions to 3D sketching in VR, it might be possible to make the user experience feel more intuitive, seamless, and similar to how artists interact with the physical world by mimicking how they work with physical materials [87]. In VR, for example, artists could use gestures, voice commands, or stylus input based on their preferences, reducing the need to learn complex menus and commands. Moreover, multimodal interactions can simplify the user experience by allowing artists to choose the interaction method that feels most natural to them. When interaction methods mimic real-world actions, the mapping between user intent and system response becomes more intuitive and requires less training in the new environment, e.g., using gestures that resemble physical actions or voice commands that directly describe what you want. Furthermore, artists are not limited to one type of input device, so they can adapt their approach based on what feels most natural and effective for each stage of their creative process. Combining gestures, voice commands, and different input devices would allow them to capture their ideas fully, translating their creative vision into a more accurate digital representation [93].

To examine how these interactions can help artists, we conducted semi-structured interviews [60] with artists from the local university’s art program to explore how natural and multimodal interaction techniques might improve the 3D sketching process in VR. Semi-structured interviews are centered around a topic, where the interviewer asks open-ended questions, and the interviewee’s answer reflects their personal experience, allowing the interviewer to gain a deeper understanding from the interviewee’s perspective. In the context of our study, this approach enabled us to understand their perspective on 3D sketching and tools. This is particularly relevant, considering that artists may anticipate experiencing a seamless transition from 2D sketching into a 3D sketching system. The truth is different, as depth perception [12] and dependency on spatial abilities [13] are issues that plague users inside an immersive 3D VE. These are just a few of the issues that affect all users, regardless of background, when using VR. Therefore, to improve existing 3D sketching systems, we collected data from the artists’ perspectives, to inform us of their needs and offer recommendations.

In this paper, we extend previous work on multimodal interaction [62, 106, 107] from simple [106, 107] to complex 3D environments [117] by proposing novel multimodal interaction techniques for 3D sketching. We also extend previous work on the advantages of using different input types simultaneously [31, 51, 52] to incorporate the artist’s perspective. Our results suggest implementing multimodal and unimodal natural interaction techniques in future 3D sketching applications to create a more comprehensive and immersive experience for artistic users. The real-world usability of these proposed interaction techniques and features can be studied and evaluated to refine the techniques further. Our findings can aid in designing 3D sketching and other VR art applications. They may be incorporated into other domains of VR, such as annotation in immersive analytics or work in architecture and interior design. Our contributions are:

•

A study, using semi-structured interviews, that asks artists their opinions on using natural unimodal interactions and adding multimodal interactions to 3D sketching systems. We found that the way artists interact varies from one individual to another, therefore, having additional unimodal interactions and adding multimodal interactions to 3D sketching will allow artists to use natural interactions that they are used to.

•

A study on how artists evaluate the usability of a commercial 3D sketching system. We found that Open Brush [35] was rated above average by most of the artists for the two tasks that were assigned during the study.

•

Recommendations for developers and designers of future 3D sketching applications and possibly other applications that have properties in common (e.g., annotation in VR). These recommendations include other unimodal and multimodal interactions to cater to the needs of each artist. Other recommendations include tools that were suggested by the participants which will aid in minimizing the artist’s workflow.

2 Related Work

2.1 3D Sketching

3D sketching as formally defined by Arora et al. [6] is “a type of technology-enabled sketching where: (i) the physical act of mark making is accomplished off-the-page in a 3D, body-centric space, (ii) a computer-based tracking system records the spatial movement of the drawing implement, and (iii) the resulting sketch is often displayed in this same 3D space, e.g., via the use of immersive computer displays, as in virtual and augmented realities (VR and AR)” (Arora et al. [6], p. 149). This way of sketching is flexible and fast [102], and is intuitive for 3D input [49, 98]. Due to these advantages, several companies have released applications that enable users to sketch and design in 3D such as Tilt Brush [39] (now open-source OpenBrush [35]), Gravity Sketch [92], and Quill [66]. These examples of commercial 3D sketching software have made 3D sketching available in various disciplines, including art, modeling, filmmaking, architecture, visualization design & research, medicine, and cultural heritage [100].

Despite the advantages of 3D sketching, correctly positioning a stroke in 3D space is challenging, as users are affected by high sensorimotor [105] and cognitive [13, 79] demands, depth perception issues in stereo displays [12, 16, 17], and the absence of physical support [7]. Previous work has studied the control and ergonomic aspects of sketching in mid-air [7, 57] and the learnability of 3D sketching [13, 105] to identify the cause(s) of these positioning inaccuracies. Other work has studied the advantages and disadvantages of 3D sketching as a medium for creativity and design by comparing it against pen-and-paper [47]. Finally, previous work has also studied how 3D sketching affects the act of ideation [32, 112]. Here, we aim to understand the expectations of artists for 3D sketching, focusing on their needs for alternate unimodal interactions and multimodal interactions within 3D sketching.

2.2 Interaction Techniques for 3D Sketching

Since the early 1990s, previous research has proposed multiple novel interaction devices and techniques for 3D sketching [10]. The devices include pens [33, 46, 85, 101] and physical surfaces [31, 52] that aim to provide a surface to draw on. For example, Elsayed et al. [33] demonstrated that active haptic feedback reduces errors in VR when a physical surface is not present. The interaction techniques, like virtual surfaces [8, 11, 61], beautification [11, 34] or novel metaphors to create strokes [50, 52, 86] aim to reduce the sensorimotor and cognitive demands of 3D sketching. For example, Barrera Machuca et al. [11], via Multiplanes, empowered participants by assisting them in sketching with snapping and beautification of strokes, which reduced the participants’ cognitive and sensorimotor demands. Finally, another approach uses visual guides to improve the user’s shape accuracy [14, 41, 97, 113]. One limitation of these approaches is that they mostly focus on unimodal interactions that use either controllers, gestures, or gaze.

We were able to identify previous work focusing on unimodal and limited multimodal interactions for 3D sketching systems (see Table 1). Some previous research outside of VR focuses on the effect of multimodal interactions on creativity [118] or user experience [110]. There has also been a lot of work that uses multimodal interactions for 3D modeling using CAD systems [20, 25, 74, 91, 96]. For example, participants in the study conducted by Wolf et al. [110] reported that using multimodal interactions, instead of unimodal interactions, allowed them to feel a higher state of presence. Another example is VR-CAD, in which Bourdot et al. [20] reported that using natural interactions allowed the participants to intuitively manage CAD objects, minimizing complications that are commonly expected of CAD applications. The advantages of using multimodal interactions for design in VR include: a higher sense of flow, higher intuitive use and lower mental workload, and a higher sense of presence [110]. They also provide similar creativity levels to unimodal interactions [118]. Due to the advantages they provide, it is important to understand how to add or incorporate multimodal interactions in 3D sketching systems.

2.3 Multimodal Interaction

Multimodal interactions are the combination of multiple input types like gestures, speech, pen, gaze, and touch. The combination of these inputs can have three properties: synchronous versus asynchronous, symmetric versus asymmetric, and dependent versus independent [63]. Synchronous interaction is one where the user can perform multiple interactions at the same time, whereas, in an asynchronous interaction, the actions do not need to happen at the same time. An example of synchronous interaction would be selecting an object while using speech to tell the system to change colors at the same time. With asynchronous interaction, one could select an object and then give a spoken instruction to change the color after the selection was made, but not at the same time. Symmetrical interactions are usually bimanual in nature, and the actions on one hand mirror what the other hand is doing; asymmetrical interactions do not have to mirror what the other hand is doing and thus act independently of each other. An example of symmetrical interaction is when one is painting a mirror image, like painting the wings of a butterfly with both hands. Similarly. asymmetrical interaction is, e.g., when one is painting and one hand has a menu palette and the other has a brush, so both hands are performing different tasks. A dependent interaction is one where an interaction depends on the other to accomplish a task, such as hands working in tandem with one hand controlling the color palette while the other hand controls the brush. In contrast, the interactions involved do not rely on each other in an independent interaction. An independent interaction could be when both hands can act as brushes and each can be used to draw, regardless of one another.

Researchers have continued investigating multimodal interaction since the work from Bolt [19]. Another important work by Hinckley et al. looked at Pen+Touch and described what type of interactions were possible [45]. Multiple studies have also examined multimodal gesture and speech inputs using mid-air gestures [5, 24, 43, 64, 71, 106]. For example, using gesture elicitation, Williams et al. [106] showed that multimodal interactions are essential to interact with augmented reality (AR) HMDs in a natural way. Yet, some have examined only a subset of gestures, such as 2D gestures (e.g., multitouch) [69, 84] or paddling gestures [48]. While multiple studies using gesture + speech interactions have been studied, they have concentrated in 2D environments or 3D environments using desktop displays, with less work in AR/VR [82, 107]. It is possible to find multimodal interaction examples, such as Internet of Things home controls [54], 3D computer-aided design in a 2-dimensional environment [58], and web browsing on televisions [71, 75, 108]. For example, Wittorf et al. [108] found that users preferred certain mid-air gestures when interacting with a wall-display. The larger amount of work has been in multimodal gesture and speech fusion and recognition [19, 24, 53, 81], although some of them have used limited gesture sets [26] or limited speech dictionaries [64]. Overall, the research conducted thus far has tested input feasibility and human adaptability and created more intuitive and discoverable interaction sets [109], yet the type of inputs are limited, without clear transferability to more complex applications.

3 Motivation and Research Questions

In previous studies, researchers collaborated with designers to evaluate the usability of novel VR interaction systems [4, 98]. Similarly, previous research also included designers in exploring new techniques or input devices for sketching in VR [31, 51, 59, 89]. For example, Drey et al. [31] did a usability walk-through with six participants to understand the design space between 2D (pen on a tablet) and 3D input (6 Degrees-of-Freedom (DOF) pen) for 3D sketching. Yet, few works have focused on the experiences of artists when using 3D sketching systems [40, 55]. For example, Keefe et al. [55] studied collaboration and visualization in VR sketching and found that the sketching system lacked the tools needed for artists to capture their intended designs. Also, to our knowledge, there have been no studies where the artists evaluated commercially available VR sketching applications (see Table 1). By filling this gap, we aim to ensure that new tools align with the creative processes and expectations of artists, which will allow this community to be more active in the space.

Our research follows previous approaches to understanding adding input types to a 3D sketching system. First, this paper explores novel ways to use the current tools available in commercial software using unimodal and multimodal natural interaction, e.g., using any combination of gestures, speech, eye gaze, pen, and controller. Second, the paper aims to identify new tools that could be added to a 3D sketching system and that can benefit from these novel input methods. Using perspectives from artists, we investigate the following research questions:

•

RQ1 What tools of commercial 3D sketching systems help artist in their sketching process?

•

RQ2 What natural multimodal interactions can 3D sketching systems add to help artist in their sketching process?

•

RQ3 How do artists perceive the usability of commercial 3D sketching systems?

While RQ1 investigates the identification of the tools of commercial 3D sketching systems that help artists in their sketching process, RQ2 explores new unimodal and multimodal natural interactions for current tools and new features for current commercial systems that fulfill their needs. Finally, RQ3 examines usability deficiencies of 3D sketching systems from the perspective of artists. Artists and designers have distinct priorities. Unlike designers who emphasize performance and speed, artists concentrate on the creative process and achieving the final result. By identifying novel unimodal and multimodal natural interactions, designers of future 3D sketching systems can create better tools considering various use cases and go beyond using a controller as an input method.

4 User Study

4.1 Methodology

Participants. For the study, thirteen participants (8 females and 5 males) studying art at the local university were recruited. Their ages ranged between 20 and 28 (M = 22.4, SD = 2.4). Eleven participants had previously used AR/VR before. Seven indicated that they had their vision corrected, two through the use of glasses, one through contact lenses, and the other four did not specify. Two participants had previously experimented with Tilt Brush (now called OpenBrush) in VR. Two participants were double majors (computer science and fine art), while the rest were specifically fine art majors. Our study was limited to artists, because they have experience working in fine arts from their classes and studio practice, giving the study a population closer to that of established artists as compared to naive users. All participants were either enrolled in or recently graduated from a Bachelor of Fine Art (BFA) degree. The BFA program requires foundational class work that includes coursework in drawing, painting, sculpture, and digital media. Additionally, the pre-survey questionnaire allowed for the participant to volunteer information, such as specific applications they had worked with (e.g., ZBrush, Maya, Blender, Cinema 4D, Autodesk 3DS Max), but none of our participants volunteered that information.

Equipment. The 3D sketching program was run on an Alienware Aurora R14 desktop equipped with an AMD Ryzen 9 5900 12-core processor running at 3.0 GHz, with a total of 32 GB of system RAM and an NVIDIA Ge-Force RTX 3080 with 26 GB of onboard memory with the GPU running at 1710 MHz. The desktop ran Microsoft Windows 11 Home (version 10.0.22621, build 22621). The participants used an HTC Vive Pro Eye with two controllers and two lighthouses to access the 3D Sketching system. Finally, a Go-Pro Hero 7 with a 128 GB memory card was used to record the interaction of the participants during the study. For the 3D sketching application, we used a fork of OpenBrush v2.3.0 [35]. OpenBrush was run through Unity 3D, version 2019.4.25f1 (as recommended by the contributors). As the participants drew in the 3D Sketching program, their drawings were recorded by Unity’s Recorder to capture the participant’s perspective from the HMD.

Figure 2:

Figure 3:

Procedure. Upon arrival, each participant followed a series of tasks, mentioned here and in Figure 4. The participant first signed three forms: a vision attestation form, the consent form, and a pre-survey questionnaire (including their demographics and any prior VR experience). They were informed of what a semi-structured interview is, and that this study uses semi-structured interviews to collect data. The participant then watched a video tutorial that showed the basics of using OpenBrush in VR ¹. After the video, the participant was fitted with the HMD and controllers to repeat the basic operations they had just seen in the OpenBrush video tutorial, allowing them to practice by replicating what they had just watched. When the tutorial was finished, the participant removed the HMD and controllers. Next, the participant watched another video that explained the different types of unimodal and multimodal natural interactions and their categories ². After the second video and before starting the study, the researchers allowed the participants to ask any questions, but none of the participants had questions on the procedure.

For the first part of the study (Phase One), the participant was fitted with the HMD/controllers and was tasked with drawing a 3-dimensional dog using any tool available in the 3D sketching application. The participant had a 2-meter by 2.1-meter rectangular space, free of obstacles, in order to sketch freely in OpenBrush. Phase One ended after 10 minutes, at which time the participant was given the choice to take a 2-minute break or continue directly to Phase Two. In Phase Two, the participant was fitted with the HMD/controllers (if the 2-minute break was taken) and was tasked with drawing a ground, a path, and a tree. Just as in Phase One, they were allowed to use any tools they liked but had a 15-minute time limit. Participants could add additional constructs to the scene as long as they had drawn the ground, the path, and the tree, and the time limit had not been reached. Some of the completed works can be observed on the right-hand side of Figure 2 and Figure 3 with the corresponding participants appearing on the left. After removing the HMD, the participant was asked to complete a System Usability Scale (SUS) [9, 65]. Afterward, a post-study interview (see the supplementary materials for the interview questions) was done, where the participants told the researchers about their experiences. Finally, participants were offered class credit or a $20 Amazon gift card for their time. In total, the entire study lasted around 57 minutes.

Data Collection. Each participant’s movements in the physical space during their sketching session were recorded using a Go-Pro camera (Figure 2 and Figure 3). The camera was fixed with an overview of the sketching area. The recording of the session began after participants watched the video tutorial about the multimodal interaction. After the video tutorial, we asked participants if they had any questions about the video and the task, but none of the participants had questions. Then, participants were asked to follow the “think-out-loud” method [77] while sketching to help understand their thought processes as they drew specific elements. Moreover, the study researcher periodically asked the participants if multimodal interaction techniques would assist with the participant’s current task. This occurred every time the participant switched to something “new” or after asking for assistance about how to navigate the system. If the participants asked for assistance, the researchers provided verbal help to resolve the issue and followed up by asking if an alternative interaction technique could have aided in accomplishing that task or prevented the issue. Sometimes, participants did not have a response to follow-up questions. To allow researchers to examine the participants’ actions while suggesting other unimodal or multimodal interactions, the screen of the PC running OpenBrush was recorded.

Following a completed participant session, the video/audio recordings from the Go-Pro camera were synchronized with the headset recordings from Unity. This allowed a simultaneous analysis of the participants’ real-world motions and what they saw in the virtual world. The audio recordings were also automatically transcribed using the Microsoft Word Web App’s transcription feature [67]. Each of the two authors reviewed half of the transcripts to fix transcription mistakes. When necessary, corrections were made to the transcriptions using the Go-Pro recording.

Figure 4:

Data Analysis. Following an approach inspired by Braun and Clarke [22, 23], this study uses researcher reflexivity as a pillar of the thematic analysis. Because of this epistemological and ontological position, researchers were able to avoid measuring inter-coder agreement. The agreement poses the existence of a researcher “bias” and tries to minimize it (as well as performing consensus coding), anchored in the belief that there is an objective way of coding and that this objective method is more desirable. Instead, researchers in this study recognize the situated nature of coding and its inherent partiality and subjectivity [28].

Three researchers conducted a qualitative analysis of the interviews. Two of these researchers (or, more specifically, coders) ran the user study and were familiar with the data. The third coder had previous thematic analysis experience and helped the lead coders through the process. Two coders were male, and one was a female. Two coders had undergraduate degrees in Fine Arts, either in animation and digital art or in film/cinema production. One had formal training in drawing and sketching, and the other had over 14 years of experience in 2D art. The third coder did not have formal training in drawing or sketching.

Of the two researchers who led the user study, one coder was assigned seven interview transcripts, and the other was assigned six. Each transcript was assigned to the coder who originally conducted the interview. This assignment leverages familiarity with the data as key to analysis [18, 21]. The two coders used a template with columns for transcript excerpts, codes, and comments. The coders were further familiarized with the data by re-reading their transcripts and taking notes. Individually and inductively, they coded their transcripts to create a system to encode the data while keeping a list of this encoded data and their descriptions to track their own process. Then, they shared the coded data and discussed the construction of themes. The themes were refined in conversations among two coders who conducted the study and then proposed to the third coder for further discussion. For this final part, the third coder participated in the discussions and helped define the final themes. The lead coders met five times and three additional times with the third coder. Ultimately, the themes were proposed to the rest of the team for further discussion.

5 Findings

5.1 Qualitative Findings using Thematic Analysis

The analysis characterizes artists’ expectations about what features 3D sketching applications should have. The questions focus on identifying the tools of commercial 3D sketching systems that help artists in their sketching process (RQ1) or integrating unimodal and multimodal natural interactions into 3D sketching systems (RQ2). Recognizing that the participants are art students at Colorado State University, our results account for this population, which has specific cultural expectations of design tools [94, 103]. Our data indicates a familiarity with complex desktop tools, yet not enough experience with 3D sketching. The research indicates that the participants (i.e., artists) wanted to improve current features and add other input modalities. It also shows that artists expect 3D sketching systems to have more features than other design tools. The following section further develops the paper’s themes to describe the requested features and input modalities artists suggested for 3D sketching systems.

Figure 5:

5.1.1 Alternative Modalities to Current Features.

Our data indicates that the study’s participants identified the need to remap current tools in the 3D sketching system tested (OpenBrush) to novel input methods. This remapping does not modify the existing functionality of the tool but rather a way to control it. We grouped these suggestions into three main categories: brush, object interaction, and menu ( Figure 5.) The Brush category includes any interaction that affects the brush style. The Object Interaction category includes any action that selects or manipulates the object/stroke of the drawing. Finally, the Menu category includes choosing a tool or doing an action from a menu.

Brush. In 3D sketching systems, the brush tool is fundamental for users to create new strokes by moving the VR controller in space. Interestingly, most participants did not mention changing the input method to draw strokes. Only two participants suggested other ways to create strokes. P10 mentioned that a physical, real-world pen would be a useful interaction method to accomplish the same functionality as the controller. P2 mentioned using a gesture plus the controller to redraw strokes by selecting a stroke and adding vertices to it. P2 described this as, “adjust it [...] like grab [...] certain [...] parts of it like I can grab this middle part like by selecting it and [...] use my hands [...] to like stretch it in the way that I want it to look.” This interaction is known as redrawing [6], and is present in applications such as Adobe Illustrator [1] and Adobe Photoshop [2].

One important aspect of the brush tool is its characteristics of a stroke drawn by moving the controller. In most 3D sketching systems, these characteristics control a stroke’s color, texture, and width. Users change brush’s characteristics via settings found in a menu that sits on the opposite controller’s virtual menu system in the 3D space. For artists, access to changing the brush’s settings could be improved through gestures. Yet, among the participants, there was no consensus on which gestures to use. P12 suggested natural gestures like swiping left or right, “if there is a type of motion where I can just like maybe like swipe like a certain way to like just like change brushes.” On the other hand, P8 suggested wrist movements, describing, “maybe a wrist flick to be able to change between the two brushes.”

Object Interaction. Unlike traditional 2D sketching with pen-and-paper, 3D strokes exist as objects in space that the user can manipulate (e.g., translate, rotate, and scale). Users can also manipulate other objects inside the environment, like drawing guides. Most 3D sketching systems allow users to manipulate these objects using one- or two-handed interactions with the controllers. Interacting with objects is an important task for artists, whether moving the object or affecting it by changing its properties. Participants suggested manipulating objects with other input modalities, such as gesture, speech, gaze, or bimanual interaction.

For unimodal input methods, participants who suggested using gestures mentioned the need for more natural interactions with the hand. One example of this is P2, who said that if it were possible to “grab this middle part like by selecting it and like use my hands or something like that to like stretch it in the way that I want it to look.” P2 stated that this method would be preferable to using a controller to scale the stroke. Other participants also wanted to use their hands, but in a bimanual interaction. For example, P3 mentioned that if “you could kind of use both hands to, like, grow a selection around something from a distance.” The participants suggested other input modalities, like speech and gaze, to make the interaction faster. For example, P13 wanted the ability to use speech to “select everything and all of the dots I’ve drawn,” and P7 mentioned “if I was looking there and I could just kind of grow a selection where I was looking.”

The participants also suggested multimodal interactions for object manipulation. Examples of proposed multimodal interactions include merging gesture and speech. An example of this is P2’s suggestion to use gesture and speech to delete strokes, “I could probably like point at it and like tell it to erase it.” Also, while attempting to select strokes, P1 mentioned that gesture and gaze would be a good way to manipulate strokes, “I feel like that would be a gaze with [...] my hand gesture.”

Menu. Accessing the menu is important to reveal all the tools available to participants. The menu allow users to modify the properties or characteristics of the strokes in the 3D environment, like changing colors, textures, or brush width. P9 and P13 suggested extending the current way to switch between tools or properties; P9 wanted to continue using the controller to alternate between tools by “double click[ing] on a button to go back to [the] previous tool.” Similarly, P13 did not want to switch to a different input modality but instead wanted to use a different combination on the controller to switch colors. P13 demonstrated such action to the researchers by tapping on the controller trackpad. While both participants preferred the controller for the current unimodal input, their methods for switching between tools differed slightly.

Other participants felt comfortable using multimodal inputs to interact in the environment. P2 wanted to use a combination of gesture and speech to erase strokes in the environment. In using gesture followed immediately by the verbal command “tell it to erase it,” P2 hoped to minimize accessing the menu multiple times - one time to perform a selection, and the second to access the erase feature from the menu. In contrast, P1 wanted to minimize the time needed to access the menu when duplicating strokes. Duplicating strokes involves selecting the strokes that will be duplicated, followed by another menu command to duplicate them. P1 hoped to save time by looking at the strokes that needed to be selected. Then, while doing a circular motion on the controller with the “hands and then I used the gesture right here” to duplicate the strokes. Both participants wanted to save time by minimizing the number of times they needed to access the menu to perform common tasks. Accessing the menu multiple times would have distracted the participants, but multimodal inputs could have allowed them to focus on the task at hand.

Table 2:

Application

Detailing

Filling

Generating

Beautification

Stroke Splitting

Sculpting

Moving

Erasing

Shortcut

Tool Selection

Grouping

Selection

Animation

Open Brush

Yes •

Gravity Sketch

Yes

Yes •

ShapesXR

Yes

Yes •

Yes

Yes •

Yes ▲

Paint 3D

Yes

Yes •

Yes

Paint.Net

Yes

Photoshop

Yes

Blender ⊛

Yes

Legend:

• Unimodal via the VR controller only.

VR Applications

▲ Limited animation when hovering over an object.

Desktop Applications

⊛ This application is a desktop system but, offers limited VR support.

Desktop/VR Application

Table 2: A comparison of the features across various commercially available desktop, VR, and a hybrid desktop/VR application.

5.1.2 Proposed Features.

Some of the participants’ suggestions on new functionalities are not currently available in OpenBrush. We also examined various tools and 3D drawing software available in the market, including Open Brush, Gravity Sketch, ShapesXR [29], Paint 3D [68], Paint.Net [30], Photoshop, and Blender [37] ( Table 2), and could only identify one solution that met the suggestions of the participants in Blender, which provides basic functionality for manipulating objects [36] in VR. We grouped these suggestions in five main categories, creation, manipulations, menu, selection, and animation (Figure 6 and Figure 7), and discussed them in detail below. The creation category is for creating objects, other than strokes, in the environment. The manipulations category allows the participant to alter the appearance of a stroke by splitting it, sculpting it, moving it, or erasing it from the environment. The beautification feature takes a non-straight line and ties all the points together into a perfect line. The proposed menu category would provide access to a menu or a set of sequential commands. The selection category would allow selection through other input modalities, such as speech, and grouping of multiple strokes via the controller. The animation category proposes a simulation that is composed of interactions between objects, and this simulation keeps repeating.

Figure 6:

Creation. While users can manipulate objects via the standard translation, rotation, and scaling, adding additional details, such as texture, is not a feature that is currently available in the application. P1 would have preferred to alter a selected stroke to reflect a particular aesthetic vision. P1 wanted to create a specific texture, but could not do so due to the current limitation of the software. Another aspect was that 7 participants were interested in turning strokes that resembled a shape into a perfect geometric shape. Artists commonly use applications, such as Adobe Photoshop and Blender, to create geometric shapes from drawings. In Notability [38] for the iPad, this feature is known as perfect shapes, where the application, based on a machine learning model attempts to approximate the shape that the user is drawing and creates a perfect shape, replacing the user’s drawing. This technique is also known as beautification. The approach for beautification differed slightly among the participants who proposed the feature. P4 suggested using speech to generate a 3-dimensional flat circle, not a sphere, by saying, “large circle, or something like that.” On the other hand, P11 wanted to use speech to generate objects, but in this case, P11 wanted to generate full 3D shapes, such as a sphere or a cube. Furthermore, P11 wanted to be as specific as possible on where the 3D shape had to go by saying “I want this on [...] the Z plane or the Y plane.” While the requests were similar, generating the requested shapes differed slightly. In contrast, P9 was interested in generating custom shapes. P9 wanted to generate fur on the side of the dog by issuing the verbal command, “generate [fur] all over the surface.”

Participants P8 and P11 (who use digital drawing applications) were interested in not only generating shapes but also filling the surface created by strokes or filling the volume. P8 and P11 agreed that filling the surface created by strokes was important. They differed in the object that was being filled. While painting the grass, P8 suggested a “fill feature so I could [...] connect a line here and then use a paint bucket to fill this all green would be interesting.” In contrast, P11 wanted to perform the same function but to fill the surface of a pre-made shape. In extending P8’s request, P12 wanted to fill the surface of any surface, regardless of the number of strokes that the object was made of. One observation is that the three participants (i.e., P8, P11, and P12) wanted to use only speech for the fill feature. However, P13 wanted a similar function by using gestures. When attempting to fill the volume of an object, P13 mentioned that “you could like make the shapes [...] come in filled” by gesturing towards the object. While speech and gesture were the most common inputs, the preferred unimodal input was speech. Interestingly, two participants, P1 and P9, mentioned being assisted by artificial intelligence (AI), such as P1, after drawing a dog, wanted “kind of AI generated to give you this.”

Although the 3D sketching application allows participants to use their dominant hands to draw, it is limited by not allowing both hands to select strokes or draw. P8 would have liked to spread both arms to select all strokes that appeared between them from the headset’s perspective. Instead of using both hands to control the selection, P4 wanted to use the non-dominant hand to control the size of the stroke being drawn by the current brush. In the current system, the stroke size can be controlled by the dominant hand by swiping left or right on the controller trackpad but not by the opposite controller. In contrast, P3 wanted to be more involved in the drawing by using both hands (bimanual) to draw independently. While there was a disagreement on how they would use both hands to affect their drawing, the participants mentioned they would have benefited from using bimanual interaction to advance their drawings.

Manipulations. Artists may start with mental images of what they envision, but they may modify their visions as the drawing progresses. In order to allow for modification, participants proposed manipulating strokes using a set of inputs that includes beautification, stroke splitting, sculpting, moving, and erasing features. The beautification of shapes was previously mentioned, but one participant wanted the beautification of single lines. P2 wanted to turn a stroke into a straight line by speaking “make the line straight” through the microphone (i.e., speech). P6 found it difficult to create a flat surface to draw the path and thus wanted the controller to have the ability to create a flat surface in the environment. P8 wanted to use straight lines. Unlike P2, however, P8 did not want a stroke to be beautified into a straight line, but rather wanted the application to draw a straight line.

In 2D, adjusting a stroke could be done by splitting it or removing part of it. In the tested application, a stroke can be removed or left as-is, but it cannot be split. P7 mentioned that erasing “the whole stroke and not just like individual parts of the stroke” was inefficient, as the participant would need to account for additional time to create new strokes by having to erase the current stroke, then creating two additional strokes to give the appearance of a split stroke. To resolve that, P2 suggested splitting a stroke by saying “pull it apart” while using a gesture, issuing a verbal command by saying “split this line,” or using a slicing gesture on the stroke.

Some branches of fine arts, like sculpting or even painting, can require artists to use their hands when working with clay or clay-like materials. P10 and P11, who enjoy sculpting, would like to see sculpting offered in future releases of OpenBrush. P10 wanted to use pre-made geometric shapes with the volume inside them filled to “just start kind of like sculpting” from the outside and working towards the inside. When asked if there was a preference between drawing and sculpting, P10 responded by saying that using hands for “sculpting [...] would probably be even more preferable.” It is clear that the participants were trying to associate previous knowledge from real-life sculpting to sculpting in VR.

Finally, six participants wanted better control of the strokes or an alternate way to remove them. In the current version of OpenBrush, to select a stroke, the user has to make contact with the controller and the stroke. Instead of walking to a stroke to select it with the controller and then move it to another position, P11 wanted to “point at something and say like or just like being able to point to something and grab it,” as in using ray-cast pointing to select strokes that were far away. P11 also wanted to use ray-cast pointing to highlight an object to either verbally tell the application to select it or grab it with the controller and then move it to a more suitable location. Similarly, P4 wanted to be able to erase a stroke by just “point[ing] at it and like tell it to erase it.” In the case of these two participants, a multimodal interaction would have been suitable to accomplish their goal.

Menu. As each participant had taken at least one digital art class, they had experience using application interface menus. Although some applications on the desktop support accessing menus via speech, the tested VR 3D sketching application did not. P11 wanted to access the tools in the menu employing speech by merely “say[ing] the name” of the shortcut corresponding to the menu. From the participant’s view, a shortcut, just like the shortcuts found on popular applications like Adobe Photoshop, allows the participant to reach a tool or an action by skipping several menus, thus saving time. When painting on a 2D digital canvas like Procreate [88] on an iPad, an artist can use a side palette to test out the brush size and color before using it to digitally draw with. While the tested application allows the participant to change the size of the controller by swiping left or right, P8 suggested a different method to access the tool by pressing on the controller trackpad rather than swiping left or right. The reasoning behind this, as P8 explained, is “to make that be a part of the trackpad, because it is a little bit choppy.” As P8 was swiping on the controller, the location of the controller in the VR environment was constantly drifting. At the same time, P8 suggested removing the menu on the non-dominant hand. The head rotation required to look at the non-dominant menu hand and select a different tool was described as distracting. P8’s reason follows: “when I have to stop and find this button, I mean it is not that hard to find, but some way that you could swipe up on the trackpad and open a menu would be, I think, a little bit more efficient.” A pop-up menu close to the dominant (or drawing) controller would have been more efficient by minimizing the time needed to rotate the head.

Selection. An important aspect of 3D systems, such as OpenBrush, is the ability to select specific strokes or a group of strokes. Selecting strokes allows the user to erase or duplicate a single stroke or multiple strokes, which minimizes the time the user has to spend to erase or duplicate them. P3 would have liked to select strokes by using a bimanual interaction, like a T-pose, where the distance between the hands hands would indicate the range of the desired selection. Another way the same participant wanted to do a stroke selection was by using speech. P4, P7, and P9 agreed on using speech to select all the strokes in the environment by saying “select all.” P8 suggested two different methods: using a dedicated button on the controller, which P9 agreed on, or using a combination of speech and gesture. Stroke selection would “probably use gaze,” according to P12, who was asked which modality of interaction would be preferred for selecting strokes. P13 felt that speech would be useful in selecting all the strokes by echoing the command, “select everything,” which would group all the strokes in the environment.

Figure 7:

Animation. While the tested application (OpenBrush) allows participants to showcase their creative side, animation is not supported. Some brush effects perform an animation as part of their texture, but the participant does not have any control over this animation. P1 wanted to create a custom animation that kept repeating itself: the effect of lightning coming out of bubbles. While this could not be created, due to the limitation of the software, P1 said that it “would be nice” if that feature existed.

Multimodal Features. Multimodal interaction refers to an interaction that involves two or more input modalities being used to accomplish a task in the system (see Figure 7). For example, a participant may want to point to a stroke and say delete. For selection, P8 was the only one that suggested using a combination of speech and gesture. When grouping the features into common categories, it was found that participants in our study mostly proposed multimodal interaction techniques for creation tasks.

Multimodal Creation. Participants proposed specific features for filling shapes or objects with colors or textures and generating shapes and objects. As with the unimodal case of this feature category, these features were grouped under “Creation” since they would involve creating additional content in the VE. Unlike the Creation category for unimodal interactions, however, no detailing or drawing features were proposed for use with multimodal interaction techniques.

Filling. Participants also expressed the desire for OpenBrush to allow them to fill the inside or surface of an object or shape. Although some proposed techniques for accomplishing this involved unimodal interactions, others proposed multimodal interaction techniques. P2 proposed multimodal interaction technique, to point at an existing object and then use speech to fill it with color or texture. To fill in a tree, for instance, P2 described, “Pointing at it, telling it [...], ‘Fill this tree up with green.’ ” This approach entailed drawing some kind of outline to indicate the tree, which P2 said could possibly mean drawing the wireframe for the object. It was not clear whether P2 meant creating a wireframe mesh, as is found in 3D modeling, or simply drawing an outline of the object and then specifying that it should be filled. Filling shapes/objects was also proposed by P10 to be accomplished through the coordinated use of the controller, a pen, and gesture. This interaction technique was focused primarily on texture and would involve selecting the drawn outline of a shape/object with the controller and then using the gesture and pen in undefined ways to fill the object with a desired texture.

Figure 8:

Generating. During drawing tasks, participants wanted to be able to generate objects and shapes in OpenBrush. As described previously, some of the proposed interaction techniques for this desired feature only involved unimodal interactions. Other proposed interaction techniques for generating shapes and objects involved multiple modalities working in tandem. This sometimes involved a combination of full-sentence speech and pointing. When asked if an alternative interaction technique could help create the ground, P2 wanted to “Point at, like say, two points [...] and say, ‘Make a square.’ ” P2 further elaborated this proposed interaction technique by pointing to two separate points, such as the opposite corners of a square, followed by the verbal command to make a square, and the system will use those 2 points as a reference and create a square. Meanwhile, for such 3D objects as cylinders, P2 said that pointing at two points could specify the top and bottom of the object. Further details in defining the dimensions of the shapes and objects were not provided by P2. P2 also proposed generating more complex objects at a specified location by pointing and simply saying to generate this. One example given was to “...point at, like, a certain point within, like, the bark of the tree and [...] tell it to sprout a branch.’ ” Alternatively, P13 proposed using a combination of controller, gesture, and full-sentence speech to generate shapes. This interaction technique would use speech to say, as P13 described, “Make me a circle,” and then gesture could be used to specify where to place the shape/object while the controller would be used to control the other attributes of the shape/object, such as the size.

Because many of the proposed multimodal interaction techniques involved speech commands, implementing these interactions would involve accurate speech recognition that can also incorporate the context provided by the other interaction techniques. For instance, when pointing at an object and using speech to fill it with color, the system will need to recognize what object is being pointed at and connect that to the spoken instructions. Due to some aspects of the proposed interaction techniques being vaguely described by participants, future work would also involve identifying what kinds of gesture, controller, or pen actions would be necessary to make these multimodal interaction techniques effective and satisfying for users.

5.2 Quantitative Findings using the System Usability Scale

Following the need to run 3D sketching evaluations with stable systems [15], the usability of the 3D sketching system was evaluated by having participants answer the SUS questionnaire. A post-study interview was conducted to get participants’ opinions on the current 3D sketching system. (See the supplementary materials for the interview questions.) Participants rated the system positively, as shown in Figure 8. The overall average score was 77.01 (SD = 12.5), corresponding to letter grade B, showing the system’s usability is above average. When looking at the individual statements, participant statements indicate usability. One participant’s evaluation with an average score was 4.15 (SD = 0.99) stated, “I think that I would like to use this system frequently.” Another evaluation with an average rating of 4.08 (SD = 0.76) declared, “I thought the system was easy to use.” Both respondents scored the system above average. The positive rating of the system, which earned it the letter grade B, is supported further by additional participant interviews. P5, for example, commented, “I love this, this is great!” P1 mentioned, “That is nice, kind of very satisfying.” Similarly, in feedback about the application, P2 said that “it is very nice.”

6 Discussion

In this study, we asked artists their opinions about the tools in current commercial 3D sketching systems, and they evaluated the usability of those tools. We also ran a user study to allow trained artists to share their ideas around novel input methods for 3D sketching that are natural and multimodal. In addition, artists were asked about their perceptions of the usability of commercial 3D sketching systems.

6.1 Adding novel tools

RQ1 was about the tools of commercial 3D sketching systems help artist in their sketching process. Our participants identified four elements, Menu, Shape Creation, Manipulations, and Animation, of the commercial system study that did not meet their needs or could be improved. Here we describe our findings for each one of them:

Menu. One feature not available in the sketching application is the ability to maximize the display space by removing the menu on the non-dominant hand. This menu could potentially distract the artists and hide important parts of the sketch when the artist is doing a visual search. Moacdieh et al. [70] studied the effects of cluttering the display, which ultimately affects the performance. It has been shown that having a higher field of view (FOV) [83] leads to better performance. Therefore, manually hiding the menu when artists are not using it maximizes the FOV, which would allow for a better visual search, and increase the performance of the artist when sketching in VR. Another requested feature was a preview pane to test different combinations of brushes, brush sizes, and colors. Just like the menu on the non-dominant hand, this preview pane should only be shown at the artist’s request. Removing it when not necessary would maximize the FOV.

Shape Creation. One of the desired features requested by artists was the ability to create basic geometric shapes. Gravity Sketch and ShapesXR (Table 2) are VR applications that already provide this functionality, which is similar to creating shapes on desktop applications (Table 2). Basic 2D shape creation in VR has previously been explored by Barrera Machuca et al. [11] by using beautification. Two artists also wanted to generate shapes with the assistance of AI. Chen et al. [27] have explored using natural language to generate colored 3D shapes, as well as composite 3D shapes, such as tables and chairs. Incorporating such technologies would allow artists to create basic geometric shapes.

Manipulations. The artists also wanted to manipulate the strokes by turning them into straight lines or splitting them. Turning a non-straight stroke into a straight line has previously been demonstrated by the use of beautification [11]. This turns a sequence of dots that look almost straight into a straight line. On the other hand, Jiang et al. [52] showed that splitting a stroke can be done via a cut operation. The sketching application has a feature called Snip that breaks the stroke, however, this function affects the stroke not only at the location where the split occurs, but also the shapes of the newly split strokes. The cut operation only affects the stroke at the site where it is cut, leaving the rest of the shape of the two strokes unaffected. One artist wanted to manipulate objects closer by hand-sculpting them. Currently, Blender (Table 2) supports basic sculpting in VR, but for finer control and details, users still have to launch the desktop counterpart of Blender to finalize those sculptures.

Animation. The sketching application includes some brushes with animations. Several applications, such as ShapesXR (Table 2), also provide limited animation when hovering over an object. Although only one artist suggested creating animations, it would be worthwhile if a future release allows artists to create a basic animation that extends beyond the animated brushes that are already included in the application.

All of these suggested tools share a commonality: they resemble features found in other desktop and VR applications used for 3D modeling. Our results suggest that new 3D sketching applications should include tools that artists are already familiar with from other software, as the artists expect to find similar tools across software.

6.2 Adding unimodal and multimodal interactions

RQ2 concerned identifying which natural multimodal interactions 3D sketching systems can add to help the artist in their sketching process. Interestingly, we found that our participants mostly focused on tools not related to the process of sketching in a broad sense, but on the interactions that help manipulate the strokes such as object interaction, selection, and manipulation. We also found that when proposing multimodal interactions, our participants only proposed interactions with two different input methods, e.g., gesture and speech or controller and gesture, but no more.

Brush. As an alternative to using the VR controller to control the brush, only two artists mentioned the pen. One would think that since all the participants are artists, that they would elect to go with the pen as the preferred input device. When we analyzed the demographics of two artists who suggested a pen, we noticed that one is in their senior year and the other had recently graduated. The artist who had just graduated used VR extensively and had worked with a team to create a VR game. Because of their familiarity with HMDs and their different uses, this artist chose the pen because they wanted the most appropriate tool for interacting in VR as a sketching medium. The second artist chose the pen because they focused on digital drawing and pixel art as their primary form of artwork. These artists’ daily activities and type of artistic practice affected their choice of using a brush.

Object Interaction. We observed a relationship between artists who wanted to use bimanual interaction versus artists who wanted to use gestures. A possible explanation for the choice of interaction may depend on the number of hours that each of these participants spends in front of a computer as opposed to those artists who spend the majority of their time on a controller. The artist who suggested a bimanual interaction spends considerably more time on a computer, using both hands to manage the keyboard and mouse. In contrast, the artist who predominately uses the controller must make broader gestures due to both hands being tied up with the controller.

Menu. For an alternative way to access menu, we looked at the two most prominent modalities requested by the artists to access the menu: controller and speech. The artist who chose the controller is accustomed to using similar types of input devices, such as game controllers, therefore, it may have been a natural preference influenced by their experience in gaming. The artist who chose speech selected this interaction as a personal preference, possibly related to phone use, wherein users can use voice activation to engage with their phones. What we can gather from these artists is that accessing the menu through other modalities is a personal choice that may be influenced by other familiar technologies in their environments.

Manipulations. Interestingly, there were two artists who wanted to split a stroke by pulling it apart. Both artists had experienced VR before and both were in a similar age group. The first artist works on sculptural art projects using both hands, therefore, using both hands to pull apart the stroke could be a natural translation from the physical world. The second artist primarily works with desktop applications, possibly influencing their choice to select using a bimanual interaction for pulling a stroke apart. We hypothesize that this artist may want to pull apart a stroke as they predominately use their hands to do separate operations when on the keyboard and mouse. For deleting strokes, two participants wanted to use gesture and speech combined. We hypothesize that using both gesture and speech to specifically choose a stroke and use speech to indicate what must be done with it could be a simplified process for this artist.

Selection. For selecting strokes, 7 artists suggested speech. Upon checking pre-survey demographics, we noticed that they all have previous experience with VR systems. We hypothesize that because VR applications often come with audio-visual tutorials, but not an in-depth written manual, speech is a more natural response in the environment—allowing the application to process the verbal command on their behalf.

6.3 System usability scale (SUS)

RQ3 was about examining the artists’ opinion on the usability of commercial 3D sketching systems. Our findings show a high usability score for the sketching application (Figure 8). However, there were four artists who rated the system with a ‘D’ or below average. The pre-survey questionnaire revealed that these artists played computer games for less than 6 hours on a weekly basis. We speculate that for people who are accustomed to playing games, it is relatively easy to switch from one game controller to another, or a VR controller for that matter [3]. Also, the only way that artists were able to directly interact with the sketching application was through the VR controllers. Using this unfamiliar input technology might have been challenging for them, leading to frustration and a lower usability score.

7 Recommendations to Add Novel Tools and Unimodal/multimodal Natural Interactions to 3d Sketching Systems

Based on the interviews and the feedback received, we have created recommendations for adding novel tools for future sketching application.

7.1 Adding novel tools

During the interviews, the artists made suggestions for tools that could help them during their sketching process. One tool would be a disappearing menu and a preview pane. These should only appear at the request of the artist, and can be tied to an on-and-off mechanism, e.g., a toggle button on the controller or a spoken command, such as “show me the menu”.

The artists also mentioned the generation of primitive shapes, which would help them save time for sketching. Two artists mentioned the possibility of being assisted by AI. Currently, large language models, such as ChatGPT by OpenAI [78], can be integrated into game engines [95]. Such similar software can help artists as novel tools.

Developers can implement beautification techniques, such as those displayed in Multiplanes [11] to estimate the shape that the artist is attempting to draw. Methods such as Nestor [42] and GCN [104], which uses a neural network, have also been shown to be successful. For the splitting function, as in HandPainter [52], a stroke splitting can break a mesh into two watertight pieces, and two new objects should be added as nodes to the scene graph.

Allowing artists to implement their animation can be challenging. Developers can create empty animation objects that can be used by the artist to assign modified object properties, and these can be saved in the timeline as they are added. By letting the artists play the animation, they can see how the object changes based on the modified properties.

7.2 Adding unimodal and multimodal natural interactions

During our study, the only way to sketch in the application was through the VR controllers. Every artist has a personalized style, and the sketching application should be able to accommodate that style. As it was suggested by the participants, other unimodal interactions should be made available to cater to each artist’s style. Additionally, some participants suggested adding multimodal interactions. Therefore, we recommend adding other input modalities, such as gesture and speech, to help the artists navigate the software in a more natural way. By adding this additional functionality the participant will be able to use the software in a natural way.

Table 3:

Category	Feature	Preferred Interaction
Creation	Detailing	Bimanual
	Filling	Speech
	Generating	Speech
	Drawing	Bimanual
Manipulations	Beautification	Controller
	Stroke Splitting	Controller, Gesture
	Sculpting	Gesture
	Moving	Speech
	Erasing	Gesture, Gesture+Speech
Menu	Shortcut	Controller
	Tool Selection	Speech
	Menu	Controller
Selection	Grouping	Controller
	Selection	Speech
Animation	Animation	Gaze

Table 3: This table illustrates the preferred modality per feature among the participants. The features stroke splitting and erasing were the only two categories with ties.

Additionally, the hardware used in this study exists to meet general VR user needs. While the current hardware allows 3D sketching to be done, designing controllers that are more specifically oriented toward sketching and other artistic applications could offer additional options for users to interact effectively with the system. Designing hardware options in varying sizes could also accommodate users with different hand sizes and different ranges of mobility. By increasing the hardware options along with the software capabilities, users can direct more conscious effort towards working and creating and less on interacting with the tools.

Having multiple modalities available for interaction provides a versatile toolkit for expressing users’ ideas. Users can start with quick, fluid gestures to lay down the basic structure of their sketches. They could then use speech or controller functions to refine their ideas. This encourages experimentation and exploration as users are not confined to specific tools or techniques. This freeform approach can lead to novel design concepts.

8 Limitations and Future Work

Semi-structured interviews allowed the discovery of ways to improve 3D sketching interactions from 13 artists. However, when inspecting the preferences in Table 3, a theme of unimodal interaction is present in most of the features. Bimanual only appears in detailing and drawing features, and controller appears a few more times (which can be bimanual). However, multimodal interaction appears only in the moving feature.

We speculate that the reason why the multimodal interaction was not always preferred or present is that our participants had limited exposure to the VR/AR sketching application. 11 of the 13 participants had used a VR/AR headset before, 2 of those 11 participants mentioned using the headset for a very short period of time. Most of them had not used or had limited time working in 3D sketching. Most of the work in producing gestures has been observed using elicitation studies, e.g., [99, 109, 114], but not using the approach we took. As expressed earlier, elicitation studies have a number of limitations in complex systems such as 3D sketching. Therefore, the next step of our research is to provide HMDs to artists and ask them to use 3D sketching over a series of weeks. In addition, during this period, we plan to provide a series of weekly videos describing different methods about unimodal and multimodal interaction to familiarize artists with modalities. Then, the artists would be invited for a study when they have mastered 3D sketching. This would allow us to combine their art expertise with the experience they have gathered in the 3D environment. We hope to find that the approach improves the participants’ familiarity with the different types of interaction modalities that they can propose. The second study will also increase the times that were given in this experiment and provide a multi-session approach, which is similar to the production methodology suggested by Morris et al. [72].

Another option is to seek a larger set of artist from different places in a future study, given that our study had participants from the local university and in the age range of 20 to 28, which limits the feedback received to those of younger adults only and with a certain type of experience.

9 Conclusion

In this paper, we proposed new ways to interact with 3D sketching systems in VR, using one or more input modalities from the artists’ point of view with semi-structured interviews. The study involved interviewing 13 artists, on which we performed a thematic analysis. We identified alternative modalities to current features and proposed new features that mimic features available on desktop software, which might give artists an advantage. The suggestions about input methods made by the artists in the study were informed in part by their coursework in drawing and sculpture. The suggestions were also influenced by the participants’ computer use and familiarity with gaming. We also gathered insights from artists experienced in traditional physical creation to explore ways in which developers could enhance the intuitiveness of unimodal and multimodal interactions in VR sketching. If developers can create more opportunities for artists to interact with VR sketching more naturally, it could allow artists to interact more seamlessly with their creations. It could also lead to greater adoption of VR sketching within the artistic community and those in the greater creative VR world.

We also provide recommendations for future 3D sketching systems in VR to make it easier for artists to transition from a desktop to an immersive 3D sketching environment. Some participants mentioned that implementing these new features will make the system efficient for people to interact in the VE, rather than trying to overcome the system’s limitations. While 3D sketching systems in VR are great for casual users, our recommendations are based on artists’ perspectives. Thus, they are geared toward ensuring that future artists are productive and efficient when sketching in 3D VEs.

Acknowledgments

We like to thank and acknowledge the funding support from the National Science Foundation via grants NSF 2327569, 2238313, 2223432, 2037417, and 1948254; support from the Defense Advanced Research Projects Agency via grant DARPA HR00112110011; and support from the Office of Naval Research via grants ONR N00014-21-1-2949 and ONR N00014-21-1-2580. We also want to extend our appreciation to Adam Coler for the valuable assistance in proofreading and suggesting edits.

Footnotes

The OpenBrush tutorial can be seen at https://youtu.be/XqrwfRKjv7U.

The Unimodal and Multimodal video can be seen at https://youtu.be/zxeCdDaPk-8.

Supplemental Material

MP4 File - Video Preview

Video Preview

Download
68.21 MB

MP4 File - Video Presentation

Video Presentation

Transcript for: Video Presentation

MP4 File - A short introduction on multimodal interaction

A short introduction to the terms unimodal, multimodal, synchronous, asynchronous, symmetrical, asymmetrical, dependent, and independent.

Download
81.92 MB

MP4 File - A brief tutorial on Open Brush

Each artist was instructed via this tutorial on how to sketch in VR. Shortly after watching the tutorial, each artist would replicate each task.

Download
66.87 MB

PDF File - Post-Study Interview Questions

We used the post-study interview questions after each artist had filled out the system usability scale (SUS) so the artist could provide us with information on the system.

Download
8.80 MB

References

[1]

Adobe Inc.2023. Adobe Illustrator. https://www.adobe.com/products/illustrator.html

Abstract

1 Introduction

2 Related Work

2.1 3D Sketching

2.2 Interaction Techniques for 3D Sketching

2.3 Multimodal Interaction

3 Motivation and Research Questions

4 User Study

4.1 Methodology

5 Findings

5.1 Qualitative Findings using Thematic Analysis

5.1.1 Alternative Modalities to Current Features.

5.1.2 Proposed Features.

5.2 Quantitative Findings using the System Usability Scale

6 Discussion

6.1 Adding novel tools

6.2 Adding unimodal and multimodal interactions

6.3 System usability scale (SUS)

7 Recommendations to Add Novel Tools and Unimodal/multimodal Natural Interactions to 3d Sketching Systems

7.1 Adding novel tools

7.2 Adding unimodal and multimodal natural interactions

8 Limitations and Future Work

9 Conclusion

Acknowledgments

Footnotes

Supplemental Material

References

Cited By

Index Terms

Recommendations

Haptics in Augmented Reality

Experiencing 3D interactions in virtual reality and augmented reality

Multimodal Human Machine Interactions in Virtual and Augmented Reality

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations