5.1.1 Alternative Modalities to Current Features.
Our data indicates that the study’s participants identified the need to remap current tools in the 3D sketching system tested (OpenBrush) to novel input methods. This remapping does not modify the existing functionality of the tool but rather a way to control it. We grouped these suggestions into three main categories: brush, object interaction, and menu ( Figure
5.) The Brush category includes any interaction that affects the brush style. The Object Interaction category includes any action that selects or manipulates the object/stroke of the drawing. Finally, the Menu category includes choosing a tool or doing an action from a menu.
Brush. In 3D sketching systems, the brush tool is fundamental for users to create new strokes by moving the VR controller in space. Interestingly, most participants did not mention changing the input method to draw strokes. Only two participants suggested other ways to create strokes. P10 mentioned that a physical, real-world pen would be a useful interaction method to accomplish the same functionality as the controller. P2 mentioned using a gesture plus the controller to redraw strokes by selecting a stroke and adding vertices to it. P2 described this as,
“adjust it [...] like grab [...] certain [...] parts of it like I can grab this middle part like by selecting it and [...] use my hands [...] to like stretch it in the way that I want it to look.” This interaction is known as redrawing [
6], and is present in applications such as Adobe Illustrator [
1] and Adobe Photoshop [
2].
One important aspect of the brush tool is its characteristics of a stroke drawn by moving the controller. In most 3D sketching systems, these characteristics control a stroke’s color, texture, and width. Users change brush’s characteristics via settings found in a menu that sits on the opposite controller’s virtual menu system in the 3D space. For artists, access to changing the brush’s settings could be improved through gestures. Yet, among the participants, there was no consensus on which gestures to use. P12 suggested natural gestures like swiping left or right, “if there is a type of motion where I can just like maybe like swipe like a certain way to like just like change brushes.” On the other hand, P8 suggested wrist movements, describing, “maybe a wrist flick to be able to change between the two brushes.”
Object Interaction. Unlike traditional 2D sketching with pen-and-paper, 3D strokes exist as objects in space that the user can manipulate (e.g., translate, rotate, and scale). Users can also manipulate other objects inside the environment, like drawing guides. Most 3D sketching systems allow users to manipulate these objects using one- or two-handed interactions with the controllers. Interacting with objects is an important task for artists, whether moving the object or affecting it by changing its properties. Participants suggested manipulating objects with other input modalities, such as gesture, speech, gaze, or bimanual interaction.
For unimodal input methods, participants who suggested using gestures mentioned the need for more natural interactions with the hand. One example of this is P2, who said that if it were possible to “grab this middle part like by selecting it and like use my hands or something like that to like stretch it in the way that I want it to look.” P2 stated that this method would be preferable to using a controller to scale the stroke. Other participants also wanted to use their hands, but in a bimanual interaction. For example, P3 mentioned that if “you could kind of use both hands to, like, grow a selection around something from a distance.” The participants suggested other input modalities, like speech and gaze, to make the interaction faster. For example, P13 wanted the ability to use speech to “select everything and all of the dots I’ve drawn,” and P7 mentioned “if I was looking there and I could just kind of grow a selection where I was looking.”
The participants also suggested multimodal interactions for object manipulation. Examples of proposed multimodal interactions include merging gesture and speech. An example of this is P2’s suggestion to use gesture and speech to delete strokes, “I could probably like point at it and like tell it to erase it.” Also, while attempting to select strokes, P1 mentioned that gesture and gaze would be a good way to manipulate strokes, “I feel like that would be a gaze with [...] my hand gesture.”
Menu. Accessing the menu is important to reveal all the tools available to participants. The menu allow users to modify the properties or characteristics of the strokes in the 3D environment, like changing colors, textures, or brush width. P9 and P13 suggested extending the current way to switch between tools or properties; P9 wanted to continue using the controller to alternate between tools by “double click[ing] on a button to go back to [the] previous tool.” Similarly, P13 did not want to switch to a different input modality but instead wanted to use a different combination on the controller to switch colors. P13 demonstrated such action to the researchers by tapping on the controller trackpad. While both participants preferred the controller for the current unimodal input, their methods for switching between tools differed slightly.
Other participants felt comfortable using multimodal inputs to interact in the environment. P2 wanted to use a combination of gesture and speech to erase strokes in the environment. In using gesture followed immediately by the verbal command “tell it to erase it,” P2 hoped to minimize accessing the menu multiple times - one time to perform a selection, and the second to access the erase feature from the menu. In contrast, P1 wanted to minimize the time needed to access the menu when duplicating strokes. Duplicating strokes involves selecting the strokes that will be duplicated, followed by another menu command to duplicate them. P1 hoped to save time by looking at the strokes that needed to be selected. Then, while doing a circular motion on the controller with the “hands and then I used the gesture right here” to duplicate the strokes. Both participants wanted to save time by minimizing the number of times they needed to access the menu to perform common tasks. Accessing the menu multiple times would have distracted the participants, but multimodal inputs could have allowed them to focus on the task at hand.
5.1.2 Proposed Features.
Some of the participants’ suggestions on new functionalities are not currently available in OpenBrush. We also examined various tools and 3D drawing software available in the market, including Open Brush, Gravity Sketch, ShapesXR [
29], Paint 3D [
68], Paint.Net [
30], Photoshop, and Blender [
37] ( Table
2), and could only identify one solution that met the suggestions of the participants in Blender, which provides basic functionality for manipulating objects [
36] in VR. We grouped these suggestions in five main categories, creation, manipulations, menu, selection, and animation (Figure
6 and Figure
7), and discussed them in detail below. The creation category is for creating objects, other than strokes, in the environment. The manipulations category allows the participant to alter the appearance of a stroke by splitting it, sculpting it, moving it, or erasing it from the environment. The beautification feature takes a non-straight line and ties all the points together into a perfect line. The proposed menu category would provide access to a menu or a set of sequential commands. The selection category would allow selection through other input modalities, such as speech, and grouping of multiple strokes via the controller. The animation category proposes a simulation that is composed of interactions between objects, and this simulation keeps repeating.
Creation. While users can manipulate objects via the standard translation, rotation, and scaling, adding additional details, such as texture, is not a feature that is currently available in the application. P1 would have preferred to alter a selected stroke to reflect a particular aesthetic vision. P1 wanted to create a specific texture, but could not do so due to the current limitation of the software. Another aspect was that 7 participants were interested in turning strokes that resembled a shape into a perfect geometric shape. Artists commonly use applications, such as Adobe Photoshop and Blender, to create geometric shapes from drawings. In Notability [
38] for the iPad, this feature is known as perfect shapes, where the application, based on a machine learning model attempts to approximate the shape that the user is drawing and creates a perfect shape, replacing the user’s drawing. This technique is also known as beautification. The approach for beautification differed slightly among the participants who proposed the feature. P4 suggested using speech to generate a 3-dimensional flat circle, not a sphere, by saying,
“large circle, or something like that.” On the other hand, P11 wanted to use speech to generate objects, but in this case, P11 wanted to generate full 3D shapes, such as a sphere or a cube. Furthermore, P11 wanted to be as specific as possible on where the 3D shape had to go by saying
“I want this on [...] the Z plane or the Y plane.” While the requests were similar, generating the requested shapes differed slightly. In contrast, P9 was interested in generating custom shapes. P9 wanted to generate fur on the side of the dog by issuing the verbal command,
“generate [fur] all over the surface.” Participants P8 and P11 (who use digital drawing applications) were interested in not only generating shapes but also filling the surface created by strokes or filling the volume. P8 and P11 agreed that filling the surface created by strokes was important. They differed in the object that was being filled. While painting the grass, P8 suggested a “fill feature so I could [...] connect a line here and then use a paint bucket to fill this all green would be interesting.” In contrast, P11 wanted to perform the same function but to fill the surface of a pre-made shape. In extending P8’s request, P12 wanted to fill the surface of any surface, regardless of the number of strokes that the object was made of. One observation is that the three participants (i.e., P8, P11, and P12) wanted to use only speech for the fill feature. However, P13 wanted a similar function by using gestures. When attempting to fill the volume of an object, P13 mentioned that “you could like make the shapes [...] come in filled” by gesturing towards the object. While speech and gesture were the most common inputs, the preferred unimodal input was speech. Interestingly, two participants, P1 and P9, mentioned being assisted by artificial intelligence (AI), such as P1, after drawing a dog, wanted “kind of AI generated to give you this.”
Although the 3D sketching application allows participants to use their dominant hands to draw, it is limited by not allowing both hands to select strokes or draw. P8 would have liked to spread both arms to select all strokes that appeared between them from the headset’s perspective. Instead of using both hands to control the selection, P4 wanted to use the non-dominant hand to control the size of the stroke being drawn by the current brush. In the current system, the stroke size can be controlled by the dominant hand by swiping left or right on the controller trackpad but not by the opposite controller. In contrast, P3 wanted to be more involved in the drawing by using both hands (bimanual) to draw independently. While there was a disagreement on how they would use both hands to affect their drawing, the participants mentioned they would have benefited from using bimanual interaction to advance their drawings.
Manipulations. Artists may start with mental images of what they envision, but they may modify their visions as the drawing progresses. In order to allow for modification, participants proposed manipulating strokes using a set of inputs that includes beautification, stroke splitting, sculpting, moving, and erasing features. The beautification of shapes was previously mentioned, but one participant wanted the beautification of single lines. P2 wanted to turn a stroke into a straight line by speaking “make the line straight” through the microphone (i.e., speech). P6 found it difficult to create a flat surface to draw the path and thus wanted the controller to have the ability to create a flat surface in the environment. P8 wanted to use straight lines. Unlike P2, however, P8 did not want a stroke to be beautified into a straight line, but rather wanted the application to draw a straight line.
In 2D, adjusting a stroke could be done by splitting it or removing part of it. In the tested application, a stroke can be removed or left as-is, but it cannot be split. P7 mentioned that erasing “the whole stroke and not just like individual parts of the stroke” was inefficient, as the participant would need to account for additional time to create new strokes by having to erase the current stroke, then creating two additional strokes to give the appearance of a split stroke. To resolve that, P2 suggested splitting a stroke by saying “pull it apart” while using a gesture, issuing a verbal command by saying “split this line,” or using a slicing gesture on the stroke.
Some branches of fine arts, like sculpting or even painting, can require artists to use their hands when working with clay or clay-like materials. P10 and P11, who enjoy sculpting, would like to see sculpting offered in future releases of OpenBrush. P10 wanted to use pre-made geometric shapes with the volume inside them filled to “just start kind of like sculpting” from the outside and working towards the inside. When asked if there was a preference between drawing and sculpting, P10 responded by saying that using hands for “sculpting [...] would probably be even more preferable.” It is clear that the participants were trying to associate previous knowledge from real-life sculpting to sculpting in VR.
Finally, six participants wanted better control of the strokes or an alternate way to remove them. In the current version of OpenBrush, to select a stroke, the user has to make contact with the controller and the stroke. Instead of walking to a stroke to select it with the controller and then move it to another position, P11 wanted to “point at something and say like or just like being able to point to something and grab it,” as in using ray-cast pointing to select strokes that were far away. P11 also wanted to use ray-cast pointing to highlight an object to either verbally tell the application to select it or grab it with the controller and then move it to a more suitable location. Similarly, P4 wanted to be able to erase a stroke by just “point[ing] at it and like tell it to erase it.” In the case of these two participants, a multimodal interaction would have been suitable to accomplish their goal.
Menu. As each participant had taken at least one digital art class, they had experience using application interface menus. Although some applications on the desktop support accessing menus via speech, the tested VR 3D sketching application did not. P11 wanted to access the tools in the menu employing speech by merely
“say[ing] the name” of the shortcut corresponding to the menu. From the participant’s view, a shortcut, just like the shortcuts found on popular applications like Adobe Photoshop, allows the participant to reach a tool or an action by skipping several menus, thus saving time. When painting on a 2D digital canvas like Procreate [
88] on an iPad, an artist can use a side palette to test out the brush size and color before using it to digitally draw with. While the tested application allows the participant to change the size of the controller by swiping left or right, P8 suggested a different method to access the tool by pressing on the controller trackpad rather than swiping left or right. The reasoning behind this, as P8 explained, is
“to make that be a part of the trackpad, because it is a little bit choppy.” As P8 was swiping on the controller, the location of the controller in the VR environment was constantly drifting. At the same time, P8 suggested removing the menu on the non-dominant hand. The head rotation required to look at the non-dominant menu hand and select a different tool was described as distracting. P8’s reason follows:
“when I have to stop and find this button, I mean it is not that hard to find, but some way that you could swipe up on the trackpad and open a menu would be, I think, a little bit more efficient.” A pop-up menu close to the dominant (or drawing) controller would have been more efficient by minimizing the time needed to rotate the head.
Selection. An important aspect of 3D systems, such as OpenBrush, is the ability to select specific strokes or a group of strokes. Selecting strokes allows the user to erase or duplicate a single stroke or multiple strokes, which minimizes the time the user has to spend to erase or duplicate them. P3 would have liked to select strokes by using a bimanual interaction, like a T-pose, where the distance between the hands hands would indicate the range of the desired selection. Another way the same participant wanted to do a stroke selection was by using speech. P4, P7, and P9 agreed on using speech to select all the strokes in the environment by saying
“select all.” P8 suggested two different methods: using a dedicated button on the controller, which P9 agreed on, or using a combination of speech and gesture. Stroke selection would
“probably use gaze,” according to P12, who was asked which modality of interaction would be preferred for selecting strokes. P13 felt that speech would be useful in selecting all the strokes by echoing the command,
“select everything,” which would group all the strokes in the environment.
Animation. While the tested application (OpenBrush) allows participants to showcase their creative side, animation is not supported. Some brush effects perform an animation as part of their texture, but the participant does not have any control over this animation. P1 wanted to create a custom animation that kept repeating itself: the effect of lightning coming out of bubbles. While this could not be created, due to the limitation of the software, P1 said that it “would be nice” if that feature existed.
Multimodal Features. Multimodal interaction refers to an interaction that involves two or more input modalities being used to accomplish a task in the system (see Figure
7). For example, a participant may want to point to a stroke and say delete. For selection, P8 was the only one that suggested using a combination of speech and gesture. When grouping the features into common categories, it was found that participants in our study mostly proposed multimodal interaction techniques for creation tasks.
Multimodal Creation. Participants proposed specific features for filling shapes or objects with colors or textures and generating shapes and objects. As with the unimodal case of this feature category, these features were grouped under “Creation” since they would involve creating additional content in the VE. Unlike the Creation category for unimodal interactions, however, no detailing or drawing features were proposed for use with multimodal interaction techniques.
Filling. Participants also expressed the desire for OpenBrush to allow them to fill the inside or surface of an object or shape. Although some proposed techniques for accomplishing this involved unimodal interactions, others proposed multimodal interaction techniques. P2 proposed multimodal interaction technique, to point at an existing object and then use speech to fill it with color or texture. To fill in a tree, for instance, P2 described,
“Pointing at it, telling it [...], ‘Fill this tree up with green.’ ” This approach entailed drawing some kind of outline to indicate the tree, which P2 said could possibly mean drawing the wireframe for the object. It was not clear whether P2 meant creating a wireframe mesh, as is found in 3D modeling, or simply drawing an outline of the object and then specifying that it should be filled. Filling shapes/objects was also proposed by P10 to be accomplished through the coordinated use of the controller, a pen, and gesture. This interaction technique was focused primarily on texture and would involve selecting the drawn outline of a shape/object with the controller and then using the gesture and pen in undefined ways to fill the object with a desired texture.
Generating. During drawing tasks, participants wanted to be able to generate objects and shapes in OpenBrush. As described previously, some of the proposed interaction techniques for this desired feature only involved unimodal interactions. Other proposed interaction techniques for generating shapes and objects involved multiple modalities working in tandem. This sometimes involved a combination of full-sentence speech and pointing. When asked if an alternative interaction technique could help create the ground, P2 wanted to “Point at, like say, two points [...] and say, ‘Make a square.’ ” P2 further elaborated this proposed interaction technique by pointing to two separate points, such as the opposite corners of a square, followed by the verbal command to make a square, and the system will use those 2 points as a reference and create a square. Meanwhile, for such 3D objects as cylinders, P2 said that pointing at two points could specify the top and bottom of the object. Further details in defining the dimensions of the shapes and objects were not provided by P2. P2 also proposed generating more complex objects at a specified location by pointing and simply saying to generate this. One example given was to “...point at, like, a certain point within, like, the bark of the tree and [...] tell it to sprout a branch.’ ” Alternatively, P13 proposed using a combination of controller, gesture, and full-sentence speech to generate shapes. This interaction technique would use speech to say, as P13 described, “Make me a circle,” and then gesture could be used to specify where to place the shape/object while the controller would be used to control the other attributes of the shape/object, such as the size.
Because many of the proposed multimodal interaction techniques involved speech commands, implementing these interactions would involve accurate speech recognition that can also incorporate the context provided by the other interaction techniques. For instance, when pointing at an object and using speech to fill it with color, the system will need to recognize what object is being pointed at and connect that to the spoken instructions. Due to some aspects of the proposed interaction techniques being vaguely described by participants, future work would also involve identifying what kinds of gesture, controller, or pen actions would be necessary to make these multimodal interaction techniques effective and satisfying for users.