[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3544548.3580981acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Visual StoryCoder: A Multimodal Programming Environment for Children’s Creation of Stories

Published: 19 April 2023 Publication History

Abstract

Computational thinking (CT) education reaches only a fraction of young children, in part because CT learning tools often require expensive hardware or fluent literacy. Block-based programming environments address these challenges through symbolic graphical interfaces, but users often need instructor support to advance. Alternatively, voice-based tools provide direct instruction on CT concepts but can present memory and navigation challenges to users. In this work, we present Visual StoryCoder, a multimodal tablet application that combines the strengths of each of these approaches to overcome their respective weaknesses. Visual StoryCoder introduces children ages 5–8 to CT through creative storytelling, offers direct instruction via a pedagogical voice agent, and eases use through a block-like graphical interface. In a between-subjects evaluation comparing Visual StoryCoder to a leading block-based programming app for this age group (N = 24), we show that Visual StoryCoder is more understandable to independent learners, leads to higher-quality code after app familiarization, and encourages personally meaningful projects.

1 Introduction

In recent decades we have seen a growing push for computer science education for all children. This early exposure to computer science, especially through creative and personally meaningful applications, can increase later interest in the field among female and racially marginalized students [22, 23], contribute to the development of computational thinking (CT) and computational literacy [3], and build lifelong skills and readiness for a child’s educational career [27]. However, despite the benefits of such early exposure and evidence suggesting children can cognitively engage with CT practices [13], the available infrastructure for teaching computing to early elementary age (K–2) children does not meet all of our students’ needs.
In particular, children often lack the advanced literacy, numeracy, and fine motor skills needed to use most existing programming environments or CT tools [12]. Existing solutions for teaching computing to children are often unable to remove this literacy threshold [46] or require expensive specialized technologies that are unaffordable for many school districts or families and not readily available to all students [29, 54].
To circumvent these challenges, a number of researchers and educational technology developers have implemented symbolic block-based programming solutions. These block-based languages for young children replace text with icons to remove the need for literacy [20, 30] and disguise the underlying programming syntax using blocks that fit together only when syntactically correct [21, 46]. This paradigm allows children to piece visual—and sometimes tangible [29]—blocks together to create programs that operate in the visuospatial domain, typically controlling the appearance or motion of sprites.
Critically, though, these symbolic block-based programming languages present content in a manner which may not enforce best practices in coding, leading to needlessly complex, messy, and difficult-to-debug code [38, 53]. As they stand, these languages often require a more experienced teacher to encourage best practices or to guide the student toward building more complex programs [38]. Therefore, children can have trouble making meaningful progress when learning with these paradigms without classroom support [18, 38, 46].
To provide direct pedagogy in an at-home learning context, researchers have recently introduced voice-based programming education paradigms, which bypass literacy challenges by replacing text with speech interaction and use a voice-based agent to deliver instruction [14]. By combining CT and storytelling, these voice-based approaches also teach CT in a manner that simultaneously supports reading readiness, thereby tackling literacy limitations directly [14]. These systems deliver both computing and storytelling pedagogy to learners without requiring a teacher’s presence [14, 64, 67] and can successfully introduce key computing concepts to children using a language the child already speaks [14, 43]. Furthermore, by expanding beyond sprite-driven storytelling in the visuo-spatial domain, this approach may broaden the scope of stories children feel empowered to tell with their projects.
Voice-only interfaces do, however, have inherent navigation challenges and high memory demands [14]. A child must remember everything they have done up to a given point in order to decide what to do next, and navigation-based interactions can be a great deal more tedious via voice than they are in a graphical interface. These limitations to voice-only interfaces can hamper the complexity of the content that the system can deliver and of the projects that students can produce.
With Visual StoryCoder, we present a block-like graphical story creation environment with voice-based input and pedagogy, blending the strengths of block-based and voice-based computing education tools to overcome their respective weaknesses. The voice interface can deliver just-in-time instruction without requiring an instructor—thereby scaffolding best practices—and can enable the creation of complex, abstract, plot-driven stories, while the block-like graphical interface simplifies navigation tasks and provides visual cues to aid memory. This system supports children’s creation of a diverse range of personally meaningful projects, encourages computational practices like abstraction and modularization through scaffolding, and introduces key computing concepts like sequences, loops, events, and variables, all while allowing for independent at-home use.
In this paper, we describe the design and development of the Visual StoryCoder system and present the results of a between-subjects study comparing Visual StoryCoder to ScratchJr, a leading block-based programming app for early-elementary aged children. Overall, we show how Visual StoryCoder’ multimodal design supports independent usage by minimizing the need for experimenter instruction, yields higher-quality code after app familiarization, and supports the creation of a broad range of personally meaningful projects using guided open-ended planning.
Specifically, with Visual StoryCoder, we present the following contributions:
(1)
We show how the strengths of voice-based computing education can facilitate independent at-home usage of a block-like interface by providing just-in-time guidance and instruction to develop children’s system understanding.
(2)
We demonstrate how memory cues and navigational support from block-like graphics combined with voice-based content pedagogy jointly support the creation of high-quality code.
(3)
We illustrate how the strengths of each respective modality—open-ended voice input and user-created graphical support—combine to encourage the creation of personally meaningful projects.

2 Related Work

Here we describe related work on computational thinking concepts and practices, block-based programming languages for children’s coding education, educational voice interfaces for preliterate children, and systems that simultaneously support computing and storytelling education.

2.1 Computational Thinking Concepts and Practices

Computational thinking (CT)—a term first coined by Seymour Papert [44]—has been central to K-12 computing education efforts since Jeanette Wing published her influential article in 2006 [60]. Despite this focus, little consensus exists as to the exact makeup of this composite skill set. The literature does generally agree on the importance of certain computing practices, including decomposition and abstraction, and to a lesser degree on computing concepts like parallelism and conditional reasoning [1, 5, 6, 24, 57, 60, 61]. However, the majority of these CT definitions focus on older students.
Brennan and Resnick’s [5] CT framework, though, specifically targets elementary age learners. This framework emerged from studies and observations of Scratch [46] users and identifies specific computing concepts (e.g., sequences and loops), practices (e.g., abstracting and modularizing), and perspectives (e.g., computing for self-expression) that are critical and appropriate for elementary school computing education.
In parallel to the development of academic CT definitions, national efforts in the United States have contributed to the creation of curricular frameworks and objectives for elementary computer science education [7, 8]. Additionally, research into code quality assessment has produced metrics and methods for evaluating children’s programs and understanding [33, 41]. Examining the intersection between these CT definitions, curricular frameworks, and assessment mechanisms, we identify a set of concepts (sequences, loops, events, and data/variables), practices (abstraction, planning, and decomposition/modularization), and perspectives (computing for self-expression) on which we focus our work.

2.2 Block-Based Programming Languages

Block-based programming languages (BBPLs) are undoubtedly the dominant paradigm for introducing computational thinking in K-12 computing education. These languages use puzzle-piece shaped function blocks that fit together only when syntactically correct, allowing a user to focus on semantics without worrying about syntax [56].
While the earliest BBPLs come from software engineering and structure-oriented editors [39, 40], their use in education dates to the 1990s [2, 9]. Today, the most prominent of these BBPLs—Scratch [36, 46] and Blockly [21]—rely on written text and are therefore inaccessible to most K–2 students who are still learning to read.
To circumvent these literacy challenges and reach younger users, researchers and developers have created text-free symbolic BBPLs, such as ScratchJr [20]. These tools replace written text with pictographic symbols representing each block’s functionality [20, 29, 30], thereby removing the need to know how to read and write and allowing younger children to learn to code. Often these symbolic BBPLs use digital blocks to control the movement and state of on-screen characters, called sprites [20, 30], but variations of these languages also include tangible blocks to control digital characters [29], digital blocks to control real-world hardware characters (i.e., robots) [51], or entirely tangible experiences [52, 54].
BBPLs are successful because they are approachable to new coders, especially in comparison to traditional programming languages. Many have been designed to follow the principles of constructionism, and are therefore intended to support learners in exploratory and discovery-based learning [44, 46]. The graphical interfaces are reminiscent of tangible jigsaw puzzles, creating a sense of familiarity from the start, and the inability to introduce syntax errors helps learners experience successes without toiling over frustrating error messages, thereby reducing cognitive load [59].
Yet these BBPLs also receive criticism because they can often present challenges to learners in the absence of educator support [25, 26, 35, 38]. To begin with, learners and educators may not see block-based programming as an authentic coding practice when compared to text-based coding [16, 58]. However, when targeting younger, preliterate learners, this concern about authenticity diminishes. More importantly, this body of work has also demonstrated how learners that are using BBPLs may not develop accepted coding habits or grasp the computational concepts that underlie their projects [25, 38]. Rather than forming a top-down plan, these students may utilize a bottom-up development process that begins from individual blocks, which can lead to fine-grained, non-reusable code [38]. Even middle school students who completed an introductory programming course using a BBPL still demonstrated confusion and misconceptions around variables, loops, and conditional logic after a post-course assessment [25]. Critically, learners that do succeed when using a BBPL typically have additional support from educators [26, 37, 48, 49]. Structure, scaffolding, and direct instruction around relevant computing concepts—particularly when provided in the moment as learners are engaging with that idea—can aid in learning transfer [49], improve student grades and retention [37], and increase learning outcomes [26, 37].
As the dominant syntactical paradigm in K–12 computing education, there is no doubting the impact and influence Scratch, ScratchJr, and other block based languages have had on this landscape. However, computing is still not reaching all learners and not all students have access to critical educator support, suggesting that there is still room for exploration, growth, and new ideas in this space.

2.3 Educational Voice Interfaces for Children

Unlike BBPLs, voice interfaces provide direct instruction through a conversational agent, while still dismantling literacy barriers by removing written text. Several recent systems for young learners that feature conversational agents focus on storytelling or other narrative-based activities [14, 62, 65, 66, 67]. For instance, Elinor is an interactive conversational AI embedded in children’s narrative science programming that contributes to higher scores on assessments of program content [62], StoryBuddy allows parents to collaborate with voice-based AI to create story experiences for their children with interactive question answering [67], and StoryDrawer takes children’s voice input from storytelling and uses it as input for an AI to generate drawings for those stories [65].
These systems show that children can learn from their interactions with instructional voice-based AI [14, 63]. However, voice-based interfaces also present cognitive load and navigation challenges that can make it difficult for children to create complex projects or to move around within the system itself [14, 55].

2.4 Technologies That Simultaneously Support Computing and Storytelling

Storytelling is a common domain for early computing education because of the underlying structure of stories and the potential for stories to foster engagement through creativity [34]. In particular, several graphical computing education tools for older students support visual storytelling, especially via animations [10, 19, 31, 46], while other tools support programmable characters [47], storytellers [17], and listeners [4].
CyberPLACE takes storytelling into the physical world while introducing CT practices to upper-elementary age children (8–12 years old). With this tool, users can physically recreate a story’s setting using electronic modules [50]. Other tangible programming tools using storytelling as a domain aim to improve accessibility in computing education for older visually impaired users [32].
Notably, one voice-based system, StoryCoder, leverages storytelling as an approachable domain in which to introduce key computing concepts to preliterate students [14]. This system demonstrates that children can learn about key computing concepts via direct built-in voice instruction. However, child users of StoryCoder still struggle with memory demands, inherent system constraints that limit the complexity of their projects, and navigation challenges that make it difficult to edit stories once they have been created [14]. StoryCoder addresses these problems by guiding children through fixed user flows and presenting minimal lists of potential response options, but these solutions also prevent children from going back to edit their story and limit what those stories could be about.
Returning to our objectives for this paper of merging the strengths of block-based and voice-based computing education, we see that the oral traditions of storytelling align well-with a voice-based interface, and storytelling metaphors from early literacy education can provide structure to scaffold well-planned and well-formed programs.

3 System Design Objectives

While there is abundant literature on block-based computing education, voice-guided approaches are still quite new to this space. We therefore began from the StoryCoder [14] system to identify design goals for this voice-guided multimodal platform.

3.1 User Needs and Curricular Objectives

StoryCoder was informed by a needfinding investigation that highlighted the importance of accessible, approachable, and engaging computing education. We consider those same needs as a baseline to guide development: we support access by designing a system to run on general-purpose hardware with no requirement for literacy; we support approachability by leveraging storytelling as a familiar domain by which to teach computing concepts; and we support engagement by allowing children to creatively tell stories that relate to their own interests.
Similarly, from a curricular standpoint, we identified computational thinking education objectives by cross-referencing computational thinking definitions [5] and curricular frameworks [7] specifically designed for the target age range (see Section 2.1). We therefore target the concepts of sequences, loops, events, and variables and the practices of decomposition, abstraction, and planning. However, while StoryCoder aimed to directly teach these computing concepts while scaffolding these practices, in this system, we aim to more directly engage children with decomposition and abstraction.

3.2 Qualitative Analysis of StoryCoder Session Data

Notably, a voice-only system like StoryCoder presented cognitive load and navigation challenges to young users because it lacked visual cues on which to offload information [11, 55]. In order to guide our own system’s design, we first conducted a qualitative analysis of session data from the original StoryCoder user study to identify the specific challenges users faced. To conduct this analysis we first transcribed all sections of the user study sessions in which children were engaging with the StoryCoder app. This process yielded 44 scripts (22 children, each using the app in 2 sessions) of interactions between the child, the experimenter, and the app itself.
Following the structure presented by Myers et al. in analyzing voice interface usage [42], we identified categories of obstacles users faced and the types of tactics they employed to overcome them. We also noted specific features that children requested so we could support those requests in this new system. Two coders collaboratively developed a codebook emergent from the data itself, and then iteratively coded the data while updating this codebook until reaching saturation. Three distinct design objectives emerged from this coding process.

3.3 Design Objectives

In identifying design objectives for our multimodal Visual StoryCoder system, we aimed to resolve the most common obstacles children faced when using StoryCoder, while more actively engaging them with computing practices. Our goal was to combine the strengths of voice-based and block-based programming environments to overcome their respective weaknesses, building a system that provides direct instruction to learners to support system understanding and presents visual cues and structural scaffolds to facilitate the creation of high-quality code and personally meaningful projects.

3.3.1 System Understanding.

Our analysis of StoryCoder showed that children using the system would commonly forget the instructions, skip the instructions, or express confusion about those instructions, which resulted in instances of parent or experimenter help, typically at least once per user flow [14]. Given that not all children have access to computing education at school and not all parents have the bandwidth to co-play with their child at home, one objective we had for this system was for it to support usage without any parent or teacher support. To achieve this goal in Visual StoryCoder, a refined voice-based agent provides direct, just-in-time guidance to develop children’s system understanding. Critically, we provide this support for independent usage without including any written text in the app to ensure that even pre- and early readers can succeed.

3.3.2 Code Quality.

Beyond system understanding, we had the additional objective of supporting the creation of complex, high-quality code projects. Within a voice-only interface like StoryCoder, memory and navigation challenges limited the complexity of a project. StoryCoder enforced fixed user flows, and asked children to structure their story to fit a simple, pre-existing template. In doing so, it introduced computing concepts (sequences, loops, events, and variables), without directly occupying children with computing practices (decomposition and abstraction). With our multimodal approach, we provide both visual and auditory scaffolding to actively engage learners with these computing practices, encouraging them to create high-quality, structured programs.
We achieve this objective by building further on the metaphors of early literacy education. Like StoryCoder, a learner begins by completing a structured planning process to determine the high level components of their story (i.e., star, location, and problem). However, in Visual StoryCoder they also engage with abstraction and decomposition by modeling their program as a story and by breaking that story apart into chapters and chapters into pages. They must sequence their pages within a chapter so that the chapter makes sense and order their chapters within the story in the same manner. Then, to utilize computing concepts, they can augment completed stories to play looping background music in parallel with chapters, select words as events in their stories to trigger sound effects, and create a variable out of the star, location, or problem to change that component out for something new.

3.3.3 Personally Meaningful Projects.

Finally, we aimed to drive engagement by supporting children in creating personally meaningful projects [28]. In particular, the StoryCoder system asked children to select components within their story (e.g., location or problem) from a set of pre-determined options. Additionally, when children told their stories, the voice-only system preformed speech-to-text on their input without saving the raw audio. This design led to a number of input error obstacles as children attempted to create stories about components that were not included in the provided response options or as the system transcribed their stories incorrectly. We identified several corresponding feature requests to a) make story planning more open-ended and b) keep the story the way the child had told it.
Figure 1:
Figure 1: The primary story creation screen: A) Flow the Fish, B) chapter block, C) page block, D) defining chapters with emojis, E) adding chapters to the story, F) story playback, G) story augmentation with music loops in the loops menu, H) story augmentation with word events in the events menu, and I) story augmentation with story component variables in the variables menu, J) music loop block attached to a story chapter.
Therefore, in Visual StoryCoder, although we provide structure to the story planning process, we do not provide constraints as to what the story can be about. Rather than providing a list of options for the main character, location, or problem, open-ended voice input means that the child can choose to tell a story about anything they can think to say. The system provides examples to spur ideas, but does not limit children to those examples. Then children draw their own graphical representations of these inputs, which serve as both memory cues and visual support. When it comes to telling a story, we similarly preserve the child’s speech input for later playback, rather than replaying audio generated from a transcript. In this manner, we ensure children can create projects about the content and topics that they find most motivating and have their own voices reflected in those projects.
Figure 2:
Figure 2: The story page creation screen: A) microphone input, B) page picture drawing space, C) story component stamps, D) toggle for drawing mode. The story page planning screen is the same, less the story component stamps.

4 System Design and Development

With these objectives guiding our system design, we have built Visual StoryCoder, a multimodal tablet application that introduces computing concepts and practices to preliterate children (ages 5–8) through story creation.

4.1 A Voice-Based Agent to Build System Understanding

The Visual StoryCoder app is entirely guided by Flow the Fish, an instructional voice agent with a visual representation in the top left corner of the screen (see Figure 1.A). On first-time app usage, there is a brief (2 minute) introductory segment in which Flow the Fish appears on-screen and shows the child how to listen to and repeat instructions, how to input voice and stop voice recordings, and how to get to the home screen. From then on, Flow will automatically and appropriately reply to the child’s voice and tap inputs to provide just-in-time instructions on what comes next. This way, rather than needing to pay attention to and remember the full contents of an upfront tutorial, children can get the relevant information exactly when they need it and in the context of their own project.
Critically, Flow provides two kinds of voice-based instruction. First, the agent explains to the child how to use the app. For example, when a child opens the app, this agent instructs the child to press the plus button to create a new project. Second, the agent provides directions and suggestions for structuring the story and code. As described in the next sections, Flow guides the child to create a story with a main character, a location, and a problem, and to decompose that story into separate chapters for the beginning, middle, and end. Similarly, the voice guidance explains key computing concepts (i.e., loops, events, and variables) and how to use those concepts in the context of the app. A full transcript of an interaction between this agent and a child user is included in the Supplementary Materials.

4.2 User Flow for High-Quality, Personally Meaningful Projects

The user flow consists of a structured story planning segment that encourages the development of personally meaningful projects, a story creation segment that emphasizes decomposition, abstraction, and sequencing, and a story augmentation segment that introduces three key computing concepts: loops, events, and variables.

4.2.1 Structured Story Planning.

Children begin by planning their story. With guidance from the app, they select the main character, location, and problem of their story. The app offers suggestions for what each of these might be, but provides no limitations or constraints on what a child can choose. With this design, children can receive inspiration if they feel stuck but ultimately have the freedom and flexibility to tell a story about whatever feels meaningful to them. For each of these story components, users provide voice input describing their choice and then draw a picture of the component they selected. For example, a child might say that the main character of the story is a horse and then they would draw a picture of that horse. These story component drawings later become stamps that children can use in the pictures in their story.

4.2.2 Story Creation with Decomposition and Abstraction.

In Visual StoryCoder, the storytelling metaphor presents an abstraction upon which children model their programs, a familiar framework by which to decompose it into parts, and an implicit need for logical sequencing to ensure the story makes sense.
Children create their story using chapter blocks and page blocks, which are visually designed to look like loop blocks and move blocks from Scratch or ScratchJr. They can drag a chapter block into their workspace and add as many page blocks as they would like to that chapter (see Figure 1.B and 1.C). Once a child has added a page block, they can tap on it to access an interface to voice record that page of the story and to create a picture to accompany that page (see Figure 2). These user-generated page pictures are a combination of drawings and dragged-and-dropped stamps of the story component pictures children drew during planning. For example, a child might drag their location story component stamp onto their picture and increase its size to fit the full frame as a background, but they could then draw on top of that stamp to further add to the picture.
When the child has completed their chapter they can select that chapter’s emoji identifier; tapping the emoji icon in the top corner of a chapter block pulls up an emoji keyboard that is used for selection (see Figure 1.D). Then, they can use the selected emoji to add that chapter to their story—the high-level program—in the story bar at the bottom (see Figure 1.E). When the child hits play on their story, it will play each of the added chapters in sequence. Each chapter in turn sequentially plays the audio of the child’s voice on each contained page while displaying the accompanying picture.
Within this design, the user-selected and user-drawn symbols (i.e., chapter emojis and page pictures) in conjunction with the familiar storytelling paradigm enables decomposition and abstraction using a small number of blocks. The storytelling metaphor allows us to present each block as a general purpose “function” that accepts user defined parameters. These functions allow child programmers to decompose, abstract, and sequence their code while still remembering what everything in their program does. That is, a story is a high-level function that calls chapters via user-defined emojis, chapters are functions that call pages, and all page blocks accept spoken and drawn parameters that simultaneously give that page content and remind the user what that page says.

4.2.3 Story Augmentation with Loops.

Once a child has created their story, they can then learn about loops by selecting background music. In the app, loop blocks each represent a short music clip. These loop blocks can slot onto the side of a chapter block and when the story plays, the music clip will play on a loop in the background for the duration of the chapter (see Figures 1.G and 1.J).

4.2.4 Story Augmentation with Events.

Children can create sound effects that trigger when their story plays selected word “events.” Using the events menu, children first select a single word from their story using voice input (see Figure 3.A). Although Flow the Fish verbally provides the child with word options in case they cannot think of one, the child can choose any word that appears in their project. Once the child has chosen a word, they then use the next voice input field to record a sound effect (see Figure 3.B). This can be any sound that they’d like; some children use their voice and some record noises from their environment (e.g., fingers tapping on a table may be a sound effect for the word “typing”). Then, when the child replays the story, it will trigger the corresponding sound effects anytime a selected word arises.
Figure 3:
Figure 3: The events menu. A) Children record the event trigger word in this left microphone input box. B) Children record the corresponding sound effect in the right microphone input box.

4.2.5 Story Augmentation with Variables.

In the app, children learn that variables are things whose value can change for other things of the same type. During the story planning process, they selected a main character, a location, and a problem for their story, and then drew pictures of each of these components. When creating their story they could then drag and drop these drawings to use as stamps in their story’s pictures. In the variables menu, the child can pick a new main character, location, and/or problem for their story and then draw a new picture of that component. When the child then replays their story, every stamp of that component used in the story’s pictures will be replaced with the newly drawn image (see Figure 4).
Figure 4:
Figure 4: An original page drawing on the left, and the variables-augmented drawing on the right in which the user changed the star of the story from a cat to a dog.

4.3 Implementation

We developed the app in Swift 5 for iOS 14. We maintained a record of the child’s state in the app (e.g., in story planning location selection) and used that state to determine what instructions the child received. This state could change based on voice input or touch input and instructions varied based on what the child had already done. In the case of voice input, we checked for a possible state update at the end of every voice recording, and in the case of touch input, we used Combine to broadcast messages to the appropriate subscribers for any event that might trigger a state change. We used the Google Cloud speech-to-text library to process voice input into written text and the Google Cloud text-to-speech library to turn text into spoken instructions.
Any time the user recorded any voice input, we maintained both the recorded audio file and the text transcript. This way we could include spoken audio within instructions or during story playback to preserve the child’s voice, but could use the text to make logical decisions within code. For example, we use both recordings and transcripts to drive the word event sound effect functionality. From the Google Cloud speech-to-text API, we receive a text transcript of the story along with time offsets of every word in that transcript. We then run speech-to-text on the word the child selected in the word event menu to get the text version of that audio input. To identify when the sound effect plays, we find all instances of that transcribed text in the transcript of the story. Then during story replay, we have a separate audio device playing the sound effects that triggers at those identified timestamps.

4.4 Iterative Testing

To ensure the app was bug-free and usable, our team engaged in staged iterative testing, first with adults and then with children of the target age range. All procedures were approved by our university’s IRB.
We first tested the app with eight adult users. These users were able to find a number of code bugs and identify points of confusion that would certainly be challenges for children. Once adult users were smoothly using the app, we transitioned to testing with children.
We conducted iterative testing with 11 children (5.48 − 7.99, M = 7.15, SD = 0.69) remotely over Zoom. Seven of these children identified as male and four as female. The children were recruited via word-of-mouth, came from eight different U.S. states, and had no prior instruction in computer science or programming. Sessions lasted for one hour, and families received a $25 gift card in exchange for participating. After obtaining informed consent, an experimenter observed the child’s interaction with the app and took notes of any challenges they had or any questions they asked. After every child participant, the research team compiled a list of app changes based on these observations and made the necessary updates before interacting with the next participant. Once we reached a point where several children in a row made it through the full user flow without assistance, we moved forward with a system evaluation, described in the next section.

5 System Evaluation

To evaluate the efficacy of combining voice-based pedagogy with a block-like storytelling environment as a means to introduce computational thinking to young learners, we conducted a between-subjects study comparing Visual StoryCoder to ScratchJr, a leading block-based programming app for early childhood computing education. While ScratchJr adheres to constructionism in providing children a playground for independent exploratory learning, our block-like environment builds upon the direct instruction approach of voice-based computing education to impart pedagogy as a child plays. In this evaluation we specifically aim to evaluate how that built-in instruction impacts children’s understanding of both the system and the content.
Therefore, based on our established design objectives, we investigated the capacity of these systems to support usage and system understanding in the absence of an instructor (e.g., in an at-home setting), to yield high-quality programs in that usage context, and to support children in creating a breadth of personally meaningful projects. Specifically, with this evaluation we sought to answer the following research questions (RQs), informed by our design objectives and related to our overarching contributions:
(1)
How do built-in voice-based instruction and experimenter instruction affect the degree of system understanding among at-home users?
(2)
How do built-in voice-based instruction and scaffolding around computing practices affect the quality of code produced by child users?
(3)
How do these systems support the creation of personally meaningful projects to drive user engagement?

5.1 Participants

We conducted this study remotely over Zoom with 24 participants (6.65–8.68, M = 7.61, SD = 0.66; 7 female, 17 male; 12 in 1st grade, 12 in 2nd grade). Each child participated in two 60-minute sessions on consecutive days; families received a $25 gift card for each session in exchange for participating. Children came from four counties across two U.S. states, none of the children had any prior programming instruction, and all of them had an iPad capable of running our app in their homes. All procedures were approved by our university’s IRB.

5.2 Procedure

Children participated in two 60-minute sessions on two consecutive days. On both days they used the same app—either Visual StoryCoder or ScratchJr. App assignments were counterbalanced across each grade-level such that six children in each of first and second grade played with each app. The first session was aimed at answering how voice-based guidance and computing practice driven design can impact independent first-use play, for example in at-home contexts (RQ1 and RQ2). The second session then evaluates how the quality of children’s programs and degree of system understanding differs between systems after receiving instruction from the experimenter (RQ1 and RQ2). We consider the contents of the artifacts children produce and their self-reported engagement scores in each session to answer RQ3.

5.2.1 Session 1.

The first session was aimed at understanding a child’s first-play usage in an independent at-home context (i.e., without help from an instructor or parent). In this session, we first obtained informed consent from the family before assisting them in installing either Visual StoryCoder or ScratchJr on their own device.
Figure 5:
Figure 5: A child uses Visual StoryCoder during the evaluative user study.
We explained to the child that we were trying to learn how kids would use the app at home if they didn’t have a parent or teacher there to help. We asked the child to do the best they could when they got stuck, but explained that neither the experimenter nor the child’s parent could assist the child during gameplay. If the child was using ScratchJr, we directed them to start by watching the tutorial video and children using Visual StoryCoder automatically began with Flow the Fish’s intro sequence. Then the child could play with the app for 30 minutes or could choose to stop playing earlier (e.g., because of frustration or because they finished their project). If at any point during gameplay the child did look to the experimenter for help, the experimenter could only say: “What does the fish say?” (Visual StoryCoder) or “What does the tutorial video say?”(ScratchJr).
After the child played with the app, we conducted an artifact-based interview, administered a verbal quiz about parts of the system interface, and asked the child to self-report engagement, as described in Section 5.4. At the end of the session, we also asked the family to send the child’s program to the research team via built-in functionality in each app; we later used these sent programs to score code quality.

5.2.2 Session 2.

The second session was aimed at exploring a child’s usage of the app with the added support of instruction, explanations, or help from a human instructor. In this second session the experimenter told the child that now they could work together to play with the app. The experimenter and child first returned to the project the child created in the first session. Starting from the artifact-based interview and points of confusion therein that the child expressed during session one, the experimenter helped to clarify any misunderstandings and to resolve any bugs remaining in the child’s project. The experimenter did not proceed past this instruction phase until they had explained the key functionality of the app and the child implemented each aspect of that functionality in their projects (for ScratchJr: sequenced motion blocks, start on green flag, go to page end blocks, and the functionality of repeat/wait/grow/shrink blocks; for Visual StoryCoder: sequenced page blocks, start story on play button, adding chapters to a story, and the functionality of the loops/events/variables menus). After completing this instruction period, the child then had 30 minutes to create a new project. For this project, the child was told that the experimenter was permitted to answer questions.

5.3 ScratchJr as a Control Condition

Children in the control condition completed the same two-session procedure as those in the experimental condition, except they used ScratchJr instead of Visual StoryCoder. We compare our system to ScratchJr specifically because it is a research-driven block-based programming environment for this age group and was specifically intended to support the design objectives we aimed to achieve.
Figure 6:
Figure 6: The ScratchJr app. Code blocks at the bottom form a script for the basketball sprite to follow.
In ScratchJr, children program animated stories by controlling the motion of characters called sprites using any of 27 code blocks (see Figure 6). Children’s stories can span up to four pages, each page can have its own background and sprites, and each sprite can have its own code. This app has a tutorial video, which all participants in this condition watched at the start of session one, but otherwise has little built-in aid for when children need help.

5.4 Metrics and Collected Data

To answer our research questions, we collected quantitative data about the quality of the code students produce, their understanding of the system they used, and their self-reported engagement with the app. We also collected qualitative data via user artifacts and by completing a structured artifact-based interview at the end of each session. For rubric-based metrics, we calculated inter-rater reliability using the weighted Kappa statistic (quadratic weighting) to determine consistency among raters.

5.4.1 System Understanding Quiz.

We evaluate children’s understanding of the system and its coding features at the end of each session using a system quiz. In this quiz we show children a screenshot of the app, present a box around a certain element of the system, and then ask what that element does in the game. Specifically, we selected the eight key elements in the ScratchJr tutorial video and the eight primary elements the instructional fish describes during gameplay in Visual StoryCoder. These are the same elements the experimenter reviewed with the child during the session two instruction period. We use this metric to consider the effect of built-in instruction on children’s understanding of the system (RQ1) and calculated an inter-rater reliability of κ = 0.87 (95% CI, 0.84 to 0.90).

5.4.2 Code Quality Rubric Score.

Kyza and colleagues [33] adapted the Dr. Scratch [41] rubric–which is commonly used to analyze the quality of Scratch programs—for use with ScratchJr. We in turn created a direct mapping adaptation of this ScratchJr rubric for use with Visual StoryCoder in which we replaced ScratchJr elements with their corresponding Visual StoryCoder elements (e.g., Visual StoryCoder pages map to ScratchJr scripts; see Supplementary Materials for the full rubrics). For each application, this rubric quantifies children’s usage of decomposition and abstraction, parallelism, logical thinking/synchronization, flow control, user interactivity, and data representation based on the presence of specific blocks within the project. Each of these six items is scored on a scale from zero to two where zero means that practice or concept is not represented in the project, one suggests a basic understanding of that practice/concept, and two signifies a developing understanding. Therefore, higher code quality scores would indicate children are using and learning about CT when creating their projects. We use these scores to evaluate both the impact of the system and of instruction on code quality (RQ2). Inter-rater reliability on these rubric scores was κ = 0.91 (95% CI, 0.91 to 0.91).

5.4.3 Artifact-based Interview.

Artifact-based interviewing can lend nuance to the more granular “there or not there” quantitative system understanding and code-quality metrics [5]. Based on prior work [5, 45], we developed an 11 question interview protocol to learn more about participants’ projects from the participants themselves. Through this protocol, we qualitatively assessed children’s system understanding and usage of computing practices after session one’s independent use and session two’s instruction (RQ1 and RQ2), and we explore the systems’ capacities to support the creation of personally meaningful projects (RQ3).
Specifically, after each session, we asked children:
(1)
Tell me about your project.
(2)
How did you get the idea for your project?
(3)
What are you most proud of in this project?
(4)
What would you do if you had more time?
(5)
How did you get started making your project?
(6)
Can you tell me about how your project changed as you worked on it?
(7)
What was important for you to know in order to make the project?
(8)
What problems did you run into while making this project? How did you deal with those problems?
(9)
What happened when you got stuck?
(10)
What did you find most confusing while making this project?
(11)
Pretend you were talking to another kid who had never used this app before. What would you say to them to explain how the app works?

5.4.4 Giggle Gauge Engagement Score.

Although many aspects of an app could make that app engaging, we aimed to support personally meaningful projects in a specific effort to drive engagement (RQ3). Therefore to directly measure children’s engagement with each application, we asked participants to complete the Giggle Gauge self-report engagement metric [15]. This 7-item metric has been validated with early elementary-age children and produces an overall engagement score on a scale from one to four.

5.5 Data Analysis

We analyze the collected data to directly address the research questions driving this system evaluation and measure our success in achieving our three design objectives. For quantitative metrics, we treat an individual child’s score across sessions as a within-subjects variable and scores across apps as a between-subjects variable.

5.5.1 RQ1: System Understanding.

We look to children’s system understanding quiz score as a quantitative metric to evaluate RQ1. In this analysis, we conduct a mixed ANOVA, where a significant result in a child’s score across sessions would indicate an (expected) effect of experimenter instruction and increased app familiarization. A significant result across apps would indicate that one app contributed to overall higher system understanding when controlling for session. Finally, a significant interaction effect would indicate that one app garnered system understanding through usage whereas another did so through experimenter instruction.
Qualitatively, to understand why differences in system understanding might exist and what features children attribute to their own understanding, we look to artifact-based interview questions 7–11, related to understanding and usage.

5.5.2 RQ2: Code Quality.

To evaluate the impact of app and instruction on code quality, we analyze the data from children’s code quality rubric scores, again using a mixed ANOVA. In this analysis, a significant result across apps would indicate one app better supported such code quality in these sessions, a significant result across sessions would indicate an impact of experimenter instruction on code quality, and a significant interaction effect would again show how one app developed code quality in first-time usage whereas another leaned on experimenter instruction.
Qualitatively, we look to explain any difference in code quality by examining responses to artifact-based interview questions 5–6 related to process.

5.5.3 RQ3: Personally Meaningful Projects.

To evaluate the systems’ capacities to support the creation of personally meaningful projects, we conducted a brief content analysis of the projects themselves in conjunction with responses to artifact-based interview questions 1–4 related to those projects. We identified personally meaningful projects as those containing elements of importance to the individual (e.g., interests or interpersonal relationships). We then clustered these personally meaningful projects based on their similarities to analyze which app led to the creation of which kinds of projects.
In addition, we look to the Giggle Gauge metric to get a higher-level view of user engagement. On this metric, an average score above 3.6 indicates high engagement, an average score between 3.0 and 3.6 indicates mid-level engagement, and a score below 3.0 indicates low engagement. We consider which engagement-level bucket each app falls into, and we again conduct a mixed-ANOVA to evaluate if there is a quantitative difference in engagement across apps or across sessions.

6 Results

We report findings across both sessions for all 24 children who participated in this study.
Table 1:
 Session 1Session 2
 VSSJVSSJ
VS Chapter/SJ Pages1.420.831.421.25
VS Pages/ SJ Sprites1.330.831.581.33
VS Story Chapter/ SJ Green Flag1.251.001.331.75
VS Play Button/SJ Shrink1.421.001.831.75
VS Loop Block/SJ Wait1.330.001.671.92
VS When/SJ Move0.752.001.252.00
VS Then/SJ Repeat1.171.251.582.00
VS Variables/SJ Go To Page0.750.831.081.50
Total Score9.427.7511.7513.5
Table 1: Average system understanding score for each assessment item, split by app and session. VS stands for Visual StoryCoder and SJ stands for ScratchJr. The maximum score for each item is 2.

6.1 RQ1: System Understanding

A mixed-ANOVA shows an unsurprising significant effect of system understanding by session. Specifically, we see an increase in system understanding in the second session after children have had an opportunity to familiarize themselves with the app, received instruction, and had a chance to ask questions, F(1, 22) = 76.63, p < 0.001 (see Table 1). We do not, however, find a main effect of which app the child used on this system understanding, F(1, 22) = 0.00, p < 0.001.
Critically, we do find a significant interaction effect, F(1, 22) = 11.89, p < 0.001 (see Figure 7). Children using ScratchJr had a much larger increase in understanding after receiving experimenter instruction in session two, whereas this was not the case with Visual StoryCoder, where the change in system understanding between sessions was much smaller. We interpret this as support for the ability of built-in instruction to teach the child about the system in Visual StoryCoder, whereas ScratchJr really benefits from external educator support.
Figure 7:
Figure 7: Children score higher on the system understanding quiz in the second session, but an interaction effect shows this session two increase is much greater for children using ScratchJr than those using Visual StoryCoder. Error bars show 95% confidence intervals.
Qualitatively, during the artifact-based interview at the end of each session participants spoke about how built-in instruction was important to them in figuring out how to use the Visual StoryCoder app. For instance, when asked what they did when they got stuck, P1, an 8-year-old using Visual StoryCoder said:
“[It was important for me to know] to click on the fish when I didn’t know what to do.” –P1
In the app, tapping on the fish would make it repeat its instructions.
On the other hand, children using ScratchJr talked about being confused, clicking about randomly, or ignoring parts they did not understand, also aligning with experimenter observations of their behavior. Another 8-year-old, P12, this time in the ScratchJr condition, said:
“When I was confused with something, I kind of just ignored it and did something different.” –P12
Based on our qualitative data we believe built-in instruction was critical to system understanding in Visual StoryCoder whereas we only see a large score increase on system understanding in ScratchJr after external instruction.

6.2 RQ2: Code Quality

Looking next at code quality across the sessions, we see that average code quality among children using Visual StoryCoder (M = 5.93, SD = 2.55) is higher than among than those using ScratchJr (M = 4.38, SD = 2.43), but not to a statistically significant degree when aggregated across sessions, F(1, 22) = 3.51, p = 0.07. We do not see a significance difference in scores across sessions or an interaction effect between session and app.
Looking at the data more granularly (see Table 2), we see that Visual StoryCoder matches or outscores ScratchJr in all areas except for flow control, for which ScratchJr users need only use the repeat block in one place in their project to get full credit whereas Visual StoryCoder users get full credit by using repeated chapters within their story (i.e., for full-chapter repetition within story playback). On the other hand, Visual StoryCoder scores highest on parallelism and data when compared to ScratchJr, perhaps due to built-in instruction around parallel music loops and changing variables (i.e., data) within the story.
Table 2:
 Session 1Session 2
 Visual StoryCoderScratchJrVisual StoryCoderScratchJr
Abstraction0.830.921.081.00
Parallelism1.000.751.831.00
Logic/Synchronization0.580.080.750.42
Flow Control0.831.331.001.17
Interactivity0.920.581.000.92
Data1.000.581.080.17
Total Score5.174.256.754.67
Table 2: Average score for each concept in the code quality rubric, split by app and session. Maximum score for each item is 2, except maximum user interactivity score for Visual StoryCoder is 1 because no corollary existed in Visual StoryCoder to the 2 point ScratchJr item. Higher score indicates greater CT learning.
Figure 8:
Figure 8: Children using Visual StoryCoder score higher on code quality in session two. Error bars show 95% confidence intervals.
Critically, we see a great deal of variance in code quality scores in session one both quantitatively (VSession1 = 9.09) and qualitatively, as compared to session two (VSession2 = 4.17). In particular, in session one some children using ScratchJr did not touch the code blocks at all or dragged them into their projects without ever running the code. On the other hand, multiple children using Visual StoryCoder spent more than half of their time planning their story before ever reaching the creation pages. Consequently, they ran out of time before getting through the full user flow of the app and did not have an opportunity to engage with loops, events, or variables. In both cases, these behaviors reflect a user still acquainting themself with the system; in the case of ScratchJr children were figuring out the purpose of code blocks and in the case of Visual StoryCoder they budgeted their 30 minutes of time poorly because they did not know what lay ahead.
As a secondary analysis, we look only to session two, where participants have had an opportunity to familiarize themselves with the app and what it can do. Here, the usage of CT in these projects—and by extension the code quality score—is more reflective of children’s CT learning, as it is less impacted by still-developing system understanding. We find that in this second session students using Visual StoryCoder do create statistically higher quality stories, in terms of code quality, than those using ScratchJr, t(21.98) = 2.71, p = 0.01 (see Figure 8).
We attribute some of this difference in code quality to Visual StoryCoder’s scaffolding around planning, a key computing practice. Qualitatively, we find that children using Visual StoryCoder talked about how they needed to plan their story in order to tell it.
When asked how they got started making their project, P15, a 6-year-old using Visual StoryCoder said:
“I got started by thinking who was in the book and then...I was, like, thinking what they would do...[It was important for me] to have a plan.” –P15
On the other hand, children using ScratchJr often spoke about making up their story as they went along. For instance, P24, a 7-year-old in this condition said:
“I went to, like, the plus button and I saw the really pretty flower and I wanted it to be part of my story. Then I thought of that. But I technically made it up as I went.” –P24
We view these differences in self-reported planning behavior as further evidence for the way that Visual StoryCoder’s scaffolding directly supports children’s planning processes.

6.3 RQ3: Personally Meaningful Projects

We look to the stories children made and their interview responses to understand how these systems support the creation of personally meaningful projects. In particular, we look at times children created projects about the things they like, the people they know, or issues they care about.
Several children using both apps told stories about their favorite things (Visual StoryCoder: N = 4; ScratchJr: N = 3). This finding includes several Visual StoryCoder users who told stories about their favorite animals (e.g., dragons or dinosaurs) or activities (e.g., basketball). Likewise, in ScratchJr, P7, an 8-year-old, selected a location for her animation based on her favorite season and recolored it according to her favorite colors:
“Well, I do love snow and winter, even though we don’t live where there’s winter, um, and so I thought do that. And I—my favorite colors are pink, purple, um, blue, and green, so I decided to put them on for the trees.” –P7
Similar to telling stories about favorite things, we see children across both apps choosing to tell stories that starred themselves or their family/friends (Visual StoryCoder: N = 3; ScratchJr: N = 3). Often these stories were particularly salient or cherished memories. P1, an 8-year-old using Visual StoryCoder told about how he broke his leg, one of the most notable events in his life:
“It was the story when I broke my leg when I was three. And, um, and I don’t really remember it because I was really little...And that was, um, one of the biggest problems [in my life].” –P1
Along the same lines, a 7-year-old using ScratchJr, P24, recreated a trip to the beach with her dad:
“Well, it was about me and my Daddy going to the beach. And—and, so the first part I was in a room and I couldn’t wait to go to the beach. Then we went on a car to get to the beach. Then we were at the beach and then we were gonna go swimming.” –P24
However, we also see that only children using Visual StoryCoder chose to retell their favorite existing stories or create new stories that include their favorite characters (Visual StoryCoder: N = 4; ScratchJr: N = 0). We believe this behavior results from the open-ended character selection design in Visual StoryCoder. Children using this app can pick any character they would like and then draw that character, whereas those using ScratchJr all stuck to the characters available in the sprite selection palette. One 7-year-old using Visual StoryCoder, P16, describes how he recreated his favorite story:
“I got the idea for the project by thinking of the story called Treasure Island and when I first started thinking of—when I first read the book.” –P16
Finally, participant 8, who used Visual StoryCoder, chose to create a project that brought attention to a societal issue he cared about. This 8-year-old’s story about a lonely tree was meant to inspire people to care more about the environment and the way all living things depend on one another:
“A lot of people, they don’t care about wildlife or nature, like, at all. Like whenever I’m driving to my grandma and grandpa’s house, I have to drive a ways, so I look out the window and I almost always see a ton of trash. And it’s like no one cares about nature. So the story—so it inspired me to make a way that, when no one’s around to protect wildlife it gets sad and plants start to die. But when people are around to protect wildlife, all the plants won’t die.” –P8
We see that both ScratchJr and Visual StoryCoder support the creation of personally meaningful projects. However, the open-ended character/location selection steps during Visual StoryCoder’s planning process and the blank canvas for visual selection (as opposed to image selection from a predetermined list) may have better supported additional types of personally meaningful stories (e.g., new stories about existing favorite characters). More children using Visual StoryCoder did discuss the creation of a personally meaningful story overall (Visual StoryCoder: N = 10; ScratchJr: N = 6).
Finally, looking directly at engagement, which personally meaningful projects are intended to support, we find that both systems land in the mid-to-high engagement range of this engagement scale. There is not a significant difference in system engagement across apps (F(1, 22) = 2.02, p = 0.17; Visual StoryCoder: M = 3.38, SD = 0.59; ScratchJr: M = 3.64, SD = 0.45), or session (F(1, 22) = 1.03, p = 0.32; Session 1: M = 3.45, SD = 0.57; Session 2: M = 3.57, SD = 0.51) within the data we collected. However, it is worth noting that by scoring higher than 3.6, ScratchJr does fall into the high engagement range of the Giggle Gauge metric, whereas Visual StoryCoder is in the mid-engagement range [15].

7 Discussion

We began this project by recognizing the strengths (e.g., approachability) and dominance of block-based programming languages, while also wondering how voice-based input might contribute to this idea of a syntax error-free block-like environment. Our evaluation demonstrates that Visual StoryCoder encourages the use of both computing concepts and practices to yield high-quality code and supports the creation of complex and personally meaningful stories. Children’s system understanding improves with educator support, but we found that built-in instruction in Visual StoryCoder decreases the need for that educator support, suggesting it might serve as a replacement in at-home learning contexts where learners may not have access to the same help resources (RQ1). We also found that children’s code quality is higher when using Visual StoryCoder than ScratchJr, and significantly so in the second session, suggesting an impact from built-in voice-based instruction and scaffolding around computing practices (RQ2). Across both conditions, children told personally meaningful stories about themselves or their favorite things, but only those using Visual StoryCoder told stories that included their favorite fictional characters or that included a social justice message. However, this broader range of personally meaningful projects did not ultimately lead to higher overall engagement (RQ3).
In this section we discuss the impact and generalizability of the built-in voice-based instruction in the app, the ability of graphics to resolve challenges in voice interfaces to better support learning, and the way literacy metaphors and user-generated symbols support decomposition and abstraction.

7.1 Impact and Generalizability of Built-In Voice-Based Instruction for System Understanding

Visual StoryCoder provides verbal usage instructions to support system understanding. This instruction offers just-in-time pedagogy to support independent at-home usage, allowing children to successfully create programs and engage with computing practices even in the absence of an instructor.
In Visual StoryCoder, the system usage instructions likely contributed to the interaction effect in system understanding; when receiving instruction on usage from the app itself, children had a better understanding of how the app worked and what different elements did, therefore decreasing the effect of external instruction from an experimenter. At the same time, this system usage instruction is generalizable to new apps. Many apps that have a logical user flow give an upfront tutorial on system usage, including ScratchJr. Rather than giving all of that information at the start, many systems could instead provide it to the user as-needed. We acknowledge instances of this occurring with written text in systems designed for older users, but voice-based just-in-time guidance presents an opportunity to take the same approach for pre- and early readers.

7.2 Voice-Guided Planning to Support Code Quality

In addition to instruction on system usage described above, Visual StoryCoder’s voice-based guidance on program structure can provide constraints to scaffold higher-quality code. In particular, the system provides direction on planning story components and then on decomposing that story into chapters. This verbal guidance outlines a structure for the child to follow that ultimately allows for increased complexity and encourages a well-thought-out project.
Critically, such instruction can be generalized across systems as well. For example, ScratchJr could provide similar support to create a multi-part story within the app. Such guidance might encourage a child to select sprites and backgrounds in advance and decide on a plot-line before writing any code. It might further explain the structure of a multi-part story and how those parts might be instantiated as pages within the app.
However, it is important to acknowledge that this program structure support does decrease some of the generalizability within the system itself. Specifically, it supports children in creating a program that tells a three-chapter story with a conflict. However, if children want to instead use the system to write a speech, make a diary, or recount an event that does not have a conflict (e.g., a family vacation), the supports provided are not as useful.
During the second session of our user study, we did see two children break the bounds of the program structure; they disregarded the instructions in order to create a story that contained more than three chapters. To more explicitly support these behaviors in non-novice users, we also provide children with the option of turning Flow the Fish off, once they understand the structure of programs and how the system works. Although we did not give children this option during the user study, we did show them this setting at the end of our last session when telling them they were permitted to keep the app at home. Some of the children were excited to learn about this option, specifically articulating that they no longer needed the instructions on system usage now that they understood how the app worked.
As an alternative solution, we envision future work exploring adaptive instructions and instructions fading over time. Specifically, we imagine a system that automatically and gradually decreases the amount of instruction provided to the child. However, if a child appears stuck (perhaps signified by a period of time without interaction) or if they regularly neglect to use certain functionality, the app might still provide appropriate and in-the-moment targeted instruction.

7.3 Graphical Cues and Metaphors to Support Code Quality

Visual StoryCoder presents a block-like graphical interface as a means to circumvent challenges in voice-only interfaces. While a voice interface can be successful in delivering pedagogy, this approach can also present cognitive load and navigation challenges that can prevent users from creating more complex stories [14]. To use a voice-only app, children need to remember everything they have done previously to decide what to do next. Furthermore, if they want to go back or navigate elsewhere in the app, the overhead for doing so is a lot higher via voice than it would be if they could just tap a button.
Visual StoryCoder’s graphical interface overcomes these navigation challenges by providing tap-input on menus and navigation bars throughout the app. Children can further specify the start and end of voice input via an on-screen microphone button, rather than relying on the app to algorithmically detect such events through an open microphone stream, which can create privacy concerns. This on-screen input also simplifies the process of editing a story (e.g., changing part of the audio or reordering parts)—a tedious task using only voice input—which in turn can simplify and drive iterative design behavior.
Futhermore, Visual StoryCoder leverages metaphors from literacy education in conjunction with graphical support from user-selected and user-drawn symbols (i.e., chapter emojis and page pictures) to encourage decomposition and abstraction within a child’s program. In particular, every chapter and page receives its own user-generated identifier, either via a page picture or chapter emoji, which the user can then use to reference that part of the story elsewhere in their project. In doing so, the child can learn to zoom in and out to different levels of abstraction (i.e., working at the granularity of an individual page or from the perspective of the high-level story).
Returning to the world of sprites and symbolic block-based languages, we imagine future work exploring a programming interface that allows participants to define their own functions independent of any sprite and draw their own symbol to represent that function in their code. For example, a child might create a function that makes a sprite travel in a square—up two, right two, down two, left two—and assign that function a hand-drawn square as its symbol. Then in their code, just as they would add a move block, they could add their own move-in-a-square block to call and run that entire function from any sprite in their project.

7.4 Open-Ended Voice and Graphics for Personally Meaningful Projects

A multimodal approach enables us to design for the strengths of each component interface and to use each approach as a tool to resolve the other’s weaknesses. One of the great benefits of voice-based input is its open-endedness. Rather than selecting from a list of potential characters or locations, open-ended voice-based input allows children to tell stories about anything they can think of. Yet this approach comes with its own challenges; students still need visual cues of the voice-based content to remember what they have already done and to interact with this content on-screen (e.g., for ordering their story).
To address these demands, Visual StoryCoder shows a graphical representation of pages in children’s stories by using the child’s own drawings as memory cues. While voice-based input allows children to tell stories about anything they can think of, these user-drawn visuals aid children in remembering the content they have created. In this way, rather than requiring written text or enforcing the use of images from a curated library of options, we can present visuals to children without putting any constraints on the content of their stories. This multimodal approach can empower children to tell any story about anything or anyone they can think of. In doing so, we can support engagement through the creation of more personally meaningful projects, which can in turn better support learning [28].
Evaluating the broader landscape of commercial voice interfaces, we see a broad and growing recognition of the challenges in voice-only interaction and a shift toward a multimodal approach. Even many smart speaker lines, initially entirely voice-only offerings, have added screens to available products (e.g., Amazon Echo Show and Google Nest Hub). While voice presents a promising avenue for at-home education in early childhood, our work shows that accompanying graphics can potentially improve these learning outcomes.
Looking ahead, a longitudinal evaluation should investigate the longer-term learning potential of multimodal interfaces for coding education, including Visual StoryCoder. In particular, there are open questions around how this system might support transitions to other BBPLs (e.g., Scratch) later on in education.

8 Conclusion

In this paper, we presented Visual StoryCoder, a multimodal system that leverages metaphors from literacy education to introduce computational thinking to preliterate children. In particular, our approach blends the strengths of voice-based (i.e., built-in just-in-time instruction and open-ended input) and graphical block-based (i.e., eased navigation and salient memory cues) approaches to overcome their respective weaknesses. We show how just-in-time voice-based instruction can develop users’ system understanding, how voice-guided planning and graphical support drive higher-quality code, and how open-ended input and story creation encourages the creation of personally meaningful projects. Specifically, in a between subjects comparison to ScratchJr, a leading block-based programming app, we find that Visual StoryCoder supports independent use by showing decreased impact of experimenter instruction on system understanding, yields higher-quality code after app familiarization, and encourages children to create personally meaningful projects through open-ended planning and voice-based story creation. Looking ahead, our contributions establish the exciting potential for multimodal creative learning activities to advance early computing education.

Supplementary Material

Supplemental Materials (3544548.3580981-supplemental-materials.zip)
MP4 File (3544548.3580981-talk-video.mp4)
Pre-recorded Video Presentation
MP4 File (3544548.3580981-video-preview.mp4)
Video Preview

References

[1]
Valerie Barr and Chris Stephenson. 2011. Bringing computational thinking to K-12: What is Involved and what is the role of the computer science education community?ACM Inroads 2, 1 (2011), 48–54.
[2]
Andrew Begel. 1996. LogoBlocks. https://andrewbegel.com/mit/begel-aup.pdf.
[3]
Marina Umaschi Bers. 2018. Coding as a playground: Programming and computational thinking in the early childhood classroom. Routledge.
[4]
Marina Umaschi Bers and Justine Cassell. 1998. Interactive storytelling systems for children: using technology to explore language and identity. Journal of Interactive Learning Research 9 (1998), 183–215.
[5]
Karen Brennan and Mitchel Resnick. 2012. New frameworks for studying and assessing the development of computational thinking. In Proceedings of the 2012 annual meeting of the American Educational Research Association, Vol. 1. 25.
[6]
Sue Inn Ch’ng, Yeh Ching Low, Yun Li Lee, Wai Chong Chia, and Lee Seng Yeong. 2019. Video Games: A Potential Vehicle for Teaching Computational Thinking. In Computational Thinking Education. Springer, Singapore, 247–260.
[7]
K-12 Computer Science Framework Steering Committee 2016. K-12 computer science framework. ACM.
[8]
Computer Science Teachers Association. 2017. CSTA K-12 Computer Science Standards. http://www.csteachers.org/standards.
[9]
Matthew Conway, Steve Audia, Tommy Burnette, Dennis Cosgrove, and Kevin Christiansen. 2000. Alice: lessons learned from building a 3D system for novices. In Proceedings of the SIGCHI conference on Human factors in computing systems. 486–493.
[10]
Stephen Cooper, Wanda Dann, and Randy Pausch. 2000. Alice: a 3-D tool for introductory programming concepts. In Journal of Computing Sciences in Colleges, Vol. 15. Consortium for Computing Sciences in Colleges, 107–116.
[11]
Anind K Dey, Lara D Catledge, Gregory D Abowd, and Colin Potts. 1997. Developing voice-only applications in the absence of speech recognition technology. Technical Report. Georgia Institute of Technology.
[12]
Griffin Dietz, Jenny Han, Hyowon Gweon, and James A Landay. 2021. Design Guidelines for Early Childhood Computer Science Education Tools. In Design Thinking Research. Springer, 291–306.
[13]
Griffin Dietz, James A Landay, and Hyowon Gweon. 2019. Building blocks of computational thinking: Young children’s developing capacities for problem decomposition. In Proceedings of the 41st Annual Meeting of the Cognitive Science Society. 1647–1653.
[14]
Griffin Dietz, Jimmy K Le, Nadin Tamer, Jenny Han, Hyowon Gweon, Elizabeth L Murnane, and James A Landay. 2021. StoryCoder: Teaching Computational Thinking Concepts Through Storytelling in a Voice-Guided App for Children. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
[15]
Griffin Dietz, Zachary Pease, Brenna McNally, and Elizabeth Foss. 2020. Giggle gauge: a self-report instrument for evaluating children’s engagement with technology. In Proceedings of the Interaction Design and Children Conference. 614–623.
[16]
Betsy DiSalvo. 2014. Graphical qualities of educational technology: Using drag-and-drop and text-based programs for introductory computer science. IEEE computer graphics and applications 34, 6 (2014), 12–15.
[17]
Allison Druin, Jamie Montemayor, Jim Hendler, Britt McAlister, Angela Boltman, Eric Fiterman, Aurelie Plaisant, Alex Kruskal, Hanne Olsen, Isabella Revett, 1999. Designing PETS: A personal electronic teller of stories. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 326–329.
[18]
Daniel C Edelson, Roy D Pea, Louis Gomez, 1996. Constructivism in the collaboratory. Constructivist learning environments: Case studies in instructional design 151(1996).
[19]
Brittany Terese Fasy, Stacey A Hancock, Barbara Z Komlos, Brendan Kristiansen, Samuel Micka, and Allison S Theobold. 2020. Bring the Page to Life: Engaging Rural Students in Computer Science Using Alice. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. 110–116.
[20]
Louise P Flannery, Brian Silverman, Elizabeth R Kazakoff, Marina Umaschi Bers, Paula Bontá, and Mitchel Resnick. 2013. Designing ScratchJr: support for early childhood learning through computer programming. In Proceedings of the 12th International Conference on Interaction Design and Children. ACM, 1–10.
[21]
N Fraser 2013. Blockly: A visual programming editor. Google (2013).
[22]
Google. 2014. Women who choose computer science–what really matters: The critical role of encouragement and exposure. Technical Report.
[23]
Google Inc. and Gallup Inc.2016. Diversity gaps in computer science: Exploring the underrepresentation of girls, blacks and hispanics. https://services.google.com/fh/files/misc/diversity-gaps-in-computer-science-report.pdf.
[24]
Lindsey Ann Gouws, Karen Bradshaw, and Peter Wentworth. 2013. Computational thinking in educational activities: an evaluation of the educational game light-bot. In Proceedings of the 18th ACM conference on innovation and technology in computer science education. 10–15.
[25]
Shuchi Grover and Satabdi Basu. 2017. Measuring student learning in introductory block-based programming: Examining misconceptions of loops, variables, and boolean logic. In Proceedings of the 2017 ACM SIGCSE technical symposium on computer science education. 267–272.
[26]
Shuchi Grover, Roy Pea, and Stephen Cooper. 2015. Designing for deeper learning in a blended computer science course for middle school students. Computer science education 25, 2 (2015), 199–237.
[27]
Mark Guzdial. 2015. Learner-centered design of computing education: Research on computing for everyone. Synthesis Lectures on Human-Centered Informatics 8, 6(2015), 1–165.
[28]
Kathy Hirsh-Pasek, Jennifer M Zosh, Roberta Michnick Golinkoff, James H Gray, Michael B Robb, and Jordy Kaufman. 2015. Putting education in “educational” apps: Lessons from the science of learning. Psychological Science in the Public Interest 16, 1 (2015), 3–34.
[29]
Michael S Horn and Robert JK Jacob. 2007. Tangible programming in the classroom with tern. In CHI’07 extended abstracts on Human factors in computing systems. ACM, 1965–1970.
[30]
Felix Hu, Ariel Zekelman, Michael Horn, and Frances Judd. 2015. Strawbies: explorations in tangible programming. In Proceedings of the 14th International Conference on Interaction Design and Children. ACM, 410–413.
[31]
Caitlin Kelleher, Randy Pausch, and Sara Kiesler. 2007. Storytelling Alice motivates middle school girls to learn computer programming. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1455–1464.
[32]
Varsha Koushik, Darren Guinness, and Shaun K Kane. 2019. StoryBlocks: A Tangible Programming Game To Create Accessible Audio Stories. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 1–12.
[33]
Eleni A Kyza, Yiannis Georgiou, Andria Agesilaou, and Markos Souropetsis. 2021. A Cross-Sectional Study Investigating Primary School Children’s Coding Practices and Computational Thinking Using ScratchJr. Journal of Educational Computing Research(2021), 07356331211027387.
[34]
Michelle H Land. 2013. Full STEAM ahead: The benefits of integrating the arts into STEM. Procedia Computer Science 20 (2013), 547–552.
[35]
Michael J Lee and Amy J Ko. 2015. Comparing the effectiveness of online learning approaches on CS1 learning outcomes. In Proceedings of the eleventh annual international conference on international computing education research. 237–246.
[36]
John Maloney, Mitchel Resnick, Natalie Rusk, Brian Silverman, and Evelyn Eastmond. 2010. The scratch programming language and environment. ACM Transactions on Computing Education (TOCE) 10, 4 (2010), 1–15.
[37]
Lauren E Margulieux, Briana B Morrison, and Adrienne Decker. 2019. Design and pilot testing of subgoal labeled worked examples for five core concepts in CS1. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 548–554.
[38]
Orni Meerbaum-Salant, Michal Armoni, and Mordechai Ben-Ari. 2011. Habits of programming in scratch. In Proceedings of the 16th annual joint conference on Innovation and technology in computer science education. ACM, 168–172.
[39]
Philip Miller, John Pane, Glenn Meter, and Scott Vorthmann. 1994. Evolution of novice programming environments: The structure editors of Carnegie Mellon University. Interactive Learning Environments 4, 2 (1994), 140–158.
[40]
Sten Minör. 1992. Interacting with structure-oriented editors. International Journal of Man-Machine Studies 37, 4 (1992), 399–418.
[41]
Jesús Moreno-León and Gregorio Robles. 2015. Dr. Scratch: A web tool to automatically evaluate Scratch projects. In Proceedings of the workshop in primary and secondary computing education. 132–133.
[42]
Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for how users overcome obstacles in voice user interfaces. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–7.
[43]
John F Pane, Brad A Myers, and Leah B Miller. 2002. Using HCI techniques to design a more usable programming system. In Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments. IEEE, 198–206.
[44]
Seymour Papert. 1980. Mindstorms: Computers, children, and powerful ideas. NY: Basic Books (1980), 255.
[45]
Dylan J Portelance and Marina Umaschi Bers. 2015. Code and Tell: Assessing young children’s learning of computational thinking using peer video interviews with ScratchJr. In Proceedings of the 14th international conference on interaction design and children. 271–274.
[46]
Mitchel Resnick, John Maloney, Andrés Monroy-Hernández, Natalie Rusk, Evelyn Eastmond, Karen Brennan, Amon Millner, Eric Rosenbaum, Jay Silver, Brian Silverman, 2009. Scratch: Programming for all. Commun. ACM 52, 11 (2009), 60–67.
[47]
Kimiko Ryokai, Michael Jongseon Lee, and Jonathan Micah Breitbart. 2009. Children’s storytelling and programming with robotic characters. In Proceedings of the seventh ACM conference on Creativity and Cognition. ACM, 19–28.
[48]
Emmanuel Schanzer, Kathi Fisler, and Shriram Krishnamurthi. 2013. Bootstrap: Going beyond programming in after-school computer science. In SPLASH education symposium.
[49]
Emmanuel Schanzer, Kathi Fisler, Shriram Krishnamurthi, and Matthias Felleisen. 2015. Transferring skills at solving word problems from computing to algebra through Bootstrap. In Proceedings of the 46th ACM Technical symposium on computer science education. 616–621.
[50]
Arash Soleimani, Danielle Herro, and Keith Evan Green. 2019. CyberPLAYce—A tangible, interactive learning tool fostering children’s computational thinking through storytelling. International Journal of Child-Computer Interaction 20 (2019), 9–23.
[51]
Sphero, Inc.n.d. Sphero. https://www.sphero.com.
[52]
Amanda Sullivan, Mollie Elkin, and Marina Umaschi Bers. 2015. KIBO robot demo: engaging young children in programming and engineering. In Proceedings of the 14th international conference on interaction design and children. ACM, 418–421.
[53]
Peeratham Techapalokul. 2017. Sniffing through millions of blocks for bad smells. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 781–782.
[54]
Terrapin. n.d. BeeBot. http://www.terrapinlogo.com.
[55]
Zhuxiaona Wei and James A Landay. 2018. Evaluating speech-based smart devices using new usability heuristics. IEEE Pervasive Computing 17, 2 (2018), 84–96.
[56]
David Weintrop. 2019. Block-based programming in computer science education. Commun. ACM 62, 8 (2019), 22–25.
[57]
David Weintrop, Elham Beheshti, Michael Horn, Kai Orton, Kemi Jona, Laura Trouille, and Uri Wilensky. 2016. Defining computational thinking for mathematics and science classrooms. Journal of Science Education and Technology 25, 1 (2016), 127–147.
[58]
David Weintrop and Uri Wilensky. 2015. To block or not to block, that is the question: students’ perceptions of blocks-based programming. In Proceedings of the 14th international conference on interaction design and children. 199–208.
[59]
David Weintrop and Uri Wilensky. 2017. How block-based languages support novices. Journal of Visual Languages and Sentient Systems 3 (2017), 92–100.
[60]
Jeannette M Wing. 2006. Computational thinking. Commun. ACM 49, 3 (2006), 33–35.
[61]
Jeannette M Wing. 2008. Computational thinking and thinking about computing. Philosophical transactions of the royal society of London A: mathematical, physical and engineering sciences 366, 1881 (2008), 3717–3725.
[62]
Ying Xu, Valery Vigil, Andres S Bustamante, and Mark Warschauer. 2022. “Elinor’s Talking to Me!”: Integrating Conversational AI into Children’s Narrative Science Programming. In CHI Conference on Human Factors in Computing Systems. 1–16.
[63]
Ying Xu and Mark Warschauer. 2019. Young children’s reading and learning with conversational agents. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–8.
[64]
Ying Xu and Mark Warschauer. 2020. Exploring young children’s engagement in joint reading with a conversational agent. In Proceedings of the Interaction Design and Children Conference. 216–228.
[65]
Chao Zhang, Cheng Yao, Jiayi Wu, Weijia Lin, Lijuan Liu, Ge Yan, and Fangtian Ying. 2022. StoryDrawer: A Child-AI Collaborative Drawing System to Support Children’s Creative Visual Storytelling. In CHI Conference on Human Factors in Computing Systems. 1–15.
[66]
Zheng Zhang, Ying Xu, Yanhao Wang, Tongshuang Wu, Bingsheng Yao, Daniel Ritchie, Mo Yu, Dakuo Wang, and Toby Jia-Jun Li. 2021. Building a storytelling conversational agent through parent-AI collaboration. (2021).
[67]
Zheng Zhang, Ying Xu, Yanhao Wang, Bingsheng Yao, Daniel Ritchie, Tongshuang Wu, Mo Yu, Dakuo Wang, and Toby Jia-Jun Li. 2022. StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement. arXiv preprint arXiv:2202.06205(2022).

Cited By

View all
  • (2024)High-Capacity Robots in Early Education: Developing Computational Thinking with a Voice-Controlled Collaborative RobotEducation Sciences10.3390/educsci1408085614:8(856)Online publication date: 7-Aug-2024
  • (2024)DrawTalking: Towards Building Interactive Worlds by Sketching and SpeakingExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651089(1-8)Online publication date: 11-May-2024
  • (2024)Charting Beyond Sight with DataStory: Sensory Substitution and Storytelling in Visual Literacy Education for Visually Impaired ChildrenExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650800(1-8)Online publication date: 11-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
April 2023
14911 pages
ISBN:9781450394215
DOI:10.1145/3544548
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2023

Check for updates

Author Tags

  1. children
  2. computational thinking
  3. multimodal interface
  4. storytelling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,343
  • Downloads (Last 6 weeks)105
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)High-Capacity Robots in Early Education: Developing Computational Thinking with a Voice-Controlled Collaborative RobotEducation Sciences10.3390/educsci1408085614:8(856)Online publication date: 7-Aug-2024
  • (2024)DrawTalking: Towards Building Interactive Worlds by Sketching and SpeakingExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651089(1-8)Online publication date: 11-May-2024
  • (2024)Charting Beyond Sight with DataStory: Sensory Substitution and Storytelling in Visual Literacy Education for Visually Impaired ChildrenExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650800(1-8)Online publication date: 11-May-2024
  • (2024)"When He Feels Cold, He Goes to the Seahorse"—Blending Generative AI into Multimaterial Storymaking for Family Expressive Arts TherapyProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642852(1-21)Online publication date: 11-May-2024
  • (2024)ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642229(1-19)Online publication date: 11-May-2024
  • (2024)Teaching artificial intelligence in extracurricular contexts through narrative-based learnersourcingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642198(1-28)Online publication date: 11-May-2024
  • (2024)SEAM-EZ: Simplifying Stateful Analytics through Visual ProgrammingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642055(1-23)Online publication date: 11-May-2024
  • (2023)Storyfier: Exploring Vocabulary Learning Support with Text Generation ModelsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606786(1-16)Online publication date: 29-Oct-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media