Keywords

1 Introduction

Wizard of Oz (WOz) is a well-known method in human-centered design, human factors, and other fields for exploring user interfaces in complex systems (Dow et al. 2005a). The perhaps earliest uses of the WOz approach include a natural language understanding system (Bobrow et al. 1983) and a simulated listening typewriter for speech recognition (Gould et al. 1983), although the term ‘Wizard of Oz’ was not coined until the 1980s (Kelley 1984). When WOz is used, users are usually lead to believe they interact with a fully functioning system, while in reality the system is controlled by a human, a ‘wizard’. The WOz method is powerful in iterative design processes for evaluation and testing of concepts and designs before completing a whole system. This method can help designers not to get locked into a particular design, and for evaluating a design before investing too much time in the development of a prototype, with the benefit of saving time and money (Dow et al. 2005a, 2005b).

WOz studies have been in many different contexts, with technology ranging from lo-fi to hi-fi prototyping. In Human-robot interaction research WOz is commonly used since robots cannot interact in socially appropriate ways, and there is even reporting guidelines aimed at rigour in experiments and reporting (Riek 2012). In the context of mixed reality and games, it seems the WOz has not been used extensively, although there is a number of reportings. To provide some examples, WOz was used in a study on “The voices of Oakland”, an audio-based tour based on a historic cemetery in Atlanta, with focus on story telling (Dow et al. 2005b). Paelke and Sester (2010) explored augmented paper maps, an integration of paper maps and mobile devices. Paavilainen (2008) describes the evaluation of implementing mobile use context in a multi-player casual quiz game, while Bernhaupt et al. (2007) explored “Capture the Flag”, a location-based multi-player game for mobile devices, using GPS positioning of game players. In a study by Vahdat et al. (2013) the WOz method was used for exploring a serious game for collaborative learning on city traffic rules and signs. Marco et al. (2012) designed a farmer game with tabletop toys combined with a virtual farm for young children, and Höysniemi et al. (2004) designed a computer vision based action game for children, both using the WOz method. These studies point out different merits of the WOz method and its usefulness, for instance, concept evaluation, flexibility and cost effectiveness.

Yet, despite the benefits of using WOz, there are some limitations. One of them, as pointed out by Dow et al. (2005a, p.18), is the “effort required to engineer a successful WOz interface and integrate it with an incomplete system”. Höysniemi and Read (2005) point other potentially problematic methodological issues, summarised in three categories. The first concerns technology, for instance, designing systems that cannot be realised. The second concerns the method’s deceptive nature, which can lead to unethical research since, for instance, children may not understand what they are participating in. The third category is wizard problems, that is, their capability and effect on a study. We will return to limitations and ethics in the last section of this paper.

In this paper we describe the use of the WOz method during the iterative development of a hi-fi game prototype. The game is a multi-player mixed reality game, aimed to raise children’s online risk awareness. The WOz method was used for evaluation and exploration of the game concept and some of the game mechanics that needed to be integrated and synchronised. The work described in this paper is part of a larger project that also included basic research on online interactions between children and adults, and later on, game evaluations (Susi and Torstensson 2019; Susi et al. 2019).

The next sections describe the game more in detail and the WOz setup. The following sections describe the exploration of four main in-game mechanics and the WOz-techniques used in this project. The paper concludes with a discussion, including the benefits and limitations of the Woz-approach, and ethical aspects.

2 A Multi-player Mixed Reality Game

Hidden in the Park.

(Parkgömmet© in Swedish, The Change Attitude Foundation 2019) is a multi-player mixed reality game intended for children 8–10 years old. It is a hide and seek adventure game for 2–4 players, where the goal is to find another player’s hidden treasure to win the game. The game design also creates a shared game experience, which provides a social dimension of interaction, competition, and discussion about different game events. The game includes analogue and digital game components. It comprises a cardboard tabletop game board with squares along the outer edges, where game pieces are moved forward, similar to Monopoly. The tabletop game board also has AR Tags to support augmented reality. There is a tablet computer equipped with AR-technology, which allows players to view the tabletop game board as a three dimensional animated world, in which treasures are hidden by pointing and touching the screen. As a player hides a treasure, she gets a set of four clue cards to the hiding place in return. The tablet contains a digital dice and after each roll the player moves her physical game piece ahead on the tabletop game board. At the same time, the tabletop gameboard is represented on the tablet, where a digital version of the player’s game piece moves along with each roll of the dice, to help players keep track of their correct positions. The game includes digital mini-games and surprise elements that provide an extra dimension of fun game events. Besides being a fun game, the game’s underlying purpose is to raise young childrens’ risk awareness during online interactions. For this end, the aforementioned clue cards are considered as personal information that should be minded to avoid them being exposed to the other players.

The game design, however, also contains four deceptive unknown characters (UCs), one for each player, that communicate with the players through text messages (SMS) that appear on the tablet. These UCs try to lure players to take pictures of their clue cards, using the tablet’s camera and send them in return. The UCs use different strategies aimed at gaining information (pictures of clue cards) from the players, and if they succeed they will expose the information at some point. When a player’s clue cards are exposed the other players may find her hidden treasure. Players can choose whether or not to take a picture, but if they do take a picture, they will also face the consequences of having complied; the sudden insight of having been deceived as the information is exposed to all other players. The strategies implemented in the game’s UCs are based on research on real online interactions, where adults try to gain personal information from young people (cf. Susi and Torstensson 2019; Susi et al. 2019). We chose four main strategies that were transferred to and implemented in the game; flattery, bribes, coercion, and threats. Regardless wether players take pictures or not, clue cards have to be revealed at some point, to progress the game play. When clues are revealed the players get the opportunity to look for a hidden treasure by viewing the 3D view on the tablet, and when a treasure is found, the game ends. The game events then create a basis for a pedagogical follow-up discussion centered on online risk awareness.

3 The WOz Setup

For an overview of the setup of our study, we use the taxonomy formulated by Höysniemi and Read (2005), which includes a number of points of variability and the setup for each such point. The taxonomy is here summarised as a table, in which the first two columns include points of variability and the setup for each point (Table 1). The third column adds information about the setup of our study. In this paper we mainly focus on the first points of variability, that is, the technology and wizard interventions. Further details are provided in the following sections.

Table 1. Taxonomy of Wizard of Oz studies (Höysniemi and Read 2005), with the setup of the present study in the third column.

The WOz sessions were set up with children playing the game, researchers and the wizard all in the same room (Fig. 1). There were eight children who participated by playing the game in groups of 3–4 players in different constellations, on five different occasions. Written consent for participation was gathered from both the children themselves and their parents. The researchers included a wizard, two observers and a game master who provided instructions. The only instructions provided were that each player had to hide a treasure using the tablet, and that the tablet would tell them what to do. Since all people were in the same room, the participants could see the wizard but they did not initially know he was wizarding and thereby controlling the game.

The children were not informed about the wizard setup, until after the third play session (after that all participants were informed about the wizard). Interestingly, the children’s game play was not visibly affected once they knew there was a wizard; they were absorbed by the game play and the game’s competitive elements, and they fully ignored the wizard. At the time, the game was a fully functional but not completed prototype. The wizard could control most of the game mechanics in real-time and modify computer responses to players’ actions.

Fig. 1.
figure 1

The WOz setup with participants and researchers/observers on the left hand, and the wizard and controls on the right hand.

Both the tabletop game board and its digital representation on the tablet had rudimentary graphics and symbols (Fig. 2). The game board layout and the graphics were subject to changes and further refinement as a consequence of the WOz approach. Changes were continously implemented in several iterations of the game during the 10-week period of WOz-testing.

Fig. 2.
figure 2

A group of children playing an early version of the game prototype with rudimentary graphics and symbols. (©Niklas Torstensson). (Color figure online)

4 The Wizard at Work

This chapter describes the wizard’s role in the development process, the tools developed for the task, the nature of the interventions made by the wizard and how this affected the game development process.

4.1 The Technology and the Wizard’s Role

The game prototype and the technology for wizarding was setup by a research engineer, highly skilled in programming and system development. A basic game prototype was developed in Unity 5 (Unity Technologies 2015), with all the required game mechanics. The game was then installed on a computer tablet (Samsung Galaxy Note 10.1 2014 Edition). The wizard system was implemented as a local web server, facilitated by a WampServer (a software stack for Windows, including the Apache web server, MySQL database, and PHP programming language). The wizard’s browser based graphical interface was programmed in PHP and javascript. The wizard’s computer and the game computer tablet had a WLAN connection to the server for data exchange.

There was a number of interconnected processes that needed to be implemented in order to make the game work as intended, and to make them work in parallel for two, three or four players. Also, to find a balance between fun and the game’s more serious intention was in itself another challenge. With the uncertainty of how to fit everything within one coherent game concept, the WOz approach was the wiser option during the prototyping stage, which went from an initial prototype with basic coding, to a finalised fully coded prototype.

During this process, the wizard assumed different roles during the exploration and evaluation of the game mechanics. As described by Dow et al. (2005a) the wizard’s role can change, from a controller to a moderator, and then to a supervisor. As a controller, the wizard fully simulates an unbuilt system component whereas the supervisor instead oversees a working system, with the possibility of overriding system or user decisions. The third role, the moderator, lies somewhere between the former two. The moderator can e.g., intercept output from a working system component before it is sent as input to the rest of the system. In our study, the wizard assumed all these roles as the game development progressed, but not in a linear way. Instead, the wizard’s role continously shifted back and forth depending on which part of the game system that needed wizarding. Initially, no game event processes were pre-programmed in the game system. Instances like what number would come up with the rolling of the digital dice, how often to get a chance to play a mini-game, and monetary gains and losses were individually coded, and their appearance was randomised. After a few rounds of game play, the wizard took control over game responses to player interactions. On an overall level, there were five iterations of the game, since it was further developed between each of the five game sessions where the children played the game. However, there were also on the fly iterations during the actual game play, as the wizard implemented changes that came into effect during the next round of rolling the dice.

Gradually, as the game mechanics, processes and game progression fell into place, the wizard’s manual operations were coded into sets of options and sets of sequences to choose from. The wizard could however, still override the coding and choose different responses. Finally, when all parts were balanced the coding was completed. The whole process of exploration and testing with the WOz method lasted 10 weeks.

4.2 The Wizard’s Graphical Interface

For the WOz method to work, and as a tool of control, a graphical interface (GUI) for wizard intervention was created. Because of the real-time setting and the need for quick responses, this GUI has to be detailed, but at the same time very clear and easy to survey. The wizard’s GUI shows a representation of the state of the game – present, past and future – for all players. Each player is assigned a colour; red, yellow, blue, or green. The controls in the GUI are colour-mapped accordingly. In the game, players’ actions are interconnected, but each player is individually represented on the Wizard’s GUI (Fig. 3).

Fig. 3.
figure 3

The wizard’s GUI where each player is represented individually (©Mikael Lebram). (Color figure online)

The wizard can see the over-all state of the game through the GUI and, for each players, a representation of the tabletop gameboard and the player’s position, and the state of their clue cards (revealed or not) (Fig. 4).

Fig. 4.
figure 4

The wizard’s GUI for player yellow (©Mikael Lebram). (Color figure online)

In closer detail, Fig. 5a shows an event and text message log, and below the log is a representation of the tabletop gameboard as little squares with yellow player’s current position, and clue cards. There is also information about the treasure hiding place, and number of coins. In Fig. 5b we se see the flow of text messages and player responses, while Fig. 5c shows the wizard’s options for input to the system, e.g., messages using free writing, send a yes/no question, send a photo request, or make player gain or lose coins. The wizard can oversee and override decisions made by the system and instead try alternative scenarios.

Fig. 5.
figure 5

a–c. The wizard’s GUI with details of a player yellow (©Mikael Lebram). (Color figure online)

Another important feature that proved valuable for analysis, development and follow-up is that the Wizard application saves a complete log of every game round when the tablet has been connected to the server. This makes it possible to both back-track problematic situations and to get statistical data from the different game rounds. Being able to analyse and understand game events in retrospect proved a vital part of the development process. The game logs show, for instance, the time spent on different game events and how many coins a player had at an instance when a bribe had no effect.

5 Exploration and Evaluation of Game Mechanics

There were a number of interconnected processes that needed to be implemented in order to make the game work as intended. The focal point, for the game to fulfil its main purpose of enhancing the players’ risk awareness, is that at least one of the players actually fall into the trap of taking and sending a picture of a clue card, so that the picture/clue gets revealed and the hidden treasure is found. There had to be a progression, causing game events to unfold, to drive the game play forward so the game would come to an end. The SMSs, prompting players to make choices, like sending pictures of clue cards, must fit in the on-going game play context. There was also a need for a monetary system, partly with the function of rewarding players, but more importantly as an incentive to make players comply with requests in SMSs in order to acquire more coins. Furthermore, if no player would agree to take a picture, it would still be necessary to somehow reveal clues to push the game play forward. Yet another dimension to consider was the time it would take to play the game. Since the game is intended for elementary school settings there needed to be a time constraint for game play that fits well into an ordinary school day in Swedish elementary schools. Lessons in schools are typically 40 min, and ideally the game play should fit within one lesson.

The major game mechanics we focused on were 1) progression of game play, 2) timing and sequences of text messages intended to lure players to reveal clues to the hiding place of a treasure, 3) the appropriate number of coins to motivate players to take actions to gain more coins, and 4) when and how clues should be exposed, to forward game progression so the game would end within a certain time frame. These four aspects are described in the following subsections.

5.1 Progression of Game Play

The first aspect to consider was progression of game play. Depending on which squares on the tabletop gameboard the players reach, the outcome can vary considerably. The game is event-based, and every square on the gameboard is assigned to a certain event that concerns either the player who ends up on that square, or it may concern all the players. Examples of such events are to re-roll the dice and move forward or backward, to lose or gain coins, to play a mini-game or to use the tablet to visually search for a treasure. It was important to find a proper balance between mere fun (e.g., to play mini-games), and progression from a revealed clue to a hiding place, to the state of experiencing the consequences of having revealed too much information.

Initially, the rolling of the dice was not controlled, but programmed to be random. This led to an overrepresentation of mini-games and gaining of coins, long game play times, and it hampered the desired progression leading to the escalation of events and, finally, bringing the game to an end.

The wizard software was then designed to allow control of the game’s progression by manipulating the dice, to decide what kind of a game event square a player would reach. The wizard could also use the interface to see and manipulate queued and upcoming game events from a dynamic list of game events. By this level of control, the wizard could influence the game play progression and find a balance between the level of fun and game play progression.

5.2 Timing and Sequences of SMS

The second aspect to consider was the timing and sequences of the text messages that intentionally should lure players to reveal clues to the hiding places of their treasures. The unknown characters’ SMSs represent different strategies to gain access to players’ clue cards; flattery, bribes, coercion, and eventually threats. These strategies constitute four of the behaviours found in the underlying research on online interactions between adults and children (Susi and Torstensson 2019). The first three types of SMSs appear in various orders, but once a player has taken and sent a picture the UC gains a leverage and escalates the process to threats, as in “If you don’t send another one I’ll show your previous picture to everyone”. Since every player has four clue cards, they can comply and send more pictures, but they do not initially know why the UC asks for pictures or what he or she will do with them.

The challenge was to set the timing of SMSs correctly in relation to game events but also in relation to the overall progression of the game play. The SMSs needed to appear occasionally but at the right time for each player. The messages also had to be related to what had just taken place. For instance, if a player scores well in a mini-game, there could be an SMS saying “You’re doing well!”, but it could not be followed by a coercive message saying “please, just one more” (begging for a another picture) since that would not fit the context and it would be confusing and meaningless to the player.

There were some issues with the SMSs as they appeared too rarely and their timing with other events was off. For instance, messages were not synchronised with the frequency of different kinds of events. There were also too many rounds of play without any messages at all, and players gained too much coins. The SMSs with bribes and coercion were essential, for instances to bribe a player with coins for a mini-game, and for a picture in return. In case a player declines the bribe, the next step should be to coerce the player to agree. However, it became clear that when players had a lot of coins neither bribes nor coercion had any effect, which stalled the game play progression. Having noticed these issues, the wizard began to control what SMSs should appear, and when they should appear. He could override the system and send a certain pre-coded SMS, or write messages of his own choice. The WOz-approach made it possible to experiment with these techniques in order to find the most efficient way of timing the messages without actually implementing them in code.

5.3 The Monetary System

The third aspect concerned the appropriate number of coins to gain or lose, to motivate players to take actions to gain some more. Coins are used e.g., to pay for mini-games, so coins should provide enough incentive to get players to take pictures of their clue cards, in order to gain more coins. For that to work, there has to be a proper amount of coins; with too many coins there is no need to agree with bribing, while having no coins would exclude a player from taking part in anything that has to be paid for, and the player could fall outside the socially shared game experience.

Initially, actual physical metal coins were introduced in the game, but that soon proved to be one element too much to handle. The coins drew too much attention away from the game play, and it was also hard to keep the coins in place on the table since the young players tended to move around the table. Yet, this gave a clue to the amount of coins or currency needed in the game, and the amount turned out to be 0–6 coins, which was less than initially estimated. The physical coins were replaced with virtual coins in the game system, and by controlling the dice, the wizard was also able to control whether a player would lose or gain coins. Gains and losses were decided in relation to the amount of coins a player had at the time. The amount of money also had to be set in relation to SMSs – if a player had little or no money she should be asked to take a picture, be offered a bribe, and be coerced to accept. If a player instead had many coins she should lose some, to set the stage for bribes etc. The coins turned out to provide a competitive element to the game where the main attraction lied in whether a player had more or less coins than another player, rather than the exact number of coins someone actually had. By altering gains and losses, and setting them in relation to the proper messages, the monetary system could be set in balance.

5.4 Exposure of Clue Cards

The fourth aspect concerned the exposure of clue cards to forward game progression and end the game within a certain time frame. The main point with the clue cards is that they represent where a player has hidden her treasure, so it is comparable to personal information that should not be revealed. At the same time, their exposure leads to someone winning the game. Assuming that the flattery, bribes etc. would work, players would take pictures. When a picture has been taken, the UC should request more pictures and coerce the player to comply, and also threaten to reveal a previous picture unless the player sends another one. This process is a mimicking of real world online events.

The clue cards contain symbols, like flags, grass, and barrels. The symbols on the physical clue cards are black and white and each player has four cards with different symbols. There is no need to hide the cards since they all look pretty much the same. The game system however, assigns the symbols with different colours. Once a clue card is exposed (shown on the tablet), the players see e.g., that there is green grass where player red has hidden her treasure. When a player gets the opportunity to look for red player’s treasure in the 3D-view of the gameboard, the player has to find green grass. When two or more clues have been revealed, the players need to find the place where all the exposed clues are gathered. When a player finds the right spot she wins the game. In essence then, players want to see the other players’ clues but would not want to expose their own. But, here the UCs enter the scene and lure players to take pictures and send them in return. What the players do not know is that the UCs will deceive them, and at some point expose the pictures to all players.

To get players to take pictures, the wizard had to make the players lose money, as described in the previous section. The players were very eager to play mini-games and even though there are five different mini-games, all players tended to choose the same one; they competed for the highest score and with a good score they also gained coins. With little or no money the players received messages saying for instance, “Would you like to get two coins?”. If the players responded with a ‘yes’, the next message would be “Then take a picture”.

Initially there was no control of the sequence of SMSs, except that threats could not appear until a player had taken a picture. Hence there was no control of when an UC should request pictures of clue cards or when clues should be exposed, which meant that the game did not progress as expected, and nothing much happened except rolling the dice and moving game pieces. The fact that the text message sequences were just random meant that requests for pictures of clue cards appeared solely on chance. At the same time, players could be in possession of so many coins that it rendered any attempt of a bribe useless. This seriously affected the game play time, and play-rounds simply took too long. The wizard had to take control by overriding the game system and control the number of coins a player would have and decide when to send which SMS. By exploring different combinations and sequences of game events, the wizard could intertwine different game events into o coherent game system.

We also had to find a clever way of exposing clues in case no player actually would agree to take any picture. The solution became to make a message appear saying “Oh no, someone saw you hide your treasures and will reveal a clue for each player”, and one clue for each player would be shown on the tablet. This mechanism of ‘automated exposure’ is triggered if a certain time of game play has elapsed with no pictures taken. Players may also become more wary of taking any pictures once they see that the pictures can be exposed, which could lead to very long play time. In that case, the same time trigger mechanism comes into effect to forward the game progression.

When the monetary system, the SMSs, and incentives for taking pictures and the following threats of exposure were set in balance, the game progression and game play time also fell in place. With all processes properly balanced the play time is 20–45 min. This is suitable for Swedish elementary schools where a lesson usually lasts 40 min.

Once all the game elements were balanced and synchronised the prototype was coded and finalised.

6 Discussion

To summarise this WOz study, there are four main processes at the core of the game, that have to work in unison to make the game function as intended; the progression of game play, the timing and sequencing of SMS, the monetary system and lastly the exposure of the clue cards. It was a complex process to interweave all the game events, and to synchronise them for each player within one and the same game context. We found the WOz method highly appropriate for creating the integration of the different processes, but the method also has inherent limitations and ethical issues.

Höysniemi and Read (2005) raise some interesting concerns regarding WOz, for instance, that the method in itself is deceptive which may lead to unethical research, and that it is questionable whether children have the ability to give informed consent to participate in WOz experiments. First of all, we did not conduct experiments but instead explored and evaluated the design concept. The participating children gave their consent to participate, although it is near impossible to say what their understanding of the situation was, except that they would play a new game that was being developed. The children’s parents were informed about the study and they also gave their consent for their children’s participation. We did not find the issue of consent to be problematic. We would instead rather agree with Marco et al. (2012, p.159), that it is “important to remember that children are not really ‘testing’ our prototypes; they are in fact playing, and they will only do so for fun” (Marco et al. 2012). Our impression is that the children played for fun and nothing else. We did not want to affect the game play by revealing the setup in advance, so the children were told only after a while that the wizard had been controlling parts of the game play. The fact that they had been subjected to the ‘deceptiveness’ of the WOz approach did not trigger any negative reactions. On the contrary, they described the experience as “cool” and they were excited to be the first players/end users of a new game.

Höysniemi and Read (2005) also discuss some pitfalls when the users in WOz experiments are children. One such is developmental stage effects when using technology, difficulties with reading and “differing ability to understand the setup or deception” (p.3). Understanding can certainly be a problem, but the game discussed here is specifically adapted to a level of understanding that can be expected at the ages of 8–10 years. Adaptations include for instance, reading, instructions, and interaction styles. The WOz setup, or deception, was no serious issue, as discussed above. A more serious and important issue is the dimension of real world deception, which was here transformed into game mechanics, and the game’s design that intentionally deceives players by bribing them to give up their clue cards. Is it fair game to do so? We argue it is. The realisation that maybe an ‘unknown character’ is not who she or he may seem to be provides an opportunity for learning about risk awareness. This is learning not just ‘in theory’, but through first-hand experience of deception and potential consequences of sharing too much personal information, and all this under safe off-line conditions. Related to this is also the subject of exposing clue cards even when no one has taken a picture during game play. The truth is that in reality, pictures can easily be shared without the photographed subject’s knowledge.

One of the major challenges with the WOz method was the obvious need for high technical competence, which has been pointed out before. As such competence was available in-house, it was possible to explore different solutions without spending resources on coding and, possibly, an elaborate prototype that would have to be discarded. Even though testing is time consuming, it proved a considerable advantage to be able to let the wizard switch between the different roles of controller, moderator, and supervisor, and control different parts of the game before final coding. Another advantage was the many iterations that could take place, both during game play and between play sessions.

In sum, the need for both technical and programming competence can be a highly limiting factor in the use of the WOz approach, but we found the method highly useful in the process of developing a fully functional prototype with intricate game mechanics. This approach allowed us to explore the game concept, to save time and costs, and it led to important improvements before completion of the game prototype. The prototype was then finalised into a complete and distributable game by a game development company. The finalised game is now distributed free of charge to all ca 5000 Swedish elementary schools (from 2019 and onwards).