Keywords

1 Introduction

StarCraft II (SC2) is a real-time strategy game based on two opposing teams competing with each other to destroy all of their respective opponent’s buildings. SC2 is played with fog of war meaning that each player can only see the units and buildings that belong to them and up to their units’ level of visibility.

The major activities of SC2 can be grouped into four categories: resource gathering, attacking and defending, scouting, and deciding what to build and how to manage resources. At the start of each game, each player is given 1 home base and 12 worker units to collect resources with. Players use worker units to gather Vespene Gas from Vespene Geysers and Minerals from Mineral Patches on the map. Resources play an important role in the game as units have different Vespene and Mineral costs. After acquiring some resources, players can decide whether they want to implement lengthy, economic styles of game play or shorter, aggressive styles such as a unit rush. Unit rushes can be very effective against inexperienced players. However, seasoned players who aren’t so easily defeated typically require a more economic strategy where players scout out enemy bases, carefully choose when and where to engage with their enemy, and manage their resources wisely.

With the recent advancements in SC2, AI agents outrank 99.8% of officially ranked players [7]. Rather than trying to create a better AI agent, this paper intends to create human-autonomy teaming aids that enhance a player’s situational awareness and performance by expanding on a modular deep learning architecture for SC2 and applying visualization techniques and principles.

2 Background and Related Work

The inspiration of this work is previous research on using a flexible modular architecture for sharing decision responsibility among multiple agents [6]. In the previous work, each agent became their own module that had their output distinctly separated from the others, making a suite of battle management HAT aids nearly a one-to-one mapping to these agents. The proposed architecture had 5 key modules with the following responsibilities and designs:

  1. 1.

    Worker Management: Responsible for ensuring that resources are gathered at maximum efficiency; this module is scripted.

  2. 2.

    Build Order: Determines what unit or building to produce; this module uses a fully connected network.

  3. 3.

    Tactics: Determines where to send the army; this module uses a fully convolutional network.

  4. 4.

    Micromanagement: Manages units to destroy more opposing units; this module is scripted.

  5. 5.

    Scouting: Sends scouts for tracking opponent information; this module is scripted and uses a recurrent neural network.

Previous research has also been done on applying visualization techniques and principles to SC2. One study [4] focused on visualizing SC2 replays rather than visualizing the game in real-time to determine if they could teach players critical aspects of the game a posteriori. Because the focus of their study was on human interactions with games, the visualization systems were built to maintain immersion but were overloaded with information. While the systems were good at supporting the analysis of professional replays, the amount of information presented would easily overwhelm someone playing the game in real-time. This paper intends to maintain immersion for players without overwhelming them by removing non-essential content and visually representing essential content in an easy-to-digest manner.

3 Modular Architecture

There are three playable races in SC2. Each race comes with its own set of advantages and disadvantages as well as its own units and buildings. However, in order to reduce the complexity, this work will only focus on the Protoss race. To further reduce the complexity and stochasticity of the AI agent’s environment, this work focuses solely on Protoss versus Protoss match-ups on the Ascension to Aiur map (Fig. 1).

Fig. 1.
figure 1

The Ascension to Aiur map is shown above. At the beginning of the game, one player will start in the upper left hand corner while their opponent starts in the lower right hand corner, opposite of them. The two starting bases are marked by green squares with circles in them. (Color figure online)

From the set of five modules that were mentioned in previous work, the work in this paper expands the Build Order, Tactics, and Micromanagement modules and also introduces a Tactical Visualization module.

3.1 Build Order Agent

In SC2, the build order is the pattern of production aimed at achieving a certain goal. Build order is determined by several things such as current unit and building count, current capabilities, enemy capabilities, and resources at hand. From a command and control (C2) standpoint, this problem is akin to resource management, logistic and operational planning, and course-of-action recommendations. The approach taken for the build order module is based on work done in StarCraft: Brood War (the predecessor to SC2) [5]. In the aforementioned work, deep learning was used as opposed to goal-based AI due to its ability to adapt to the opponent at different states of the game.

This work presents a fully connected network that has 4 layers of rectified linear unit activation and 1 softmax output layer. The input layer to this network is a game state vector from a player’s point-of-view, consisting of their active units, enemy units observed, technology depth/upgrades, and the state of their current resources. Table 1 below describes the input vector with greater detail. The output of the softmax layer is 1 of 64 potential outputs that represents what unit, building, or upgrade should be built next.

Table 1. Build order inputs

The fully connected network was trained via supervised learning with replays from the winner’s perspective of the match. By training the network to predict what unit should be built next based on a winner’s build, we bias the agent to learn only winning build orders. Table 2 below shows the comparison of our build order agent’s results to Justesen and Risi’s work on StarCraft: Brood War. Our model organizes outputs by their probability of being correct; the top-1 choice of the model represents what the model thinks is the most probable output. Following this idea, the top-3 choice represents the model’s 3 most probable outputs. The top-1 error rate in Table 2 represents the model’s average percentage of error based on whether or not the true label of a test case matched the top-1 choice of our model. Likewise, the top-3 error rate in Table 2 represents the model’s average percentage of error based on whether or not the true label of a test case was one of the model’s top-3 choices. Because some versions of the same model can have slightly better accuracy than others, the results shown in Table 2 are from the best performing model.

Table 2. Build order agent results

With statistically accurate results compared to previous work based on professional player actions, the build order agent is a prominent battle management aid that achieves its preliminary goal of mimicking human player builds. Future work for the build order agent includes increasing accuracy and exploring builds through deep reinforcement learning. User-centered future work for this module includes improving the content and presentation of information to users (e.g. remove extraneous content, visualize content with graphs) and adjusting the quantity and timing of suggestions to optimize reaction time.

3.2 Tactical Visualization

As previously mentioned, SC2 is played with fog of war, which means the map is only partially observable, and players are limited to knowing only their own units’ and buildings’ locations. This partial observability adds a layer of complexity to the game due to high levels of uncertainty in a player’s situational awareness. To reduce the level of uncertainty for users, the tactical visualization module seeks to visually display predicted enemy unit locations and densities.

Fig. 2.
figure 2

This figure shows the frequency of game lengths pulled from the data used in this study.

To better understand the raw data that was collected from players’ replays, preliminary analytics such as hard-coded heatmaps and histograms were used to find appropriate features and game timestamps. Outlaying trials were eliminated from the training and test data to streamline the process and to control for excessive zero-buffering that would skew experimental results. Varied game lengths (ranging from 1 min–27 min in this case) mean that agents cannot simply obtain an overall average at different phases throughout a game to determine an enemy’s position. Different game lengths often result from different player styles and different quantities of units. To account for different player styles, this work trains on a wide range of game lengths. Histogramming, as seen in Fig. 2, illustrates one of the underlying issues in framing the analysis of tactics and strategy in this game. Using this graph to visualize the lengths of each game, data was extracted to maximize the information available to train the neural nets with 1248 games–encompassing most of the initial curve ranging from 90 s to 590 s. Each time-segment from 90 s to 590 s had greater than 20 trials which was our threshold for using the data in our experiment. The remaining game data was not used in this experiment because there were too few game trials that lasted long enough to justify incorporating them into the training data.

At the outset of each game, it is nearly impossible to anticipate the strategy that players will use or the length of the match. Therefore, a hard-coded algorithm will struggle with the accuracy of anticipating the quantities and placement of an adversary on a map. This problem is mitigated by using a smaller map, like Ascension to Aiur. Games on smaller maps tend to finish more quickly and encourage the agents to choose strategies that focus on winning as soon as possible.

[3] poses a more resilient solution to the above issue by implementing a convolutional Long Short Term Memory (LSTM) autoencoder architecture to “defog” the map. An LSTM is a type of Recurrent Neural Network (RNN) which, in this case, takes the player’s observations as inputs. The “hidden state” of the LSTM is the defogged game which shows the enemy’s position at the time of the observation. Using the player’s observations, the neural net is trained with the hidden state as a desired output. After training and validation, the net will thereby learn to defog the game by predicting where the enemy lies. We intend to use the same encoder-decoder architecture as the [3] Brood War approach. The decoder then provides an output layer of predictions that range from unit types, number, and locations.

As implemented in StarCraft Broodwar, [3] has shown promise in predicting enemy positions. Inspired by these methods and their success, we have used the Keras API to implement three different architectures to compare their accuracies. All three architectures use an encoder-decoder setup with four stacked layers. The first and last modules of each architecture have the same number of filters, and the middle two modules contain one-half of those filters. Autoencoders set up with this kind of architecture are typically used to reduce the noise of inputs which we hope will increase the likelihood of correct defogging predictions.

The first architecture we implemented focuses on the temporal aspect of the data by using a LSTM. Using LSTMs exclusively has the potential downside of reducing the spatial data to one dimension before inputting it to the architecture. However, the sequential nature of game play in SC2 lends itself to LSTM architectures.

The second architecture we implemented focuses on retaining some of the spatial data by using stacked convolutional filters. Using the same approach as [3], we also step down the total resolution of the game space into one \(16\times 16\) field. This size helps the user conceptualize quadrants and sub-quadrants of proposed defogged enemy positions and it simplifies the input data. Within this space, the 256 squares are then grouped into three channels representing probes, troops, and buildings. This input setup helps both players and the AI agent discern the different types of enemy units occupying different areas on the map.

The last architecture incorporates both the spatial and temporal aspects of our game data. Using Keras’s module, ConvLSTM2D, this architecture combines the convolutional neural nets with the LSTM algorithm. A very similar architecture was used in [1] where the authors applied unsupervised and semi-supervised learning to detect anomalous frames in video surveillance data. It is likely that this architecture will perform the best because it integrates all dimensions of the input data without reshaping it. By retaining the spatial and time data, this neural net is expected to discover certain patterns that are otherwise obscured when matrices are overly processed prior to their input into a network.

Once the above architectures are implemented, further work can be conducted on larger maps to study the resiliency of our approach. The map used in this study, for instance, begins with players at specific locations on the field. This effectively biases the data of the game by taking out some of the stochasticity present in other maps.

Fig. 3.
figure 3

After a neural net predicts the position of an adversary, a heatmap is generated with the position of the player included. The figure above shows an example of the output of the neural net. (Color figure online)

Outputs of the neural nets are used to build heatmaps which illustrate the statistical probability of enemy locations. Figure 3 depicts an example of neural net outputs where green represents probes, red represents units that can attack, and blue represents buildings. If a heatmap overlay is more transparent, there are less units occupying that part of the map. The work accomplished by visualizing SC2 [4] used heatmap information to build tactics directing where a player should be moving units based on enemy unit locations. The heatmaps produced by the tactical visualization module can provide key graphics to aid the player in partially observable situations while also directing a player’s army placement.

3.3 Micromanagement and Scouting Agent

In the original modular architecture, the micromanagement module and scouting module were scripted [6]. Currently, they remain scripted in this work due to our limited time and resources. The automation of both scouting and micromanagement can be enabled and disabled during a game. The micromanagement module’s main goal is to optimize the damage dealt to enemy units by rearranging friendly units’ positions and attacking maneuvers. For example, a scripted action called focus fire involves searching for the enemy unit with the lowest amount of health remaining and dedicating the entire army’s attack on that singular enemy.

The scouting agent’s goal is to search the map for enemy units, buildings, and bases. Because SC2 is played with fog of war, and therefore enemy locations are unknown, scouting is essential to the goal of the game which is to destroy all enemy buildings. Scouting also plays a prominent role in determining what types of units to build. For example, if a player knew their enemy’s army consisted mostly of air units, then the player would build the appropriate air unit counters.

Future work for both agents include using reinforcement learning techniques to optimize these strategies; however, improvements are still being tested because of the complexity of the problem.

4 Human-Machine Interface

The human-machine interface (HMI) we created is a web-based interface that houses the build order module, tactical visualization module, and micromanagement and scouting automation modules. To maintain player immersion while still allowing module recommendations to stand out, user-centered design considerations were applied to the HMI. For example, similar content was grouped into panels, and the panels were clearly separated from one another on the screen so players could easily find relevant information. Likewise, a color scheme similar to the color scheme used in SC2 was selected and used consistently throughout the HMI in order to maintain immersion while offering easily digestible information.

Figure 4 is a screenshot of the current web-based HMI design. The left side of the screen houses toggle switches to turn on and off the scouting and micromanagement automation and a text box to display hints about the tool or game state to players. In the middle of the screen, the minimap that is presented will display the tactical visualization module’s predictions of where enemy units and buildings are on the map based on the current state of the game. Because the visualization module’s outputs suggest the density and locations of enemy units, overlays of different colors will be placed on the minimap. Lastly, on the right side of the screen, the top 4 build order recommendations are listed with the most recommended unit at the top of the screen. Because the build order module outputs based on probabilities, the probability that a winning strategy would choose to build that recommended unit is also provided under the unit’s or building’s name.

Fig. 4.
figure 4

The human-machine interface that houses the modules and automation features.

Future work for the HMI includes conducting usability tests and adding supportive build functionality to the build order’s recommendations. Usability testing will help ensure that key information is conveyed to players and determine if additional information is needed. Build functionality will allow players to auto-build recommended units by simply clicking on the build order’s recommendations. Similar to the auto-build functionality given by toggle switches in the HMI, the auto-build for build order recommendations would allow units to be built and placed automatically by the system to help automate lower level work and free up a player’s attention and mental resources for higher level decisions.

5 Experiment Design

An experiment was designed to test the impact of the battle aid system on player performance. The physical setup of the experiment will consist of two monitors put side by side; one monitor will display Blizzard’s official SC2 game interface while the other monitor will display the HMI battle aid produced by this work. Figure 5 depicts what the physical setup for the experiment will look like.

Fig. 5.
figure 5

The SC2 game screen put next to the HMI.

While the battle aid is meant to help players regardless of their previous experience with SC2, the initial experiment will only include novice players with little to no experience with real-time strategy games. This will be done to avoid confounding the data and to ensure that we are testing the effects of our battle management aid rather than the effects of a participant’s prior experience with real-time strategy games.

Participants will play four games in SC2: 2 with the HMI and 2 without the HMI. Participants will start the study by playing a short SC2 and tool tutorial that walks them through the basics of the game and its rules. Questionnaires to measure mental workload, trust, and usability of the HMI will be administered before and after each game. Participant preferences based on a win-loss record and a Blizzard score will be compared between games with the HMI and games without the HMI.

6 Discussion

A feature that is being explored but has not yet been incorporated in the current interface is a more explainable version of the build order module. While confidence scores and model sensitivity can be estimated from current battle aid modules, future implementations of the battle aid system will draw on not only traditional reinforcement learning concepts but also more causal and counterfactual reinforcement learning concepts in order to more easily convey suggestions to users. Work done within the recent year suggests that causality can play a role in increasing the understanding of human users [2] while counterfactual reasoning would give the system a better way to explain why actions were not selected though they seem to be correct in a user’s eyes.

Though the work done in [2] does not seem to have a significant impact on the user’s trust of an agent, the future iterations of this battle aid would hopefully see a different outcome. Because trust is something earned or gained rather than given all at once, over the course of a couple games, the HAT battle aid produced in this work would need to build up a reputation of good suggestions and therefore be deserving of a user’s trust.

7 Summary

This paper presented a modular approach to creating deep learning agents for SC2 and an approach for converting agents into battle management aids. Although the modules are still being improved, the initial work that has already been accomplished is cohesive and offers users a tool that may help people understand and perform better on real-world strategy games.