1 Introduction

Most literary works, including popular novels and major classic plays, are now published as electronic text. This enables researchers to use information technology-based methodologies and the computer-aided study of literary works. In modern literary analysis approaches for the last decades, scholars have noticed that quantitative analysis, which exploits mathematical techniques, is necessary to unveil the plot of narratives. Some researchers focus on networks between characters or keywords that appear in a narrative because they advance the plot of the narrative. For example, in a pioneering study, Moretti, F. at Stanford Literary Laboratory stimulated literary scholars to quantify the plot using network theory [1]. The theory showed a character network consisting of vertices (nodes) and edges (links): the vertices represent characters and the edges represent the relationships between characters. The network graphs provide an intuitive tool for the interpretation of the plot structure of a narrative. As big data analysis and computer visualization technology have advanced, this movement to extract character networks from narratives has expanded to a wide variety of genres (see the review by Labatut and Bost, 2019 [2]): Biblical Social Network [3,4,5], epic poems (e.g. the Iliad [6]), novels (e.g. Harry Potter [7], Game of Thrones [8, 9], and British novels [10]), movies [11, 12], and a wide range of other genres [13,14,15,16].

Additionally, a number of quantitative analyses have been performed on William Shakespeare’s plays to understand their character networks. In the early years of such analyses, Spurgeon and Foster manually counted designated keywords in printed literature to study Shakespeare’s plays [17, 18]. Stiller et al. [19, 20] investigated the proportion of existing edges in the network and showed that characters are connected by a small number of degrees of separation. Sparavigna and Marazzato [21] visualized the character networks of Hamlet and Othello using Graphviz, which is open-source graph visualization software. They demonstrated some features of character networks and discussed the role of the central characters who advance the plot of the play. Masias et al. [22, 23] adopted social network analysis to investigate the centralities of key characters in Romeo and Juliet. Nalisnick and Baird [24] analyzed sentiment between characters in Shakespeare’s play using the AFINN word list [25] and combined the results with social network analysis, which incorporated negative and positive sentiments between characters. Their challenge succeeded in showing a two-sided separation among the characters in Hamlet using structural balance theory following Marvel, et al. [26]. However, the results for Othello did not show a reasonable division using the theory. In our previous studies [27, 28], we resolved word counting in the time domain and used it directly to visualize the story structure. This method is applicable for visually understanding conflict structures in narratives such as William Shakespeare’s plays and Charles Dickens’ novels [29].

As shown in previous studies, the standard approach used to visualize the connection between play characters [e.g., 19–23] is simply based on counting natural conversations or the co-occurrences of the play characters. This standard approach sometimes misleads us into confused interpretations of play characters’ relationships because most plays and dramas had several divisions, such as acts and scenes since Horace (65–8 BC), a Roman poet, advocated that a play should consist of five acts [30]. It is also well known that Shakespeare’s plays have a five-act structure. Figure 1 shows a schematic example of interactions between two play characters (A and B) in “Story 1” (top row) and Story 2” (bottom row). In “Story 1,” two play characters (A and B) do not interact with each other, except for in “ACT 1.” In this example, they communicate five times at the beginning of the story (the number of communications N = 5). If we calculate the cross-correlation of their co-occurrences C, it has a negative value (C < 0). By contrast (the bottom row in Fig. 1), two play characters interact with each other in every “ACT," which means that N = 5 and C > 0. If we follow the standard approach, it is difficult to distinguish the play characters’ differences while their relationship is negative for “Story 1” and positive for “Story 2”. Because the dramatic structure (organization of a play) significantly influences the co-occurrences of play characters, we incorporate the idea of cross-correlation into our visualization analysis to understand the relationships between play characters through an entire story.

Fig. 1
figure 1

Dramatic structure of a play: “Story1” (top row) vs. “Story 2” (bottom row)

As a result of the recent progress of node-link visualization tools, not only experts but also non-experts in mathematical techniques can use user-friendly software for large graph visualization tools, such as Gephi [32], NodeXL [33], Pajek [34], to analyze the networks between objects [2, 31]. Some visualization tools can process graphs with more than 10,000 nodes and exploit algorithms for force-directed (or energy-based) graph drawings aimed at automatic-layout showing the networks and the clustering of the nodes in graph space [35,36,37]. For instance, spring-electric forces are incorporated for attractive and repulsive forces between objects (nodes), which are used in ForceAtlas2 for Gephi [38]. Beveridge and Shan (2016) [8] applied this tool to visualize the social network of A Game of Thrones, the first novel in the series A Song of Ice and Fire. In their visualization using Gephi, they linked pairs of characters whenever their names appeared within 15 words.

In these methodologies reported to date, the visualization process always needs to be optimized by users using trial-and-error operations to satisfactorily reach understandable results for third parties. The trial usually includes changing parameters in pre-defined functions such as force amplitude and iteration number to assess numerical convergence. Furthermore, this optimization process depends on the target of the literary genre because of the difference in the number of play characters in each story and the story structure being targeted. Unlike researchers in many of the previous studies, we attempt to exclude these subjective processes by proposing another force-based method that uses potential theory for graphically visualizing the mutual relationships among the characters in a play. Our algorithms include attractive and repulsive forces between two characters (nodes) in a space with a cyclic boundary condition. Each force between two characters is defined by a common function of the cross-correlation between them based on their appearances in a plot line, and we calculate their neutral arrangement in a plot space from the forces acting among all the characters. These definitions allow us to obtain a mathematically unique solution in the plot space without manual trial-and-error tests, thereby promoting the most objective evaluation of the characters’ relationships.

In this present paper, we explain how we determine the mutual relationship between a pair of arbitrary characters, and how we visualize all individual relationships among characters. We also analyze the relationship among three characters to find unique triangular relationships hidden in the narratives. The motivation of our research is not only to design mathematical procedures for text processing but also to create striking visualizations from which people can immediately understand the story structure from the result. Therefore, we demonstrate the performance of this method using internationally well-known literary works: ten famous Shakespeare plays. Because these plays involve many play characters with various complex relationships for the enjoyment of the audience, we can validate the proposed method using conventional interpretations presented to date.

2 Text analysis method

In computational linguistics, several algorithms for processing complex natural languages have been proposed, which are mainly based on pattern recognition techniques. Neural networks, support vector machines, and deep learning for big data analysis are currently popular topics [39, 40]. By contrast, our method takes an orthodox approach to establish a basic structure for visualization by analyzing the time-series functionalization of words. Our method consists of three parts, as shown in Fig. 2.

Fig. 2
figure 2

Flow chart of our method for visualizing play characters’ mutual relationships

The preprocessing routine shown on the left of the figure loads the target text, decomposes it into single words, and creates a list of the words. This part is based on a common word extraction algorithm used in many text-analysis programs, including that in our previous analysis [27]. Our present study starts at the first routine shown on the right of the figure with the selection of play characters from the list of words. The second routine creates the graphical visualization.

2.1 Play character appearances

In the preprocessing routine, we store all the words in w(i) in order of appearance, i. At the end of the routine, we list all the words in the target text in two array variables: L(k) and M(k), where k is an index for the order of detection, L is a list that consist of each unique word, and M lists the number of times that the word L is used in the entire text. For example, if the first word in the text is “Romeo” and it appears 100 times in the story, the routine outputs are L(1) = “Romeo” and M(1) = 100. Note that not only play characters’ names but also other types of words (e.g., verbs, adjectives, and link words) are listed in the array L [29]. In this study, we manually select the names of play characters from the characters list provided by the Open Source of Shakespeare website.Footnote 1

Table 1 shows the number of appearances of play characters in five Shakespeare plays, sorted by the largest order of M. In each play, the main characters have the highest number of occurrences, which is consistent with the fact that the plays were originally created for theater entertainment. Table 1 only lists the top 12 most frequently mentioned characters, but the actual number of characters in each play is more than 15, for example, Macbeth has 40 characters in total.

Table 1 Play characters in five Shakespeare plays and their number of appearances

To evaluate a single play character’s appearances in a storyline, we define the following function:

$$ \delta \left( {k,i} \right) = \left\{ {\begin{array}{*{20}c} 1 & {if\begin{array}{*{20}c} {} & {w(i) = L(k)} \\ \end{array} } \\ 0 & {if\begin{array}{*{20}c} {} & {w(i) \ne L(k)} \\ \end{array} } \\ \end{array} } \right., $$
(1)

where the variables k and i denote the index of each proper noun and the order of nouns in the entire text over time (from i = 1 to nt: total number of nouns), respectively. For example, w(1) denotes the first noun (string) in the story. The function defined by Eq. (1) is the two-dimensional (2D) distribution of each play character L(k) along the word appearance order, i. Next, we convert order i, which is an integer value, to a real number t, which represents the story progress from t = 0 until the end (t = T) as follows:

$$ t = \frac{i - 1}{{n_{t} - 1}}T,\begin{array}{*{20}c} {} & {} \\ \end{array} i = 1,2,3, \cdots m_t. $$
(2)

We can convert the binary function in Eq. (1) into a discontinuous delta function \(\delta \left( {k,t} \right)\). We compute a play character’s appearance frequency f (k, t) from the delta function using the following conversion:

$$ f\left( {k,t} \right) = \int_{0}^{T} {\max \left\{ {\delta \left( {k,\tau } \right)\left| {1 - \frac{t - \tau }{\lambda }} \right|,0} \right\}} \begin{array}{*{20}c} {} \\ {} \\ \end{array} d\tau , $$
(3)

where τ is the intermediate variable for the time integral. Equation (3) provides a cumulative distribution of single pulses, which correspond to the appearances of play characters, represented as a continuous function. We use a triangular pulse with a width of λ, which is equivalent to the wavelength of an averaging filter. When the total number of play characters’ appearances is N, we take the smallest value of λ from sampling theory using λ/T = 50/N. For a narrative with a total of 1,000 appearances of play characters, for example, the lower limit of λ /T is 50/1,000 = 1/20, which means that 20 steps can be used to divide the story from the start to the end. Because all the target texts used in this study satisfy N > 1,000, we set λ/T = 1/20 in all cases.

Figure 3 shows the example results of play character appearance frequencies obtained by Eqs. (1)–(3). In Shakespeare’s tragedy Romeo and Juliet, (a) the play characters Nurse and Peter often appear in the same scenes, (b) Tybalt and Capulet appear to be independent of each other, and (c) Benvolio and Juliet appear according to different distributions as if they were avoiding each other in the storyline.

Fig. 3
figure 3

Frequencies of play character appearances in Romeo and Juliet

3 Cross-correlation analysis

3.1 Pairwise correlation

After we obtain all the appearance functions of individual characters in a narrative, we can quantify any two-character relationships using the following cross-correlation function:

$$ C_{ij} = \frac{{\int {\left[ {f_{i} (t) - \overline{f}_{i} } \right] \cdot \left[ {f_{j} (t) - \overline{f}_{j} } \right]dt} }}{{\sqrt {\int {\left[ {f_{i} (t) - \overline{f}_{i} } \right]}^{2} dt \cdot \int {\left[ {f_{j} (t) - \overline{f}_{j} } \right]}^{2} dt} }}, $$
(4)

where the indices i and j correspond to the indices of two characters. Equation (4) provides the cross-correlation coefficient of the covariance between two functions normalized with respect to their variance. The overbar represents the mean. Coefficient C is in the range |C|< 1. The case C > 0 indicates that two play characters coappear in the same scenes while interacting with each other. The case C < 0 denotes the isolation of play characters with respect to each other, which appears to be more repulsive than a neutral relationship.

As an example, Table 2 shows the cross-correlation coefficient matrix for the 10 most frequently appearing characters in Romeo and Juliet. The italic and bold numbers indicate the highest and lowest correlations in each column except for the diagonal line on which the correlation becomes unity. For instance, if we consider Romeo’s column, the most positive character is Benvolio (0.45) and the most negative character is Capulet (− 0.27), who is known to be against Romeo’s family, the Montagues. Romeo also has a large positive correlation with Mercutio (0.41). Moreover, regarding Juliet’s column, she is close to Nurse (0.24), but she rarely appears with Benvolio (− 0.42). Another remarkable combination with a positive correlation is detected for Benvolio and Mercutio, who are Romeo’s friends. A notable pair with a negative correlation is Friar and Benvolio. The pair is understood to represent the antagonized relation of the love between Romeo and Juliet. Friar supports Romeo’s love of Juliet, whereas Benvolio does not support his love. They rarely appear in the same scene in the play and each play character interacts with Romeo in different scenes. Note that the computational time taken to obtain these matrix components was approximately a minute on a laptop computer (CPU 1.1 GHz, RAM 8 GB, OS 64 bit). Note also that we removed play characters that appeared less than 10 times in the play are removed from the cross-correlation analysis because the denominator became too small for statistical purposes. This is one of the limitations of the proposed method. However, play characters that appear only a few times are regarded as minor supporting roles that have less effect on the main story frame and we ignored them in this study.

Table 2 Cross-correlation coefficient matrix obtained for Romeo and Juliet

3.2 Triangular correlation

We can extend the above approach to the analysis of a triangular relationship using the following formula:

$$ T_{ijk} = \frac{{\int {\left[ {f_{i} (t) - \overline{f}_{i} } \right] \cdot \left[ {f_{j} (t) - \overline{f}_{j} } \right] \cdot \left[ {f_{k} (t) - \overline{f}_{k} } \right]dt} }}{{\sqrt {\int {\left[ {f_{i} (t) - \overline{f}_{i} } \right]}^{2} dt \cdot \int {\left[ {f_{j} (t) - \overline{f}_{j} } \right]}^{2} dt \cdot \int {\left[ {f_{k} (t) - \overline{f}_{k} } \right]}^{2} dt} }}, $$
(5)

where suffix k denotes the third character in addition to two characters i and j. When the play contains a key character who governs the pairwise relation between two characters, Eq. (5) has a significant value. Thus, we can identify hidden relationships organized by three characters in storylines using triangular correlation analysis. In fact, such a relation is commonly adapted for making the story entertaining by enriching characters’ relationships.

Figure 4 shows the triangular correlation of all the characters in Romeo and Juliet. We computed the abscissa as a predicted correlation using the following function:

$$ C_{ijk} = C_{ij} \cdot C_{jk} \cdot C_{ki} , $$
(6)

where Cij, Cjk, and Cki are the pairwise correlations defined by Eq. (4); that is, we predicted the relationship among the three characters as a simple product of three pairwise correlations. Therefore, by comparing Tijk with Cijk, we can find which triangular relationships are substantial in the storyline relative to the predicted correlation. Considering the result in Fig. 4, the combination of Tybalt-Prince-Benvolio has a high value above the line of Tijk = Cijk, and proves that these three characters interlock the frame of the story. By contrast, the combination of Romeo-Page-Balthasar has the largest negative value, whereas it has a positive predicted correlation. This means that a simple triple product of the pairwise correlations cannot extract the triangular relation. The combination of Romeo–Juliet–Nurse has a value lower than the line Tijk = Cijk, which infers that two of them behave pairwisely in each scene in the story.

Fig. 4
figure 4

Triangular correlation of three characters Tijk in Romeo and Juliet compared with the predicted correlation Cijk

Another example is shown in Fig. 5 for Othello. The highest value is recorded by the combination of Lodovico–Grantiano–Cassio. These characters are gearing up to fight each other in the story of Othello. The lowest correlation is the combination of Othello–Iago–Desdemona. This corresponds well with the story that the mutual relationship between Othello and his young wife Desdemona is threatened by Iago’s various ruses. Consequently, we can extract the triangular relationship among the play characters as a positive value for their collusion and as a negative value for their division.

Fig. 5
figure 5

Triangular correlation of three characters Tijk in Othello compared with the predicted correlation Cijk

4 Graphical imagery

4.1 Physical potential model

Standard projection techniques exist that are used to visualize correlations among keywords with high information dimensions onto a 2D visualization space: node-link representations and adjacency matrix with a heat map. In this study, we adopt node-link visualization because it has an advantage over matrix-based visualization in terms of readability if the number of vertices is smaller than 20 [41].

To visualize the mutual relationships among play characters (keywords) using node-link representations, we calculated their equilibrium arrangement using the metaphor of a physical potential model (PPM), which has the function of cross-correlation coefficients obtained in the previous section. The basic idea of the PPM is simple: we place two play characters close to each other if they have a positive correlation to depict an intimate relationship, whereas we place them far from each other if they have a negative correlation. Therefore, the relationship between two play characters is expressed as their distance in 2D space. A cross-correlation around zero (called null correlation) means that almost no relation exists between play characters, whereas a negative correlation between two play characters means that they avoid each other along the storyline. Such a pair of characters needs to be separated intentionally using repulsive forces. Note, however that this does not immediately suggest antagonism between the characters. When the total number of play characters is finite, some pairs with a negative correlation occur accidentally when the writer may not have designed this intentionally. This is a feature of statistics because the number of samples is limited. By contrast, a large negative value for a cross-correlation infers opposition, which is explained by several reasons, including antagonism. We explain the demonstration of this for each play in later sections. Another noteworthy aspect of our PPM, which exploits both attractive and repulsive forces to represent the relationship, is that it is different from classical spring models, such as radial visualization (RadViz) [42,43,44] and the enhanced models [45, 46]. These classical spring models use a static force balance based on an elastic spring mechanism as a physical metaphor to visualize the correlations between objects (e.g., documents) and dimensions (e.g., certain keywords).

Figure 6 illustrates how our physical dynamics model works. For three characters A, B, and C, we use the forces acting between two characters to represent two-character relationships: A–B, B–C, and A–C (see Fig. 6a). If A–B is a close relationship, as indicated by a positive cross-correlation coefficient, we decrease the distance between A and B. Because a negative cross-correlation is provided for A–C, we increase the distance between A and C. For the case of an almost null correlation (B–C), no force acts between B and C. We iterated this displacement using virtual time steps and the locations of the three play characters converged to the configuration shown in Fig. 6b. To perform this operation computationally, we use the following equations:

$$ \overrightarrow {{X_{i} }} = \overrightarrow {{X_{i} }} + \overrightarrow {{F_{i} }} \cdot \Delta t,\begin{array}{*{20}c} {} & {} \\ \end{array} \overrightarrow {{F_{i} }} = \sum\limits_{j = 1}^{n} {\frac{{\overrightarrow {{X_{j} }} - \overrightarrow {{X_{i} }} }}{{\left| {\overrightarrow {{X_{j} }} - \overrightarrow {{X_{i} }} } \right|}}} \cdot F_{ij} , $$
(7)

where Xi denotes the position vector of the play character labeled i and Δt is a virtual time step. Fi is the force acting on the play character, which is the sum of n elementary forces acting on i from other play characters j, where n denotes for the total number of play characters. The magnitude of the elementary force is modeled as

$$ F_{ij} = \max \left( {C_{ij} ,0} \right) \cdot \frac{1}{{r^{2} }} - \max \left( { - C_{ij} ,0} \right) \cdot \frac{1}{{r^{2} }} - \frac{a}{{r^{4} }},\begin{array}{*{20}c} {} & {} \\ \end{array} r = \left| {\overrightarrow {{X_{j} }} - \overrightarrow {{X_{i} }} } \right|, $$
(8)

where Cij is the cross-correlation coefficient computed by Eq. (4) and r is the distance between the positions of two play characters i and j in the space. The first term is an attractive force that acts when Cij > 0, which weakens with respect to the distance squared between the two play characters. The second term expresses the repulsive force at Cij < 0, which attenuates inversely as a function of distance. The third term is a local isotropic repulsive force that avoids the scene of two play characters placed at the same location, where a is a control parameter that indicates how much distance is maintained. We note that Eq. (8) mimics the Lennard–Jones potential, which describes the potential energy of the interaction between two non-bonding atoms (or molecules) based on their distance. To represent the dynamics between two characters, we use a factor of 1/r2 for the attractive force (Newtonian potential) and 1/r for the repulsive force (logarithmic potential). With this combination, all the play character objects move in an understandable manner that expresses the mutual relationships among them.

Fig. 6
figure 6

Physical dynamics model used to depict play character relationships using cross-correlation coefficients

Our PPM also represents the influence of a single play character with respect to other play characters. Figure 6c shows how multiple forces are changed as a fourth play character D joins. The total number of forces to be calculated is n (n − 1)/2, where n is the number of play characters. In Fig. 6c, for example, we consider six forces to obtain the local balance of the play characters (n = 4). We consider all these interactions between the nearest neighbors in a space with cyclic boundary conditions. The use of a cyclic boundary also confines all the play characters’ positions to a single domain.

Figure 7 shows how each play character moves during the computation of the model for Romeo and Juliet. Note that 20 characters appear in this narrative. We plotted these characters in a regular arrangement at the initial condition, t = 0. The size of the circle for each character corresponds to the character’s number of appearances throughout the narrative (e.g., the circle for the main character Romeo is the largest, followed by the circle for Juliet). The solid lines indicate the attractive forces, while the broken lines represent the repulsive forces. As the time step increases, both forces displace the play characters to reduce the individual magnitude of the forces between each pair. At the converged state at t = 1000, a specific arrangement takes place that no longer changes, that is, the relationships among the play characters reach a state of equilibrium. The number of iterations depends on the virtual time step and the number of play characters. In most cases, we confirmed that 1000 was sufficient.

Fig. 7
figure 7

Numerical convergence of model analysis for Romeo and Juliet

Next, we evaluate our visualization method. Figure 8 shows the effect of the initial arrangement of nodes (characters) in Romeo and Juliet. We obtained each panel by calculating the force balance among nodes that are randomly arranged for the initial state. The main character Romeo appears the most frequently and is indicated in red. Other characters are colored according to the number of appearances in the play, classified into three levels: yellow, green, and blue. In the 2D plot, the solution of the force balance is not exactly unique if the number of characters is greater than 4 (we discuss this in Sect. 4.3). Despite this, the final arrangement for the main characters that we obtained are geometrically similar, such as for the relation among Romeo, Juliet, Nurse; that is, the positions of minor characters who do not appear many times in the story are affected by the initial condition of the arrangement.

Fig. 8
figure 8

Effect of the initial arrangement for Romeo and Juliet

Figure 9 represents the effect of the functions of the attractive (the first term) or repulsive (the second term) force in Eq. (8). If we adopt the function of r−1, it takes a long time to reach the converged state because the nodes (characters) move around widely in the plot space. By contrast, it is difficult to determine the relationships between key characters when the force functions are r−3 and r−4 because they do not fully interact with each other once they are placed apart; that is, the final arrangement of the characters strongly depends on their initial positions as a higher-order potential is adopted. In our assessment, the best function is r−2, which is analogous to the potential of gravity or Coulomb forces in nature.

Fig. 9
figure 9

Comparison with different functions of forces: a r−1, b r−2, c r−3, and d r−4 in Eq. (6) for Romeo and Juliet

We also examined Jacomy’s model [38] for Romeo and Juliet (Fig. 10). Figure 10a was obtained when the repulsive force between characters was a function of r−2 (not weighted) without the cyclic boundary condition (this corresponds to the model of the spring-electric layout in Fig. 1 (center) in the paper of Jacomy et al. 2014 [38]). In this case, once key characters are positioned far from the center, they do not interact with each other. This means that the initial condition of their arrangement influences the final arrangement more sensitively in Jacomy’s model. If we apply cyclic boundary conditions for Jacomy’s model, the key characters would be positioned around the center (Fig. 10b) because the repulsion caused by negative correlation is weakened by the global attractive potential.

Fig. 10
figure 10

Examination of Jacomy’s model for Romeo and Juliet

4.2 Results and discussion

Using the proposed method, we visualized ten Shakespeare plays. From the visualization results, the attractive and repulsive forces of the characters’ relationships are depicted. We highlighted the main character in each play is highlighted in red to make it easier to understand the relationships between the major and supporting roles. Generally, Shakespeare’s plays are divided into three genres: tragedy, comedy, and history. Figures 11, 12 and 13 show the results of play characters for tragedies, comedies, and histories, respectively.

Fig. 11
figure 11

Visualization results of play characters for Shakespeare’s tragedies

Fig. 12
figure 12

Visualization results for play characters in Shakespeare’s comedies

Fig. 13
figure 13

Visualization results of play characters for Shakespeare’s history plays

4.2.1 Tragedy

4.2.1.1 Hamlet

Figure 11a shows the narrative of Hamlet, where the main character Hamlet is closely accompanied by several people, that is, Polonius, Rosencrantz, Guildenstern, Gertrude, and Laertes, whereas his young fiancée Ophelia stands at the edge of the group and his friend Horatio appears alone on the opposite side. Fortinbras, who will be chosen as the next king in the narrative, also appears alone.

4.2.1.2 Othello

In the narrative of Othello (Fig. 11b), his wife Desdemona always appears with him. At the bottom, Iago, who is a key character that ruins their relationship, stands apart from the group and interacts with Othello. Note that Brabantio, Desdemona’s father, is placed at the edge together with Emilia, Iago’s wife.

4.2.1.3 King Lear

The visualization result in Fig. 11c clearly indicates the different behaviors of the three daughters of the main character Lear. The two daughters Regan and Goneril appear at the top, who convince their father with false flattery to obtain a large territory. By contrast, the youngest daughter, Cordelia, appears at the bottom, who is different from her two sisters and always loyal to Lear.

4.2.1.4 Macbeth

For the story of Shakespeare’s Macbeth, the resulting character distribution is more extreme, as shown in Fig. 11d. The main character Macbeth is caught in the middle between Macduff and Duncan, who both strongly influence the destiny of Macbeth. At the center of another cluster, Lady is Macbeth’s wife, who also directly influences his life. Regarding the three witches in Macbeth, these characters appear only at the beginning of the narrative and their total number of appearances is less than ten times, which is insufficient to statistically visualize the roles using the present approach. This results in their absence in the figure, which is another limitation of the present statistical approach.

4.2.1.5 Romeo and Juliet

Figure 11e shows the result for Romeo and Juliet. Three groups are identified: Romeo’s group, Juliet’s group, and Benvolio’s group. This corresponds well to the scenes in the play. The Montagues and Capulets are two households feuding with each other, but they belong to the same group because they coappear often on stage in the play.

As described above, in these tragedies, the existence of main characters is highlighted more than in the other categories, that is, comedies and histories, because their frequencies of appearances are dominant: the main characters in tragedies predominantly govern the entire story. Each group is visually clustered because the story in tragedies is simple, and the relationships between the play characters are easy to understand for both the reader and audience.

4.2.2 Comedy

4.2.2.1 Twelfth Night

Figure 12a shows the results for a comedy, where clusters are formed. The boy and girl twins, Sebastian and Viola encounter a countess, Olivia, and a duke, Orsino, and then fall in love with them, respectively. As shown in the result, Viola is close to Orsino and Sebastian is close to Olivia. In the other cluster, another love story appears, where Toby, Olivia’s uncle, loves Maria, her maid.

4.2.2.2 A Midsummer Night’s Dream

This scenario is relatively complicated because the love stories of many characters progress simultaneously. Our analysis determined three clusters, as shown in Fig. 12b. In the two clusters at the bottom, four young people, that is, Demetrius, Lysander, Helena, and Hermia, are related to each other. Titania, the queen of the fairies, appears solely near the top cluster and falls in love with Bottom because of Puck’s magic in the story.

4.2.2.3 The Merchant of Venice

Figure 12c represents the strong relationships among the play characters. The main character Bassanio sticks to Antonio, his friend and the merchant of Venice. At the center, Shylock, a rich Jew, influences Bassanio, Antonio, and Portia, who is Bassanio’s fiancée. At the edge of the cluster, Lancelot and Gobbo, Shylock’s servants, and Nerissa, Portia’s maid, support their master and mistress.

From the visualization results, more main characters appear with many love stories that progress in parallel in comedies. Even though the play characters are softly clustered, the stories of comedies are complicated and have overlapping clusters. This suggests that the relationships among the play characters are relatively obscure so that they deepen the narrative, which attracts readers and the audience.

4.2.3 History

4.2.3.1 Julius Caesar

Figure 13a shows the result for a history play. The main character Caesar appears at the center and is caught in the middle between Brutus and Cassius, who are conspirators against Caesar. In this story, Octavius, Lepidus, and Antony become the triumvirs in ancient Rome after Caesar is killed by Brutus. They form a triangle and surround Caesar’s cluster, where Octavius appears at the top of Caesar’s cluster, Lepidus is shown at the edge of the left cluster, and Antony is located at the bottom.

4.2.3.2 King Richard III

The visualization result in Fig. 13b depicts the complicated relationships surrounding the main character, Richard. Note that Richard is close to Richmond, the next king after Richard’s death. In the right cluster, Richard’s enemies, Edward, Margaret, and Anne, gather, whereas his favorite characters, Elizabeth and Buckingham, appear in the left cluster. Many sub-characters are scattered under Richard, and their aim is to entertain readers.

In these results, the main characters in history plays are easily distinguishable because of the high frequency of their appearances. However, the number of play characters in history plays is larger than that in tragedies. This results in diffuse clusters and can confuse readers until they fully understand how individual characters influence the storyline of the narrative. This is regarded as limitation in the present visualization that uses the 2D mapping of characters.

4.3 Elimination of a key character

The proposed method can also visualize the arrangement of play characters when a specific character is virtually eliminated. Moretti [1] and Sparavigna [21] visualized a character network for Hamlet, and then removed a main character (“Hamlet” or “Claudius”) from the network to show the main character’s influence on the character network. They simply removed the main character’s links from the network; however, we recalculate the force balance among all the characters after the elimination of a key character. Because we obtain all the cross-correlations between two-character relationships independently, eliminating a key character can quantify the influence that character has on other characters in the narrative. This allows us to reveal the role and power balance of play characters in a narrative and offer alternative new insights.

4.3.1 Removing “Lear” from King Lear

The main story is described using the different behaviors of Lear’s three daughters, Goneril, Regan, and Cordelia. When the main character Lear is eliminated, the visualization result changes from Figs. 11c, 12, 13, 14a. Because of the absence of Lear, these three daughters further separate and form a triangle. In particular, Goneril and Regan are substantially separated from each other. This implies that the existence of Lear keeps these two daughters closer and makes them behave more similarly in the original narrative. Another effect that we can observe is the clustering of play characters around Gloucester that includes his two sons, Edgar and Edmund. Thus, a new love story could begin between the daughters and sons or a new war story could begin between Gloucester and his two sons.

Fig. 14
figure 14

Play character arrangements if key characters are removed

4.3.2 Removing “Iago” from Othello

As discussed above, Iago is a key character who ruins the relationship between Othello and Desdemona in the story. When we removed Iago, the visualization result changed from Fig. 11b, 12, 13, 14b. It seems that Othello continues to love Desdemona; however, Brabantio, Desdemona’s father, could disturb their love instead of Iago. Furthermore, Emilia, Desdemona’s lady-in-waiting and Iago’s wife, may be the next key character in the story because Emilia is depicted as an independent and attractive woman in the story. From this interpretation, we can hypothesize that the narrative would change from a tragic story into a comical love story.

5 Conclusions

We developed a method to visualize the mutual relationships among play characters using electronic text and targeted Shakespeare’s plays. We first extracted the frequency of appearances of individual play characters in the time domain. This conversion to numerals functions validly when researchers target plays because they consist of words in a time sequence. Then, we calculated the cross-correlation values for all pairs of characters and also the triangular relationships hidden in the story. To present the results visually, we used node-link representations based on a PPM to simulate the attraction and repulsion forces between two play characters. A novelty of our approach is the objectivity of the visualized results ensured by the cyclic boundary condition used in the plot space. This warrants a dynamic equilibrium state among all the play characters that interact with all others. We used the proposed method to visualize ten Shakespeare’s plays. The results showed that the play characters’ relationships depend on the type of narrative, that is, tragedy, comedy, or history. In our demonstration, we appropriately extracted and visualized known relations among the major characters. We also applied the method while artificially excluding key characters from the story, which allowed us to evaluate the influence of those characters and indicate the possibility of new Shakespearean criticism.

The proposed method has limitations, such as the dependence on the initial condition when using the PPM because the results are depicted in two dimensions. As the total number of play characters increases, the distribution of the plots will be dense, which loses the visibility of clusters and groups. By contrast, play characters who appear less than ten times throughout the narrative cannot be considered because of the lack of statistical reliability. As future work, we plan to examine the following extensions to the proposed method: i) By dividing each story into multiple stages, we can extract more dynamic relationships among the play characters. ii) We will attempt three-dimensional illustration of relationships to achieve a more accurate visualization.