- Research
- Open access
- Published:
Structure and dynamics of growing networks of Reddit threads
Applied Network Science volume 9, Article number: 48 (2024)
Abstract
Millions of people use online social networks to reinforce their sense of belonging, for example by giving and asking for feedback as a form of social validation and self-recognition. It is common to observe disagreement among people beliefs and points of view when expressing this feedback. Modeling and analyzing such interactions is crucial to understand social phenomena that happen when people face different opinions while expressing and discussing their values. In this work, we study a Reddit community in which people participate to judge or be judged with respect to some behavior, as it represents a valuable source to study how users express judgments online. We model threads of this community as complex networks of user interactions growing in time, and we analyze the evolution of their structural properties. We show that the evolution of Reddit networks differ from other real social networks, despite falling in the same category. This happens because their global clustering coefficient is extremely small and the average shortest path length increases over time. Such properties reveal how users discuss in threads, i.e. with mostly one other user and often by a single message. We strengthen such result by analyzing the role that disagreement and reciprocity play in such conversations. We also show that Reddit thread’s evolution over time is governed by two subgraphs growing at different speeds. We discover that, in the studied community, the difference of such speed is higher than in other communities because of the user guidelines enforcing specific user interactions. Finally, we interpret the obtained results on user behavior drawing back to Social Judgment Theory.
Introduction
Conversations on online social media have become a crucial aspect of modern communication (Smith 2018), shaping how individuals interact with each other, share information, and form connections (Grabowicz et al. 2011). Social network platforms enable conversations to reach a wide audience, while allowing for real-time sharing of information, opinions, and reactions. These online conversations are driven by three key factors: (i) the purpose of the users and how they want to communicate, (ii) the functionality provided by the platform and its limitations, and (iii) the user guidelines and recommendations governing the community (Gillespie 2018; Russo et al. 2023). The later two factors, functionality and guidelines, are key to encourage certain types of interactions and conversational structures over others. For example, while X (formerly known as Twitter) used to enforce users to write messages with less than 140 characters, Facebook and other thread-based forums do not have such restrictions, leading to more informative messages and shorter interactions (Alis et al. 2015). Additionally, some platforms have guidelines and recommendations that, while not technologically enforced, aim to guide user behavior. These guidelines serve as suggestions encouraging users to conform to an expected conduct. For instance, Instagram’s Community GuidelinesFootnote 1 promote authentic interactions to avoid spam, and Facebook groups often have customized rules. Whether these guidelines effectively contribute to shape user behavior remains to be explored. It is well known, however, that many of these rules, regulations and technical limitations change over time based on users interactions with the platform. A recent example of how users have forced platforms to modify their regulations is X. Before 2018, when users needed to write longer tweets (i.e. with more than 140 characters) would split the long text into multiple interconnected tweets, including formats like “Tweet 1/12”. In December 2017, X adapted to this behaviour by introducing a new feature called threads, allowing users to concatenate multiple tweets together in a sequence. This made easier to create and follow longer conversations or narratives within a single thread.
The goal of this work is to uncover the effect of platform guidelines on online conversations and evaluate the extent to which such guidelines influence participation. In particular, we focus on Reddit, a social media platform where people participate in “self-governing and self-organized” communities known as subreddits (Jamnik and Lane 2019; Medvedev et al. 2019). Each subreddit has its own rules and guidelines, specifying what is allowed inside the subreddit and recommending how users should behave, often based on a specific topic (e.g., r/science, r/gaming). Beyond being simple forum, Reddit has been widely populated with subreddits with their own guidelines and internal features, making it a valuable resource for conducting social research on opinion formation (Shatz 2017; Hintz and Betts 2022). In recent years, a plethora of studies of interesting subreddits has emerged, particularly focusing on studying their evolution and dynamics to understand how such communities develop and grow over time (Krohn and Weninger 2019; Weninger et al. 2013; Horawalavithana et al. 2022). Many studies have also analyzed discussions in Reddit communities to understand how interactions among participants influence behavior. For example, Petruzzellis et al. exploited the r/ChangeMyView subreddit to analyze changes in online information consumption behavior arising after opinion changes (Petruzzellis et al. 2023). In Cauteruccio and Kou (2023), Cauteruccio et al. investigated the emotional experiences in eSports spectatorship using the r/leagueoflegends subreddit: they show that spectators supporting the same team tend to engage in cohesive discussions, while interactions among those supporting different teams are less salient. Additionally, a significant body of research has focused on the language used in Reddit discussions, examining linguistic patterns, sentiment, and rhetorical strategies to gain deeper insights into the nature and impact of online communication within these communities. For instance, Helm (2024) studied the r/Incel community to identify subcultural discourse and understand how it affirms deviant behaviors, while Bouzoubaa et al. (2024) analyzed drug-related subreddits to understand their role in the online discourse surrounding substance use.
On Reddit, users can write and publish posts (known as submissions) or comments. Each post, together with its comments, constitutes a thread, i.e., a conversation where comments are organized hierarchically in a tree-like format. The post is the root of the thread, each comment is a node in the discussion tree, and replies to posts or comments create branches in the tree. Reddit communities are moderated by designated users (the moderators) who establish the community rules and ensure that everyone follows them when participating. Moderators maintain order by performing actions such as deleting posts or comments and banning users. In each community, the rules are often displayed on the side of the webpage.
In this work we focus our attention on the /r/AmItheAsshole (AITA) subreddit,Footnote 2 an online community where people post stories about personal experiences having ambiguous moral valence, asking othes if they have been “assholes” (or not) in the narrated story, i.e. if they are to blame for the conflict described. Users creating such posts should provide detailed descriptions of their stories in the text, including relevant background information about the people involved. Other users then perform the explicit judgment by voting, which involves writing a comment including a specific acronym corresponding their judgment. The available acronyms provided by the community are listed in Table 1.
The subreddit guidelines suggest that, along with the acronym, users should include in the comment a brief motivation for the vote to explain their choices to other readers. The AITA community uses Reddit’s integrated voting system to allow participants to rate the judgments they agree with by upvoting them. Expressing disagreement is not allowed in this context, since downvotes are used to report off-topic or spam discussions and harassing comments. The community has established an 18-hours waiting period before assigning the final verdict. Users must vote within this timeframe. As users upvote different comments, a consensus emerges over time, with one judgment gaining the majority of agreements as the collective decision. After the time window passes, this judgment is then accepted as the official verdict and is made public by assigning a flair to the post, i.e. a tag with the respective judgment acronym. More details about the voting process are provided in "Operationalization" Section.
In the AITA community, the explicit request for a judgment is, therefore, a requirement of the subreddit, allowing researchers to study how humans express moral judgments through socio-linguistic features. Indeed, the comments contained in AITA threads offer the ground truth of what people voted for and often why. This motivates why the AITA community has received much attention the last two years. Botzer et al. (2023) exploited the AITA subreddit to study the presence and impact of moral valence, as well as whether gender and age play a role in users’ judgments. De Candia et al. (2022) examined which demographic factors and topics are associated with judgments, while Giorgi et al. (2023) analyzed the possibility of identifying, through linguistic and narrative features, whether the author of the post is also the character in the story or is narrating a story from a third-person perspective.
In order to analyze the dynamics of interactions in the AITA subreddit, we collected more than 6,000 threads that received significant attention in 2023 (see details in "Data" Section). For each thread, we compute the individuals’ amount of judgment and the group level of disagreement ("Operationalization" Section). We then model each thread as a complex multi-graph network of user interactions evolving over time ("Temporal network analysis" Section). We study the growth of such networks reconstructing each conversation over time and comparing the evolution of structural properties with respect to the existing literature on growing real social networks,Footnote 3 including other subreddits ("Results" Section). In particular, we focus on the clustering coefficient and average shortest path length as structural properties growing over time, since these highlight the peculiar evolution of AITA networks and help explain the reasons behind the user behavior. Furthermore, we compute the reciprocity and the disagreement of such networks to understand if they play a role in the AITA discussions.
In short, the contributions of this work are the following:
-
Our temporal analysis of the communication exchange shows that Reddit user interaction networks consist of two subgraphs, a star and a periphery, that exhibit different speeds of growth ("Results" Section). The star structure is mainly formed by users not engaging in conversations and rather answering to the root message of each subreddit thread (i.e., post), while the periphery is mostly composed of users engaging in long conversations.
-
We find that the speed at which participants contribute in these subgraphs is highly influenced by the intention of the participants. In the periphery subgraph, the participants who vote in addition to writing comments respond almost twice as slowly as users just commenting ("Response time" Section). At a macro level, we explain how people engage in conversations with other users in subreddits through the insights revealed by the evolution of structural properties. Specifically, the increasing average shortest path length as well as the decreasing (and very small) clustering coefficient reflect the behavior of people discussing mostly with only one other user, often through a single message ("Structural propertiesof AITA evolvingover time" Section).
-
Our analysis shows that these interaction networks evolve differently compared to other social networks, despite falling into the same category of “real social networks” ("Growing networks of Redditthreads" Section). More specifically, we compare the AITA subreddit with other subreddits by examining the growth of the two subgraphs within the dynamic networks. We demonstrate that the speed of the star subgraph is between 2 and 3 times larger than that of the periphery subgraph in AITA, which is significantly larger than in other Reddit communities. We interpret this as a consequence of community rules shaping user behavior.
-
Our analysis shows that disagreement in the judgment process is associated with more interactions in the thread but may prevent some users expressing a judgment. Specifically, we prove that when the disagreement is higher (i.e., when the judgment is not obvious), people prefer to discuss rather than judge: they engage with others through more comments, and if they express a vote, they struggle to clearly pick a side ("Disagreementand reciprocity" Section).
Finally, we analyze the underlying social dynamics among users by drawing on social psychology theories and interpreting their effect on the graph structure evolution. Specifically, we interpret our results through the lens of Social Judgment Theory (Brehmer 1988).
Theoretical framework
One of the goals of this work is to shed light on how people discuss in online communities where they are asked to explicitly express their opinion. Specifically, we aim to measure to what extent users’ disagreement affects the evolution of the online conversations. We do this by modeling all the threads in the subreddit as a set of growing networks of user interactions (see "Methodologicalframework" Section). In this section, we lay the groundwork for understanding social judgment dynamics ("Social judgment" Section), and we provide the state of the art of growing social networks ("Growing socialnetworks" Section).
Social judgment
In social psychology, judgment is defined as the cognitive process of forming opinions, evaluations, or assessments about oneself, others, or situations. It generally consist in the product of non-conscious systems that operate quickly based on some evidence (Gilbert 2002). For example, when engaging in a conversation with someone, body language, tone of voice, and facial expressions are cues that serve as evidence to formulate judgments about the person. Social psychologists have studied various aspects of judgment, including how people make decisions, evaluate others, and interpret social information. Specifically, Social Judgment Theory (SJT) (Brehmer 1988) is a theoretical framework within social psychology that seeks to understand how individuals form and evaluate judgments about themselves and others. SJT also investigates the reasons why, in particular social contexts, people are more inclined to express judgments (Morrison and Miller 2008; Noelle-Neumann 1993; Morrison and Miller 2011; Hornsey 2003; Matthes et al. 2010). For example, (Adamic et al. 2021; Spears 2021) found that users are more prone to express negative judgments in anonymous settings where either the giver or the receiver of the opinion is unknown. Despite the extensive scientific literature, we have little understanding about the role that disagreement plays in such settings.
Research on user interactions in online platforms has primarily focused on conflict, controversies, and affective polarization (Addawood et al. 2017; Garimella et al. 2018; Lamba et al. 2015; Mejova et al. 2014; Conover et al. 2021), analyzing these social behaviors mostly through sentiment and topic analysis. In particular, Kumar et al. (2018) used Reddit data to study conflictual interactions of users across different communities. They found that less than 1% of communities start the majority of conflicts and that such conflicts are initiated by highly active community members and carried out by significantly less active members. In our work, instead, we are interested in studying the role of disagreement among users. Despite the plethora of studies about the role of polarization and conflict in online conversations, the question of if and how disagreement affects people’s moral judgments remains unexplored.
Growing social networks
Conversational data, such as the actions and interactions of users in online platforms, can be modeled as dynamic social networks (Newman 2003; Scott 2000; Wasserman and Faust 1994). For example, a follower-followee relationships, Facebook friendship links, e-mail or message exchanges, and retweet patterns. When these networks are not synthetic but taken from real user interaction data, they are commonly referred to as “real social networks” (Newman 2003; Leskovec et al. 2005) to emphasize that the original data originates from actual networks rather than mechanistic models. These types of networks include a wide variety of online connections such as friendship or following relations in social media (e.g., X), interactions such as sharing messages, replying to emails, or real-life interactions (e.g., academic co-authorship). The properties of this category of networks have been extensively studied from both a static and a dynamic perspective. The structural evolution of growing social networks has been intensively studied by Newman, who analyzed structural properties of some models of growth (Newman 2003), proved that preferential attachment is the origin of power-law degree distributions in collaboration networks (Newman 2001), and developed a new growing model that reproduces features of real-world friendship networks (Jin et al. 2001). However, most research has focus on studying structural properties of networks after a sufficiently long period, rather than on how such properties evolve during networks’ growth (See Table 2). An exception is the work of Leskovec (Leskovec et al. 2005) who, through empirical observation of four real graphs (three of which were social) growing over time, demonstrated that such networks become denser over time and that their diameter shrinks.
In summary, real social networks represent a subclass of social networks that includes a wide variety of graphs with diverse underlying dynamics. As a result, discoveries in the literature about the growth of real social network structures and properties over time may not be universally applicable to all graphs within this class. For example, it is reasonable to think that a graph of retweets could grow differently over time compared to graph of messages in a group chat. Despite belonging to the same category of networks, further investigation into the differences in their structural properties as they evolve over time is needed.
Methodological framework
In order to study online conversations in which users express moral judgments, we collect data from the AITA community ("Data" Section) and operationalize the judgment behavior of participants ("Social judgment" Section). Then, we provide a measure for disagreement among users, representing how much polarizing their judgments are ("Disagreement " Section). Finally, we model each conversation as a growing complex network and we study its evolution in time ("Temporal network analysis" Section).
Data
We downloaded 6366 threads, containing a total of 6,372,251 comments, from the AITA subreddit using the PRAW library.Footnote 4 In particular, we download the “top” submissions — those having the highest score, measured as the difference between upvotes and downvotes of a post (i.e. the thread root). By definition, top posts are likely to have received significant attention, possibly resulting in a large volume of comments. In order to gather a representative dataset, we performed 10 different queries across various temporal scopes, ranging from one week to multiple years, each gathering different sets of top submissions along with all the comments. We set the limit of each query to 1000 to comply with the Reddit API limitsFootnote 5 and we removed duplicated threads. The final dataset size is reported in Table 3, which also contains the temporal scope of data selection. Figure 1a shows the distribution of thread sizes (measured as the number of comments), while Fig. 1b shows the distribution of final verdicts across threads. Note that 75% of the threads have less than 2,000 comments, and 80% of them have been assigned “NTA” as final verdict.
Operationalization
Judgment behavior
In the AITA community, users participate by writing posts (to be judged by others) or comments (to judge others). This paradigm established by the community implies that people commenting are expected to express a vote. We distinguish between voting (i.e. writing comments containing at least one acronym among those listed in Table 1) and discussing (writing text without expressing a vote). For the purposes of this work, we decided to disregard the INFO acronym, as it does not constitute a vote by definition.
The AITA community has specific guidelines about how users should vote and how the votes are processed to obtain the final verdict. Users can access these rules from the dedicated page,Footnote 6 the FAQ page,Footnote 7 or the “Voting rules” section in the navigation panel of the homepage. These resources are also referenced in every post since a bot automatically includes them in a top-level comment produced as soon as the post is published. Such comment is pinned on top for maximum visibility, so users are aware of how they are expected to behave. According to the AITA rules, users must vote including one and only one voting label in their top-level comment. This implies that: (i) users cannot include more than one label in the text, (ii) the label should be one of those provided by the community and correctly spelled, and (iii) the comment containing it must appear in the first level of the thread. The label can appear at any point in the text and does not necessarily have to be capitalized. Since the judgment process (votes and upvotes) lasts 18 h, the comments should also be published within this time window to be part of the voting contest.
Disagreement
To measure disagreement of AITA threads, we use the codified information about judgments expressed by users. As mentioned earlier, in the AITA community, users explicitly take a side and make it public when they express a vote. Consequently, we label each comment with the respective judgment label. The voting labels represent the sides that users are taking, making it straightforward to determine which side each comment belongs to. In this context, we measure the level of disagreement in a thread by measuring the uncertainty of the judgments expressed in the comments. We do this by computing the probability of each label appearing (i.e., of each side to be taken) and measuring the Shannon entropy of the post.
Following (De Candia et al. 2022), who used binary entropy on aggregated votes to measure controversiality, we use multi-label entropy to operationalize disagreement.
Given a the set of labels \(\mathcal {X}\), the entropy of a post is defined as:
where p(x) is the discrete probability distribution of the labels appearing in the comments of the post. Since we do not consider the INFO label, we have six possible labels (see Table 1), so the maximum value of entropy for each post is \(\log _2 |X| \approx 2.6\). Values of the entropy close to 2.6 indicate maximum uncertainty and therefore maximum divisiveness: judgments are uniformly split among the different labels, with people equally taking all the different sides. In this case, we can say that the post has high disagreement. In contrast, a value of 0 would represent the maximum level of certainty: all judgments are unanimous and users all agree on taking one side, so the post has no disagreement. As shown in Fig. 2, around 53% of the posts have low entropy (\(< 0.65\)), indicating that in more than half of the posts people agree on the judgment.
Temporal network analysis
We model the discussions collected from the AITA community as networks of user interactions. For each thread, we build a directed multi-graph \({M} = (V, E, {t, x})\) with attributed nodes and edges. The set of vertices V represents users and the set of edges E represents the answering comments. We extract the voting acronyms of each comment and we store them as a vertex attribute set X. Hence, \(x: V \rightarrow X\) is a function assigning to each vertex, the set of judgments expressed by that user in their comments. Since we could not determine the expressed vote from comments containing different acronyms (e.g., [“NTA”, “ESH”, “YTA”]) we label those judgments as unsureFootnote 8. The temporal information is embedded by a scalar \(t: E \rightarrow T\), stored as an edge attribute, where T is an ordered set of time annotations with a resolution of seconds. We perform a statistical test to prove that such networks are scale-free. This because, in order to compare our network with the state of the art on real social networks, we first need to demonstrate that our networks are scale-free, i.e. that their degree distribution follows a power law distribution \(k^\gamma\), where \(2< \gamma < 3\). Hence we fit our empirical data to a power-law distribution and we measure the distribution of the exponents to verify that they mostly fall in the range [2, 3]. The results are shown in Fig. 3. To assess the goodness of the fit we performed a one-sample Kolmogorov-Smirnov (KS) test for all the degree distributions of the networks, which returned a coefficient smaller than.35 for all the networks and a p-value greater than.001 for 88% of the networks, confirming that the empirical distribution of our data is (significantly) very close to a power-law distribution. Such results confirm that AITA networks are scale-free, hence we can compare their properties with other real social networks.
Then, we study each network M of user interactions from a temporal perspective, by reconstructing them in time. We obtain, for each thread, a set of directed networks \(G = \langle G_1,..., G_k \rangle\) that grow over time, where each network \(G_k = (V_k, E_k), k = 0\dots |E|\) is the k-th network. Therefore, \(V_k \subseteq V\) includes the user starting the thread and all users commenting until k-th messages have been posted. Each edge in the set \(E_k = (v, u) \subseteq E\) indicates that user v has written at least one comment to user u.
Results
As users join the conversation thread, new interactions are formed over time, and the network grows. The dynamic evolution of the network generates two distinct subgraphs: one consisting of participants directly responding to the author of the post (i.e., users writing first-level comments), and the other comprising users joining with comments located at deeper levels in the thread. We refer to these subgraphs as the star and the periphery, respectively. Figure 4 illustrates one of the AITA networks evolving over time, demonstrating how these interactions and subgraphs develop. Red nodes represent users voting in at least one comment, while blue nodes represent users writing comments without expressing a vote (i.e., discussing). Figure 5 shows that most of the voters are located in the star, a consequence of the community rules, which state that votes should be expressed in first-level comments. We describe in detail how this rule impacts user behavior in the community in "Discussion and conclusion" Section.
In the following subsections, we provide different views of these networks of interactions and their subgraphs, and investigate whether the guidelines of the AITA subreddit would result in significantly different structural and growing properties. First, we describe why the star and the periphery exist, and we explore the response time of comments in the network ("Response time" Section). In "Structural propertiesof AITA evolvingover time" Section we analyze how the networks growth from a global perspective, by comparing the evolution of their structural properties with the state of the art of real dynamic social networks, previously summarized in Table 2.
Then, in "Growing networks of Redditthreads" Section we compare the growth of the two substructures of AITA networks with networks from other subreddits, concluding that the growth speed of the star is between 2 and 3 times faster than that of the periphery subgraph. This difference is significantly larger than in other subreddits. Finally, in "Disagreementand reciprocity" Section we examine the relation between thread entropy and other features of the threads to demonstrate that disagreement plays a role in the discussions of the AITA community.
Response time
To capture how quickly users participate in the star and in the periphery, we compute how fast they respond to a message (i.e., how fast a replying edge is added in each subgraph). We calculate the time differences between a comment and its parent node (the post-root or the preceding comment) and we refer to this quantity as the response time R. We only consider response times within the range of \([\mu - 2\sigma , \mu + 2\sigma ]\) to exclude outliers, where the \(\mu\) is the mean response time of parent–child edges in the given graph and \(\sigma\) represents one standard deviation from the mean.
Figure 6 shows the distribution of response times in the star (left) and in the periphery (right), both for comments containing a vote (blue) or not (red). On average, the response time in AITA threads is between \(10^4\) and \(10^5\) s. Our main interest lies in the periphery, where we observe that the response time of voting comments is higher than non-voting comments, suggesting that writing a comment that contains a judgment requires more time. In "Discussionand conclusion" Section we discuss this phenomenon in depth in relation with the AITA community guidelines and with SJT. The difference in response time between voting and non-voting comments in the star is neither interesting, due to a large imbalance in the data—with more than 70% of voting comments in the star—, or statistically significant.
Finally, note that the difference in the average response time between the star and the periphery is very small, and is likely an artifact of how the measure has been constructed. While the response time in the periphery always represents the difference between a comment and the immediate reply, that is not the case for the star. In the star subgraph the response time will always increase as the networks grows since the parent comment is the root (post). For instance, the R between a given comment and the root will always be larger than the distance between a previous comment and the root.
Structural properties of AITA evolving over time
According to the literature, the average shortest path length of growing real social networks usually decreases over time (Leskovec et al. 2005; Jeong et al. 2001; Barabási et al. 2002; Lee et al. 2006; Barabasi and Albert 1999; Boccaletti et al. 2006; Dorogovtsev and Mendes 2002; Watts and Strogatz 1998; Newman 2002; Ravasz and Barabási 2003). This happens because the average number of steps needed to connect two random individuals tends to become relatively small due to the increasing number of paths available. As the network expands over time, more connections are established, increasing the likelihood of finding shorter paths between individuals. The literature attributes this phenomenon to (i) the presence of highly connected individuals (“hubs”) that reduce the distance between different parts of the network, and (ii) the tendency for networks to exhibit a clustered structure, creating local neighborhoods or communities within the network. Hence, real social networks that are scale-free exhibit preferential attachment and community structure, both contributing to shortening the average path length (Barabasi and Albert 1999; Pattanayak et al. 2022; Sallaberry et al. 2013).
In this work, we demonstrate that Reddit networks of user interactions evolve differently from what is described in the literature about growing real social networks. Specifically, during the network reconstruction process (explained in "Temporal network analysis" Section), every time a new edge is added, we calculate the following structural properties of the network: density (d), global clustering coefficient (GCC), average shortest path length (ASPL) and diameter (D). We show that despite being scale-free (see "Temporal network analysis" Section), their ASPL increases with time. Moreover, their global clustering coefficient (GCC) is five orders of magnitude smaller than expected since, on average, an extremely small number of clusters are formed. Figure 7 shows the evolution of these metrics over time for all threads (i.e., averaging the metric value at each timestamp over all the networks). The more edges are created over time, the more the ASPL increases while the GCC decreases. Moreover, the GCC is, on average, very small.
This unexpected behavior of the network is what causes the increase of ASPL over time. Table 2 shows the state of the art of real social networks growing over time. Note that all the examples contained in the table have a high GCC and, when available, a decreasing ASPL over time. By comparing the last row, which represents our AITA networks, with other rows, it is clear that the GCC is negligible and that the ASPL behaves differently when such networks evolve: the more edges are added, the more the ASPL increases over time.
Growing networks of Reddit threads
In this section, we examine the growth speed of the two substructures in the AITA subreddit and compare it with other subreddit networks where the community rules do not incentive a particular behavior. For our comparison, we use five distinct pre-existing subreddits, which are openly available online and include temporal information of the comments. We pre-process these datasets by removing threads containing fewer than 2 comments, as well as duplicate comments. Table 4 shows basic statistics of the datasets used after pre-processing them. For each dataset, we reconstruct its conversations over time following the same methodology described in "Temporal network analysis" Section.
In order to compare the speed of conversations with similar duration over time, we compute the distribution of the thread lengths for each subreddit. Then removing outliers (i.e., extremely long conversations), we group threads by length in time (dividing them into 10 bins) and compute the speed for each group of conversations. We calculate the speed of both the two growing subgraphs as follows:
where \(|e_{m}|\) is the total number of edges of the subgraph g at minute m. We compute the speed for three different time intervals (1 min, 10 min and 1 h) to observe the growth at different granularities. Speeds that could not be computed because of missing data have been set to 0. Figure 8 shows that the difference between the speeds of growth of the two subgraphs is larger in AITA than in other subreddits. The horizontal bars in the plots represent the difference in speed as the number of nodes that join the conversation every minute. Observe that such difference is higher in the AITA community, where the speed of the star is around 2 and 3 times the speed of the periphery. The results for the 10-min and 1-h intervals are not plotted for simplicity, as they yield similar results. We discuss the implications of this result in "Discussionand conclusion" Section.
Disagreement and reciprocity
To understand if disagreement in the judgment process is what drives discussions in AITA conversations, we verify the existence of a monotonic relationship between thread entropy (computed in "Disagreement " Section) and other features of the threads, such as: the ASPL and GCC (computed in "Structural propertiesof AITA evolvingover time" Section), the percentage of users participating only once, the length of the thread (in number of comments), the percentage of users that participate without voting, the average length of comments (in number of words), the score of the comments (see "Data" Section), the thread duration over time, the frequency of the comments (number of edges per minute), the average sentiment of the thread, and the percentage of users expressing an “unsure” comment (see "Temporal network analysis" Section). Among these features, we also include a measure of reciprocal interactions.
Reciprocity is an important behavioral feature of discussion dynamics that fosters mutual participation in conversations between users (Aragón et al. 2017). It is traditionally defined as follows (Aragón et al. 2017):
where \(E^{\leftrightarrow }\) corresponds to the number of bidirectional edges and E corresponds is the total number of edges. This metric ranges from 0 to 1, where a value of 0 indicates the absence of reciprocal edges in the network, and a value of 1 indicates that all edges are reciprocated. We are interested in measuring the amount of reciprocity in AITA threads to assess its role in the judgment process, especially in relation to disagreement. To characterize reciprocity, we exploit the directed network of replies between users in each thread. In such networks, a directed edge between user u and v exists if user u replied to user v in the discussion. By using the metric in Eq. 3, we compute the reciprocity for every static network. Figure 9 shows the distribution of reciprocity in our dataset of networks. Such distribution is right-skewed, with very small reciprocity for the majority of the threads (0.03 on average), suggesting that very few comments in the AITA discussions are reciprocated. Moreover, while the theoretical upper limit of reciprocity is 1, the maximum value observed for this metric in our data is \(\sim 0.43\), revealing that there are no threads with high levels of mutual exchange. We interpret such result in connection with other findings in Sect. 5.
To corroborate the existence of a relationship between disagreement and all the above-mentioned features, we compute the Spearman rank correlation, with results summarized in Table 5 and discussed in the following Sect. 5. We observe that when thread entropy is high (i.e., there is more disagreement in the judgment expressed), users tend to write more than one comment, often engaging in reciprocal discussions with others. They also write more comments, prefer not to vote, and if they do, they include more than one label in the comment, indicating their uncertainty in picking a side. Notably, when randomizing the networks by edge rewiring, the relationship between disagreement and reciprocal discussions tends to disappear.
Discussion and conclusion
In this paper, we analyzed Reddit threads by modeling them as networks of user interactions and by computing the evolution of their structural properties over time. We show that these networks differ from real social networks, despite falling in the same category, as they exhibit a negligible GCC and an increasing ASPL. We also demonstrated that networks of the AITA community grow differently with respect to networks from other subreddits, as the difference in speed between the two subgraphs is larger than in other subreddits. In this section, we discuss such results in the context of Social Judgement Theory, particularly regarding disagreement in the judgment process.
We interpret the results presented in "Structural propertiesof AITA evolvingover time" Section by referring to the structure of the platform, which allows threaded-structured conversations and shapes the user interaction differently compared to other real social networks. Indeed, Reddit is not a relationship-based social network, meaning that most of the user interactions are content-driven and not user-driven (Makow et al. 2017). This means that users on Reddit do not join to comment on a specific person but on a specific content (post or comment). This difference in how the platform is built shapes user interactions differently, generating a different behavior in the networks as they evolve over time.
Furthermore, the unexpected behavior of GCC and ASPL reveals that participants mostly interact with only one other user, and often by a single message. To further inspect such user behavior, we derived a measure of reciprocal interaction and its relation to the disagreement in the judgment process of AITA threads. We have shown in "Disagreementand reciprocity" Section that disagreement plays an important role in online discussions where people are expected to express a judgment. It is significantly related to the generation of more discussions and more reciprocal interactions and, at the same time, to more uncertainty in judgment expression. This could reveal that, despite the anonymity of users on Reddit, users might not feel free to explicitly express their opinions in discussions with high disagreement. This is coherent with SJT: indeed, if it is true that people are more prone to express opinions in anonymous environments and settings (Adamic et al. 2021; Spears 2021), it is also true that in situations of high disagreement they perceive less support for their viewpoint from the social environment, making them less likely to express their judgments (Glynn et al. 1997; Chun and Lee 2017). Furthermore, in relation to the expression of social judgment in online discussions, in this work we have also shown that comments containing a judgment have a higher response time than comments that do not include it ("Response time" Section). This finding aligns with moral judgment theories stating that responses to moral dilemmas require cognitive control, which is an emotional process that takes time (Suter and Hertwig 2011). The more time needed for voting comments could also be due to the AITA community guidelines that encourage users to include a justification for the expressed vote in the text of their comments. In summary, the obtained results contribute to the advancement of unexplored aspects of the SJT, especially related to online communication.
We conclude that the temporal analysis of the structural properties of these networks reveals the following behavioral patterns of users discussing on Reddit. Participants mostly interact with only one other user, often by a single message. The lack of clusters, together with the very small reciprocity, suggests that most of the new users participating in the conversation do not engage with more than one person. They join the thread to respond to a single user, rarely with more than one message exchange.
In this work we also demonstrate that the speed of the star in the AITA conversations grows faster than the periphery (Sect. 4.3). We interpret this as a consequence of community guidelines enforcing the behavior of participants, since it is a direct consequence of the community rules. As explained in "Judgmentbehavior" Section, these rules indicate that votes expressed in comments that are not first-level will not be considered for the final judgment verdict, hence encouraging people to participate in the thread by answering to the post author. The periphery is, as a consequence, a spontaneous behavior of the users who discuss instead of voting (only 30% of the voters are voting in the periphery, as shown in Fig. 5).
Availability of data and materials
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
Notes
https://www.reddit.com/r/AmItheAsshole According to Reddit (2023), AITA is the most viewed Reddit community since 2020.
In this work we will use the term “real social networks” to refer to networks modeling social interactions which comes from real-world data, i.e. non-random and not synthetically generated.
Python Reddit API Wrapper (https://praw.readthedocs.io/en/stable/).
Note that we did not include “unsure” comments in the disagreement computation described in the previous section, since it is impossible to infer from such comments which judgment users are willing to express.
References
Adamic LA (1999) The small world web’. In: Research and advanced technology for digital libraries. Springer, Berlin, Heidelberg
Addawood A et al. (2017) Telling apart tweets associated with controversial versus non-controversial topics. In: Proc. Second Workshop on NLP and Computational Social Science. ACL, pp. 32–41. https://doi.org/10.18653/v1/W17-2905
Alis CM et al (2015) Quantifying regional differences in the length of twitter messages. PLOS ONE. https://doi.org/10.1371/journal.pone.0122278
Aragón P, Gómez V, Kaltenbrunner A (2017) To thread or not to thread: the impact of conversation threading on online discussion. In: Proc. International AAAI Conference on Web and Social Media. 11(1). https://ojs.aaai.org/index.php/ICWSM/article/view/14880
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. In: Science (New York, N.Y.) 286.5439, pp. 509–512. https://doi.org/10.1126/science.286.5439.509
Barabási AL et al (2002) Evolution of the social network of scientific collaborations. Phys A 311(3):590–614. https://doi.org/10.1016/S0378-4371(02)00736-7
Benjamin D, Žiga T, Urša Z (2022) Semantic Analysis of Russo-Ukrainian War Tweet Networks. In: SCORES: Ljubljana, Slovenia
Boccaletti S et al (2006) Complex networks: structure and dynamics. Phys Rep 424(4):175–308. https://doi.org/10.1016/j.physrep.2005.10.009
Botzer N, Shawn G, Weninger T (2023) Analysis of moral judgment on reddit. IEEE Trans Comput Social Syst 10(3):947–957. https://doi.org/10.1109/TCSS.2022.3160677
Bouzoubaa L, Young J, Rezapour R (2024) Exploring the landscape of drug communities on reddit: a network study. In Proceedings of the 2023 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, https://doi.org/10.1145/3625007.3629125
Brehmer B (1988) Chapter 1 The development of social judgment theory. In: Advances in Psychology. Vol. 54. Human Judgment the SJT View. North-Holland, pp. 13–40. https://doi.org/10.1016/S0166- 4115(08) 62169-X
BwandoWando (2024) Reddit r/Jokes Dataset. 2024. https://doi.org/10.34740/KAGGLE/DSV/7381469
BwandoWando (2024) Reddit r/Ukraine Dataset. https://doi.org/10.34740/KAGGLE/DSV/7742648
BwandoWando. (2024) Reddit r/PinoyProgrammer Dataset. 2024. https://doi.org/10.34740/KAGGLE/DSV/7742835
Cauteruccio F, Kou Y (2023) Investigating the emotional experiences in eSports spectatorship: the case of league of legends. Inf Process Manag. https://doi.org/10.1016/j.ipm.2023.103516
Chun JW, Lee MJ (2017) When does individuals’ willingness to speak out increase on social media? Perceived social support and perceived power/control. Comput Hum Behav 74:120–129. https://doi.org/10.1016/j.chb.2017.04.010
Conover M et al. (2021) Political Polarization on Twitter. In: Proc. International AAAI conference on web and social media 5(1):89–96. https://doi.org/10.1609/icwsm.v5i1.14126
De Candia S et al. (2022) Social norms on reddit: a demographic analysis. In: 14th ACM Web science conference 2022. WebSci ’22: 14th ACM Web Science Conference 2022. ACM, June 26, pp. 139–147. https://doi.org/10.1145/3501247.3531549
Dorogovtsev SN, Mendes JFF (2002) Evolution of networks. Adv Phys 51(4):1079–1187. https://doi.org/10.1080/00018730110112519
Garimella K et al (2018) Quantifying controversy on social media. Trans Soc Comput. https://doi.org/10.1145/3140565
Gilbert DT (2002) Inferential correction. In: Heuristics and biases: the psychology of intuitive judgment. Cambridge University Press, pp. 167–184. https://doi.org/10.1017/CBO9780511808098.011
Gillespie T (2018) Custodians of the internet: platforms, content moderation, and the hidden decisions that shape social media. pp. 1–288. https://doi.org/10.12987/9780300235029
Giorgi S et al. (2023) Author as character and narrator: deconstructing personal narratives from the r/AmITheAsshole Reddit Community’. In: Proc. International AAAI conference on web and social media 17, pp. 233–244. https://doi.org/10.1609/icwsm.v17i1.22141
Glynn CJ, Hayes AF, Shanahan J (1997) Perceived support for one’s opinions and willingness to speak out: a meta-analysis of survey studies on the ’spiral of silence’. In: Public Opinion Quarterly. 61(3)
Grabowicz PA et al (2011) Social features of online networks: the strength of intermediary ties in online social media. PLoS ONE 7:e29358
Grossman J (2002) Patterns of collaboration in mathematical research. 35(9)
Helm B et al (2024) Examining incel subculture on Reddit. J Crime Just 47(1):27–45. https://doi.org/10.1080/0735648X.2022.2074867
Hintz EA, Betts T (2022) Reddit in communication research: current status, future directions and best practices. Ann Int Commun Assoc 46(2):116–133. https://doi.org/10.1080/23808985.2022.2064325
Horawalavithana S et al (2022) Online discussion threads as conversation pools: predicting the growth of discussion threads on reddit. Comput Math Org Theory. https://doi.org/10.1007/s10588-021-09340-1
Hornsey MJ et al (2003) On being loud and proud: non-conformity and counter-conformity to group norms. Br J Soc Psychol 42:319–35
Jamnik M, Lane D (2019) The use of reddit as an inexpensive source for high-quality data. Pract Assess Res Eval. https://doi.org/10.7275/j18t-c009
Jeong H, Néda Z, Barabasi AL (2001) Measuring preferential attachment in evolving networks. Europhysics Lett. https://doi.org/10.1209/epl/i2003-00166-9
Jin Emily M, Michelle G, Newman MEJ (2001) Structure of growing social networks. Phys Rev E 64(4):046132. https://doi.org/10.1103/PhysRevE.64.046132
Jörg M, Rios MK, Christian S (2010) A spiral of silence for some: attitude certainty and the expression of political minority opinions. Commun Res 37(6):774–800. https://doi.org/10.1177/0093650210362685
Kong Joseph S, Nima S, Roychowdhury Vwani P (2008) Experience versus talent shapes the structure of the Web. Proc Natl Acad Sci USA 105(37):13724–13729. https://doi.org/10.1073/pnas.0805921105
Krohn R, Weninger T (2019) Modelling online comment threads from their start. arXiv:1910.08575 [cs.SI]
Kumar S et al. (2018) Community interaction and conflict on the web. In: Proc. 2018 world wide web conference. International world wide web conferences steering committee, pp. 933–943. https://doi.org/10.1145/3178876.3186141
Lada A et al (2021) Rating friends without making enemies. Proc Int AAAI Conf Web Social Media 5(1):2–9. https://doi.org/10.1609/icwsm.v5i1.14121
Lamba H, Malik MM, Pfeffer J (2015) A Tempest in a Teacup? Analyzing firestorms on Twitter. In: 2015 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). pp. 17–24. https://doi.org/10.1145/2808797.2808828
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proc. eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp. 177–187. https://doi.org/10.1145/1081870.1081893
Mayank K, Ke S (2022) Can Scale-free network growth with triad formation capture simplicial complex distributions in real communication networks? arxiv:abs/2203.06491
Medvedev AN, Lambiotte R, Delvenne JC (2019) The anatomy of reddit: an overview of academic research. In: Dynamics on and of complex networks III. Springer, Cham. https://doi.org/10.1007/978-3-030-14683-2_9
Mejova Y et al. (2014) Controversy and sentiment in online news. In: Computation and Journalism Symposium
Morrison KR, Miller DT (2011) Explaining differences in opinion expression: direction matters. In: Rebels in groups: dissent, deviance, difference and defiance. Wiley Blackwell, pp. 219–237
Newman MEJ (2002) Random graphs as models of networks. http://arxiv.org/abs/cond-mat/0202208
Newman MEJ (2004) Coauthorship networks and patterns of scientific collaboration. In: Proc. National Academy of Sciences of the United States of America 101.Suppl 1, pp. 5200–5205. https://doi.org/10.1073/pnas.0307545100
Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102. https://doi.org/10.1103/PhysRevE.64.025102
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256. https://doi.org/10.1137/S003614450342480
Nicola S et al. (2011) Time-varying graphs and social network analysis: temporal indicators and metrics. arxiv:abs/1102.0629
Noah M, Matt M, Chenyao Y (2017) A network model for reddit post virality prediction
Noelle-Neumann E (1993) The spiral of silence: public opinion-our social skin. University of Chicago Press
Pattanayak HS, Verma HK, Sangal AL (2022) Lengthening of average path length in social networks due to the effect of community structure. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.10.014
Petruzzellis F et al. (2023) On the relation between opinion change and information consumption on reddit. In: Proceedings of the international AAAI conference on web and social media 17. https://doi.org/10.1609/icwsm.v17i1.22181
Pohl JS et al. (2022) “Invasion@ Ukraine: providing and describing a twitter streaming dataset that captures the outbreak of war between Russia and Ukraine in . In: Proc. International AAAI Conference on Web and Social Media. 17:1093–1101
Ravasz E, Barabási A, (2003) Hierarchical organization in complex networks. Phys Rev E. https://doi.org/10.1103/PhysRevE.67.026112
Reddit. Reddit Recap 2023. https://www.reddit.com/r/recap/comments/18c4kvr/keeping_it_dialed_in_2023_redditors_sought_honest/?rdt=57856
Rios MK, Miller Dale T (2008) Distinguishing between silent and vocal minorities: not all deviants feel marginal. J Pers Soc Psychol 94(5):871–882. https://doi.org/10.1037/0022-3514.94.5.871
Rossi R, Ahmed N (2015) The network data repository with interactive graph analytics and visualization. In: AAAI. https://networkrepository.com
Russell S (2021) Social influence and group identity. Annu Rev Psychol 72(1):367–390. https://doi.org/10.1146/annurev-psych-070620-111818
Russo G et al. (2023) Spillover of antisocial behavior from fringe platforms: the unintended consequences of community banning. In: Proceedings of the international AAAI conference on web and social media 17(1):742–753. https://doi.org/10.1609/icwsm.v17i1.22184
Sallaberry A, Zaidi F, Melançon G (2013) Model for generating artificial social networks having community structures with small-world and scale-free properties’’. Social Netw Anal Min. https://doi.org/10.1007/s13278-013-0105-0
Sang HL, Pan-Jun K, Hawoong J (2006) Statistical properties of sampled networks. Phys Rev E. https://doi.org/10.1103/physreve.73.016102
Scott J (2000) Social network analysis: a handbook, 2nd edn. Sage, London
Shatz I (2017) Fast, free, and targeted: reddit as a source for recruiting participants online. Social Sci Comput Rev 35(4):537–549. https://doi.org/10.1177/0894439316650163
Sho T, Sumaru N (2020) The impact of social network structure on the growth and survival of online communities. In: Proc. 2019 IEEE/ACM International conference on advances in social networks analysis and mining. ACM, pp. 1112–1119. https://doi.org/10.1145/3341161.3343526
Smith D (2018) Chapter 1 - Social Media in Society. In: Growing your library career with social media. Chandos Publishing, https://doi.org/10.1016/B978-0-08-102411-9.00001-7
Suter RS, Hertwig R (2011) Time and moral judgment. In: Cognition. https://doi.org/10.1016/j.cognition.2011.01.018
Szabó Gábor, Alava M, Kertész Já (2004) Clustering in complex networks. In: Complex networks. Lecture Notes in Physics. Springer, pp. 139–162. https://doi.org/10.1007/978-3-540-44485-5_7
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Structural analysis in the social sciences. Cambridge University Press, https://doi.org/10.1017/CBO9780511815478
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks’’. Nature. https://doi.org/10.1038/30918
Weninger T, Zhu XA, Han J (2013) An exploration of discussion threads in social news sites: a case study of the Reddit community”. In: Proceedings of the 2013 IEEE/ACM International conference on advances in social networks analysis and mining. ACM, https://doi.org/10.1145/2492517.2492646
Yiming Z et al. (2022) A Reddit Dataset for the Russo-Ukrainian Conflict in 2022 . https://doi.org/10.48550/arXiv.2206.05107
Acknowledgements
We would like to thank Prof. Matteo Magnani for his comments and suggestions.
Funding
Open access funding provided by Uppsala University. This work has been partly funded by eSSENCE, an e-Science collaboration funded as a strategic research area of Sweden. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Both authors contributed to the design and implementation of the research. The data acquisition, data curation, formal analysis and validation has been carried by D.G. Both authors contributed to the analysis of the results and to the writing of the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Goglia, D., Vega, D. Structure and dynamics of growing networks of Reddit threads. Appl Netw Sci 9, 48 (2024). https://doi.org/10.1007/s41109-024-00654-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109-024-00654-y