Introduction

The last few years have seen an eruption of political protests aided by internet technologies. The phrase Twitter revolution was coined in 2009 to refer to the mass mobilizations that took place in Moldova1 and, a few months later, in Iran2, in both cases to protest against fraudulent elections. Since then, the number of events connecting social media with social unrest has multiplied, not only in the context of authoritarian regimes exemplified by the recent wave of upsurges across the Arab world but also in western liberal democracies, particularly in the aftermath of the financial crisis and changes to welfare policies. These protests respond to very different socio-economic circumstances and are driven by very different political agendas, but they all seem to share the same morphological feature: the use of social networking sites (SNSs) to help protesters self-organize and attain a critical mass of participants. There is, however, not much evidence on how exactly SNSs encourage recruitment. Empirical research on online activity around riots and protests is scarce and the few studies that exist3,4,5 show no clear patterns of protest growth. Related research has shown that information cascades in online networks occur only rarely6,7,8, with the implication that even online it is difficult to reach and mobilize a high number of people. Revolutions, riots and mass mobilizations are also rare and, as such, difficult to predict; but when they happen, they unleash potentially dramatic consequences. The relevant question, which we set to answer here, is not when these protests take place but whether and how SNSs contribute to trigger their explosion.

Sociologists have long analyzed networks as the main recruitment channels through which social movements grow9,10. Empirical research has shown that networks were crucial to the organization of collective action long before the internet could act as an organizing tool, with historical examples that include the insurgency in the Paris commune of 187111, the 60's civil right struggles in the U.S.12 and the demonstrations that took place in East Germany prior to the fall of the Berlin wall13,14. These studies provide evidence that recruits to a movement tend to be connected to others already involved and that networks open channels through which influence on behavior spreads, but they are limited by the quality of the network data analyzed, particularly around time dynamics. Analytical models have tried to overcome these data limitations by recreating the formal features of interpersonal influence and analyzing how they are related to diffusion15,16,17,18 and to examples of social contagion like collective action or the growth of social movements19,20,21,22,23. Four main findings arise from these models. First, the shape of the threshold distribution, i.e. the variance in the propensity to join intrinsic to people, determines the global reach of cascades. Second, individual thresholds interact with the size of local networks: two actors with the same propensity might be recruited at different times if one is connected to a larger number of people. Third, attaining a critical mass depends on being able to activate a sufficiently large number of low threshold actors that are also well connected in the overall network structure. And fourth, the exposure to multiple sources can be more important than multiple exposures: unlike epidemics, the social contagion of behavior often requires reinforcement from multiple people. Recent experiments have confirmed the relevance of complex contagions to explain behavior in online contexts24 and large-scale analyses have validated its effects on information diffusion on Twitter25.

Models of collective action have identified important network mechanisms behind the decision to join a protest, but they suffer from lack of empirical calibration and external validity. Online networks and the role that SNSs play in articulating the growth of protests, offer a great opportunity to explore recruitment mechanisms in an empirical setting. We analyze one such setting by studying the protests that took place in Spain in May 2011. The mobilization emerged as a reaction to the political response to the financial crisis and it organized around broad demands for new forms of democratic representation. The main target of the campaign was to organize a protest on May 15, which brought tens of thousands of people to the streets of 59 cities all over the country. After the march, hundreds of participants decided to camp in the city squares until May 22, the date for local and regional elections; crowded demonstrations took place daily during that week. After the elections, the movement remained active but the protests gradually lost strength and its media visibility waned (more background information in SI).

We analyze Twitter activity around those protests for the period April 25 (20 days before the first mass mobilizations) to May 25 (10 days after the first mass mobilizations and 3 days after the elections). The data set follows the posting behavior of 87,569 users and tracks a total of 581,750 protest messages (see Methods). We know, for each user, who they follow and who is following them. In addition to this asymmetric network, we also consider a version of the network that only retains reciprocated and therefore stronger connections. Previous research has suggested that Twitter is closer to a news media platform than to a social network7; this research suggests that the properties of the online network cannot be directly compared to other social networks because of the prominence of broadcasters. The symmetric (reciprocated) network mitigates the relatively higher influence of these hubs of activity and retains only connections that reflect mutual acknowledgement between users, which is arguably a stronger proxy to offline relationships. Contrasting recruitment patterns in both the asymmetric and symmetric networks allows us to test whether the dynamics of mobilization depend on weak, broadcasting links or on stronger connections, based on mutual recognition. Our analysis of recruitment is based on the assumption that users joined the movement the moment they started sending Tweets about it. We also assume that once they are activated, they remain so for the rest of the period we consider.

Results

By the end of our 30-day window, most users in the network had sent at least one message related to the protest, with only about 2% remaining silent (but still being exposed to movement information, Fig. 1).The most significant increase in activity takes place right after the initial protest (May 15), during the week leading to the elections of May 22. Up to that point, only about 10% of the users had sent at least a message related to the protests.

Figure 1
figure 1

Fraction of recruited users over time.

The vertical axis is normalized by the total number of users (87,569), the horizontal axis tracks the number of activated users accumulated by hours. At the end of our time window the proportion of activated users is 98.03%, which means that the vast majority of users sent at least one protest message during this month. Vertical labels flag some of the events that took place during the period.

Activation times tell us the exact moment when users start emitting messages and allow us to distinguish between activists leading the protests and those who reacted in later stages. We calculated, for each user, the proportion of neighbors being followed that had been active at the time of recruitment (ka/kin ). This gives us a measure that approximates the threshold parameter used in formal models of social contagion, particularly those that incorporate networks17,18,22. Activists with an intrinsic willingness to participate have a threshold ka/kin 0, whereas those who need a lot of pressure from their local networks before they decide to join are in the opposite extreme ka/kin 1. Looking at the empirical distribution, most users in our case exhibit intermediate values (Fig. 2A). Although the distribution is roughly uniform for almost the full threshold interval, there are two local maxima at 0 (users who act as the recruitment seeds) and 0.5 (users who join when half their neighbors already did). The symmetric network has a significantly higher number of users with ka/kin = 0 because it mitigates the influence of hubs or broadcasters (i.e. users who do not reciprocate connections about 7,000 in the overall network but who contribute to activate low threshold participants, the seeds in the symmetric network). The shape of the distribution changes before and after 15 May, the first big demonstration day (Fig. 2B). Most early participants i.e. users who sent a message prior to the first mass mobilizations and to the news media coverage of the events needed, on average, less local pressure to join, which is consistent with their role as leaders of the movement. Because most activity takes place after 15-M, the threshold distribution for the ten days that followed is not very different from the threshold distribution for the full period.

Figure 2
figure 2

Distribution of thresholds ka/kin .

(A) The vertical axis measures the proportion of users activated for each threshold of activated neighbors, tracked in the horizontal axis. The figure shows measures for both the asymmetrical and symmetrical networks. When broadcasters (i.e. users with a high number of followers who do not reciprocate connections) are eliminated, the number of early participants with ka/kin = 0 increases by an order of magnitude, which suggests that broadcasters are influential at recruiting low-threshold individuals. Panel (B) splits the data in two subsets: the first subset considers recruitment activity before 15-M, the day of the first mass demonstrations; the second subset tracks activity after 15-M. Media coverage, which increased after 15-M, does not seem to cause a significant rise in the number of early activated, low-threshold users.

The actual chronological time of activation changes across same-threshold actors (see SI, Fig. S2); this variation is predictable given that actors react to different local networks, both in size and composition. The time it takes neighbors to join, however, also influences the activation of users. We measure the pace at which the number of active neighbors grows using the logarithmic derivative of activation times ka/ka = (kat+1kat)/ kat+126. The rationale behind ka/ka is that some users might be susceptible to recruitment bursts, that is, more likely to join if many of their neighbors do in a short time-span. This emphasis on time dynamics qualifies the idea of complex contagion: receiving stimuli from multiple sources is important because, unlike epidemics, social contagion often requires exposure to a diversity of sources22; evidence of recruitment bursts would suggest that the effects of multiple and diverse exposures are magnified if they take place in a short time window. We find that early participants, i.e. users with low thresholds, are insensitive to recruitment bursts; for the vast majority of users, however, being exposed to sudden rates of activation precedes their decision to join (Fig. 3A). Users with moderate thresholds who are susceptible to bursts act as the critical mass that makes the movement grow from a minority of early participants to the vast majority of users: without them, late participants (the majority of users that made the movement explode) would not have joined in (Fig. 3B).

Figure 3
figure 3

Thresholds, recruitment bursts and time of activation.

(A) The figure measures the association between bursts of activity and thresholds; while early participants (ka/kin < 0.2) are not affected by bursts, moderate-threshold users (0.2 ka /kin 0.5) and high threshold users (ka/kin > 0.5) are more likely to join the exchange of messages if they see a sudden increase of participants in their local networks; the slope of the curve indicates that higher threshold users are more susceptible to bursts of activity. (B) This figure shows the percentage of activated users grouped as early, mid and late participants for each day of the period considered; most late participants joined the protests after 15-M, once a critical mass of mid participants had already been activated.

Information diffusion follows different dynamics. Very few messages generate cascades of a global scale: we assume that if a user emits a message at time t and one of their followers also emits a message within the interval (t, t+ t), both messages belong to the same chain. A chain is aborted when none of the followers exposed to a message acts as a spreader and messages can only belong to a single chain, i.e. only the messages that do not belong to a previous chain are considered seeds for a new cascade (see Methods). The vast majority of these chains die soon, with only a very small fraction reaching global dimensions, a result that is robust using different time intervals (Fig. 4A). This supports previous findings6,7,8 and reveals that cascades are rare even in the context of exceptional events. We run a k-shell decomposition27 to identify the network position of users acting as seeds of the most successful chains. We found a positive association between network centrality, as measured by the classification of nodes in high k-cores and cascade size (Fig. 4B). This positive association suggests that agents at the core of the network not necessarily those with a higher number of connections, but connected to equally well connected users (Fig. 4CD) are the most effective when it comes to spreading information, again in accordance with what has been found in research on epidemics and contagion28. Spreaders, though, need to be recruited first and the same decomposition analysis does not find any significant association between thresholds and topology, i.e. early participants do not have a characteristic network position; they are instead scattered all over the network (see SI, Fig. S5).

Figure 4
figure 4

Distribution of cascade sizes and core position of spreaders.

(A) The distribution of cascade sizes (Nc ) suggests that only a few cascades percolate to affect most users and that the vast majority die in the early stages of diffusion. (B) There is a positive correlation between network centrality, as measured by the classification of nodes in high k-cores and cascade sizes, suggesting that users at the core of the network are more likely to be the seeds of global chains of information diffusion. (C) The nodes in the network arranged according to their k-core; node size accounts for degree centrality and node color indicates the maximum size of the cascades generated by the user (users generating the largest cascades are depicted in orange). (D) Example of a global cascade affecting about 35,000 nodes. Nodes in blue are users who participated in the diffusion of protest messages; nodes in orange were exposed to the messages but did not send messages of their own. The darker the shade of blue, the earlier users joined the cascade as spreaders; the lighter the shade of yellow, the later users joined the cascade as listeners.

Discussion

The role that SNSs play in helping protests grow is uncontested by most media reports of recent events. However, there is not much evidence of how exactly these online platforms can help disseminate calls for action and organize a collective movement. Our findings suggest that there are two parallel processes taking place: the dynamics of recruitment and the dynamics of information diffusion. While being central in the network is crucial to be influential in the diffusion process, there is no topological position that characterizes the early participants that trigger recruitment. This suggests that whatever exogenous factors motivate early participants to start sending messages, the consequence is that they create random seeding in the online network: they spur focuses of early activity that are topologically heterogeneous and that spread through low threshold individuals. This finding is consistent with previous work using simulations that test (and challenge) the influential hypothesis17,18. However, a small core of central users is still critical to trigger chains of messages of high orders of magnitude. The advantage that this minority has as cascade generators derives from their location in the network; contrary to what has been argued in previous research4, centrality in the network of followers is still a meaningful measure of influence in online networks at least in the context of mass mobilizations.

The decision to join a protest depends on multiple reasons that we do not capture with online data for instance, the amount of offline news media to which users are exposed. It is not surprising, then, that network position does not account for time of activation as it does for cascading influence (the diffusion of messages is, for the most part, endogenous, depending on the network structure). However, there is one element in the recruitment process that is endogenous as well and that is the timing of exposures. The existence of recruitment bursts indicates that the effects of complex contagion22 are boosted by accelerated exposure, that is, by multiple stimuli received from different sources that take place within a small time window. These bursts facilitated by the speed at which information flows online provide empirical evidence of what scholars of social movements have called, metaphorically, collective effervescence29. We provide an empirical measure for that metaphor and find that most users are susceptible to it. These findings qualify threshold models of collective action that do not take into account the urgency to join that bursts of activity instill in people.

In addition, this study provides evidence of why horizontal organizations (like the platform coordinating this protest, see SI) are so successful at mobilizing people through SNSs: their decentralized structure, based on coalitions of smaller organizations, plant activation seeds randomly at the start of the recruitment process, which maximizes the chances of reaching a percolating core; users at this network core, in turn, contribute to the growth of the movement by generating cascades of messages that trigger new activations and so forth. These joint dynamics illustrate the trade-off between global bridges (controlled by well connected users) and local networks: the former are efficient at transmitting information, the later at transmitting behavior22. This is one reason why Twitter has played a prominent role in so many recent protests and mobilizations: it combines the global reach of broadcasters with local, personalized relations (which we approximate in the form of reciprocal connections); in the light of our data, both features are important to articulate the growth of a movement. These features, however, are necessary, not sufficient, conditions. Again, being able to generate recruitment patterns on a scale of this order is still an exceptional event and this study sheds no light that helps predict future occurrences; but it shows that when exceptional events like mass mobilizations take place, recruitment and information diffusion dynamics are reinforcing each other along the way.

Our data has two main limitations. First, we might be overestimating social influence because we do not control for demographic information and the effects of homophily in network formation30. Studies that control for demographic attributes, however, still find that networks are significant predictors of recruitment10,14; in the light of those findings, we can only assume that online networks will still be significant channels for the spread of behavior once demographics are taken into account. Second, we also do not control for exposure to offline media, which is likely to have interacted with social influence, or to other sources of information that might have also contributed to recruitment (like, for instance, offline discussion networks). The lack of media coverage before the demonstrations of May 15 allows us to conduct a natural experiment and compare how the network channels recruitment with and without the common knowledge of media exposure. We show that there is no significant shift to the left of the threshold distribution once the media starts reporting on the protests this would have indicated that exposure to mass media led to a higher proportion of users joining the protests in the absence of local pressure. On the contrary, we find that local pressure is still an important precursor for a large number of users and that the vast majority are still susceptible to bursts of activity in their local networks.

Our findings, however, are still limited by the fact that we are not capturing the full range of information exposure: users had access to other sources we do not consider that might have also influenced their decision to join the movement; this unobserved exposure is surely overestimating the influence effects of Twitter activity. The different times of adoption that we analyse suggest that for some users (the early adopters) online activity in Twitter had probably more weight in their decision to join than for others (the late adopters, who needed reinforcement from other sources, probably mass media or offline networks, before displaying their commitment online). Further investigations should consider the relative weight that different sources of information have in shaping individual behaviour.

In addition, future research should consider if our results are robust using time dependent networks. One of the main assumptions of our data is that the network of followers does not change during the period considered; in fact, a significant number of connections are likely to have been created as a result of the mobilization itself. Future work should also address if our findings are platform-dependent or universal to different types of online networks. Recent events, like the riots in London in August 2011, suggest that different online platforms are being used to mobilize different populations31. The question that future research should consider is if the same recruitment patterns apply regardless of the technology being used, or if the affordances of the technology (i.e. public/private by default) shape the collective dynamics that they help coordinate. The replication of these analyses with data covering similar events (like the OccupyWallStreet protests initiated in New York and soon spreading to other U.S. cities) could help determine if the dynamics we identify here can be generalized to different social contexts.

Methods

The data contains time-stamped tweets for the period April 25 to May 25. Messages related to the protests were identified using a list of 70 #hashtags (full list in SI). The collection of messages is restricted to Spanish language and to users connected from Spain and it was archived by a local start-up company, Cierzo Development Ltd using the SMMART Platform. We estimate that our sample captures above a third of the total number of messages exchanged in Twitter related to the protests. The network of followers was reconstructed applying a one-step snowball sampling procedure, using the authors that sent protest messages as the seed nodes. An arc (i,j) in this network means that user i is following the Tweets of user j and we assume that this network is static for the period we consider. The symmetric network filters out all asymmetric arcs, that is, for every arc (i,j) there also needs to be an arc (j,i).

We reconstruct message chains assuming that protest activity is contagious if it takes place in short time windows. We do not have access to re-tweet (RT) information, but since all our messages are related to the 15-M movement, chains refer to the same subject matter (although the precise content of the messages in the same chain might differ). This measurement maps the extent to which the stream of content related to the protests diffuses in given time windows.

The k-shell decomposition assigns a shell index ks to each user by pruning the network down to users with more than k neighbours. The process starts removing all nodes with degree k = 1, which are classified (together with their links) in a shell with index ks = 1. Nodes in the next shell, with degree k = 2, are then removed and assigned to ks = 2 and so forth until all nodes are removed (and all users are classified). Shells are layers of centrality in the network: users classified in shells with higher indexes are located at the core, whereas users with lower indexes define the periphery of the network (see SI for details of node classification in shells).