Combining interventions to reduce the spread of viral misinformation

Data collection and processing

All data were collected in accordance with the University of Washington Institutional Review Board. Our dataset was collected in real time during the 2020 US election. We relied on a set of 160 keywords to collect posts from Twitter’s API (1.04 billion). The keywords were updated in response to new narratives—for instance, adding ‘sharpiegate’ and related terms after false narratives emerged about the use of Sharpie markers invalidating ballots. Working with the Electoral Integrity Partnership, we catalogued instances of false or misleading narratives that were either detected by the team or reported by external partners2. This led to a large corpus of tickets associated with validated reports of misleading, viral information about election integrity.

Tickets that shared a common theme were consolidated into incidents. We developed search terms and a relevant date range for each incident to query posts from our tweet database. Incidents (N = 430) were generally characterized by one or more periods of intense activity followed by returning to a baseline state (Fig. 1a). The search terms and descriptions of the incidents are provided along with the data.

We then wished to extract segments of the time series that exhibit macroscopic features consistent with viral dynamics. More specifically, candidate events should exhibit quiescent periods before and after the event where our search terms return to baseline levels. However, multiple peaks may occur between these boundaries. To extract candidate events, we computed the raw time series of post volume per five minutes for each of our distinct incidents. We then identified events by finding the five-minute interval within the aggregated time series with the largest number of collected posts. Other peaks in activity were considered part of separate events if they were at least 30% of the magnitude of the largest peak (to filter out noise). Starting with the largest peak, we identified its boundaries as the points before and after the peak where the number of posts in five minutes was less than 5% of the maximum volume. This may include multiple peaks within the same event, if no quiescent period occurred between them. We then repeated this process for all remaining peaks. If periods of activity less than 5% of the maximum peak height did not occur within the range of data collection, the first (or last) time point collected was used to denote the beginning (or end) of an event. Finally, events were required to last at least an hour (that is, 12 data points). This process extracted 544 candidate events from 269 incidents.

Statistical and computational model

Model derivation

We then derived a model of spreading dynamics during viral misinformation cascades. We restricted our model to the dynamics of misinformation flow within a single event rather than longer-timescale processes such as the adoption of beliefs and behaviours. The spread of beliefs and behaviours often requires that multiple neighbours have adopted the state (that is, a complex contagion)16. The acceptance of a given misinformation narrative, for instance, can involve complicated cognitive processes involving partisan leanings, prior knowledge, attention, the message content and a host of other factors4,17.

Re-sharing of information on Twitter, however, requires solely that a single neighbour has shared a piece of content for it to potentially be seen and retweeted18,19. Moreover, empirical work has demonstrated that out-degree (follower count) nearly linearly predicts engagement20,21. These features are hallmarks of simple contagions at the timescales of interest in our events. Following previous work, we therefore model the spread of viral misinformation as a simple contagion22,23. At the core of our model is a latent virality parameter, v, which tracks the amount of attention a topic is garnering over time. Unlike typical compartmental models, accounts vary widely in their out-degree from 0 followers to more than 100 million. In disease research, branching process models have incorporated various degree distributions to examine the role of super-spreaders14.

We built on models of super-spreading and leverage the fact that the out-degree of each account can be estimated by their total followers24,25. When a user posts during an event, our model assumes that virality is increased proportionately to their number of followers (that is, the total exposed). However, network saturation and competition for attention with other topics can reduce virality over time. We incorporate this by adding a decay function, such that virality naturally decays over time. Together, growth from sharing and decay from saturation and competition define virality. Posts in a given time step are predicted by virality in the previous time step. These phenomena can be captured by a minimally parameterized branching process model, such that:

$$\begin{array}{lll}{\mathbb{E}}[{y}_{t}]&=&\exp (\alpha +\beta {v}_{t-1})\\ {v}_{t}&=&{v}_{t-1}\delta {{\mathrm{e}}}^{-\lambda t}+{x}_{t-1}\\ {x}_{t-1}&=&\log \left(\mathop{\sum }\limits_{j=1}^{{y}_{t-1}}{F}_{j,t-1}\right)\end{array}$$

(1)

where yt is the number of posts (that is, retweets, tweets, replies and quote tweets) at five-minute interval time t, α is the baseline rate of discussion and β is the effect of virality, v. Virality is a latent parameter proportional to the total number of users at a given point in time that are exposed to misinformation. It represents the extent to which an event, at a given point in time, is visible in timelines across Twitter. Virality decays as an exponential function via δ and λ. Here, δ captures the baseline rate of decay per time step, and λ controls how that decay changes over the lifetime of an event. This could be due to algorithmic processes favouring new content or user saturation for very large events. Every time step, for each of yt accounts that posts, the log sum (xt) of their followers, Fj, is added to virality.

We note that our model does not explicitly incorporate a network, as is common in many simulations of information and behaviour spread online16. Our primary reason for doing this is that algorithmic filtering of content renders the true network topology unknown. Reconstructing a network would require additional epistemic assumptions, which could bias the results in opaque ways26. Moreover, research on disease has highlighted the utility of modelling interventions in the absence of network structure, notably when the degree distribution is known or approximated14. We note that the success of simple models in understanding the spread of infectious disease is not due to simplistic contagion dynamics. For disease, daily interactions, immune-system dynamics, population structure, behaviour and air-flow patterns create remarkably complex and dynamic network topologies of disease spread.

Our model was fit to each event using PyStan v.2.9.1.1 (refs. 27,28). We fit events separately (rather than hierarchically) as they varied widely in their timescales, magnitudes and contexts within the broader 2020 election cycle. Of the 544 candidate events, our model performed well on 454 events (~10.4 million posts) of rapid misinformation spread. Our model was unlikely to be suitable for all events because it assumes that post volume is well predicted by the number of previously exposed accounts on Twitter. If, for instance, an incident received substantial news coverage (for example, Dominion software narratives), our model would probably fail.

To safeguard against this, we relied on a number of criteria to ensure model fit to a given event. Events were included in the final analysis if (1) the posterior 89% CI of total posts contained the observed value, (2) the chains successfully converged for all parameters (\(\hat{R} < 1.1\)), (3) the fit did not contain divergent transitions and (4) the event lasted longer than an hour (that is, >12 data points to fit). Other than these criteria, events surrounding the Dominion narrative were removed as they involved long periods of high-volume online discussion. This filtering process resulted in the inclusion of 454 events (83% of total events) and ~10.4 million posts.

Statistical model

We derived parameters for our model statistically using a custom-written model in Stan27. Posts yi at time t are assumed to be distributed as a gamma–Poisson mixture (that is, negative binomial) with expected value μt. A gamma–Poisson distribution was chosen because it allows for overdispersion of discrete events occurring in a fixed interval (here, posts). Specifically:

$$\begin{array}{lll}{y}_{t}& \sim& {{{\rm{NegativeBinomial}}}}2({\mu }_{t},\phi )\,{{{\rm{for}}}}\,t=2…T\\ {\mu }_{i}&=&\exp (\alpha +\beta {v}_{t-1})\,{{{\rm{for}}}}\,t=2…T\\ {v}_{t}&=&{v}_{t-1}\delta {{\mathrm{e}}}^{-\lambda t}+{x}_{t-1}\\ \alpha & \sim &{{{\rm{Normal}}}}(-3,3)\\ \beta & \sim &{{{\rm{Normal}}}}(0,3)\\ \delta & \sim &{{{\rm{Beta}}}}(1,1)\\ \lambda & \sim &{{{\rm{HalfExponential}}}}(1)\\ \phi & \sim &{{{\rm{HalfExponential}}}}(1)\\ {v}_{1}&=&{x}_{1}\\ {x}_{t-1}&=&\log \left(\mathop{\sum }\limits_{j=1}^{{y}_{t-1}}{F}_{j}+1\right)\end{array}$$

Here α is the baseline rate of detection for related keywords, and β is the effect of virality, v, on posts in a subsequent time step. Virality is calculated as a decaying function of vt−1 and the log of the sum of account follower counts Fj for posts in the previous time step. One follower is added to each user to avoid an undefined value in time steps with no followers. The log transform accounts for the link function (exp), transforming the linear model into an expected value for the negative binomial distribution. Given the wide range of possible event shapes, generic, weakly informative priors were chosen for all parameters. The models were fit using NUTS in PyStan with the default sampling parameters27,28.

Computational model

Our computational model relied on the posterior distributions of parameters obtained from fitting our statistical model separately to each event. For each simulation, one sample was drawn at random from the posterior for a given event. At t = 1, the model was initialized with the volume of posts and total exposed users from the first time step in which any posts were observed. At each subsequent time step, our computational model predicted the number of new posts, yt, by sampling from a negative binomial distribution as per our statistical model. For each of yt new posts, we drew a follower count from the actual distribution of accounts that retweeted for that event at that time step. Doing so allowed us to control for the possibility that some accounts tend to appear earlier in a viral event. This process was repeated for the duration of the actual event.

We simulated the removal of misinformation by simply setting yt+1 = 0 after at a specified intervention time, t. Virality circuit breakers were enacted by multiplying virality at each time step by a constant. For example, a 10% reduction in virality was implemented as \({\hat{v}}_{t}={v}_{t}(1-0.1)\). As with content removal, this occurred only after a specified time step. In the case of the combined approach, virality circuit breakers (and subsequent removal) were employed at a given probability for each simulation run. We implemented nudges via multiplying follower counts by a constant, reducing the pool of susceptible accounts (that is, for account j, \({\hat{F}}_{j}={F}_{j}(1-\eta )\)). Finally, we implemented a three-strikes rule by identifying the third incident in which a given account appeared in our full dataset. They were removed from simulations for all events that occurred after their third strike.

Additionally, our model included a maximum value of twice the observed posts per time interval to account for a rare condition in which long-tail parameters would lead to runaway. This was observed to occur rarely enough to be challenging to quantify (<1% of model runs), but it was implemented to reduce upward bias in control conditions. We did this to ensure conservative estimates of efficacy, as interventions could reduce the possibility of runaway without meaningfully impacting engagement. Such a feature would be expected in any model of a growth process with exceptionally long-tailed distributions of follower counts and spread at a given time step (that is, a negative binomial).

For the figures shown in the main text and the tables presented in the Supplementary Information, we ran 500 simulations of all 454 events. For each run, we computed the cumulative engagement. The 500 simulations were summed across runs, from which we calculated the medians and CIs. All simulations were implemented in Python29 v.3.9.10.

Model validation

Some form of model validation strengthens any theoretical approach. As data-derived models of large-scale processes are uncommon in the social sciences, we offer some notes on validation and its limitations in this context. Ideally, our findings could be externally validated in an empirical setting. In our case, the gold standard would be to have Twitter implement our recommended policies in some locations but not others and examine subsequent engagement with viral misinformation.

Validation of this sort is both practically and ethically prohibitive. Ethically, the application of our theory to real-world social networks should occur after broader scientific scrutiny and not before publication. As these experiments impose actual costs on the individuals impacted by platform policies, a complete evaluation by the scientific community is necessary to evaluate potential benefits and mitigate risks. Ethical challenges aside, such an experiment is impractical, as it would require Twitter to rewrite its platform guidelines and hire fact-checkers at our suggestion. To the extent that Twitter conducts internal experiments, observational validation by the scientific community (that is, natural experiments) is confounded by unseen changes in the user interface, algorithmic sorting, concurrent A/B testing or other aspects of the experiment that are not disclosed to researchers.

This is a problem inherent to any data-derived model of a complex system at scale. Climate models suggest that reducing greenhouse gases will slow climate change and highlight the relative efficacy of various approaches30. Yet empirical validation at scale would require convincing nations to experimentally reduce greenhouse gases alongside a control world where these policies are not applied. Similarly, an experiment involving altering conditions in an enclosed space may be consistent with data-derived models yet provide little additional insight31. Furthermore, there is no known orthogonal world in which models of anthropogenic disturbance can be externally validated. Nevertheless, models of greenhouse gas reduction remain our best hope at reversing climate change. A recent perspective has argued that similar approaches are probably necessary for the stewardship of our social systems10.

Here we take a similar approach to climate models to validate our model internally (that is, within our dataset). Climate models can be validated by allowing them to condition on data and then run freely for some period. If the model successfully retrodicts conditions at a future point in time, it provides evidence that the model captures the dynamics of interest. We follow much the same approach here, simulating total engagement from the initial tweet throughout an event. At the coarsest level, the total number of observed posts (10.4 million) falls within the 89% CI of our baseline simulations (10.8 million, 89% CI, (9.8, 11.7)). On the scale of individual events, posterior predictive simulations recover the number of observed posts over several orders of magnitude, despite the model only being seeded with posts in the first time step and the time-varying empirical follower distribution (Supplementary Fig. 2). This holds true across several orders of magnitude in post volume and for events that vary widely in duration from one hour to several days. Visual inspection of posterior-predictive time series similarly indicates that our model recovers fine-grained temporal dynamics, even for our largest events where the number of data points far exceeds the model parameters (Supplementary Fig. 3). Considering the relatively small number of parameters (five in this model), this provides evidence that our model is adequately capturing key features of the underlying temporal dynamics.

Post-event engagement

Our model cannot directly evaluate post-event engagement, as it is designed to capture viral spreading dynamics rather than long, noisy periods of posting about a topic. These periods would be difficult to capture directly with a generative model, making it challenging to infer the impact of interventions on misinformation about a topic in general. However, there exists a quite regular relationship between the proportion of posts that occur within our definition of an event and those that occur subsequent to the event (Fig. 4c).

We can leverage this fact to gain insight into how interventions may impact discussion following the viral periods we analysed. To accomplish this, we used a Bayesian log-normal regression to estimate the effect of posts within the largest event on subsequent engagement (Supplementary Table 10):

$$\begin{array}{ll}\beta & \sim {{{\rm{Cauchy}}}}(0,1)\\ \sigma & \sim {{{\rm{Cauchy}}}}(0,1)\\ \mu &=\beta x\\ y& \sim {{{\rm{LogNormal}}}}(\mu ,\sigma )\end{array}$$

Here, y is post-event engagement, and x is engagement during the largest event. The intercept is set at zero, as an event with no posts would not be expected to produce subsequent posts. We then use the posterior distribution from this model to estimate subsequent engagement as a function of engagement during our simulated events with intervention. This is summed across events to generate the estimates shown in Fig. 4d. This method provides insight, but we note that it is limited by the assumption that the relationship between within- and post-event engagement is invariant to interventions. Furthermore, it is limited by the extent to which our data collection process captured posts across the entire incident (that is, event and subsequent posts).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: www.nature.com