Models are pivotal to battle the current COVID-19 crisis. In their call to action, Squazzoni et al. (2020) convincingly put forward how social simulation researchers could and should respond in the short run by posing three challenges for the community among which is a COVID-19 prediction challenge. Although Squazzoni et al. (2020) stress the importance of transparent communication of model assumptions and conditions, we question the liberal use of the word ‘prediction’ for the outcomes of the broad arsenal of models used to mitigate the COVID-19 crisis by ours and other modelling communities. Four key arguments are provided that advocate using expectations derived from scenarios when explaining our models to a wider, possibly non-academic audience.

The current COVID-19 crisis necessitates that we implement life-changing policies that, to a large extent, build upon predictions from complex, quickly adapted, and sometimes poorly understood models. The examples of models spurring the news to produce catchphrase headlines are abundant (Imperial College, AceMod-Australian Census-based Epidemic Model, IndiaSIM, IHME, etc.). And even though most of these models will be useful to assess the comparative effectiveness of interventions in our aim to ‘flatten the curve’, the predictions that disseminate to news media are those of total cases or timing of the inflection point.

The current focus on predictive epidemiological and behavioural models brings back an important discussion about prediction in social systems. “[T]here is a lot of pressure for social scientists to predict” (Edmonds, Polhill & Hales, 2019), and we might add ‘especially nowadays’. But forecasting in human systems is often tricky (Hofman, Sharma & Watts, 2017). Approaches that take well-understood theories and simple mechanisms often fail to grasp the complexity of social systems, yet models that rely on complex supervised machine learning-like approaches may offer misleading levels of confidence (as was elegantly shown recently by Salganik et al., 2020). COVID-19 models appear to be no exception as a recent review concluded that “[…] their performance estimates are likely to be optimistic and misleading” (Wynants et al., 2020, p. 9). Squazzoni et al. describe these pitfalls too (2020: paragraph 3.3). In the crisis at hand, it may even be counter-productive to rely on complex models that combine well-understood mechanisms with many uncertain parameters (Elsenbroich & Badham, 2020).

Considering the level of confidence we can have about predictive models in general, we believe there is an issue with the way predictions are communicated by the community. Scientists often use ‘prediction’ to refer to some outcome of a (statistical) model where they ‘predict’ aspects of the data that are already known, but momentarily set aside. Edmonds et al. (2019: paragraph 2.4) state that “[b]y ‘prediction’, we mean the ability to reliably anticipate well-defined aspects of data that is not currently known to a useful degree of accuracy via computations using the model”. Predictive accuracy, in this case, can then be computed later on, by comparing the prediction to the truth. Scientists know that when talking about predictions of their models, they don’t claim to generalize to situations outside of the narrow scope of their study sample or their artificial society. We are not predicting the future, and wouldn’t claim we could. However, this is wildly different from how ‘prediction’ is commonly understood: As an estimation of some unknown thing in the future. Now that our models quickly disseminate to the general public, we need to be careful with the way we talk about their outcomes.

Predictions in the COVID-19 crisis will remain imperfect. In the current virus outbreak, society cannot afford to rely on the falsification of models for interventions against empirical data. As the virus remains to spread rapidly, our only option is to rely on models as a basis for policy, ceteris paribus. And it is precisely here – at ‘ceteris paribus’ – where the terminology ‘predictions’ miss the mark. All things will not be equal tomorrow, the next day, or the day after that (Van Bavel et al. [2020] note numerous topics that affect managing the COVID-19 pandemic and its impact on society). Policies around the globe are constantly being tweaked, and people’s behaviour changes dramatically as a consequence (Google, 2020). Relying on predictions too much may give a false sense of security.

We propose to avoid using the word ‘prediction’ too much and talk about scenarios or expectations instead where possible. We identify four reasons why you should avoid talking about prediction right now:

Not everyone is acquainted with noise and emergence. Computational Social Scientists generally understand the effects of noise in social systems (Squazzoni et al., 2020: paragraph 1.8). Small behavioural irregularities can be reinforced in complex systems fostering unexpected outcomes. Yet, scientists not acquainted with studying complex social systems may be unfamiliar with the principles we have internalized by now, and put over-confidence in the median outputs of volatile models that enter the scientific sphere as predictions.

Predictions do not convey uncertainty. The general public is usually unacquainted with academic esoteric concepts. For instance, showing a flatten-the-curve scenario generally builds upon mean or median approximation, oftentimes neglecting to include variability of different scenarios. Still, there are numerous other outcomes, building on different parameter values. We fear that by stating a prediction to an undisciplined public, they expect such a thing to occur for certain. If we forecast a sunny day, but there’s rain, people are upset. Talking about scenarios, expectations, and mechanisms may prevent confusion and opposition when the forecast does not occur.

It’s a model, not a reality. The previous argument feeds into the third notion: Be honest about what you model. A model is a model. Even the most richly calibrated model is a model. That is not to say that such models are not informative (we reiterate: models are not a shot in the dark). Still, richly calibrated models based on poor data may be more misleading than less calibrated models (Elsenbroich & Badham, 2020). Empirically calibrated models may provide more confidence at face value, but it lies in the nature of complex systems that small measurement errors in the input data may lead to big deviations in outputs. Models present a scenario for our theoretical reasoning with a given set of parameter values. We can update a model with empirical data to increase reliability but it remains a scenario about a future state given an (often expansive) set of assumptions (recently beautifully visualized by Koerth, Bronner, & Mithani, 2020).

Stop predicting, start communicating. Communication is pivotal during a crisis. An abundance of research shows that communicating clearly and honestly is a best practice during a crisis, generally comforting the general public (e.g., Seeger, 2006). Squazzoni et al. (2020) call for transparent communication. by stating that “[t]he limitations of models and the policy recommendations derived from them have to be openly communicated and transparently addressed”. We are united in our aim to avert the COVID-19 crisis but should be careful that overconfidence doesn’t erode society’s trust in science. Stating unequivocally that we hope – based on expectations – to avert a crisis by implementing some policy, does not preclude altering our course of action when an updated scenario about the future may require us to do so. Modellers should communicate clearly to policy-makers and the general public that this is the role of computational models that are being updated daily.

Squazzoni et al. (2020) set out the agenda for our community in the coming months and it is an important one. Let’s hope that the expectations from the scenarios in our well-informed models will not fall on deaf ears.

Understanding a situation is the precondition to make good decisions. In the extraordinary current situation of a global pandemic, the lack of consensus about a good decision path is evident in the variety of government measures in different countries, analyses of decision made and debates on how the future will look. What is also clear is how little we understand the situation and the impact of policy choices. We are faced with the complexity of social systems, our ability to only ever partially understand them and the political pressure to make decisions on partial information.

The JASSS call to arms (Flaminio & al. 2020) is pointing out the necessity for the ABM modelling community to produce relevant models for this kind of emergency situation. Whilst we wholly agree with the sentiment that ABM modelling can contribute to the debate and decision making, we would like to also point out some of the potential pitfalls inherent in a false application and interpretation for ABM.

Small change, big difference: Given the complexity of the real world, there will be aspects that are better and some that are less well understood. Trying to produce a very large model encompassing several different aspects might be counter-productive as we will mix together well understood aspects with highly hypothetical knowledge. It might be better to have different, smaller models – on the epidemic, the economy, human behaviour etc. each of which can be taken with its own level of validation and veracity and be developed by modellers with subject matter understanding, theoretical knowledge and familiarity with relevant data.

Carving up complex systems: If separate models are developed, then we are necessarily making decisions about the boundaries of our models. For a complex system any carving up can separate interactions that are important, for example the way in which fear of the epidemic can drive protective behaviour thereby reducing contacts and limiting the spread. While it is tempting to think that a “bigger model”, a more encompassing one, is necessarily a better carving up of the system because it eliminates these boundaries, in fact it simply moves them inside the model and hides them.

Policy decisions are moral decisions: The decision of what is the right course to take is a decision for the policy maker with all the competing interests and interdependencies of different aspects of the situation in mind. Scientists are there to provide the best information for the understanding of a situation, and models can be used to understand consequences of different courses of action and the uncertainties associated with that action. Models can be used to inform policy decisions but they must not obfuscate that it is a moral choice that has to be made.

Delaying a decision is making a decision to do nothing: Like any other policy option, a decision to maintain the status quo while gathering further information has its own consequences. The Call to Action (paragraph 1.6) refers to public pressure for immediate responses, but this underplays the pressure arising from other sources. It is important to recognise the logical fallacy: “We must do something. This is something. Therefore we must do this.” However, if there are options available that are clearly better than doing nothing, then it is equally illogical to do nothing.

Instead of trying to compete with existing epidemiological models, ABM could focus on the things it is really good at:

Understanding uncertainty in complex systems resulting from heterogeneity, social influence, and feedback. For the case at hand this means not to build another model of the epidemic spread – there are excellent SEIR models doing that – but to explore how the effect of heterogeneity in the infected population (such as in contact patterns or personal behavior in response to infection) can influence the spread. Other possibilities include social effects such as how fear might spread and influence behaviours of panic buying or compliance with the lockdown.

Build models for the pieces that are missing and couple these to the pieces that exist, thereby enriching the debate about the consequences of policy options by making those connections clear.

Visualise and communicate difficult to understand and counterintuitive developments. Right now people are struggling to understand exponential growth, the dynamics of social distancing, the consequences of an overwhelmed health system, and the delays between actions and their consequences. It is well established that such fundamentals of systems thinking are difficult (Booth Sweeney and Sterman https://doi.org/10.1002/sdr.198). Models such as the simple models in the Washington Post or less abstract ones like the routine day activity one from Vermeulen et al (2020) do a wonderful job at this, allowing people to understand how their individual behaviour will contribute to the spread or containment of a pandemic.

Highlight missing data and inform future collection. This unfolding pandemic is defined through the constant assessment using highly compromised data, i.e. infection rates in countries are entirely determined by how much is tested. The most comparable might be the rates of death but even there we have reporting delays and omissions. Trying to build models is one way to identify what needs to be known to properly evaluate consequences of policy options.

The problem we are faced with in this pandemic is one of complexity, not one of ABM, and we must ensure we are honouring the complexity rather than just paying lip service to it. We agree that model transparency, open data collection and interdisciplinary research are important, and want to ensure that all scientific knowledge is used in the best possible way to ensure a positive outcome of this global crisis.

But it is also important to consider the comparative advantage of agent-based modellers. Yes, we have considerable commitment to, and expertise in, open code and data. But so do many other disciplines. Health information is routinely collected in national surveys and administrative datasets, and governments have a great deal of established expertise in health data management. Of course, our individual skills in coding models, data visualisation, and relevant theoretical knowledge can be offered to individual projects as required. But we believe our institutional response should focus on activities where other disciplines are less well equipped, applying systems thinking to understand and communicate the consequences of uncertainty and complexity.

The JASSS position paper ‘Computational Models That Matter During a Global Pandemic Outbreak: A Call to Action’ (Squazzoni et al 2020) calls on the scientific community to improve the transparency, access, and rigour of their models. A topic that we think is equally important and should be part of this list is the quest to more “interdisciplinarity”; scientific communities to work together to tackle the difficult job of understanding the complex situation we are currently in and be able to give advice.

The modelling/simulation community in the UK (and more broadly) tend to work in silos. The two big communities that we have been exposed to are the epidemiological modelling community, and social simulation community. They do not usually collaborate with each other despite working on very similar problems and using similar methods (e.g. agent-based modelling). They publish in different journals, use different software, attend different conferences, and even sometimes use different terminology to refer to the same concepts.

The UK pandemic response strategy (Gov.UK 2020) is guided by advice from the Scientific Advisory Group for Emergencies (SAGE), which in turn has comprises three independent expert groups- SPI-M (epidemic modellers), SPI-B (experts in behaviour change from psychology, anthropology and history), and NERVTAG (clinicians, epidemiologists, virologists and other experts). Of these, modelling from member SPI-M institutions has played an important role in informing the UK government’s response to the ongoing pandemic (e.g. Ferguson et al 2020). Current members of the SPI-M belong to what could be considered the ‘epidemic modelling community’. Their models tend to be heavily data-dependent which is justifiable given that their most of their modelling focus on viral transmission parameters. However, this emphasis on empirical data can sometimes lead them to not model behaviour change or model it in a highly stylised fashion, although more examples of epidemic-behaviour models appear in recent epidemiological literature (e.g. Verelst et al 2016; Durham et al 2012; van Boven et al 2008; Venkatesan et al 2019). Yet, of the modelling work informing the current response to the ongoing pandemic, computational models of behaviour change are prominently missing. This, from what we have seen, is where the ‘social simulation’ community can really contribute their expertise and modelling methodologies in a very valuable way. A good resource for epidemiologists in finding out more about the wide spectrum of modelling ideas are the Social Simulation Conference Proceeding Programmes (e.g. SSC2019 2019). But unfortunately, the public health community, including policymakers, are either unaware of these modelling ideas or are unsure of how these are relevant to them.

As pointed out in a recent article, one important concern with how behaviour change has possibly been modelled in the SPI-M COVID-19 models is the assumption that changes in contact rates resulting from a lockdown in the UK and the USA will mimic those obtained from surveys performed in China, which unlikely to be valid given the large political and cultural differences between these societies (Adam 2020). For the immediate COVID-19 response models, perhaps requiring cross-disciplinary validation for all models that feed into policy may be a valuable step towards more credible models.

Effective collaboration between academic communities relies on there being a degree of familiarity, and trust, with each other’s work, and much of this will need to be built up during inter-pandemic periods (i.e. “peace time”). In the long term, publishing and presenting in each other’s journals and conferences (i.e. giving the opportunity for other academic communities to peer-review a piece of modelling work), could help foster a more collaborative environment, ensuring that we are in a much better to position to leverage all available expertise during a future emergency. We should aim to take the best across modelling communities and work together to come up with hybrid modelling solutions that provide insight by delivering statistics as well as narratives (Moss 2020). Working in silos is both unhelpful and inefficient.

There is a lot of pressure on social scientists to predict. Not only is an ability to predict implicit in all requests to assess or optimise policy options before they are tried, but prediction is also the “gold standard” of science. However, there is a debate among modellers of complex social systems about whether this is possible to any meaningful extent. In this context, the aim of this paper is to issue the following challenge:

Are there any documented examples of models that predict useful aspects of complex social systems?

To do this the paper will:

define prediction in a way that corresponds to what a wider audience might expect of it

give some illustrative examples of prediction and non-prediction

request examples where the successful prediction of social systems is claimed

and outline the aspects on which these examples will be analysed

About Prediction

We start by defining prediction, taken from (Edmonds et al. 2019). This is a pragmatic definition designed to encapsulate common sense usage – what a wider public (e.g. policy makers or grant givers) might reasonably expect from “a prediction”.

By ‘prediction’, we mean the ability to reliably anticipate well-defined aspects of data that is not currently known to a useful degree of accuracy via computations using the model.

Let us clarify the language in this.

It has to be reliable. That is, one can rely upon its prediction as one makes this – a model that predicts erratically and only occasionally predicts is no help, since one does not whether to believe any particular prediction. This usually means that (a) it has made successful predictions for several independent cases and (b) the conditions under which it works is (roughly) known.

What is predicted has to be unknown at the time of prediction. That is, the prediction has to be made before the prediction is verified. Predicting known data (as when a model is checked on out-of-sample data) is not sufficient [1]. Nor is the practice of looking for phenomena that is consistent with the results of a model, after they have been generated (due to ignoring all the phenomena that is not consistent with the model in this process).

What is being predicted is well defined. That is, How to use the model to make a prediction about observed data is clear. An abstract model that is very suggestive – appears to predict phenomena but in a vague and undefined manner but where one has to invent the mapping between model and data to make this work – may be useful as a way of thinking about phenomena, but this is different from empirical prediction.

Which aspects of data about being predicted is open. As Watts (2014) points out, this is not restricted to point numerical predictions of some measurable value but could be a wider pattern. Examples of this include: a probabilistic prediction, a range of values, a negative prediction (this will not happen), or a second-order characteristic (such as the shape of a distribution or a correlation between variables). What is important is that (a) this is a useful characteristic to predict and (b) that this can be checked by an independent actor. Thus, for example, when predicting a value, the accuracy of that prediction depends on its use.

The prediction has to use the model in an essential manner. Claiming to predict something obviously inevitable which does not use the model is insufficient – the model has to distinguish which of the possible outcomes is being predicted at the time.

Thus, prediction is different from other kinds of scientific/empirical uses, such as description and explanation (Edmonds et al. 2019). Some modellers use “prediction” to mean any output from a model, regardless of its relationship to any observation of what is being modelled [2]. Others use “prediction” for any empirical fitting of data, regardless of whether that data is known before hand. However here we wish to be clearer and avoid any “post-truth” softening of the meaning of the word for two reasons (a) distinguishing different kinds of model use is crucial in matters of model checking or validation and (b) these “softer” kinds of empirical purpose will simply confuse the wider public when if talk to themabout “prediction”. One suspects that modellers have accepted these other meanings because it then allows them to claim they can predict (Edmonds 2017).

Some Examples

Nate Silver and his team aim to predict future social phenomena, such as the results of elections and the outcome of sports competitions. He correctly predicted the outcomes of all 50 electoral colleges in Obama’s election before it happened. This is a data-hungry approach, which involves the long-term development of simulations that carefully see what can be inferred from the available data, with repeated trial and error. The forecasts are probabilistic and repeated many times. As well as making predictions, his unit tries to establish the level of uncertainty in those predictions – being honest about the probability of those predictions coming about given the likely levels of error and bias in the data. These models are not agent-based in nature but tend to be of a mostly statistical nature, thus it is debatable whether these are treated as complex systems – it certainly does not use any theory from complexity science. His book (Silver 2012) describes his approach. Post hoc analysis of predictions – explaining why it worked or not – is kept distinct from the predictive models themselves – this analysis may inform changes to the predictive model but is not then incorporated into the model. The analysis is thus kept independent of the predictive model so it can be an effective check.

Many models in economics and ecology claim to “predict” but on inspection, this only means there is a fit to some empirical data. For example, (Meese & Rogoff 1983) looked at 40 econometric models where they claimed they were predicting some time-series. However, 37 out of 40 models failed completely when tested on newly available data from the same time series that they claimed to predict. Clearly, although presented as being predictive models, they could not predict unknown data. Although we do not know for sure, presumably what happened was that these models had been (explicitly or implicitly) fitted to the out-of-sample data, because the out-of-sample data was already known to the modeller. That is, if the model failed to fit the out-of-sample data when the model was tested, it was then adjusted until it did work, or alternatively, only those models that fitted the out-of-sample data were published.

The Challenge

The challenge is envisioned as happening like this.

We publicise this paper requesting that people send us example of prediction or near-prediction on complex social systems with pointers to the appropriate documentation.

We collect these and analyse them according to the characteristics and questions described below.

We will post some interim results in January 2020 [3], in order to prompt more examples and to stimulate discussion. The final deadline for examples is the end of March 2020.

We will publish the list of all the examples sent to us on the web, and present our summary and conclusions at Social Simulation 2020 in Milan and have a discussion there about the nature and prospects for the prediction of complex social systems. Anyone who contributed an example will be invited to be a co-author if they wish to be so-named.

How suggestions will be judged

For each suggestion, a number of answers will be sought – namely to the following questions:

What are the papers or documents that describe the model?

Is there an explicit claim that the model can predict (as opposed to might in the future)?

What kind of characteristics are being predicted (number, probabilistic, range…)?

Is there evidence of a prediction being made before the prediction was verified?

Is there evidence of the model being used for a series of independent predictions?

Were any of the predictions verified by a team that is independent of the one that made the prediction?

Is there evidence of the same team or similar models making failed predictions?

To what extent did the model need extensive calibration/adjustment before the prediction?

What role does theory play (if any) in the model?

Are the conditions under which predictive ability claimed described?

Of course, negative answers to any of the above about a particular model does not mean that the model cannot predict. What we are assessing is the evidence that a model can predict something meaningful about complex social systems. (Silver 2012) describes the method by which they attempt prediction, but this method might be different from that described in most theory-based academic papers.

Possible Outcomes

This exercise might shed some light of some interesting questions, such as:

What kind of prediction of complex social systems has been attempted?

Are there any examples where the reliable prediction of complex social systems has been achieved?

Are there certain kinds of social phenomena which seem to more amenable to prediction than others?

Does aiming to predict with a model entail any difference in method than projects with other aims?

Are there any commonalities among the projects that achieve reliable prediction?

Is there anything we could (collectively) do that would encourage or document good prediction?

It might well be that whether prediction is achievable might depend on exactly what is meant by the word.

Acknowledgements

This paper resulted from a “lively discussion” after Gary’s (Polhill et al. 2019) talk about prediction at the Social Simulation conference in Mainz. Many thanks to all those who joined in this. Of course, prior to this we have had many discussions about prediction. These have included Gary’s previous attempt at a prediction competition (Polhill 2018) and Scott Moss’s arguments about prediction in economics (which has many parallels with the debate here).

Notes

[1] This is sufficient for other empirical purposes, such as explanation (Edmonds et al. 2019)

[2] Confusingly they sometimes the word “forecasting” for what we mean by predict here.

A few years ago I worked on an ABM that I eventually published in a book. Recently, I have conducted new experiments with the same model, re-analyzed the data and had a different dataset that I used for validation of the model. Where can I publish this new work on an older model? I submitted it to a special issue of a journal, but was rejected as “the model was not original”. While the model is not, the new data analysis and validation are and I think it is even more important within the current discussions about replication crises in science.

Berea, A. (2019) What are the best journals or publishers for reports of re-validations of existing models? Review of Artificial Societies and Social Simulation, 31st October 2019. https://rofasss.org/2019/10/30/best-journal/

How one thinks about knowledge can have a significant impact on how one develops models as well as how one might judge a good model.

Pragmatism. Under this view a simulation is a tool for a particular purpose. Different purposes will imply different tests for a good model. What is useful for one purpose might well not be good for another – different kinds of models and modelling processes might be good for each purpose. A simulation whose purpose is to explore the theoretical implications of some assumptions might well be very different from one aiming to explain some observed data. An example of this approach is (Edmonds & al. 2019).

Social Constructivism. Here knowledge about social phenomena (including simulation models) are collectively constructed. There is no other kind of knowledge than this. Each simulation is a way of thinking about social reality and plays a part in constructing it so. What is a suitable construction may vary over time and between cultures etc. What a group of people construct is not necessarily limited to simulations that are related to empirical data. (Ahrweiler & Gilbert 2005) seem to take this view but this is more explicit in some of the participatory modelling work, where the aim is to construct a simulation that is acceptable to a group of people, e.g. (Etienne 2014).

Relativism. There are no bad models, only different ways of mediating between your thought and reality (Morgan 1999). If you work hard on developing your model, you do not get a better model, only a different one. This might be a consequence of holding to an Epistemological Constructivist position.

Descriptive Realism. A simulation is a picture of some aspect of reality (albeit at a much lower ‘resolution’ and imperfectly). If one obtains a faithful representation of some aspect of reality as a model, one can use it for many different purposes. Could imply very complicated models (depending on what one observes and decides is relevant), which might themselves be difficult to understand. I suspect that many people have this in mind as they develop models, but few explicitly take this approach. Maybe an example is (Fieldhouse et al. 2016).

Classic Positivism. Here, the empirical fit and the analytic understanding of the simulation is all that matters, nothing else. Models should be tested against data and discarded if inadequate (or they compete and one is currently ahead empirically). Also they should be simple enough that they can be thoroughly understood. There is no obligation to be descriptively realistic. Many physics approaches to social phenomena follow this path (e.g Helbing 2010, Galam 2012).

Of course, few authors make their philosophical position explicit – usually one has to infer it from their text and modelling style.