On behalf of the Oxford COVID-19 Evidence Service Team Centre for Evidence-Based Medicine, Nuffield Department of Primary Care Health Sciences University of Oxford Correspondence to trish.greenhalgh@phc.ox.ac.uk

VERDICTNEWS and its updated version, NEWS2, are early warning scores which were originally developed for monitoring hospital in-patients over time using repeated measurements.

There is no research on the value of these tools for COVID-19 outside hospital. NEWS2 includes blood pressure and oxygen saturation measurements that are difficult or impossible to take remotely. It does not include age or comorbidities, which are known to be strong independent predictors of survival in COVID-19.

Enthusiasm for NEWS2 in the primary care management of COVID-19 may be premature. If used at all, this score should be used alongside a wider clinical assessment of the patient and in the context of changes over time.

Further research is needed on the use of NEWS/NEWS2 in primary care, and on the use of prognostic scores more generally in the context of COVID-19. It should also be remembered that not every sick patient has COVID-19.

METHODOLOGICAL NOTEThis review attempts to make sense of a complex and contested literature. Rather than present an exhaustive analysis of every published paper on NEWS, we have chosen to review a selected sample of studies and scholarly commentaries using a combination of conventional critical appraisal techniques [1-3] and narrative (hermeneutic) methodology.[4] The former are primarily concerned with the methods (was the study done right? is it biased? etc.); the latter are concerned with meaning-making from complex bodies of evidence.

RAPID GUIDANCE ON NEWS IN COVID-19NEWS2 is recommended by the National Institute for Health and Care Excellence (NICE) in its guidelines for the management of COVID-19 patients in critical care.[5]

NICE guidance for the management of COVID-19 pneumonia in a primary care setting is much more circumspect, reflecting the lack of direct evidence of the value of this score in the primary care context. The guideline states that “Use of the NEWS2 tool in the community for predicting the risk of clinical deterioration may be useful. However, a face-to-face consultation should not be arranged solely to calculate a NEWS2 score” (paragraph 3.7).[6]

BACKGROUNDThe National Early Warning Score (NEWS) was launched by the Royal College of Physicians (RCP) in 2012 to improve the identification, monitoring and management of unwell patients in hospital. [7] It is based on a logistic regression model designed to predict in-hospital patient mortality within 24 hours of a set of vital sign observations.[8] Originally consisting of pulse rate, respiratory rate, blood pressure, temperature and oxygen saturation, it was updated in 2017 to NEWS2, which incorporated new onset of confusion and a separate scoring system for oxygen saturation in patients with type 2 respiratory failure. To quote from the RCP website:

The NEWS2 is based on a simple aggregate scoring system in which a score is allocated to physiological measurements, already recorded in routine practice, when patients present to, or are being monitored in hospital. Six simple physiological parameters form the basis of the scoring system:

respiration rate

oxygen saturation

systolic blood pressure

pulse rate

level of consciousness or new confusion*

[…] The score is then aggregated and uplifted by 2 points for people requiring supplemental oxygen to maintain their recommended oxygen saturation.

The NEWS2 scores for escalating care are shown in Chart 1, which is reproduced from the Royal College of Physicians guidance.[7] A recent in-hospital study showed that NEWS2 did not appear to add predictive value over NEWS, even in patients with type 2 respiratory failure.[9]

The recording chart recommended for monitoring repeated NEWS scores in inpatients is shown in the Appendix.

The use of NEWS/NEWS2 has been mandated in the UK for acute trusts and ambulance services since 2019, on the grounds that a “common language of sickness” will reduce the risk of miscommunication in safety-critical settings, especially though not exclusively in the management of suspected sepsis.[10] The scores are designed to be used over time, comparing the patient’s current score with a trend over days, hours or minutes.

It was beyond the scope of this rapid review to definitively address whether and in what circumstances the NEWS/NEWS2 score adds value in hospital settings. In brief, a high NEWS2 score appears to predict poor survival in patients admitted to critical care facilities; a low score predicts good survival (example papers here [11-13]).

It is important to note that NEWS was developed pragmatically by front-line clinicians with (initially) limited research input. An in-press systematic review by a University of Oxford team has set out numerous methodological weaknesses which challenge the validity of many of the empirical studies on early warning scores undertaken on hospital inpatients.[14, 15] Criticisms included retrospective designs, small sample sizes, small numbers of primary outcome events, discarding of missing data (rather than imputation), and weak methodology for internal and external validation. The authors concluded that “Early warning scores are widely used prediction models, and often mandated in daily clinical practice to identify early clinical deterioration in hospital patients. Despite wide use, this study found many early warning scores in clinical use have methodological weaknesses. They may not perform as well as expected, and therefore have a potentially highly detrimental effect on patient care.”

There appear to be very few published studies on the use of NEWS/NEWS2 in the specific context of COVID-19, even in hospital settings. Note, however, that this review did not include an exhaustive search for such studies. A paper published from China during the early phase of COVID-19 pandemic offered an early warning score based on an adapted version of the NEWS2 score with age >65 (score 3 points) added to reflect emerging evidence that age is an independent risk factor for survival.[16] That modified NEWS score, which appears to have been developed pragmatically, is shown below.

The relevance of all the above to primary care is unclear and controversial. On the one hand, as Finnikin has put it, “The universal use of a common scoring system allows clinical information to be communicated efficiently across departments, clinical settings, and between clinical colleagues. Its usefulness as a common language, combined with the drive to identify sepsis early, have contributed to the widespread adoption and acceptance of NEWS2 in secondary care.” [17]

On the other hand, a “common language” of sickness could prove unhelpful and even harmful if the language relates to unvalidated and misleading concepts. Roland et al, writing from a hospital paediatric perspective (since NEWS2 is not validated for use in children) have cautioned against the uncritical use of supposedly standardised instruments: “communication and standardisation will need to be balanced against using a tool that simply isn’t valid to detect the range of potential pathologies seen in both adults and children in primary care.” [18]

Against this background, we aimed to establish the state of knowledge, ignorance and uncertainty in relation to the use of NEWS2 in the context of COVID-19 in primary care.

USE OF NEWS/NEWS2 IN PRIMARY CARE SETTINGS: EXISTING EVIDENCEWe know that high NEWS2 scores are rare in primary care settings [19], though slightly less rare in care homes.[20] We know that some authors recommended their use by GPs in the specific context of suspected sepsis.[21] The question is whether a high NEWS score might add value over a standard clinical assessment in the context of ongoing monitoring and referral to hospital in a patient with possible COVID-19.

Our search identified no published peer-reviewed empirical research studies on the use of NEWS/NEWS2 scores to guide decision-making in COVID-19 patients in primary care settings.

Furthermore, most studies of NEWS/NEWS2 in pre-hospital settings were on samples drawn from ambulance services, with none in primary care, and had similar methodological weaknesses to those of the in-hospital studies described above.[15] In particular, the calculation of NEWS/NEWS2 in pre-hospital settings was usually done retrospectively as part of a research study rather than being an integral part of the care of the patient (in other words, NEWS was not actually being used; the studies were considering the hypothetical question of whether it might have added value had it been used). With that caveat, our summary of the pre-hospital literature follows.

We identified the following key publications (including two in-press papers which the authors kindly shared with us):

Patel et al’s systematic review of the predictive value of NEWS in pre-hospital settings, published in 2018 [22]

Additional studies of the predictive value of NEWS/NEWS2 in pre-hospital settings that were published after the Patel review, we describe an example from Martin-Rodriguez [23]

Two new observational studies from the West of England CLAHRC about the use of NEWS2 by GPs (but not in the context of COVID-19): Pullyblank et al [24] and Scott et al [25]

We used the AMSTAR2 checklist [3] for the systematic review, QUADAS-2 checklist [2] for predictive accuracy studies and the CASP critical appraisal checklist for cohort studies [1] for the observational studies. Below, we briefly review the scope, methods and findings of the studies listed above.

This review, by Rita Patel and colleagues from the West of England CLAHRC (an NIHR-funded applied research collaboration) aimed to evaluate the effectiveness and predictive accuracy of early warning scores (including but not limited to NEWS) to predict deteriorating patients in pre-hospital settings. Our own assessment of the Patel review against CASP and AMSTAR2 checklists indicated that it was of high quality. The authors identified 17 primary studies, of which 5 [26-30] had evaluated NEWS (none had evaluated NEWS2, which was introduced later). Patel et al applied the QUADAS-2 checklist to appraise these primary studies.

The five NEWS studies included in this review consisted of the following:

Shaw et al.[29] A retrospective analysis of ambulance records in a sample of 287 patients, which asked whether the NEWS score as assessed by ambulance staff predicted admission to critical care facilities and death (it did, with 87% sensitivity and 60% specificity, but AUROC was not calculated).

Silcock et al.[30] A retrospective analysis of ambulance records in a sample of 1684 patients, which asked whether the NEWS score predicted admission to critical care facilities and 48-hour and 30-day mortality (it did, with an AUROC of up to 0.89).

Infinger et al (abstract only).[26] A retrospective analysis from USA of ambulance records in a sample of 101 patients, which asked whether the NEWS score predicted outcome (unclear which) in high-risk patients with severe sepsis (it did, with a sensitivity of 90% and specificity of 25%).

AUROC stands for ‘area under the receiver operator characteristic curve’. An AUROC value of 1.0 indicates 100% sensitivity and 100% specificity for the outcome tested. Lower values for AUROC indicate that the test is either less sensitive or less specific (and usually both). Few prediction scores have an AUROC above 0.9; an AUROC of 0.5 indicates that the test is no better than guesswork.

In summary, none of the NEWS studies reviewed by Patel et al had been carried out in primary care. None had evaluated the measurement of NEWS by GPs or other primary care staff. In addition, all five studies did either or both of i) not reporting the NEWS threshold used (making it unreplicable in practice) or ii) excluding 18-31% of eligible patients from their analysis (creating a potential bias so large that we cannot trust their estimates of predictive accuracy for the score). We agree with the conclusion of Patel et al that in all these five studies, methodological quality was weak and external validity to a primary care setting is low.

Importantly, the primary studies evaluated in this systematic review were all addressing the question of whether NEWS could predict adverse outcome. They did not assess the use of NEWS as a communication tool (the “common language of sickness” referred to above).

In conclusion, this well-conducted but slightly dated systematic review does not provide any direct evidence to either support or refute the use of NEWS2 by GPs in the acute setting.

Our search (see details below) updated the search strategy used by Patel et al, narrowed to focus on NEWS or NEWS2 rather than all scoring systems.

We identified 15 additional studies, which examined NEWS or NEWS2 scores by emergency medical (ambulance) services. A brief examination of these reports showed that many were conducted with highly selected patient populations. Some publications appeared to be based on different analyses of the same study populations.

No studies were undertaken in a primary care setting. Therefore, we did not conduct a further appraisal of these papers, except one which we describe as an example from a research group who have published extensively on this subject:

Martin-Rodriguez et al. [23] This prospective multi-centre cohort study from Spain on patients using an emergency medical (ambulance) service asked whether NEWS2 predicted 24- and 48-hour and 7-day mortality. They reported positive findings, with AUROC of 0.86, 0.89 and 0.84 respectively. However, these headline figures referred to the predictive power of a NEWS2 score of 9 or more. For NEWS2 scores of 0-4, 5-6 and 7 or more, AUROC values of 0.61, 0,37 and 0.79 respectively were reported.

Though the study appears to have been well-conducted, there are a number of potential sources of bias. For example, more than 60% of initially eligible patients were excluded due to on-site resolution or allocation to BLS (basic life support, a category of ambulance providing basic care) rather than ALS (advanced life support, an ambulance providing advanced care with two paramedics, a nurse and a doctor). The patients triaged to ALS formed the study population. Systematic exclusion of patients requiring less intensive care, or no further care, will have artificially inflated estimates of the accuracy of NEWS2 if applied to ambulance-callers generally. 12% of included patients were excluded from the analysis for missing data, it is unclear what effect this could have had on results.

It is unclear whether the NEWS2 score collected by ambulance staff was used to inform patient care. If so, this may introduce bias through the mechanism of changing decision-making behaviour of the ambulance staff in a study where the aim was only to explore the predictive value of the NEWS2 score, not measure an effect of its use. Though the reported accuracy of the NEWS2 score in this study may be an overestimate, it offers some support for the predictive value of NEWS2 as being similar to NEWS in an ambulance setting.

We found no direct evidence to either support or refute the use of NEWS2 by GPs in the acute setting.

Studies of the use of NEWS2 by GPs in acutely unwell patients

The West of England Academic Health Sciences Network has been proactive in implementing NEWS2 ‘system wide’ – that is, in primary care as well as secondary care and ambulance services – using a breakthrough collaborative model. This regional initiative (covering a population of 2.4 million) commenced in January 2015 and was complete in most settings by February 2017. The region comprises six acute trusts, one ambulance trust, two community mental health trusts, seven community health services, five clinical commissioning groups and one ambulance trust. From a primary care perspective, GPs were asked to provide a NEWS score (or the components of it to allow calculation by the receiving organisation) at the point of referral.

Two recent publications from that initiative are relevant to this review. Note that the positive findings from this particular region may not be transferable to other regions which have not had such an intensive, system-wide initiative to promote NEWS2 understanding and use.

Pullyblank et al [24] report an observational study using the Suspicion of Sepsis (SoS) national dashboard to study the effects of a natural experiment: system-wide use of NEWS2 in patients with suspected sepsis implemented in the West of England between 2015 and 2018, but not in any other region in England. SoS is a set of 250 ICD codes for bacterial infections, nationally recognised as a suitable denominator population for patients at risk of sepsis.

Over the study period, there was a statistically significant and clinically meaningful reduction in mortality for patients at risk of sepsis in emergency admissions, contemporaneous with implementation, compared to historical data from the same region and comparative data from the rest of England. Their Figure 3, a statistical process chart, shows a change over time in crude mortality from ~6.2% to ~5.2%, compared to the rest of England which remained unchanged at ~6.7% (their Figure 2).

However, we do not know how many patients were included in this study, and what these percentages may mean in terms of the number of deaths delayed. No data were presented to support the claim that length of stay is reduced. The proportion of the cohort for whom GPs calculated NEWS scores is uncertain. In addition, the study does not address what impact NEWS or NEWS2 score may have on patients for whom sepsis is not suspected.

In conclusion, whilst caution should be applied in interpreting observational studies, the Pullyblank study lends support to the use of NEWS2 as a communication tool about suspected sepsis between primary and secondary care. It does not provide evidence about the use of NEWS2 to support decision-making in primary care. It does not, however, provide direct evidence that NEWS2 adds value in patients with suspected COVID-19.

Scott et al [25] report another study of the system-wide use of NEWS in a West of England cohort. Data were collected in one of the acute trusts and many correlations explored, including the relationship between the severity of the NEWS score calculated at the time of GP referral and a number of outcomes: time from referral to arrival (faster), time to review in hospital (faster), length of stay (longer), admission to ICU (more likely), proportion of patients diagnosed with sepsis (higher), and 2- and 30-day mortality (higher). The authors point to a finding that the 22% of patients for whom a NEWS score was not calculated at referral had care process and outcomes similar to those with the lowest NEWS scores, but also had higher risk of adverse outcomes. However, the absolute differences in outcome rates are small and were not statistically significant. Whilst the findings are encouraging, the study design does not allow us to draw firm conclusions about causation (in other words, we cannot conclude that the use of the NEWS score caused faster care processes, though this is one possible interpretation).

A commentary by Finnikin on the above two papers,[17] published in the same issue of the British Journal of General Practice, expresses caution about the use of NEWS2 in general practice. Written before the COVID-19 crisis escalated, the article makes the point that general practice is the “risk sink” of the National Health Service (NHS) – that is, a traditional role of GPs even before COVID-19 was to hold onto clinical risk. GPs tread a delicate line between “referring excessively (so that secondary care is overwhelmed) and identifying those patients who can be managed at home or in a care home (without overwhelming community services)”.[17]

ROYAL COLLEGE OF GPs GUIDANCELargely on the basis of the in-press Pullyblank and Scott studies described above,[24, 25] in April 2020 the Royal College of GPs recently published an educational resource recommending the judicious use of NEWS2 in the assessment of deteriorating patients and for referral conversations at the primary-secondary care interface.[31] The website includes the statement: “NEWS2 has been validated for all-cause patient deterioration, but is of particular value in identifying those patients who may have sepsis”. Whilst that statement is broadly correct in relation to the inpatient monitoring of non-COVID-19 patients, and it also reflects the experience of the primary care management of suspicion of sepsis in one UK region,[24, 25] we believe it does not necessarily apply to the assessment of COVID-19 patients in the community.

This new guidance, which was put out in response to a call for a “common language” in response to the COVID-19 crisis, updates a previous decision by the RCGP’s National Council in 2019, when they voted against recommending use NEWS2 in favour of “physiological measurement”, and called for more focused research on the use of NEWS2 scores in primary care.[32] At that time, RCGP Council recommended NEWS2 scoring in combination with clinical judgement for the assessment of illness severity and to support decisions to refer (or not) to hospital. In addition, they recommend its use as a communication tool between GPs, ambulance services and secondary care, highlighting its potential value in assessing deterioration over time.

Whilst the RCGP now cautiously recommends the use of NEWS2, it does not recommend it as a substitute for clinical judgement. In its latest statement, for example, the College points out:

“GPs have always had to make challenging and complex decisions in a short time; should a patient be reassured, reviewed or referred? By using NEWS2 alongside clinical judgment, subtle changes in physiology may become apparent.” [31]

THEORETICAL CONCERNS RELEVANT TO USE OF NEWS/NEWS2 IN THE PRIMARY CARE MANAGEMENT OF COVID-19 PATIENTSEven in the secondary care environment for which they were designed, NEWS/NEWS2 and other early warning scores have been criticised on several grounds (see also the systematic review by Gerry et al described above [14]):

Content validity: the physiological measures that make up the score capture some but not all dimensions of acute sickness, so clinicians should always use the score alongside clinical judgement, not as a substitute for it. Numerous physiological conditions and the effects of the medication will mark elements of the score (beta-blockers causing slowing of the heart rate are an obvious example) [33];

Insufficient emphasis on trends over time. Reporting a NEWS/NEWS2 score as a single observation at a single point in time is neither sensitive nor specific. In some patients, a high score can occur without serious acute illness; in others, the score is not high until very late in the illness. Key to the detection of deterioration is comparing the current NEWS/NEWS2 score against previous sets of vital signs for the same patient. Failing to do this will result in missed cases of deterioration and also unnecessary escalation of care for some patients [34];

Cultural and behavioural issues. The accuracy of an early warning score is dependent on the skill and commitment of the person taking the readings. In conditions of suboptimal staffing, accuracy is likely to fall. In hierarchical referral pathways, a lower-status member of staff may fail to escalate concerns (and a higher-status one may fail to acknowledge concerns) without the support of an “objective” score. This may create a culture of dependency and over-reliance on a less than accurate score [35].

The subsequent implementation of a scoring system in new settings may not be supported by the complementary components a complex intervention which was the subject of original research. For example, the system-wide implementation of NEWS in the West of England described by Pullyblank et al [24] involved an extensive two year programme of stakeholder engagement, education, IT support and public involvement. Implementing NEWS scoring elsewhere without this package of support may not achieve the same results.

Risk factors and physiological responses specific to COVID-19. Information is emerging in the literature on COVID-19 suggesting particular markers may be of predictive value. For example: higher age, male sex, high body mass index and cardiovascular disease all independently predict poor prognosis. There have been reports of unreliable or late changes to respiratory rate even in the context of severe illness. These differences might significantly reduce the predictive power of the NEWS2 score. Use of the NEWS2 score might result in the non-application of these new prognostic insights.

These reservations about NEWS/NEWS2 raise important questions about its usefulness – and potential for harm – in the specific context of the COVID-19 crisis. We believe these questions have not been answered by the research published (or submitted) to date, and that they require urgent new research.

RESEARCH QUESTION 1: Is NEWS2 a valid measure of severity in COVID (and does it predict patients who are likely to deteriorate)?To our knowledge, this question has not yet been answered in either secondary or primary care. As noted above, the only published paper describing use of NEWS2 in COVID-19 is a hospital-based descriptive account, in which NEWS2 was modified to include age (score an additional three points if over 65), but this was not based on a formal validation.[16]

RESEARCH QUESTION 2: Is a single NEWS2 score sufficiently sensitive (i.e. will it pick up all patients with critical deterioration in a timely manner) and specific (will it exclude all or most patients who are not seriously ill and unlikely to deteriorate)?Again, we have no data at all on the sensitivity or specificity of NEWS2, especially when used in the ‘risk sink’ of primary care. Urgent research is needed to explore (and, hopefully, exclude) potential kinds of harm from the use of NEWS2: a) missing patients who are very sick by over-relying on a low or moderate NEWS2 score; and b) increasing workload of secondary care by referring too many patients with high NEWS2 scores who are actually at low risk of deterioration.

RESEARCH QUESTION 3: Is NEWS2 likely to be practicable in the context of unprecedented pressures on the system and the remote-by-default policy being pursued in general practice (in which most patients are managed by telephone or video)?An important element of any tool or scoring system is its practicality and acceptability. NEWS was originally developed in a hospital setting and based on data that are routinely available in most inpatient hospital settings. Such data are much harder to obtain in primary care. A small qualitative study (25 interviewees) conducted in the West of England CLAHRC (where a ‘system wide’ use of NEWS had been implemented as a quality improvement initiative) identified subtle but important challenges, including the time taken to complete it, when using this score in a primary care setting.[36] The authors commented: “in primary care, clinicians had to select patients for NEWS and adopt different methods of clinical assessment, whereas for paramedics it fitted well with usual clinical practice and was used for all patients. In community services and mental health, modifications were ‘needed’ to make the tool relevant to some patient populations.”

The logistics of obtaining data to populate a NEWS2 score have become much more complex and challenging in the current COVID-19 crisis. There is no evidence that the advantages of bringing patients to a face to face appointment purely to calculate NEWS2 outweigh the potential disadvantages (contagion from clinician to patient or vice versa). Anecdotal reports from the clinical front line suggest that biomarkers may be normal even when the patient is very sick and deteriorating. And we simply do not know whether a ‘light’ version of NEWS2 in the context of a remote consultation (e.g. omitting blood pressure and oxygen saturation) would be better than not using the score at all.

CONCLUSIONS

NEWS2 is an early warning score developed pragmatically for use in hospital inpatient settings and based on data routinely available to hospital staff;

Research suggests that NEWS2 is useful in stratifying patients and triggering intervention in hospital settings and that it may add value in pre-hospital setting when used by ambulance staff;

Very few studies on the use of NEWS2 by GPs and other primary care staff have been published. One recent study from a single UK region suggests (but falls short of proving) that it is useful in patients with suspicion of sepsis; another study from the same region suggests (but again falls short of proving) that patients with higher NEWS2 scores are transferred and processed more quickly than those with lower scores;

We found only one published paper (in a hospital setting) describing the use of NEWS2 in the management of COVID-19 patients; in this Chinese study, age >65 (3 points) had been added to the score;

We believe that notwithstanding the need for a common language of sickness, reliance on NEWS2 in the assessing, referring and communicating about patients with suspected COVID-19 in primary care is premature. The score should be used, if at all, in the context of a wider clinical assessment;

Research is urgently needed to affirm the value of NEWS2 in COVID-19 and ensure that it does not cause harm.

Disclaimer: the article has not been peer-reviewed; it should not replace individual clinical judgement and the sources cited should be checked. The views expressed in this commentary represent the views of the authors and not necessarily those of the host institution, the NHS, the NIHR, or the Department of Health and Social Care. The views are not a substitute for professional medical advice.