Studies of stratosphere–troposphere coupling, particularly those
seeking to understand the dynamical processes underlying the coupling
following extreme events such as major stratospheric warmings, suffer
significantly from the relatively small number of such events in the
“satellite” era (1979 to present). This limited sampling of a highly
variable dynamical system means that composite averages tend to have large
uncertainties. Including years during which radiosonde observations of the
stratosphere were of sufficiently high quality substantially extends this
record, reducing this sampling uncertainty by up to 20 %. Moreover, many open
questions in this field involve aspects of tropospheric dynamics likely to be
better constrained by “conventional” (i.e. radiosonde and surface-based)
observations.

Based on an intercomparison of reanalyses, a quantitative case is made that for
many purposes the improved sampling obtained by including this period
outweighs the reduced precision of the reanalyses in the Northern Hemisphere.
Studies of stratosphere–troposphere coupling should therefore consider the
use of this period when using reanalysis data. These results also support
continued attention on this period from centres producing reanalyses.

One of the central challenges to the detailed study of the large-scale coupling
between the stratosphere and the troposphere is the relatively limited record of
high-quality, global observations. In the absence of more insightful modes of
analysis, quantifying the dynamical processes relevant for the coupling requires
large samples to isolate them from unrelated dynamical variability. Despite the
availability of nearly four decades of global satellite-based observations, the
length of the observational record remains a fundamental limitation to this
statistical approach. This is demonstrated explicitly here, as well as by
another closely related contribution (Gerber and Martineau, 2018) to the Stratosphere-troposphere Processes And their Role in Climate
(SPARC) Reanalysis Intercomparison Project (S-RIP; Fujiwara et al., 2017).

The coupling between the stratosphere and the troposphere remains a significant
source of uncertainty in projected climate changes over the coming
century (Manzini et al., 2014; Simpson et al., 2018), as well as an important
source of skill in seasonal forecasting (Sigmond et al., 2013). Global models
exhibit a diversity of stratospheric circulation (Manzini et al., 2014) and
variability (Charlton-Perez et al., 2013; Taguchi, 2017), and of
tropospheric responses to stratospheric variability (Hitchcock and Simpson, 2014).
Observations of the true circulation can be used to identify which models are
correctly representing these processes, but this relies on comparing the
time-averaged behaviour of the models to the observations, and the large
interannual variability in the observed circulation means that the sampling
uncertainty remains large. Accounting for sampling error in such large-scale
dynamical phenomena is a major concern for many other dynamical questions,
including identifying regional signals of climate change and teleconnection
patterns (e.g. Deser et al., 2017).

Studies of observed stratosphere troposphere coupling often rely on reanalysis
products, which combine a wide range of observations with global forecast models
(see Fujiwara et al., 2017, for a comprehensive discussion, as well as descriptions of all reanalysis products and centres). Two of the
older products, ERA-40 and NCEP-NCAR R1, begin in 1957 and 1948, respectively,
dates which coincide with significant extensions of the global radiosonde
observing network. Many more recent products (ERA-Interim, MERRA, MERRA-2, CFSR)
by contrast cover only the period from 1979 onwards, after the availability of
sounding data from the Microwave Sounding Unit (MSU) and Stratospheric Sounding Unit
(SSU) instruments. It is convenient to label the period after 1979 the
“satellite” era, though it is worth noting that a number of satellite data
products exist prior to 1979, as discussed by Uppala et al. (2005). Amongst
the more modern products only JRA-55 begins prior to the satellite era, in 1958.
However, both ERA-5 and JRA-3Q, two newer products unavailable at the time of
writing, are expected to cover the pre-satellite era as well.

For the purposes of the present work, the “radiosonde” era will refer to the
period of 1958 through 1978, although radiosonde data exist prior to this period
and continue to be important afterwards. There is no general consensus
amongst studies of stratosphere–troposphere coupling as to whether to include
the radiosonde era. This is complicated by the fact that the coverage of ERA-40
ends in 2002, leaving out the most recent (and best-observed) decade and a half.
Some studies have made use of the older reanalysis products ERA-40 and NCEP-NCAR
R1 alone (Charlton and Polvani, 2007; Mitchell et al., 2013), while others consider
exclusively the satellite record (Dunn-Sigouin and Shaw, 2014; Kodera et al., 2015; Birner and Albers, 2017). Still, others choose to merge multiple reanalyses, using an
older product for the radiosonde era and a more modern product for the satellite
era (Hitchcock et al., 2013; Lehtonen and Karpechko, 2016). The value of JRA-55 as a
single modern product that spans both the radiosonde and satellite eras is thus
evident (and as such it will be privileged in the analysis that follows), but
the question remains whether the observational record during the radiosonde era
is of “sufficiently” high quality to be worth considering.

The first identification of a sudden stratospheric warming is credited to
Scherhag (1952) and much was known about their dynamics prior to the
availability of a long satellite-based observational record
(e.g. Matsuno, 1971; Labitzke, 1977; McIntyre, 1982), largely
on the basis of radiosonde observations. Moreover, a successful 5-day
forecast of the sudden warming that occurred in January 1958 initialized from
ERA-40 has been demonstrated (Simmons et al., 2005). All of this suggests that
the observational record prior to 1979 is of real value in constraining the
behaviour of the coupled stratosphere–troposphere system around
sudden stratospheric warmings.

The immediate goal of this work is to evaluate the representation of a number of
quantities of interest to the problem of stratosphere–troposphere coupling in
the radiosonde era, in view of coming to a more quantitative assessment of their
value. For the Northern Hemisphere, the arguments given below clearly indicate
their value. However, since this judgement depends on the specific quantity of
interest, a broader goal is to discuss how to answer this question more
generally. Indeed, the same arguments should apply to the study of many other
features of the large-scale atmospheric circulation, particularly of those
phenomena with large spatial scales and characteristic timescales of the order
of weeks to months. The same approach could also be applied in principle to the
period prior to 1958, although no effort has been made to do so here.

Figure 1(a) Winds from JRA-55 for 36 sudden warmings. Events from the satellite
period are in dark grey, those from the radiosonde period are in light grey
and are dashed. (b) Winds for a single satellite-period event for all
reanalyses; this event is shown by the black line in panel (a). (c) Winds for a
single radiosonde-period event for all reanalyses covering this period; this
event is shown by the dashed black line in panel (a).

This evaluation is based on the availability of multiple reanalysis products.
Since in general the different reanalyses assimilate subsets of the same
observational record into distinct forecast models, the level of agreement
provides a simple measure of how strongly the observations constrain the
quantity in question. This method has caveats in that the underlying forecast
models may share biases that result in them getting consistently wrong answers.
More critically, the availability of only one modern reanalysis product that
covers the radiosonde era (and assimilates radiosonde data) means that this
comparison must be based in part on older reanalyses with known deficiencies
(e.g. Long et al., 2017). Nonetheless, as will be argued below, the
agreement is close enough in the Northern Hemisphere to suggest that this period
has real value for carrying out many classes of dynamical studies. This is
broadly consistent with the conclusions of Gerber and Martineau (2018) and of
Hersbach et al. (2017), which explicitly examined the value of upper-air
observations over the period 1939 to 1967 in an experimental reanalysis product.

The outline of this paper is as follows. The reanalysis data considered here
are described in Sect. 2. Section 3 presents, as an initial example, a discussion
of the time series of zonal mean zonal wind at 10 hPa and 60∘ N
that is central to the identification of major sudden stratospheric warmings.
Section 4 presents more general criteria for determining when the radiosonde era
should be included. These criteria are then discussed in Sect. 5, as they apply
to wider variety of zonal mean quantities, including fluxes of heat and momentum
that are relevant to stratosphere–troposphere coupling. Section 6 presents
conclusions and a discussion.

Zonally averaged output from the 12 reanalysis products listed in
Table 1 are considered here. Of these reanalyses, five (JRA-55,
NCEP-NCAR, ERA-40, 20CR v2, and ERA-20C) include the period of 1958 through
1978. Two reanalysis products (20CR v2 and ERA-20C) extend further back but do
not assimilate upper-air observations; following the nomenclature of
Fujiwara et al. (2017), these will be referred to as “surface-input”
reanalyses, in contrast to “full-input” reanalyses. A third category is
“conventional-input” reanalysis, the sole present example being the JRA-55C product. This
is noteworthy in this context as it assimilates only “conventional”, that is to
say, non-satellite-based, observations. It therefore provides a means of
estimating of the additional value of incorporating the satellite observations.
A useful comparative description of these reanalysis products including details
of the underlying forecast models, the observational datasets assimilated, and
the assimilation techniques used can be found in Fujiwara et al. (2017). The
data used here have been regridded to a uniform latitude–pressure grid and
are described in Martineau et al. (2018).

Table 1Reanalysis products and dates considered in the present work. See Fujiwara et al. (2017) for a
much more thorough discussion of the observations assimilated into each
product. Abbreviations for certain products used within the text are
indicated within parentheses.

* Although MERRA-2 includes 1980, there are spin-up issues
in early 1980 which affect the Arctic vortex.

Anomalies are computed from climatologies based on the years 1981 through 2001.
These years are chosen since they are included in all of the reanalysis products
under present consideration. Leap years are handled by omitting 1 July so
that all years are treated as 365 days long. These climatologies (computed for
each reanalysis) are used regardless of the period under consideration.

As an initial example, Fig. 1a shows time series of zonal mean zonal
wind at 60∘ N, 10 hPa from the JRA-55 reanalysis for a set of 36
sudden stratospheric warming events, identified following
Charlton and Polvani (2007). The central dates (lag 0) of the events are defined
by when the wind at this grid point reverses from westerly to easterly, so all
of the time series pass through zero at this point. However, the inter-event
variance of the winds is large both immediately prior to and shortly after the
central date. This spread is only to a weak degree the result of the timing of
the event within the cold season; a similar plot of anomalies from the
climatological mean shows very similar growth in the inter-event spread (not
shown). As a result of this large dynamical variability, the composite mean has
a large sampling variability independent of the quality of the observations or
the forecast models underlying the reanalysis products.

In contrast, Fig. 1b shows the same time series from all 12
reanalysis products for a single event that occurred on 21 February 1989. The
inter-reanalysis spread is in general much smaller than the inter-event
variability emphasized in Fig. 1a. An exception to this is the
surface-input reanalyses, ERA-20C and 20CR v2. JRA-55C, which does not
assimilate satellite observations, is notably indistinguishable from other
reanalysis products, suggesting that satellite observations are not required to
closely constrain these winds.

Although there are far fewer reanalysis products that include the radiosonde
period, Fig. 1c shows that the three reanalyses spanning this period
which assimilate radiosonde observations (JRA-55, NCEP-NCAR, and ERA-40) exhibit
a similarly close agreement, showing only a somewhat larger spread across
reanalyses than in the satellite period. This again suggests that the
radiosondes are providing a strong constraint on the flow, and that as a result
the events that occurred during the radiosonde era are of significant potential
value for constraining our knowledge of the composite mean evolution of sudden
warmings.

Since sudden stratospheric warmings are typically identified by the date on
which this wind reverses sign, these slight differences in reanalyzed winds can
lead to the identification of central dates which differ by a day or two, and in
some cases can lead to an event being identified in one reanalysis but not in
others. This sensitivity is a generic feature of thresholds in the event
definition, not of the particular choice of definition.

This leads to difficulties with comparing composites of events in different
reanalyses: because of the large inter-event variability, the exclusion of even
just one event from a given reanalysis composite mean can produce differences in
the composite mean that easily overwhelm the differences in the reanalyzed flow
itself. Thus, small differences in the identification of events can “alias” into
relatively large apparent differences in the overall composite evolution.

Similar considerations preclude the direct comparison of composite averages of
satellite-era and radiosonde-era events: they differ but not evidently by any
more than should be expected due to this dynamical sampling uncertainty. To
isolate the intrinsic differences between reanalyses from this aliasing of
sampling variability, one must instead consider a fixed set of events across all
reanalyses. This is done here by selecting the date where the event fell in the
majority of the available reanalyses, following the S-RIP chapter 6 analysis of
stratosphere–troposphere coupling coupling, and Butler et al. (2017).

These points are illustrated in Fig. 2, which demonstrates that
composites of events across reanalyses agree better when a fixed set of dates is
taken than when event dates are chosen individually for each reanalysis. This is
true of the full-input analyses for both the satellite era and the radiosonde era.

Figure 2Composites of zonal mean zonal wind at 10 hPa, 60∘ N during
sudden stratospheric warmings for events during the satellite era (a, b) and
the radiosonde era (c, d). Events in panels (a, c) are determined by applying the wind
reversal criteria of Charlton and Polvani (2007) to each reanalysis
individually, while those in panels (b, d) are taken to be common across all
reanalyses. Line colours are as in Fig. 1.

Figure 3(a) Frequency
of all events and of events classified as splits or
displacements for the satellite period versus for the radiosonde
period. (b) Same as panel (a) but for each month of extended winter. Error bars
indicate 95 % confidence intervals; see text for details.

In contrast, the surface-input reanalyses (ERA-20C and 20CR v2) generally agree better
with the composites when event dates are chosen per reanalysis, particularly
around the central date of the event. This suggests that while the surface
observations are sufficient to constrain the stratospheric flow to some extent,
the breakdown of the stratospheric vortex is also significantly determined by
the behaviour of the forecast model in these products.

Considering a list of fixed event dates provides a useful starting point for
quantifying the additional information contained in the radiosonde era. Using
the fixed set of event dates as a basis, Fig. 3a shows estimates of
the overall frequency of sudden stratospheric warmings for the satellite era
alone and for the full 1958–2016 era, as well as for split and displacement
events. The month-by-month frequency is shown in Fig. 3b. Confidence
intervals in all cases are estimated with a bootstrapping procedure: N years
are selected from the period from 1958 to 2016 with replacement, and the events
that occurred in these N years are then used to compute event frequencies,
counted multiple times for those years that are selected more than once. For the
satellite era, N=Ns=32, while for the total period, N=Nt=Ns+Nr=53. This whole processes is repeated 10 000 times, and the bounds of the
confidence intervals are taken to be the 2.5th and 97.5th percentiles.

As expected from the central limit theorem, the confidence intervals are reduced
by a factor very close to Ns/Nt. This amounts to about a 20 %
reduction, providing a stronger observational constraint on the climatological
frequency of sudden stratospheric warmings. A similar reduction is obtained for
the occurrence frequency of splits and displacements, classified following
Lehtonen and Karpechko (2016), as well as for the seasonal distribution of events.

Since the bootstrapping is based on the entire record, the confidence intervals
for the satellite era are not centred on the mean frequencies. The use of the
longer baseline results in a slight shift of the seasonal peak, suggesting that
in the long term, January events are in fact more frequent than February events,
in contrast to the February peak obtained using the satellite period alone. This
difference in apparent seasonality has also been discussed by
Gómez-Escola et al. (2012). These changes could in principle be a result of
some longer-term trend or decadal variability external to the stratosphere, but
they are fully consistent with the null hypothesis of sampling variability from
an unchanged underlying seasonality. In this latter interpretation, the full
record therefore represents a modest but useful strengthening of the
observational constraints on these statistics.

Despite these promising examples, one should expect in general that the quality
of the reanalyses are not as high during the radiosonde era as during the
satellite era. In this light, one might regard the reduction of 20 % in the
confidence intervals found in Fig. 3 to be an upper bound. While errors in the
reanalyses will in general arise from both observational uncertainty as well as
from uncertainty arising from the underlying forecast model and assimilation
process, these will be considered together here as “reanalysis” uncertainty.

A simple way to quantify the potential improvement from including the radiosonde
era is to treat the reanalysis and sampling uncertainty as uncorrelated
Gaussian variance and consider the effect on the sample mean of drawing from
two periods with different variances. More explicitly, we consider some physical
observable X (for instance, the zonal mean zonal wind at 10 hPa and
60∘ N) to be modelled by a normally distributed random variable with mean
μ and variance σ2. Since we are interested in the statistics of the
sample mean, the central limit theorem in principle allows the assumption of
Gaussianity to be relaxed, but the role of non-Gaussian statistics will not be
explicitly considered.

Figure 4The effective value δ of radiosonde-era degrees of freedom
relative to that of satellite-era degrees of freedom in reducing the overall
uncertainty. Shown as a function of αr and αs for three
values of β: (a) 0.1 (radiosonde era much longer than satellite era),
(b) 0.6 (roughly appropriate for the observational records considered
here), and (c) 0.9 (radiosonde era much shorter than satellite era). Contour
interval is 0.25, with the 0 contour indicated in bold.

We further assume that the variance consists of two uncorrelated components
σ2=σd2+σo2: the first, σd2, arising from the
dynamical variability of the atmosphere, and the second, σo2, from the
reanalysis uncertainty. We further consider two sets of observations of this
variable, one of Ns samples with smaller reanalysis error representing the
satellite era, with σo=σs, and one with Nr samples and
relatively larger reanalysis error representing the radiosonde era, with
σo=σr. We take the dynamical variability to be constant across
both samples. The variance of a sum of independent random variables is the sum
of the variance of each variable; hence, the variance of the sample mean during
the satellite era is

(1)Var1Ns∑i=1NsXis=σd2+σs2Ns,

while that of the sample mean over the entire period is

(2)Var1Ns+Nr∑i=1NsXis+∑i=1NrXir=Nsσd2+σs2+Nrσd2+σr2Ns+Nr2.

Here, the superscript on X indicates the “era” from which the sample is drawn
(and thus its variance).

A first criterion for including the both periods is that the standard deviation
of the sample mean should be reduced relative to that obtained from the
satellite era alone. As argued in the previous section, if the reanalysis
errors of the two periods are equal (σr=σs), the standard deviation of
the mean when the whole record is considered will be reduced by a factor
Ns/(Ns+Nr). If the reanalysis errors of the two periods differ,
some straightforward manipulations of the formulas above can be used to show
that the factor can be written
Ns/(Ns+δNr), with

(3)δ=1-βf1+(1-β)f,f=αr2-αs21+αs2.

Here, αs,r=σs,r/σd is the ratio of the reanalysis
standard deviation in each respective period to the dynamical standard
deviation, and β=Ns/Nt is the length of the satellite era as a
fraction of the total length of the record. For the observational period
considered here, β≈0.6.

The factor δ can be loosely interpreted as an efficiency factor for the
sampling during the radiosonde period. Since it depends on the number of
observations in both periods, its value will in general change (through β)
with the size of the sample; however, in the limit that the reanalysis error in
both eras is small compared to the dynamical error, δ≈1-f=1+αs2-αr2, in which case its value is independent of the sample
size. This result, central to the argument of this work, indicates that even if
the reanalysis uncertainty in the radiosonde era is much larger than the
reanalysis uncertainty in the satellite era, δ will be close to 1 so long
as the dynamical uncertainty dominates both.

Figure 4 shows values of δ as a function of αr and
αs for three values of β. One can note several properties of this
factor. Firstly, δ can be negative for sufficiently large values of
αr, although this threshold depends on the value of β. For the
present observational record (Fig. 4b), when αs is
small, this occurs only when αr is somewhat larger than 1, that is, when the
reanalysis uncertainty is somewhat larger than the dynamical uncertainty.
This threshold occurs at smaller values of αr as β decreases, so
that, for marginal cases, the value of the radiosonde era in reducing overall
uncertainty will decrease with time as a longer record of higher-quality
observations becomes available.

Figure 5Standard deviation of de-seasonalized (a) winds in DJF and (b) temperatures in JJA
from the JRA-55 reanalysis over the satellite period. (c, d) Standard deviation
of the differences in same quantities (respectively) across six reanalysis
products for the satellite period. (e, f) As in panels (c, d) but across three
reanalysis products for the radiosonde period. See text for details.

Secondly, δ remains close to 1 if αr≈αs. Because
this statistical model assumes that both periods are drawn from populations with
the same underlying mean, it assigns equal value to both periods, regardless of
how large the reanalysis uncertainty is relative to the dynamical uncertainty.
In practice, the dynamical variability σd is estimated here from the
interannual variability of the field in question. The reanalysis uncertainty
σo is estimated from the statistics of differences between
different reanalysis products: more precisely as the time mean of the standard
deviation across reanalyses. If the observations are not constraining the flow
in a significant way, the reanalysis product will reflect the dynamics of the
underlying forecast model and the flow across the various reanalyses will become
uncorrelated. In this case, assuming that the forecast models produce reasonably
accurate dynamical variability, the estimate of σo should approach
2σd, that is, α≈2. To see this, consider
the time series of an observable from a given reanalysis Xi as the sum of the true
atmospheric evolution Xa and a correction xi. If the standard deviation of
the forecast model is correct, Xi has the same standard deviation as Xa.
When these two components become decorrelated, the correction xi will be the
difference between two uncorrelated time series with standard deviation
σd. Since Xa is independent of the reanalysis, the standard deviation
across reanalyses will therefore be 2σd.

This suggests a second criterion: if αr (or αs) approaches
2, the observations are not providing any significant constraint on the
fluctuations. In this case, we should not regard the reanalysis as providing any
kind of estimate of the true behaviour of the climate system and this part of
the time series should not be included. To avoid influence of the forecast
model, one might reasonably require α to be significantly less than
2.

An important assumption that has been made is that the reanalysis uncertainty is
dominated by a stochastic component that is uncorrelated in time. One can easily
suppose the presence of systematic errors that remain relatively fixed in time,
differing only when the assimilated observations change in a substantial way.
Such a systematic error will not be reduced by a larger sample size; if such an
error ϵ is present during the radiosonde era, its contribution to the
overall uncertainty will be ϵ(1−β). However, in the case that the
dynamical sampling error dominates the random component of the uncertainty, this
systematic error can still be neglected if ϵ≪σd/Nt.

Since the dynamical standard deviation is in general a function of the flow, and
the reanalysis standard deviation is a function of the observational network,
the relative information content present in the radiosonde period will vary both
spatially and temporally, and will depend on what quantity is under
consideration. A complete survey is therefore impossible, but in the next
section a brief overview of some commonly used quantities of importance to
stratosphere–troposphere interaction is given.

Figure 5 shows estimates of the de-seasonalized standard deviation,
σd, and reanalysis standard deviations σs and σr for
zonal wind in boreal winter and temperature in boreal summer. The standard
deviation of the anomaly from the climatology in JRA-55 is used as an estimate
of σd. The variability of DJF zonal winds is large in the Arctic
stratospheric polar vortex, and to a lesser extent in the region of the
quasi-biennial oscillation (QBO) and on the flanks of the tropospheric jets. The
variance of JJA temperatures also shows enhanced variance in the winter
stratosphere as well as in the deep tropical stratosphere but the structures are
less pronounced. In the troposphere, the largest variances are at the poles.

The reanalysis uncertainty is estimated during the satellite period
(Fig. 5b) as the variance across six reanalysis products (JRA-55,
NCEP-NCAR R1, ERA-40, ERA-Interim, MERRA-2, and CFSR; this choice is further
justified below) after first removing their respective climatological means. The
variance is of the order of 0.1 m s−1 through much of the extratropics with
a slight increase with height, particularly in the winter upper stratosphere.
There is considerably larger inter-reanalysis spread in the deep tropical
stratosphere, where the lack of strong balance constraints reduces the utility of
the thermodynamic measurements available from satellites
(Kawatani et al., 2016). Nonetheless, the reanalysis uncertainty remains
significantly less than the dynamical uncertainty throughout the QBO region,
partly due to enhanced dynamical variability and partly due the observational
constraints from radiosondes. In contrast, the inter-reanalysis spread in
temperatures is small (0.1 to 0.2 K) throughout most of the summer hemisphere
below 10 hPa but is larger in the upper stratosphere and the winter polar
stratosphere. A weak maximum is also seen near the tropical and Southern
Hemisphere tropopauses.

Figure 6Standard deviations of pairwise differences between winds in
different reanalysis products at (a) 30 hPa, 60∘ N (DJF), (b) 100 hPa,
60∘ S (JJA), (c) 500 hPa, 40–50∘ N (DJF), and (d) 500 hPa, 40–50∘ S (JJA). All quantities are in m s−1. The diagonal
elements show the de-seasonalized standard deviation of the corresponding
quantity, elements below the diagonal show differences for the satellite era,
and elements above the diagonal show differences for the radiosonde era.
Elements are shaded by the ratio of the difference to the mean of the
dynamical standard deviations from the corresponding two diagonal
elements: light blue (less than 10 %), dark blue (10 % to 30 %), light red (30 % to
100 %), and dark red (greater than 100 %).

The reanalysis uncertainty during the radiosonde period (Fig. 5e, f)
is estimated similarly but using the three full-input reanalyses that cover
this period (JRA-55, NCEP-NCAR R1, and ERA-40). Above 10 hPa, where data from
NCEP-NCAR R1 are not available, the estimate is based on only two products. This
results in some weak discontinuities apparent near 10 hPa. The structure of the
inter-reanalysis spread is to first order similar to that during the satellite
period but is larger in magnitude. Interhemispheric differences are more
apparent, with both wind and temperature spreads in general noticeably larger in the
Southern Hemisphere (an exception to this is the winds in the
upper stratosphere). This is generally consistent with the sparser set of observational
constraints. Nonetheless, in many regions, it remains substantially smaller than
the dynamical variability. Some features with small vertical length scales are
present in the JJA temperature variance; this is likely associated with known
artificial vertical temperature oscillations present in ERA-40
(e.g. Randel et al., 2004).

The “reanalysis” uncertainty is, as discussed above, not associated solely
with the properties of the observational data available but also of the
assimilation and forecast model used by the respective reanalysis products, and
could therefore depend strongly upon which products are included in the
calculation. For this reason, it is not immediately obvious that the
inter-reanalysis spread used here is a reasonable estimate of the reanalysis
uncertainty; for instance, certain reanalyses may be outliers for a given
quantity and may thus inflate the overall spread.

Figure 6 thus shows pairwise inter-reanalysis differences, computed
as a standard deviation over time of the difference between the anomalies from
two different reanalyses. For example, if ui′ is the anomalous zonal mean
zonal wind of reanalysis i, the difference σij between two
reanalyses i and j is

(4)σij=1T∫ui′(t)-uj′(t)2dt1/2.

Entries below the diagonal are computed for the satellite period; those above
the diagonal are for the radiosonde period. Entries on the diagonal show the
dynamical variability computed from the corresponding reanalysis:

(5)σii=1T∫ui′(t)2dt1/2.

The ratio of the inter-reanalysis spread to the dynamical variability (an
estimate of αr and αs) is indicated by the colour of the
off-diagonal cells. Red colours are chosen for ratios greater than 0.3, although
this is well below the strict condition of α<2.

Differences are shown for four regions in the winters of the respective
hemispheres: Fig. 6a, b in the Northern and Southern Hemisphere stratosphere
(30 hPa), respectively, and Fig. 6c, d in the Northern and Southern Hemisphere
troposphere (500 hPa). A value of 30 hPa is used as a representative height for the
stratosphere to reduce the effects of the model lid in NCEP-NCAR R1 and NCEP-DOE
R2; otherwise, the conclusions remain essentially unchanged for 10 hPa. The
estimates of the dynamical variability (along the diagonal) agree closely across
all reanalyses, with the exception of 20CR v2, which is significantly less
variable in the stratosphere.

Figure 7Ratios (a, b)αs and (c, d)αr, and (e, f) the effective
value δ of radiosonde-era degrees of freedom as defined in Sect. 3 for
(a, c, e) zonal winds in DJF and (b, d, f) temperatures in JJA. Note the
different scale for panel (d).

In the Northern Hemisphere, the agreement between full-input and
conventional-input reanalyses (those other than 20CR v2 and ERA-20C) is in all
cases below 30 % of the dynamical variability. Looking more closely, reanalysis
products that share the same or related forecast models tend to be in closer
agreement than those from different centres, and there is in general better
agreement between the more modern products (JRA-55, ERA-Interim, MERRA-2, CFSR)
than between older products. This confirms that the forecast model and
assimilation procedure is a contributing factor to the “reanalysis” error. In
the Northern Hemisphere, the agreement between the conventional-input reanalysis
JRA-55C (which does not assimilate satellite observations) and other products is
nearly as good as that of JRA-55, even in the stratosphere. In the Northern
Hemisphere troposphere, the two surface-input reanalyses agree with other
products to within 30 % of the dynamical variability in the troposphere, but this
agreement degrades substantially in the stratosphere. Nonetheless, at least for
ERA-20C, the agreement is to within the dynamical variability, suggesting that
surface observations do offer some constraint on the evolution of the
stratosphere.

In the Southern Hemisphere, the quality of agreement is weaker everywhere than
the corresponding cases in the Northern Hemisphere. The full-input reanalyses
agree to within 30 % in the troposphere, and, with a few exceptions, in the
stratosphere as well. In the Southern Hemisphere, the conventional-input
reanalysis, JRA-55C is more noticeably degraded relative to the agreement
between other full-input reanalyses, although the differences are still
substantially less than the dynamical variability. The surface-input products
also show larger differences in the troposphere.

As expected, differences in the radiosonde era are in general larger than the
corresponding differences in the satellite era; the one exception to this is in
the Northern Hemisphere stratosphere with 20CR v2, where agreement with JRA-55,
ERA-40, and NCEP-NCAR R1 is all apparently slightly improved in the absence of
satellite observations. Nonetheless, agreement between these latter full-input
products in the Northern Hemisphere remains very close, showing only a slight
degradation within the troposphere, and an agreement between ERA-40 and JRA-55
in the Northern Hemisphere stratosphere to within 10 % of the dynamical
variability. In contrast, differences in the Southern Hemisphere troposphere
approach dynamical variability and exceed it in the stratosphere.

Given the smaller sample size of products which represent the
radiosonde period, general conclusions cannot be as strong as those from the satellite
period; nonetheless, the choice of reanalyses used in Fig. 5 is justified in
that no significant outliers are apparent. Lower values of the reanalysis
uncertainty would likely be obtained if only more modern reanalyses were
included, but this would make comparisons to the radiosonde era impossible.
Given the general improvement in agreement across modern reanalyses
seen in the satellite era, it is plausible that further improvements within the
radiosonde era are also possible.

Having justified to some extent the estimates of σd, σr, and
σs, these can be used to estimate the ratios αr and αs,
and from these δ and the effective value of the radiosonde era according
to the criteria discussed in the previous section. Following Fig. 5,
these quantities are shown for boreal winter zonal winds and austral winter
temperatures in Fig. 7.

The ratio αs is seen to be in general smaller for the zonal winds than
for temperatures. Consistent with Fig. 5, values are generally
smallest in the Northern Hemisphere extratropics, below 0.1 for the winds and
below 0.2 for temperatures. The ratio is generally below 0.4 for the winds'
somewhat larger values near the surface in the deep tropics as well as above
10 hPa in the tropics and at high southern latitudes. For temperatures, values are
below 0.4 or so in the extratropics up to about 50 hPa, but notably approach 1
near the tropopause in the tropics where dynamical variability is small, as well
as in the Southern Hemisphere, and through much of the stratosphere.

The ratio αr shares many of the structural features present in
αs but with generally larger values. Most importantly for the present
discussion, the Northern Hemisphere extratropical winds show values still in
general below 0.2. For zonal winds, the ratio exceeds 0.5 but remains below 1
through most of the Southern Hemisphere, indicating the observations are less
effective at constraining the winds in this hemisphere, but there is still some
information common across reanalyses. As
with αs, αr is larger for temperatures than for zonal winds,
particularly near the tropical and Southern Hemisphere tropopause where values
are well above 1. Values in the Northern Hemisphere extratropics through the
lower stratosphere remain small, but the summertime mid-stratospheric
temperatures (where dynamical variability is relatively weak) are not well
constrained. Much of the wintertime Southern Hemisphere also shows values near
1.

Figure 8Ratio of the power spectrum of the differences in zonal winds between
JRA-55 and other reanalyses (as indicated in the legend), and the power
spectrum of winds in JRA-55 itself. Winds are de-seasonalized and from (a, b)
30 hPa, 60∘ N and (c, d) 500 hPa, 40∘ N in the satellite era (a, c) and
radiosonde era (b, d). Note that the legend is divided across the
panels but applies equally to each. Frequencies corresponding to periods of
1 year, 1 month (30 days), 1 week, and 1 day are indicated on the
horizontal axis. The black horizontal line is at 2, indicative of the lack of
observational constraints (see text).

Using these values of αr and αs, Fig. 5e, f show the
calculated value of δ. The values for the zonal wind remains quite close
to 1 through the Northern Hemisphere and tropics in boreal winter. In the
Southern Hemisphere, below 10 hPa, the values are reduced but perhaps
surprisingly remain above 0.5. This reflects to some extent the fact that the
underlying reanalysis uncertainty σs is larger in Southern Hemisphere
than in the Northern Hemisphere, even during the satellite era. These values
suggest that DJF winds are constrained well enough by observations in the
radiosonde era that they may be of some value towards reducing uncertainty. This
is, however, not the case for JJA temperatures in the Southern Hemisphere
(Fig. 5f, or in fact for JJA winds or DJF temperatures, though these
latter cases are not shown explicitly), for which values of δ are in many
cases below 0; this is notably the case for temperatures near the tropical
tropopause as well.

In summary, these criteria show clear value in including the radiosonde era in
dynamical analyses of Northern Hemisphere quantities from the troposphere up to
the mid-stratosphere. There is a possible suggestion that useful information may
be gained for winds in the Southern Hemisphere summer winds as well. On the
other hand, for much of the rest of the Southern Hemisphere quantities, this is
not the case. Temperatures near the tropical tropopause also show significantly
worse agreement during the radiosonde period.

As they are based on the overall variance, these estimates are most sensitive to
the dominant dynamical structures of interannual variability in the flow, which
have typically relatively longer timescales and larger length scales. These
bulk estimates may not therefore imply that the observational constraints on
dynamical processes at shorter timescales are equally strong. To begin to assess
this point, Fig. 8 compares the power spectra of de-seasonalized
winds from JRA-55 in the stratosphere and troposphere with the power spectra of
pairwise differences between JRA-55 and other reanalyses. These provide
frequency-dependent estimates of σd and σo, respectively, and
thus the ratio of these two spectra in the corresponding eras provides a
frequency-dependent estimate of αs2 and αr2. Such spectra are
shown for Northern Hemisphere winds in the stratosphere (Fig. 8a, b)
and in the troposphere (Fig. 8c, d).

During the satellite era, differences from most reanalyses at low frequencies are
2–3 orders of magnitude smaller than the spectrum, consistent with the
5 %–10 % estimate of the raw differences since these plots show the variance
instead of the standard deviation. These values can be compared to the
horizontal line shown at a value of 2, expected if observations are providing no
constraint on the flow. Fluctuations at higher frequencies reach the same order
as the dynamical variability at timescales of a few days in the stratosphere; in
the troposphere, differences amongst the more modern reanalyses remain below
dynamical variability down to the highest frequency considered (corresponding to
a period of 6 h). Within the stratosphere, differences from NCEP-NCAR R1 and
NCEP-DOE R2 are significantly larger than other reanalyses at all
frequencies, and the differences from ERA-20C and 20CR v2 are of the order of the reference
spectrum. Within the troposphere, the surface-input reanalyses are still
noticeably in weaker agreement with JRA-55, with difference spectra that
approach the reference spectra at frequencies corresponding to periods less than
half a week or so.

During the radiosonde era (Fig. 8b, d), the differences are, as
expected, larger than during the satellite era, although similar features can be
noted with better agreement between JRA-55 and ERA-40, and significantly worse
agreement with the surface-input reanalyses. This suggests that processes with
timescales even as short as a few days are still significantly constrained in
the Northern Hemisphere extratropics, although this constraint is not as strong
(relative to dynamical variability) as is the case for processes on timescales
of a month or longer.

A similar spectral analysis could be applied spatially to determine which
spatial scales which are reliable. However, this has not been directly considered
and would be better applied to fully three-dimensional data as opposed to the
zonal means considered here.

Up to this point, the analysis has considered both the radiosonde and satellite
eras to be to some extent uniform in time in their properties, yet the
observational record evolved during these periods as well. To consider briefly
the evolution of the observational constraint over time, the ratio α
can be estimated for each month individually. In this case, we
consider pairwise differences between JRA-55 and other reanalyses as an estimate
of σo, and the standard deviation of JRA-55 itself as an estimate of
σd. In all cases, the time series are first de-seasonalized.

Since the interest is primarily in the early part of the record,
Fig. 9 shows this ratio for zonal winds in the Northern
Hemisphere stratosphere (at 60∘ N, 30 hPa) and in the Southern Hemisphere
troposphere (at 45∘ S, 500 hPa), spanning from 1958 to 1986. The
month-by-month values fluctuate considerably but show nonetheless a distinct annual
cycle with lower values of α during the respective winter months when
the dynamical variability is higher. A clearer trend can be observed by
considering δ computed from 12-month running averages of α (bold
lines in Fig. 9). In the Northern Hemisphere stratosphere, values
for ERA-40 remain well below 0.5 through nearly all of the period in question,
and NCEP-NCAR R1 is only somewhat larger. Although the methodology used here
cannot yet be used to examine the period prior to 1958, these relatively low
values suggest that even earlier periods could be of value. This speculation is
supported by the results of Hersbach et al. (2017), who found this period to
be of value in particular for constraining the evolution of the QBO.

Figure 9Time-dependent estimate of α for (a)U at 30 hPa,
60∘ N and (b)U at 500 hPa, 45∘ S. The faint lines are
computed based on month-by-month variability (see text for details), while
bold lines are computed based 12-month running means of α.

The surface-input reanalyses show large fluctuations over time but less of a
clear trend. For ERA-20C, the value of α remains close to 1 through much
of the period, though at the beginning of the period the value is only slightly
larger than for NCEP-NCAR R1. The values for 20CR v2 are systematically larger,
not far below the limit of 2, despite the lower overall variance at
these heights seen in Fig. 6.

In the Southern Hemisphere, again, values show a clear seasonal cycle; while there
are times of the year during which the agreement is better, the 12-month running
average is above 1 for all products through the 1960s, dropping somewhat
through the early 1970s and to values of less than 0.5 only after 1979. This
suggests that the tropospheric flow is only weakly constrained by the
observations prior to 1979. In this case, 20CR v2 shows somewhat better
agreement with JRA-55 than ERA-20C through the early 1980s.

The assessment of inter-reanalysis differences presented here suggests that there
is considerable value for dynamical studies in including the radiosonde era,
particularly in the extratropical Northern Hemisphere. The criteria discussed
suggest that for lower-frequency, large-scale processes such as those
responsible for stratosphere–troposphere coupling during sudden
stratospheric warmings, including the radiosonde era could reduce confidence intervals by
close to 20 %, despite the increase in reanalysis uncertainty during this
time. To assess whether this is in fact the case, Fig. 10 presents
bootstrap estimates of uncertainties (at the 95 % level) on composites of
several dynamical quantities fundamental to this coupling: the vertically
integrated zonal wind, vertically integrated meridional momentum fluxes, and
meridional heat fluxes at 100 hPa. The vertical integral is taken from 1000
to 100 hPa(see, e.g. Hitchcock and Simpson, 2016). The bootstrap estimates are
carried out by generating a large number of synthetic composites by selecting
N events with replacement from the full period (shown in solid lines with
shaded confidence intervals) and from the satellite period (shown in dashed
lines with outlined confidence intervals).

Importantly, any systematic error present in these quantities during the
radiosonde era will contribute to the bootstrapped confidence intervals. The
fact then that in each case confidence intervals are (with some regional
exceptions; not shown explicitly) reduced by an order of 20 % suggests that
any such systematic errors are small relative to the sampling error.

As was the case with the event frequencies shown in Fig. 3, the
composite means agree nearly everywhere to within estimated confidence
intervals, as should be the case. Within these uncertainties, the tropospheric
jet shift is seen at somewhat lower latitudes during the full period with a less
pronounced low-latitude signal; the momentum flux anomalies are somewhat more
positive, and the heat-flux anomalies during the recovery phase suggest somewhat
more suppression of the upward wave flux. While the differences in composite
means are modest, including this period reduces the confidence intervals on
these quantities by the expected amount, providing better observational
constraints on dynamical understanding and modelling efforts.

The advent of more advanced satellite-based sounding instruments in the late
1970s resulted in major improvements in the monitoring of the detailed state of
the atmosphere. Nonetheless, “conventional” upper-air observations play an
important complementary role, and the network of surface and radiosonde
observations in place prior to this period represents a valuable resource for
observationally constraining atmospheric variability. For dynamical studies that
rely on statistical composites of specific anomalous conditions, the dominant
source of error in many cases arises from sampling this atmospheric
variability, not from observational uncertainties.

In particular, this study has considered the value of the “radiosonde” era from
1958 to 1978 relative to the “satellite” era from 1979 to 2010, using
differences between presently available reanalysis products to characterize the
constraint provided by the observations in these two periods. In principle,
including the radiosonde era allows for up to a reduction of 20 % in confidence
intervals associated with the dynamical variability.

Figure 10(a) Composite mean of vertically averaged zonal wind anomalies,
averaged over lags of 5 to 60 days following major warmings. The solid line shows
the composite for all events, while the dashed line shows the composite for
the satellite era alone. Confidence intervals for the whole period are
shaded, while those for the satellite era are indicated by thin dashed lines. (b)
Similar but for vertically integrated momentum fluxes. (c) Similar but for
meridional heat fluxes at 100 hPa, averaged over lags −15 to 0 (in
red), and over lags 5 to 60 (in blue). See text for details.

The value of the radiosonde era towards reducing the overall sampling
uncertainty in composites is quantified by Eq. (3). This
depends on the ratio of the “reanalysis” uncertainty (including uncertainty
arising from the observations as well as that arising from the assimilation
process) to the dynamical uncertainty (the variability of the dynamical
phenomena themselves). A key conclusion to draw from this relationship is that
even if the reanalysis uncertainty is significantly greater in the radiosonde
era than in the satellite era, so long as the dynamical uncertainty dominates both,
the radiosonde era will be of nearly equivalent value to the satellite era.
However, since this criterion assesses the relative value of the two periods, it
is important as well to consider directly the ratio of the reanalysis
uncertainty to the dynamical uncertainty. If this is too large, this indicates a
more significant influence of the underlying forecast model.

Since these criteria depend on the physical properties of the climate system,
the observations available, and the reanalysis forecast model and
assimilation system, they must be applied on a case-by-case basis. The present
work cannot hope to provide a comprehensive survey. However, basic zonal mean
quantities including zonal winds, temperatures, and fluxes of momentum and heat,
as archived for 12 reanalysis products (see Table 1) by
Martineau (2017), have been considered here.

For all quantities considered, the reanalysis uncertainty in the Northern
Hemisphere extratropics from the surface up to the mid-stratosphere (about
10 hPa) is found to be sufficiently small relative to the dynamical variability to
make the radiosonde era of clear value in reducing composite uncertainties.
For zonal mean zonal winds, the interannual variability is such that despite
larger reanalysis uncertainties, this is also the case for tropical winds (even
in the stratosphere), and even Southern Hemisphere winds may be of some value in
the austral summer. However, temperatures through much of the Southern
Hemisphere are not well enough constrained to be worth including the radiosonde
era. This is also notably true of temperatures in the tropical tropopause
layer.

This test has also been applied to the surface-input reanalyses ERA-20C and 20CR
v2. The statistics of differences between these products and full-input
reanalyses clearly indicate that, at least for ERA-20C, their stratospheric
evolution bears some meaningful resemblance to reality. However, this constraint
is still much weaker compared to that available to full-input or even conventional-input
products, with inter-reanalysis differences of similar magnitude to the
dynamical variability. Furthermore, while differences between other reanalyses
are reduced when considering fixed dates for sudden stratospheric warmings, for
the surface-input reanalyses, the comparison is improved when considering
per-reanalysis dates, suggesting that, in these surface-input reanalyses,
sudden stratospheric warmings are at least as much a product of the forecast
model dynamics as a result of assimilated observations.

While these criteria do not consider the possibility of systematic biases in
the radiosonde era, direct bootstrap estimates generally confirm this reduction
in uncertainty of several dynamical quantities relevant to
stratosphere–troposphere coupling following sudden stratospheric warmings in the
Northern Hemisphere.

As a final note, while considerable improvements have been documented for more
modern reanalyses during the satellite period (e.g. Long et al., 2017),
there are at present not enough modern reanalyses that cover the radiosonde era
to clearly document improvements over this earlier period. It seems likely that
similar attention on the radiosonde era could produce similar improvements.
Given the value of this period for dynamical studies demonstrated in this and
other recent studies (Hersbach et al., 2017; Gerber and Martineau, 2018), the intent to
include this period in two upcoming products (ERA-5 and JRA-3Q) is welcome.

The author thanks Sean Davis and Gloria Manney for helpful discussions, as well
the lead authors of the S-RIP chapter on stratosphere–troposphere coupling,
Patrick Martineau and Edwin Gerber, for their support of this work. The reviewer comments of
Adrian Simmons, Edwin Gerber, and two anonymous referees led to significant
improvements in the text and were also much appreciated.

Studies of the dynamics of stratosphere–troposphere coupling benefit from long observational records in order to distinguish common dynamical features from unrelated atmospheric variability. On the basis of a comparison between a range of reanalysis products, this study argues that the period from 1958 to 1979 is of significant value in the Northern Hemisphere for this purpose, despite the lack of global satellite records.

Studies of the dynamics of stratosphere–troposphere coupling benefit from long observational...