Abstract

Time-varying GARCH-M models are commonly used in econometrics and financial economics. Yet the recursive nature of the conditional variance makes exact likelihood
analysis of these models computationally infeasible. This paper outlines the issues and suggests to employ a Markov chain Monte Carlo algorithm which allows the calculation of a classical estimator via the simulated EM algorithm or a simulated Bayesian solution in only 𝑂(𝑇) computational operations, where 𝑇 is the sample size. Furthermore, the theoretical dynamic properties of a time-varying GQARCH(1,1)-M are derived. We discuss them and apply the suggested Bayesian estimation to three major stock markets.

1. Introduction

Time series data, emerging from diverse fields appear to possess time-varying second conditional moments. Furthermore, theoretical results seem to postulate quite often, specific relationships between the second and the first conditional moment. For instance, in the stock market context, the first conditional moment of stock market’s excess returns, given some information set, is a possibly time-varying, linear function of volatility (see, e.g., Merton [1], Glosten et al. [2]). These have led to modifications and extensions of the initial ARCH model of Engle [3] and its generalization by Bollerslev [4], giving rise to a plethora of dynamic heteroscedasticity models. These models have been employed extensively to capture the time variation in the conditional variance of economic series, in general, and of financial time series, in particular (see Bollerslev et al. [5] for a survey).

Although the vast majority of the research in conditional heteroscedasticity is being processed aiming at the stylized facts of financial stock returns and of economic time series in general, Arvanitis and Demos [6] have shown that a family of time-varying GARCH-M models can in fact be consistent with the sample characteristics of time series describing the temporal evolution of velocity changes of turbulent fluid and gas molecules. Despite the fact that the latter statistical characteristics match in a considerable degree their financial analogues (e.g., leptokurtosis, volatility clustering, and quasi long-range dependence in the squares are common), there are also significant differences in the behavior of the before mentioned physical systems as opposed to financial markets (examples are the anticorrelation effect and asymmetry of velocity changes in contrast to zero autocorrelation and the leverage effect of financial returns) (see Barndorff-Nielsen and Shephard [7] as well as Mantegna and Stanley [8, 9]). It was shown that the above-mentioned family of models can even create anticorrelation in the means as far as an AR(1) time-varying parameter is introduced.

It is clear that from an econometric viewpoint it is important to study how to efficiently estimate models with partially unobserved GARCH processes. In this context, our main contribution is to show how to employ the method proposed in Fiorentini et al. [10] to achieve MCMC likelihood-based estimation of a time-varying GARCH-M model by means of feasible 𝑂(𝑇) algorithms, where 𝑇 is the sample size. The crucial idea is to transform the GARCH model in a first-order Markov’s model. However, in our model, the error term enters the in-mean equation multiplicatively and not additively as it does in the latent factor models of Fiorentini et al. [10]. Thus, we show that their method applies to more complicated models, as well.

We prefer to employ a GQARCH specification for the conditional variance (Engle [3] and Sentana [11]) since it encompasses all the existing restricted quadratic variance functions (e.g., augmented ARCH model), its properties are very similar to those of GARCH models (e.g., stationarity conditions) but avoids some of their criticisms (e.g., very easy to generalize to multivariate models). Moreover, many theories in finance involve an explicit tradeoff between the risk and the expected returns. For that matter, we use an in-mean model which is ideally suited to handling such questions in a time series context where the conditional variance may be time varying. However, a number of studies question the existence of a positive mean/variance ratio directly challenging the mean/variance paradigm. In Glosten et al. [2] when they explicitly include the nominal risk free rate in the conditioning information set, they obtain a negative ARCH-M parameter. For the above, we allow the conditional variance to affect the mean with a possibly time varying coefficient which we assume for simplicity that it follows an AR(1) process. Thus, our model is a time-varying GQARCH-M-AR(1) model.

As we shall see in Section 2.1, this model is able to capture the, so-called, stylized facts of excess stock returns. These are (i) the sample mean is positive and much smaller than the standard deviation, that is, high coefficient of variation, (ii) the autocorrelation of excess returns is insignificant with a possible exception of the 1st one, (iii) the distribution of returns is nonnormal mainly due to excess kurtosis and may be asymmetry (negative), (iv) there is strong volatility clustering, that is, significant positive autocorrelation of squared returns even for high lags, and (v) the so-called leverage effect; that is, negative errors increase future volatility more than positive ones of the same size.

The structure of the paper is as follows. In Section 2, we present the model and derive the theoretical properties the GQARCH(1,1)-M-AR(1) model. Next, we review Bayesian and classical likelihood approaches to inference for the time-varying GQARCH-M model. We show that the key task (in both cases) is to be able to produce consistent simulators and that the estimation problem arises from the existence of two unobserved processes, causing exact likelihood-based estimations computationally infeasible. Hence, we demonstrate that the method proposed by Fiorentini et al. [10] is needed to achieve a first-order Markov’s transformation of the model and thus, reducing the computations from 𝑂(𝑇2) to 𝑂(𝑇). A comparison of the efficient (𝑂(𝑇) calculations) and the inefficient (𝑂(𝑇2) ones) simulator is also given. An illustrative empirical application on weekly returns from three major stock markets is presented in Section 4, and we conclude in Section 5.

2. GQARCH(1,1)-M-AR(1) Model

The definition of our model is as follows.

Definition 2.1. The time-varying parameter GQARCH(1,1)-M-AR(1) model:
𝑟𝑡=𝛿𝑡ℎ𝑡+𝜀𝑡,𝜀𝑡=𝑧𝑡ℎ𝑡1/2,(2.1)
where
𝛿𝑡=(1−𝜑)𝛿+𝜑𝛿𝑡−1+𝜑𝑢𝑢𝑡,ℎ(2.2)𝑡𝜀=𝜔+𝛼𝑡−1−𝛾2+𝛽ℎ𝑡−1,(2.3)𝑧𝑡∽i.i.d.𝑁(0,1), 𝑢𝑡∽i.i.d.𝑁(0,1), 𝑢𝑡, 𝑧𝑡 are independent for all𝑡𝑠, and where {𝑟𝑡}𝑇𝑡=1 are the observed excess returns, 𝑇 is the sample size, {𝛿𝑡}𝑇𝑡=1 is an unobserved AR(1) process independent (with 𝛿0=𝛿) of {𝜀𝑡}𝑇𝑡=1, and {ℎ𝑡}𝑇𝑡=1 is the conditional variance (with ℎ0 equal to the unconditional variance and 𝜀0=0) which is supposed to follow a GQARCH(1,1). It is obvious that 𝛿𝑡 is the market price of risk (see, e.g., Merton [1] Glosten at al. [2]). Let us call ℱ𝑡−1 the sequence of natural filtrations generated by the past values of {𝜀𝑡} and {𝑟𝑡}.

Modelling the theoretical properties of this model has been a quite important issue. Specifically, it would be interesting to investigate whether this model can accommodate the main stylized facts of the financial markets. On other hand, the estimation of the model requires its transformation into a first-order Markov’s model to implement the method of Fiorentini et al. [10]. Let us start with the theoretical properties.

2.1. Theoretical Properties

Let us consider first the moments of the conditional variance ℎ𝑡, needed for the moments of 𝑟𝑡. The proof of the following lemma is based on raising ℎ𝑡 to the appropriate power, in (2.3), and taking into account that 𝐸(𝑧4𝑡)=3, 𝐸(𝑧6𝑡)=15 and 𝐸(𝑧8𝑡)=105.

Lemma 2.2. If 105𝛼4+60𝛼3𝛽+18𝛼2𝛽2+4𝛼𝛽3+𝛽4<1, the first four moments of the conditional variance of (2.3) are given by
𝐸ℎ𝑡=𝜔+𝛼𝛾2,𝐸ℎ1−(𝛼+𝛽)2𝑡=𝜔+𝛼𝛾2𝜔+𝛼𝛾2(1+𝛼+𝛽)+4𝛼2𝛾21−3𝛼2−2𝛼𝛽−𝛽2(,𝐸ℎ1−𝛼−𝛽)3𝑡=𝜔+𝛼𝛾23+3𝜔+𝛼𝛾2𝜔+𝛼𝛾2(𝛼+𝛽)+4𝛼2𝛾2𝐸ℎ𝑡1−𝛽3−15𝛼3−9𝛼2𝛽−3𝛼𝛽2+3𝜔+𝛼𝛾23𝛼2+𝛽2+2𝛽𝛼+4𝛼2𝛾2𝐸ℎ(3𝛼+𝛽)2𝑡1−𝛽3−15𝛼3−9𝛼2𝛽−3𝛼𝛽2,𝐸ℎ4𝑡=𝜔+𝛼𝛾22𝜔+𝛼𝛾22+4𝜔+𝛼𝛾2(𝛼+𝛽)+6𝛼2𝛾2𝐸ℎ𝑡1−105𝛼4−60𝛼3𝛽−18𝛼2𝛽2−4𝛼𝛽3−𝛽4+6𝜔+𝛼𝛾223𝛼2+𝛽2+2𝛽𝛼+8𝜔+𝛼𝛾2(3𝛼+𝛽)+𝛼2𝛾2𝛼2𝛾21−105𝛼4−60𝛼3𝛽−18𝛼2𝛽2−4𝛼𝛽3−𝛽4𝐸ℎ2𝑡+4𝜔+𝛼𝛾215𝛼3+9𝛼2𝛽+3𝛼𝛽2+𝛽3+6𝛼2𝛾215𝛼2+6𝛼𝛽+𝛽21−105𝛼4−60𝛼3𝛽−18𝛼2𝛽2−4𝛼𝛽3−𝛽4𝐸ℎ3𝑡.(2.4)

Consequently, the moments of 𝑟𝑡 are given in the following theorem taken from Arvanitis and Demos [6].

Theorem 2.3. The first two moments of the model in (2.1), (2.2), and (2.3) are given by
𝐸𝑟𝑡ℎ=𝛿𝐸𝑡𝑟,𝐸2𝑡=𝛿2+𝜑2𝑢1−𝜑2𝐸ℎ2𝑡ℎ+𝐸𝑡,(2.5)
whereas the skewness and kurtosis coefficients are
𝑟Sk𝑡=𝑆𝑟𝑡Var1.5𝑟𝑡𝑟,kurt𝑡=𝜅Var2𝑟𝑡,(2.6)
where
𝑆𝑟𝑡𝛿=𝛿2𝜑+32𝑢1−𝜑2𝐸ℎ3𝑡ℎ+3𝛿𝐸2+2𝛿3𝐸3ℎ𝑡𝛿−3𝛿2+𝜑2𝑢1−𝜑2𝐸ℎ2𝑡ℎ+𝐸𝑡𝐸ℎ𝑡,𝛿𝜅=4+6𝛿2𝜑2𝑢1−𝜑2𝜑+34𝑢1−𝜑22𝐸ℎ4𝑡+3𝛿22−𝛿2𝐸ℎ𝑡𝐸3ℎ𝑡𝛿+62+𝜑2𝑢1−𝜑2𝐸ℎ3𝑡−4𝛿2𝛿2+3𝜑2𝑢1−𝜑2𝐸ℎ3𝑡𝐸ℎ𝑡+6𝛿2𝛿2+𝜑2𝑢1−𝜑2𝐸ℎ𝐸(ℎ)−2𝐸(ℎ)+32𝑡,(2.7)
and 𝐸(ℎ𝑡), 𝐸(ℎ2𝑡), 𝐸(ℎ3𝑡),and 𝐸(ℎ4𝑡) are given in Lemma 2.2.

In terms of stationarity, the process {𝑟𝑡} is 4th-order stationary if and only if
||𝜑||<1,105𝛼4+60𝛼3𝛽+18𝛼2𝛽2+4𝛼𝛽3+𝛽4<1.(2.8)
These conditions are the same as in Arvanitis and Demos [6], indicating that the presence of the asymmetry parameter, 𝛾, does not affect the stationarity conditions (see also Bollerslev [4] and Sentana [11]). Furthermore, the 4th-order stationarity is needed as we would like to measure the autocorrelation of the squared 𝑟𝑡’s (volatility clustering), as well as the correlation of 𝑟2𝑡 and 𝑟𝑡−1 (leverage effect). The dynamic moments of the conditional variance and those between the conditional variance ℎ𝑡 and the error 𝜀𝑡 are given in the following two lemmas (for a proof see Appendix B).

Lemma 2.4. Under the assumption of Lemma 2.2, one has that
ℎCov𝑡,ℎ𝑡−𝑘=(𝛼+𝛽)𝑘𝑉ℎ𝑡,ℎCov2𝑡,ℎ𝑡−𝑘=𝐴𝑘𝐸ℎ3𝑡ℎ−𝐸2𝑡𝐸ℎ𝑡+(𝛼+𝛽)𝑘−𝐴𝑘ℎ(𝛼+𝛽)−𝐴𝐵𝑉𝑡,ℎCov𝑡,ℎ2𝑡−𝑘=(𝛼+𝛽)𝑘𝐸ℎ3𝑡−1ℎ−𝐸2𝑡−1𝐸ℎ𝑡−1,ℎCov2𝑡,ℎ2𝑡−𝑘=𝐴𝑘𝑉ℎ2𝑡(+𝐵𝛼+𝛽)𝑘−𝐴𝑘𝐸ℎ(𝛼+𝛽)−𝐴3𝑡ℎ−𝐸2𝑡𝐸ℎ𝑡,(2.9)
where 𝐴=3𝛼2+𝛽2+2𝛼𝛽 and 𝐵=2[2𝛼2𝛾2+(𝜔+𝛼𝛾2)(𝛼+𝛽)].

Lemma 2.5. ℎCov𝑡,𝜀𝑡−𝑘ℎ=𝐸𝑡𝜀𝑡−𝑘=−2𝛼𝛾(𝛼+𝛽)𝑘−1𝐸ℎ𝑡,ℎCov𝑡ℎ𝑡−𝑘,𝜀𝑡−𝑘ℎ=𝐸𝑡ℎ𝑡−𝑘𝜀𝑡−𝑘=−2𝛼𝛾(𝛼+𝛽)𝑘−1𝐸ℎ2𝑡−1,ℎCov𝑡,𝜀2𝑡−𝑘=(𝛼+𝛽)𝑘𝑉ℎ𝑡+2𝛼(𝛼+𝛽)𝑘−1𝐸ℎ2𝑡,ℎCov2𝑡,𝜀𝑡−𝑘ℎ=𝐸2𝑡𝜀𝑡−𝑘=−4𝛼𝛾(3𝛼+𝛽)𝐴𝑘−1𝐸ℎ2𝑡−4𝛼𝛾𝜔+𝛼𝛾2(𝛼+𝛽)𝑘−𝐴𝑘(𝛼+𝛽)−𝐴+2𝛼2𝛾2(𝛼+𝛽)𝑘−1−𝐴𝑘−1𝐸ℎ(𝛼+𝛽)−𝐴𝑡,ℎCov2𝑡,𝜀2𝑡−𝑘=𝐴𝑘𝐸ℎ3𝑡ℎ−𝐸2𝑡𝐸ℎ𝑡+𝐵(𝛼+𝛽)𝑘−𝐴𝑘𝑉ℎ(𝛼+𝛽)−𝐴𝑡+4𝛼(3𝛼+𝛽)𝐴𝑘−1𝐸ℎ3𝑡+2𝛼𝐵(𝛼+𝛽)𝑘−1−𝐴𝑘−1(𝛼+𝛽)−𝐴+42𝛼2𝛾2+𝜔+𝛼𝛾2𝐴𝑘−1𝐸ℎ2𝑡,ℎCov2𝑡ℎ𝑡−𝑘,𝜀𝑡−𝑘ℎ=𝐸2𝑡ℎ𝑡−𝑘𝜀𝑡−𝑘=−4𝛼𝛾𝐴𝑘−1ℎ(3𝛼+𝛽)𝐸3𝑡+𝜔+𝛼𝛾2𝐸ℎ2𝑡(−2𝛼𝛾𝐵𝛼+𝛽)𝑘−1−𝐴𝑘−1𝐸ℎ(𝛼+𝛽)−𝐴2𝑡,(2.10)
where 𝐴 and 𝐵 are given in Lemma 2.4.

Furthermore, from Arvanitis and Demos [6] we know that the following results hold.

Theorem 2.6. The autocovariance of returns for the model in (2.1)–(2.3) is given by
𝛾𝑘𝑟=Cov𝑡,𝑟𝑡−𝑘=𝛿2ℎCov𝑡,ℎ𝑡−𝑘ℎ+𝛿𝐸𝑡𝜀𝑡−𝑘+𝜑𝑘𝜑2𝑢1−𝜑2𝐸ℎ𝑡ℎ𝑡−𝑘,(2.11)
and the covariance of squares’ levels and the autocovariance of squares are
𝑟Cov2𝑡,𝑟𝑡−𝑘𝛿=𝐸2𝑡𝛿𝑡−𝑘ℎCov2𝑡,ℎ𝑡−𝑘𝛿+Cov2𝑡,𝛿𝑡−𝑘𝐸ℎ2𝑡𝐸ℎ𝑡−𝑘𝛿+𝐸𝑡−𝑘ℎCov𝑡,ℎ𝑡−𝑘𝛿+𝐸2𝑡𝐸ℎ2𝑡𝜀𝑡−𝑘ℎ+𝐸𝑡𝜀𝑡−𝑘,𝑟Cov2𝑡,𝑟2𝑡−𝑘=𝐸2𝛿2𝑡ℎCov2𝑡,ℎ2𝑡−𝑘𝛿+Cov2𝑡,𝛿2𝑡−𝑘𝐸ℎ2𝑡ℎ2𝑡−𝑘𝛿+𝐸2𝑡ℎCov2𝑡,𝜀2𝑡−𝑘𝛿+2𝐸2𝑡𝛿𝑡−𝑘𝐸ℎ2𝑡ℎ𝑡−𝑘𝜀𝑡−𝑘𝛿+𝐸2𝑡ℎCov𝑡,ℎ2𝑡−𝑘ℎ+Cov𝑡,𝜀2𝑡−𝑘ℎ+2𝛿𝐸𝑡ℎ𝑡−𝑘𝜀𝑡−𝑘,(2.12)
where all needed covariances and expectations of the right-hand sizes are given in Lemmas 2.4 and 2.5,
𝛿Cov2𝑡,𝛿𝑡−𝑘𝛿=Cov𝑡,𝛿2𝑡−𝑘=2𝜑𝑘𝛿𝜑2𝑢1−𝜑2,𝛿Cov2𝑡,𝛿2𝑡−𝑘=4𝜑𝑘𝛿2𝜑2𝑢1−𝜑2+2𝜑2𝑘𝜑4𝑢1−𝜑22.(2.13)

From the above theorems and lemmas it is obvious that our model can accommodate all stylized facts. For example, negative asymmetry is possible; volatility clustering and the leverage effect (negative Cov(ℎ𝑡,𝜀𝑡−𝑘)) can be accommodated, and so forth. Furthermore, the model can accommodate negative autocorrelations, 𝛾𝑘, something that is not possible for the GARCH-M model (see Fiorentini and Sentana [12]). Finally, another interesting issue is the diffusion limit of the time-varying GQARCH-M process. As already presented by Arvanitis [13], the weak convergence of the Time-varying GQARCH(1,1)-M coincides with the general conclusions presented elsewhere in the literature. These are that weak limits of the endogenous volatility models are exogenous (stochastic volatility) continuous-time processes. Moreover, Arvanitis [13] suggests that there is a distributional relation between the GQARCH model and the continuous-time Ornstein-Uhlenbeck models with respect to appropriate nonnegative Levy’s processes.

Let us turn our attention to the estimation of our model. We will show that estimating our model is a hard task and the use of well-known methods such as the EM-algorithm cannot handle the problem due to the huge computational load that such methods require.

3. Likelihood-Inference: EM and Bayesian Approaches

The purpose of this section is the estimation of the time-varying GQARCH(1,1)-M model. Since our model involves two unobserved components (one from the time-varying in-mean parameter and one from the error term), the estimation method required is an EM and more specifically a simulated EM (SEM), as the expectation terms at the 𝐸 step cannot be computed. The main modern way of carrying out likelihood inference in such situations is via a Markov chain Monte Carlo (MCMC) algorithm (see Chib [14] for an extensive review). This simulation procedure can be used either to carry out Bayesian inference or to classically estimate the parameters by means of a simulated EM algorithm.

The idea behind the MCMC methods is that in order to sample a given probability distribution, which is referred to as the target distribution, a suitable Markov chain is constructed (using a Metropolis-Hasting (M-H) algorithm or a Gibbs sampling method) with the property that its limiting, invariant distribution is the target distribution. In most problems, the target distribution is absolutely continuous, and as a result the theory of MCMC methods is based on that of the Markov chains on continuous state spaces [15]. This means that by simulating the Markov chain a large number of times and recording its values a sample of (correlated) draws from the target distribution can be obtained. It should be noted that the Markov chain samplers are invariant by construction, and, therefore, the existence of the invariant distribution does not have to be checked in any particular application of MCMC method.

The Metropolis-Hasting algorithm (M-H) is a general MCMC method to produce sample variates from a given multivariate distribution. It is based on a candidate generating density that is used to supply a proposal value that is accepted with probability given as the ratio of the target density times the ratio of the proposal density. There are a number of choices of the proposal density (e.g., random walk M-H chain, independence M-H chain, tailored M-H chain) and the components may be revised either in one block or in several blocks. Another MCMC method, which is special case of the multiple block M-H method with acceptance rate always equal to one, is called the Gibbs sampling method and was brought into statistical prominence by Gelfand and Smith [16]. In this algorithm, the parameters are grouped into blocks, and each block is sampled according to the full conditional distribution denoted as 𝜋(𝜙𝑡/𝜙/𝑡). By Bayes' theorem, we have 𝜋(𝜙𝑡/𝜙/𝑡)∝𝜋(𝜙𝑡𝜙/𝑡), the joint distribution of all blocks, and so full conditional distributions are usually quite simply derived. One cycle of the Gibbs sampling algorithm is completed by simulating {𝜙𝑡}𝑝𝑡=1, where 𝑝 is the number of blocks, from the full conditional distributions, recursively updating the conditioning variables as one moves through each distribution. Under some general conditions, it is verified that the Markov chain generated by the M-H or the Gibbs sampling algorithm converges to the target density as the number of iterations becomes large.

Within the Bayesian framework, MCMC methods have proved very popular, and the posterior distribution of the parameters is the target density (see [17]). Another application of the MCMC is the analysis of hidden Markov’s models where the approach relies on augmenting the parameter space to include the unobserved states and simulate the target distribution via the conditional distributions (this procedure is called data augmentation and was pioneered by Tanner and Wong [18]). Kim et al. [19] discuss an MCMC algorithm of the stochastic volatility (SV) model which is an example of a state space model in which the state variable ℎ𝑡 (log-volatility) appears non-linearly in the observation equation. The idea is to approximate the model by a conditionally Gaussian state space model with the introduction of multinomial random variables that follow a seven-point discrete distribution.

The analysis of a time-varying GQARCH-M model becomes substantially complicated since the log-likelihood of the observed variables can no longer be written in closed form. In this paper, we focus on both the Bayesian and the classical estimation of the model. Unfortunately, the non-Markovian nature of the GARCH process implies that each time we simulate one error we implicitly change all future conditional variances. As pointed out by Shephard [20], a regrettable consequence of this path dependence in volatility is that standard MCMC algorithms will evolve in 𝑂(𝑇2) computational load (see [21]). Since this cost has to be borne for each parameter value, such procedures are generally infeasible for large financial datasets that we see in practice.

3.1. Estimation Problem: Simulated EM Algorithm

As mentioned already, the estimation problem arises because of the fact that we have two unobserved processes. More specifically, we cannot write down the likelihood function in closed form since we do not observe both 𝜀𝑡 and 𝛿𝑡. On the other hand, the conditional log-likelihood function of our model assuming that 𝛿𝑡 were observed would be the following:
ℓ𝐫,𝜹∣𝜙,ℱ0=ln𝑝𝐫∣𝜹,𝜙,ℱ0+ln𝑝𝜹∣𝜙,ℱ01=−𝑇ln2𝜋−2𝑇𝑡=1lnℎ𝑡−12𝑇𝑡=1𝜀𝑡2ℎ𝑡−𝑇2𝜑ln2𝑢−12𝑇𝑡=1𝛿𝑡−𝛿(1−𝜑)−𝜑𝛿𝑡−12𝜑2𝑢,(3.1)
where 𝐫=(𝑟1,…,𝑟𝑇),𝜹=(𝛿1,…,𝛿𝑇), and 𝐡=(ℎ1,…,ℎ𝑇).

However, the 𝛿𝑡’s are unobserved, and, thus, to classically estimate the model, we have to rely on an EM algorithm [22] to obtain estimates as close to the optimum as desired. At each iteration, the EM algorithm obtains 𝜙(𝑛+1), where 𝜙 is the parameter vector, by maximizing the expectation of the log-likelihood conditional on the data and the current parameter values, that is, 𝐸(ℓ(⋅)∣𝐫,𝜙(𝑛),ℱ0) with respect to 𝜙 keeping 𝜙(𝑛) fixed.

The 𝐸 step, thus, requires the expectation of the complete log-likelihood. For our model, this is given by: 𝐸ℓ(⋅)∣𝐫,𝜙(𝑛),ℱ0𝑇=−𝑇ln2𝜋−2𝜑ln2𝑢−12𝑇𝑡=1𝐸lnℎ𝑡∣𝐫,𝜙(𝑛),ℱ0−12𝑇𝑡=1𝐸𝜀𝑡2ℎ𝑡∣𝐫,𝜙(𝑛),ℱ0−12𝑇𝑡=1𝐸𝛿𝑡−𝛿(1−𝜑)−𝜑𝛿𝑡−12𝜑2𝑢∣𝐫,𝜙(𝑛),ℱ0.(3.2)

It is obvious that we cannot compute such quantities. For that matter, we may rely on a simulated EM where the expectation terms are replaced by averages over simulations, and so we will have an SEM or a simulated score. The SEM log-likelihood is:
𝑇SEMℓ=−𝑇ln2𝜋−2𝜑ln2𝑢−121𝑀𝑀𝑇𝑖=1𝑡=1lnℎ𝑡(𝑖)−121𝑀𝑀𝑇𝑖=1𝑡=1𝜀𝑡(𝑖)2ℎ𝑡(𝑖)−𝑇2(1−𝜑)2𝛿2𝜑2𝑢−121𝜑2𝑢1𝑀𝑀𝑇𝑖=1𝑡=1𝛿𝑡(𝑖)2+(1−𝜑)𝛿𝜑2𝑢1𝑀𝑀𝑇𝑖=1𝑡=1𝛿𝑡(𝑖)+𝜑𝜑2𝑢1𝑀𝑀𝑇𝑖=1𝑡=1𝛿𝑡(𝑖)𝛿(𝑖)𝑡−1+(1−𝜑)𝜑𝛿𝜑2𝑢1𝑀𝑀𝑇𝑖=1𝑡=1𝛿(𝑖)𝑡−1−𝜑22𝜑2𝑢1𝑀𝑀𝑇𝑖=1𝑡=1𝛿(𝑖)𝑡−12.(3.3)
Consequently, we need to obtain the following quantities: ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1lnℎ𝑡(𝑖), ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1((𝜀𝑡(𝑖))2/ℎ𝑡(𝑖)), ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1𝛿𝑡(𝑖), ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1𝛿(𝑖)𝑡−1, ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1𝛿𝑡(𝑖)𝛿(𝑖)𝑡−1 and ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1(𝛿𝑡(𝑖))2, ∑(1/𝑀)𝑀𝑖=1∑𝑇𝑡=1𝛿2(𝑖)𝑡−1, where 𝑀 is the number of simulations.

Thus, to classically estimate our model by using an SEM algorithm, the basic problem is to sample from 𝐡∣𝜙,𝐫,ℱ0 where 𝜙 is the vector of the unknown parameters and also sample from 𝜹∣𝜙,𝐫,ℱ0.

In terms of identification, the model is not, up to second moment, identified (see Corollary 1 in Sentana and Fiorentini [23]). The reason is that we can transfer unconditional variance from the error, 𝜀𝑡, to the price of risk, 𝛿𝑡, and vice versa. One possible solution is to fix 𝜔 such that 𝐸(ℎ𝑡) is 1 or to set 𝜑𝑢 to a specific value. In fact in an earlier version of the paper, we fixed 𝜑𝑢 to be 1 (see Anyfantaki and Demos [24]). Nevertheless, from a Bayesian viewpoint, the lack of identification is not too much of a problem, as the parameters are identified through their proper priors (see Poirier [25]).

Next, we will exploit the Bayesian estimation of the model, and, since we need to resort to simulations, we will show that the key task is again to simulate from 𝜹∣𝜙,𝐫,ℱ0.

3.2. Simulation-Based Bayesian Inference

In our problem, the key issue is that the likelihood function of the sample 𝑝(𝐫∣𝜙,ℱ0) is intractable which precludes the direct analysis of the posterior density 𝑝(𝜙∣𝐫,ℱ0). This problem may be overcome by focusing instead on the posterior density of the model using Bayes’ rule:𝑝(𝜙,𝜹∣𝐫)∝𝑝(𝜙,𝜹)𝑝(𝐫∣𝜙,𝜹)∝𝑝(𝜹∣𝜙)𝑝(𝜙)𝑝(𝐫∣𝜙,𝜹),(3.4)
where𝜙=𝛿,𝜑,𝜑𝑢,𝛼,𝛽,𝛾,𝜔.(3.5)

On the other hand,𝑝(𝐫∣𝜙,𝜹)=𝑇𝑡=1𝑝𝑟𝑡𝑟𝑡−1=,𝜹,𝜙𝑇𝑡=11√2𝜋ℎ𝑡−𝜀exp2𝑡2ℎ𝑡(3.7)
is the full-information likelihood. Once we have the posterior density, we get the parameters' marginal posterior density by integrating the posterior density. MCMC is one way of numerical integration.

The Hammersley-Clifford theorem (see Clifford [26]) says that a joint distribution can be characterized by its complete conditional distribution. Hence, given initial values {𝛿𝑡}(0),𝜙(0), we draw {𝛿𝑡}(1) from 𝑝({𝛿𝑡}(1)∣𝐫,𝜙(0)) and then 𝜙(1) from 𝑝(𝜙(1)∣{𝛿𝑡}(1),𝐫). Iterating these steps, we finally get ({𝛿𝑡}(𝑖),𝜙(𝑖))𝑀𝑖=1, and under mild conditions it is shown that the distribution of the sequence converges to the joint posterior distribution 𝑝(𝜙,𝜹∣𝐫).

The above simulation procedure may be carried out by first dividing the parameters into two blocks:𝜙1=𝛿,𝜑,𝜑2𝑢,𝜙2=(𝛼,𝛽,𝛾,𝜔),(3.8)
Then the algorithm is described as follows.(1)Initialize 𝜙.(2)Draw from 𝑝(𝛿𝑡∣𝛿≠𝑡,𝐫,𝜙).(3)Draw from 𝑝(𝜙∣𝜹,𝐫) in the following blocks:(i)draw from 𝑝(𝜙1∣𝜹,𝐫) using the Gibbs sampling. This is updated in one block;(ii)draw from 𝑝(𝜙2∣𝐫) by M-H. This is updated in a second block.(4)Go to (2).

We review the implementation of each step.

3.2.1. Gibbs Sampling

The task of simulating from an AR model has been already discussed. Here, we will follow the approach of Chib [27], but we do not have any MA terms which makes inference simpler.

Suppose that the prior distribution of (𝛿,𝜑2𝑢,𝜑) is given by:𝑝𝛿,𝜑2𝑢,𝜑=𝑝𝛿∣𝜑2𝑢𝑝𝜑2𝑢𝑝(𝜑),(3.9)
which means that 𝛿,𝜑2𝑢 is a priory independent of 𝜑.

Also the following holds for the prior distributions of the parameter subvector 𝜙1:
𝑝𝛿∣𝜑2𝑢𝛿∼𝑁𝑝𝑟,𝜑2𝑢𝜎2𝛿𝑝𝑟,𝑝𝜑2𝑢𝑣∼IG02,𝑑02,𝑝𝜑(𝜑)∼𝑁0,𝜎2𝜑0𝐼𝜑,(3.10)
where 𝐼𝜑 ensures that 𝜑 lies outside the unit circle, IG is the inverted gamma distribution, and the hyperparameters 𝑣0,𝑑0,𝛿𝑝𝑟,𝜎2𝛿𝑝𝑟,𝜑0,𝜎2𝜑0 have to be defined.

Now, the joint posterior is proportional to 𝑝𝛿,𝜑,𝜑2𝑢∝∣𝐫,𝜹𝑇𝑡=11√2𝜋𝜑2𝑢−𝛿exp𝑡−(1−𝜑)𝛿−𝜑𝛿𝑡−122𝜑2𝑢𝛿×𝑁𝑝𝑟,𝜑2𝑢𝜎2𝛿𝑝𝑟𝑣×IG02,𝑑02𝜑×𝑁0,𝜎2𝜑0𝐼𝜑.(3.11)
From a Bayesian viewpoint, the right-hand side of the above equation is equal to the “augmented” prior, that is, the prior augmented by the latent 𝜹 (We would like to thank the associate editor for bringing this to our attention.) We proceed to the generation of these parameters.

Generation of 𝛿First we see how to generate 𝛿. Following again Chib [27], we may write
𝛿∗𝑡=𝛿𝑡−𝜑𝛿𝑡−1,𝛿∗𝑡∣ℱ𝑡−1∼𝑁(1−𝜑)𝛿,𝜑2𝑢,(3.12)
or, otherwise,
𝛿∗𝑡=(1−𝜑)𝛿+𝑣𝑡,𝑣𝑡∼𝑁0,𝜑2𝑢.(3.13)
Under the above and using Chib's (1993) notation, we have that the proposal distribution is the following Gaussian distribution (see Chib [27] for a proof).Proposition 3.1. The proposal distribution of 𝛿 is
𝛿∣𝜹,𝜙,𝜑2𝑢̃∼𝑁𝛿,𝜑2𝑢𝜎2𝛿,(3.14)
where
̃𝛿=𝜎2𝛿𝛿𝑝𝑟𝜎2𝛿𝑝𝑟+(1−𝜑)𝑇𝑡=1𝛿∗𝑡,𝜎2𝛿=1𝜎2𝛿𝑝𝑟+(1−𝜑)2−1.(3.15)Hence, the generation of 𝛿 is completed, and we may turn on the generation of the other parameters.

Generation of 𝜑2𝑢For the generation of 𝜑2𝑢 and using [27] notation, we have the following.Proposition 3.2. The proposal distribution of 𝜑2𝑢 is
𝜑2𝑢∣𝜹,𝜙,𝛿∼IG𝑇−𝑣02,𝑑0+𝑄+𝑑2,(3.16)
where
𝑄=𝛿−𝛿p𝑟2𝜎𝛿−2𝑝𝑟,𝑑=𝑇𝑡=2𝛿∗𝑡−𝛿(1−𝜑)2.(3.17)Finally, we turn on the generation of 𝜑.

Generation of 𝜑 For the generation of 𝜑, we follow again Chib [27] and write
𝛿𝑡=(1−𝜑)𝛿−𝜑𝛿𝑡−1+𝑣𝑡,𝑣𝑡∼𝑁0,𝜑2𝑢.(3.18)
We may now state the following proposition (see Chib [27] for a proof).Proposition 3.3. The proposal distribution of 𝜑 is
𝜑2∣𝜹,𝛿,𝜑2𝑢∼𝑁𝜑,𝜎2𝜑,(3.19)
where
𝜑=𝜎2𝜑𝜎𝜑−20𝜑0+𝜑𝑢𝑇−2𝑡=1𝛿𝑡−1𝛿−𝛿𝑡,−𝛿𝜎𝜑−2=𝜎𝜑−20+𝜑𝑢𝑇−2𝑡=1𝛿𝑡−1−𝛿2.(3.20)The Gibbs sampling scheme has been completed, and the next step of the algorithm requires the generation of the conditional variance parameters via an M-H algorithm which is now presented.

3.2.2. Metropolis-Hasting

Step (3)-(ii) is the task of simulating from the posterior of the parameters of a GQARCH-M process. This has been already addressed by Kim et al. [19], Bauwens and Lubrano [28], Nakatsuma [29], Ardia [30] and others.

First, we need to decide on the priors. For the parameters 𝛼,𝛽,𝛾,𝜔, we use normal densities as priors:𝑝𝜇(𝜶)∼𝑁𝛼,Σ𝜶𝐼𝜶,𝜇𝑝(𝛽)∼𝑁𝛽,𝜎2𝛽𝐼𝛽(3.21)
and similarly
𝜇𝑝(𝛾)∼𝑁𝛾,𝜎2𝛾,(3.22)
where 𝜶=(𝜔,𝛼),𝐼𝜶,𝐼𝛽 are the indicators ensuring the constraints 𝜶>0,𝛼+𝛽<1 and 𝛽>0,𝛼+𝛽<1, respectively. 𝜇.,𝜎2. are the hyperparameters.

We form the joint prior by assuming prior independence between 𝜶,𝛽,𝛾, and the joint posterior is then obtained by combining the joint prior and likelihood function by Bayes' rule:𝑝(𝜶,𝛽,𝛾∣𝐫)∝𝑇𝑡=11√2𝜋ℎ𝑡−𝜀exp2𝑡2ℎ𝑡𝜇×𝑁𝛼,Σ𝜶𝐼𝜶𝜇×𝑁𝛽,𝜎2𝛽𝐼𝛽𝜇×𝑁𝛾,𝜎2𝛾.(3.23)

For the M-H algorithm, we use the following approximated GARCH model as in Nakatsuma [29] which is derived by the well-known property of GARCH models [4]:
𝜀2𝑡𝜀=𝜔+𝛼𝑡−1−𝛾2+𝛽𝜀2𝑡−1+𝑤𝑡−𝛽𝑤𝑡−1,(3.24)
where 𝑤𝑡=𝜀2𝑡−ℎ𝑡 with 𝑤𝑡∼𝑁(0,2ℎ2𝑡).

Then the corresponding approximated likelihood is written as
𝑝𝜀2∣𝜹,𝐫,𝜙2=𝑇𝑡=112ℎ𝑡√𝜋⎡⎢⎢⎢⎣−𝜀exp2𝑡𝜀−𝜔−𝛼𝑡−1−𝛾2−𝛽𝜀2𝑡−1+𝛽𝑤𝑡−124ℎ2𝑡⎤⎥⎥⎥⎦,(3.25)
and the generation of 𝜶,𝛽,𝛾 is based on the above likelihood where we update {ℎ𝑡} each time after the corresponding parameters are updated. The generation of the four variance parameters is given.

Generation of 𝛼For the generation of 𝜶, we first note that 𝑤𝑡 in (3.32), below, can be written as a linear function of 𝜶:
𝑤𝑡=𝜀2𝑡−𝜁𝑡𝜶,(3.26)
where 𝜁𝑡=[̃𝜄𝑡,̂𝜀2𝑡] with
𝜀2𝑡=̃𝜀2𝑡−𝛽̃𝜀2𝑡−1,̃𝜀2𝑡=𝜀2𝑡+𝛽̃𝜀2𝑡−1,̂𝜀2𝑡=𝜀𝑡−1−𝛾2+𝛽̂𝜀2𝑡−1,̃𝜄𝑡=1+𝛽̃𝜄𝑡−1.(3.27)Now, let the two following vectors be𝑌𝜶=𝜀21,…,𝜀2𝑇,𝑋𝜶=𝜁1,…,𝜁𝑇.(3.28)Then the likelihood function of the approximated model is rewritten as
𝑝𝜀2∣𝐫,𝜹,𝜙2=𝑇𝑡=112𝜋2ℎ2𝑡⎡⎢⎢⎢⎣−exp𝜀2𝑡−𝜁𝑡𝜶222ℎ2𝑡⎤⎥⎥⎥⎦.(3.29)Using this we have the following proposal distribution of 𝜶 (see Nakatsuma [29] or Ardia [30] for a proof).Proposition 3.4. The proposal distribution of 𝜶 is
𝜶∣𝑌,𝑋,Σ,𝜙2−𝜶∼𝑁𝜇𝜶,Σ𝜶𝐼𝜶,(3.30)
where 𝜇𝜶=Σ𝜶(𝑋𝜶Λ−1𝑌𝜶+Σ𝜶−1𝜇𝜶), Σ𝜶=(𝑋𝜶Λ−1𝑋𝜶+Σ𝜶−1)−1, and Λ=diag(2ℎ21,…,2ℎ2𝑇). 𝐼𝜶 imposes the restriction that 𝜶>0 and 𝛼+𝛽<1. Hence a candidate 𝜶 is sampled from this proposal density and accepted with probability:
𝑝𝑞𝜶min𝜶,𝛽,𝛾∣𝜹,𝐫∗∣𝜶,𝛽,𝛾,𝜹,𝐫𝑝(𝜶∗,𝛽,𝛾∣𝜹,𝐫)𝑞𝜶∣𝜶∗,𝛽,𝛾,𝜹,𝐫,1,(3.31)
where 𝜶∗ is the previous draw.Similar procedure is used for the generation of 𝛽 and 𝛾.

Generation of 𝛽Following Nakatsuma [29], we linearize 𝑤𝑡 by the first-order Taylor expansion
𝑤𝑡(𝛽)≈𝑤𝑡𝛽∗+𝜉𝑡𝛽∗𝛽−𝛽∗,(3.32)
where 𝜉𝑡 is the first-order derivative of 𝑤𝑡(𝛽) evaluated at 𝛽∗ the previous draw of the M-H sampler.Define as
𝑟𝑡=𝑤𝑡𝛽∗+𝑔𝑡𝛽∗𝛽∗,(3.33)
where 𝑔𝑡(𝛽∗)=−𝜉𝑡(𝛽∗) which is computed by the recursion:
𝑔𝑡=𝜀2𝑡−1−𝑤𝑡−1+𝛽∗𝑔𝑡−1,(3.34)𝜉𝑡=0 for 𝑡≤0 [30]. Then,
𝑤𝑡(𝛽)≈𝑟𝑡−𝑔𝑡(𝛽)𝛽.(3.35)Let the following two vectors be
𝑌𝛽=𝑟1,…,𝑟𝑇,𝑋𝛽=𝑔1,…,𝑔𝑇.(3.36)Then, the likelihood function of the approximated model is rewritten as
𝑝𝜀2∣𝜹,𝐫,𝜙2=𝑇𝑡=112𝜋2ℎ2𝑡−𝑤exp𝑡𝛽∗+𝜉𝑡𝛽∗𝛽−𝛽∗222ℎ2𝑡.(3.37)We have the following proposal distribution for 𝛽 (for a proof see Nakatsuma [29] or Ardia [30]).

Proposition 3.5. The proposal distribution for 𝛽 is
𝛽∣𝑌,𝑋,𝜎2𝛽,𝜙2−𝛽∼𝑁𝜇𝛽,𝜎2𝛽𝐼𝛽,(3.38)
where 𝜇𝛽=𝜎2𝛽(𝑋𝛽Λ−1𝑌𝛽+(𝜇𝛽/𝜎2𝛽)), 𝜎2𝛽=(𝑋𝛽Λ−1𝑋𝛽+(1/𝜎2𝛽))−1, and Λ=diag(2ℎ41,…,2ℎ4𝑇). 𝐼𝛽 imposes the restriction that 𝛽>0 and 𝛼+𝛽<1. Hence, a candidate ̃𝛽 is sampled from this proposal density and accepted with probability:
𝑝̃𝑞𝛽min𝛽,𝛼,𝛾∣𝜹,𝐫∗∣̃𝛽,𝛼,𝛾,𝜹,𝐫𝑝𝛽∗𝑞̃,𝛼,𝛾∣𝜹,𝐫𝛽∣𝛽∗,𝛼,𝛾,𝜹,𝐫,1(3.39)

Finally, we explain the generation of 𝛾.

Generation of 𝛾As with 𝛽, we linearize 𝑤𝑡 by a first-order Taylor, expansion at a point 𝛾∗ the previous draw in the M-H sampler. In this case,
𝑟𝑡=𝑤𝑡𝛾∗−𝑔𝑡𝛾∗𝛾∗,(3.40)
where 𝑔𝑡(𝛾∗)=−𝜉𝑡(𝛾∗) which is computed by the recursion:
𝑔𝑡𝜀=−2𝛼𝑡−1−𝛾∗+𝛽𝑔𝑡−1,(3.41)
and 𝑔𝑡=0 for 𝑡≤0.Then
𝑤𝑡(𝛾)≈𝑟𝑡−𝑔𝑡𝛾.(3.42)Let again
𝑌𝛾=𝑟1,…,𝑟𝑇,𝑋𝛾=𝑔1,…,𝑔𝑇(3.43)
and the likelihood function of the approximated model is rewritten as
𝑝𝜀2∣𝜹,𝐫,𝜙2=𝑇𝑡=112𝜋2ℎ2𝑡−𝑤exp𝑡𝛾∗−𝑔𝑡𝛾∗𝛾∗222ℎ2𝑡.(3.44)Thus, we have the following proposal distribution for 𝛾 (for proof see Nakatsuma [29] and Ardia [30]).

Proposition 3.6. The proposal distribution of 𝛾 is
𝛾∣𝑌,𝑋,𝜎2𝛾,𝜙2−𝛾∼𝑁𝜇𝛾,𝜎2𝛾,(3.45)
where 𝜇𝛾=𝜎2𝛾(𝑋𝛾Λ−1𝑌𝛾+(𝜇𝛾/𝜎2𝛾)), 𝜎2𝛾=(𝑋𝛾Λ−1𝑋𝛾+(1/𝜎2𝛾))−1, and Λ=diag(2ℎ41,…,2ℎ4𝑇). A candidate ̃𝛾 is sampled from this proposal density and accepted with probability:
𝛾min𝑝(̃𝛾,𝛼,𝛽∣𝜹,𝐫)𝑞∗∣̃𝛾,𝛼,𝛽,𝜹,𝐫𝑝(𝛾∗,𝛼,𝛽∣𝜹,𝐫)𝑞(̃𝛾∣𝛾∗,𝛼,𝛽,𝜹,𝐫),1.(3.46)

The algorithm described above is a special case of a MCMC algorithm, which converges as it iterates, to draws from the required density 𝑝(𝜙,𝜹∣𝐫). Posterior moments and marginal densities can be estimated (simulation consistently) by averaging the relevant function of interest over the sample variates. The posterior mean of 𝜙 is simply estimated by the sample mean of the simulated 𝜙 values. These estimated values can be made arbitrarily accurate by increasing the simulation sample size. However, it should be remembered that sample variates from an MCMC algorithm are a high dimensional (correlated) sample from the target density, and sometimes the serial correlation can be quite high for badly behaved algorithms.

All that remains, therefore, is step (2). Thus, from the above, it is seen that the main task is again as with the classical estimation of the model, to simulate from 𝜹∣𝜙,𝐫,ℱ0.

3.2.3. MCMC Simulation of 𝜀∣𝜙,𝐫,ℱ0

For a given set of parameter values and initial conditions, it is generally simpler to simulate {𝜀𝑡} for 𝑡=1,…,𝑇 and then compute {𝛿𝑡}𝑇𝑡=1 than to simulate {𝛿𝑡}𝑇𝑡=1 directly. For that matter, we concentrate on simulators of 𝜀𝑡 given 𝐫 and 𝜙. We set the mean and the variance of 𝜀0 equal to their unconditional values, and, given that ℎ𝑡 is a sufficient statistic for ℱ𝑡−1 and the unconditional variance is a deterministic function of 𝜙, ℱ0 can be eliminated from the information set without any information loss.

Now sampling from 𝑝(𝜀∣𝐫,𝜙)∝𝑝(𝐫∣𝜀,𝜙)𝑝(𝜀∣𝜙) is feasible by using an M-H algorithm where we update each time only one 𝜀𝑡 leaving all the other unchanged [20]. In particular, let us write the nth iteration of a Markov chain as 𝜀𝑛𝑡. Then we generate a potential new value of the Markov chain 𝜀new𝑡 by proposing from some candidate density 𝑔(𝜀𝑡∣𝜀𝑛⧵𝑡,𝐫,𝜙) where 𝜀𝑛⧵𝑡={𝜀1𝑛+1,…,𝜀𝑛+1𝑡−1,𝜀𝑛𝑡+1,…,𝜀𝑛𝑇} which we accept with probability⎡⎢⎢⎣𝑝𝜀min1,new𝑡∣𝜀𝑛⧵𝑡𝑔𝜀,𝐫,𝜙new𝑡∣𝜀𝑛⧵𝑡,𝐫,𝜙𝑝𝜀𝑛𝑡∣𝜀𝑛⧵𝑡𝑔𝜀,𝐫,𝜙𝑛𝑡∣𝜀𝑛⧵𝑡⎤⎥⎥⎦,𝐫,𝜙.(3.47)

If it is accepted then, we set 𝜀𝑡𝑛+1=𝜀new𝑡 and otherwise we keep 𝜀𝑡𝑛+1=𝜀𝑛𝑡. Although the proposal is much better since it is only in a single dimension, each time we consider modifying a single error we have to compute:
𝑝𝜀new𝑡∣𝜀𝑛⧵𝑡,𝐫,𝜙𝑝𝜀𝑛𝑡∣𝜀𝑛⧵𝑡=𝑝𝑟,𝐫,𝜙𝑡∣𝜀new𝑡,ℎnew𝑡,𝑡𝑝𝜀,𝜙new𝑡∣ℎnew𝑡,𝑡𝑝𝑟,𝜙𝑡∣ℎ𝑡𝑛,𝑡,𝜙𝑝𝑟𝑡∣ℎnew𝑡,𝑡𝑝𝑟,𝜙𝑡∣𝜀𝑛𝑡,ℎ𝑡𝑛,𝑡𝑝𝜀,𝜙new𝑡∣ℎ𝑡𝑛,𝑡∗,𝜙𝑇𝑠=𝑡+1𝑝𝑟𝑠∣𝜀𝑟𝑠,ℎnew𝑠,𝑡𝑝𝜀,𝜙𝑟𝑠∣ℎnew𝑠,𝑡𝑝𝑟,𝜙𝑠∣ℎ𝑠𝑛,𝑡,𝜙𝑝𝑟𝑠∣ℎnew𝑠,𝑡𝑝𝑟,𝜙𝑠∣𝜀𝑛𝑠,ℎ𝑠𝑛,𝑡𝑝𝜀,𝜙𝑛𝑠∣ℎ𝑠𝑛,𝑡=𝑝𝑟,𝜙𝑡∣𝜀new𝑡,ℎnew𝑡,𝑡𝑝𝜀,𝜙new𝑡∣ℎnew𝑡,𝑡,𝜙𝑝𝑟𝑡∣𝜀𝑛𝑡,ℎ𝑡𝑛,𝑡𝑝𝜀,𝜙new𝑡∣ℎ𝑡𝑛,𝑡∗,𝜙𝑇𝑠=𝑡+1𝑝𝑟𝑠∣𝜀𝑛𝑠,ℎnew𝑠,𝑡𝑝𝜀,𝜙𝑛𝑠∣ℎnew𝑠,𝑡,𝜙𝑝𝑟𝑠∣𝜀𝑛𝑠,ℎ𝑠𝑛,𝑡𝑝𝜀,𝜙𝑛𝑠∣ℎ𝑠𝑛,𝑡,,𝜙(3.48)
where for 𝑠=𝑡+1,…,𝑇,ℎnew𝑠,𝑡𝜀=𝑉𝑠∣𝜀𝑛𝑠−1,𝜀𝑛𝑠−2,…,𝜀𝑛𝑡+1,𝜀new𝑡,𝜀𝑛+1𝑡−1,…,𝜀1𝑛+1,ℎ𝑠𝑛,𝑡𝜀=𝑉𝑠∣𝜀𝑛𝑠−1,𝜀𝑛𝑠−2,…,𝜀𝑛𝑡+1,𝜀𝑛𝑡,𝜀𝑛+1𝑡−1,…,𝜀1𝑛+1(3.49)
whileℎnew𝑡,𝑡=ℎ𝑡𝑛,𝑡.(3.50)

Nevertheless, each time we revise one 𝜀𝑡, we have also to revise 𝑇−𝑡 conditional variances because of the recursive nature of the GARCH model which makes ℎnew𝑠,𝑡 depend upon 𝜀new𝑡 for 𝑠=𝑡+1,…,𝑇. And since 𝑡=1,…,𝑇, it is obvious that we need to calculate 𝑇2 normal densities, and so this algorithm is 𝑂(𝑇2). And this should be done for every 𝜙. To avoid this huge computational load, we show how to use the method proposed by Fiorentini et al. [10] and so do MCMC with only 𝑂(𝑇) calculations. The method is described in the following subsection.

3.3. Estimation Method Proposed: Classical and Bayesian Estimation

The method proposed by Fiorentini et al. [10] is to transform the GARCH model into a first-order Markov’s model and so do MCMC with only 𝑂(𝑇) calculations. Following their transformation, we augment the state vector with the variables ℎ𝑡+1 and then sample the joint Markov process {ℎ𝑡+1,𝑠𝑡}∣𝐫,𝜙∈ℱ𝑡 where
𝑠𝑡𝜀=sign𝑡−𝛾,(3.51)
so that 𝑠𝑡=±1 with probability one. The mapping is one to one and has no singularities. More specifically, if we know {ℎ𝑡+1} and 𝜑, then we know the value of𝜀𝑡−𝛾2=ℎ𝑡+1−𝜔−𝛽ℎ𝑡𝛼∀𝑡≥1.(3.52)

Hence the additional knowledge of the signs of (𝜀𝑡−𝛾) would reveal the entire path of {𝜀𝑡} so long as ℎ0 (which equals the unconditional value in our case) is known, and, thus, we may now reveal also the unobserved random variable {𝛿𝑡}∣𝐫,𝜙,{ℎ𝑡+1}.

Now we have to sample from
𝑝𝑠𝑡,ℎ𝑡+1∝∣𝐫,𝜙𝑇𝑡=1𝑝𝑠𝑡∣ℎ𝑡+1,ℎ𝑡𝑝ℎ,𝜙𝑡+1∣ℎ𝑡𝑝𝑟,𝜙𝑡∣𝑠𝑡,ℎ𝑡,ℎ𝑡+1,𝜙,(3.53)
where the second and the third term come from the model, and the first comes from the fact that 𝜀𝑡∣ℱ𝑡−1∼𝑁(0,ℎ𝑡) but 𝜀𝑡∣{ℎ𝑡+1},ℱ𝑡−1 takes values𝜀𝑡=𝛾±𝑑𝑡,(3.54)
where
𝑑𝑡=ℎ𝑡+1−𝜔−𝛽ℎ𝑡𝛼.(3.55)

From the above, it is seen that we should first simulate {ℎ𝑡+1}∣𝐫,𝜙 since we do not alter the volatility process when we flip from 𝑠𝑡=−1 to 𝑠𝑡=1 (implying that the signs do not cause the volatility process), but we do alter 𝜀𝑡 and then simulate {𝑠𝑡}∣{ℎ𝑡+1},𝐫,𝜙. The second step is a Gibbs sampling scheme whose acceptance rate is always one and also conditional on {ℎ𝑡+1},𝐫,𝜙 the elements of {𝑠𝑡} are independent which further simplifies the calculations. We prefer to review first the Gibbs sampling scheme and then the simulation of the conditional variance.

3.3.1. Simulations of {𝑠𝑡}∣{ℎ𝑡+1},𝐫, 𝜙

First, we see how to sample from {𝑠𝑡}∣{ℎ𝑡+1},𝐫,𝜙. To obtain the required conditionally Bernoulli distribution, we establish first some notation. We have the following (see Appendix A):
𝑐𝑡=1√𝑣𝑡∣𝑟𝑡,ℎ𝑡𝜑𝛾+𝑑𝑡−𝜀𝑡∣𝑟𝑡,ℎ𝑡√𝑣𝑡∣𝑟𝑡,ℎ𝑡+𝜑𝛾−𝑑𝑡−𝜀𝑡∣𝑟𝑡,ℎ𝑡√𝑣𝑡∣𝑟𝑡,ℎ𝑡,𝜀𝑡∣𝑟𝑡,ℎ𝑡𝜀=𝐸𝑡∣𝑟𝑡,ℎ𝑡=1−𝜑2𝑟𝑡−𝛿ℎ𝑡𝜑2𝑢ℎ𝑡+1−𝜑2,𝑣𝑡∣𝑟𝑡,ℎ𝑡𝜀=Var𝑡∣𝑟𝑡,ℎ𝑡=𝜑2𝑢ℎ2𝑡𝜑2𝑢ℎ𝑡+1−𝜑2.(3.56)

Using the above notation, we see that the probability of drawing 𝑠𝑡=1 conditional on {ℎ𝑡+1} is equal to the probability of drawing 𝜀𝑡=𝛾+𝑑𝑡 conditional on ℎ𝑡+1,ℎ𝑡,𝑟𝑡,𝜙, where 𝑑𝑡 is given by (3.70), which is given by𝑝𝑠𝑡ℎ=1∣𝑡+1𝜀,𝐫,𝜙=𝑝𝑡=𝛾+𝑑𝑡∣ℎ𝑡+1,ℎ𝑡,𝑟𝑡=1,𝜙𝑐𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡𝜑𝛾+𝑑𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡.(3.57)

Similarly for the probability of drawing 𝑠𝑡=−1. Both of these quantities are easy to compute; for example,𝜑𝛾+𝑑𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡=1√⎧⎪⎨⎪⎩−12𝜋exp2𝛾+𝑑𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡2⎫⎪⎬⎪⎭(3.58)
and so we may simulate {𝑠𝑡}∣{ℎ𝑡+1},𝐫,𝜙 using a Gibbs sampling scheme. Specifically, since conditional on {ℎ𝑡+1},𝐫,𝜙 the elements of {𝑠𝑡} are independent, we actually draw from the marginal distribution, and the acceptance rate for this algorithm is always one.

The Gibbs sampling algorithm for drawing {𝑠𝑡}∣{ℎ𝑡+1},𝐫,𝜙 may be described as below.(1)Specify an initial value 𝑠(0)=(𝑠1(0),…,𝑠𝑇(0)).(2)Repeat for 𝑘=1,…,𝑀.(a)Repeat for 𝑡=0,…,𝑇−1.(i)Draw 𝑠(𝑘)=1 with probability,
1𝑐𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡𝜑𝛾+𝑑𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡,(3.59)
and 𝑠(𝑘)=−1 with probability,
11−𝑐𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡𝜑𝛾+𝑑𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡.(3.60)(3)Return the values {𝑠(1),…,𝑠(𝑀)}.

3.3.2. Simulations of {ℎ𝑡+1}∣𝐫, 𝜙 (Single Move Samplers)

On the other hand, the first step involves simulating from {ℎ𝑡+1}∣𝐫,𝜙. To avoid large dependence in the chain, we use an M-H algorithm where we simulate one ℎ𝑡+1 at a time leaving the others unchanged [20, 31]. So if (ℎ𝑡+1)𝑛 is the current value of the 𝑛th iteration of a Markov chain, then we draw a candidate value of the Markov chain ℎnew𝑡+1 by proposing it from a candidate density (proposal density) 𝑔(ℎ𝑡+1∣(ℎ)𝑛/𝑡+1,𝐫,𝜙) where (ℎ)𝑛/𝑡+1={ℎ1𝑛+1,ℎ2𝑛+1,…,ℎ𝑡𝑛+1,ℎ𝑛𝑡+2,…,ℎ𝑛𝑇+1}. We set (ℎ𝑡+1)𝑛+1=(ℎ𝑡+1)new with acceptance probability𝑝ℎmin1,new𝑡+1∣(ℎ)𝑛/𝑡+1𝑔ℎ,𝐫,𝜙𝑛𝑡+1∣(ℎ)𝑛/𝑡+1,𝐫,𝜙𝑝ℎ𝑛𝑡+1∣(ℎ)𝑛/𝑡+1𝑔ℎ,𝐫,𝜙new𝑡+1∣(ℎ)𝑛/𝑡+1,𝐫,𝜙,(3.61)
where we have used the fact that𝑝(𝐡∣𝐫,𝜙)=𝑝(ℎ)/𝑡𝑝ℎ∣𝐫,𝜙𝑡∣(ℎ)/𝑡,𝐫,𝜙.(3.62)

However, we may simplify further the acceptance rate. More specifically, we have that𝑝ℎ𝑡+1∣(ℎ)/𝑡+1ℎ,𝐫,𝜙∝𝑝𝑡+2∣ℎ𝑡+1𝑝ℎ,𝜙𝑡+1∣ℎ𝑡𝑝𝑟,𝜙𝑡+1∣ℎ𝑡+2,ℎ𝑡+1𝑝𝑟,𝜙𝑡∣ℎ𝑡+1,ℎ𝑡,𝜙.(3.63)

Now, since the following should hold:
ℎ𝑡+1≥𝜔+𝛽ℎ𝑡(3.64)
and similarly
ℎ𝑡+1≤𝛽−1ℎ𝑡+2−𝜔,(3.65)
we have the support of the conditional distribution of ℎ𝑡+1 given that ℎ𝑡 is bounded from below by 𝜔+𝛽ℎ𝑡, and the same applies to the distribution of ℎ𝑡+2 given ℎ𝑡+1 (lower limit corresponds to 𝑑𝑡=0 and the upper limit to 𝑑𝑡+1=0). This means that the range of values of ℎ𝑡+1 compatible with ℎ𝑡 and ℎ𝑡+2 in the GQARCH case is bounded from above and below; that is:ℎ𝑡+1∈𝜔+𝛽ℎ𝑡,𝛽−1ℎ𝑡+2−𝜔.(3.66)

From the above, we understand that it makes sense to make the proposal to obey the support of the density, and so it is seen that we can simplify the acceptance rate by setting
𝑔ℎ𝑡+1∣(ℎ)/𝑡+1ℎ,𝐫,𝜙=𝑝𝑡+1∣ℎ𝑡,𝜙(3.67)
appropriately truncated from above (since the truncation from below will automatically be satisfied). But the above proposal density ignores the information contained in 𝑟𝑡+1, and so according to Fiorentini et al. [10] we can achieve a substantially higher acceptance rate if we propose from
𝑔ℎ𝑡+1∣(ℎ)/𝑡+1ℎ,𝐫,𝜙=𝑝𝑡+1∣𝑟𝑡,ℎ𝑡,𝜙.(3.68)

A numerically efficient way to simulate ℎ𝑡+1 from 𝑝(ℎ𝑡+1∣𝑟𝑡,ℎ𝑡,𝜙) is to sample an underlying Gaussian random variable doubly truncated by using an inverse transform method. More specifically, we may draw𝜀𝑡∣𝑟𝑡,ℎ𝑡,𝜙∼𝑁1−𝜑2𝑟𝑡−𝛿ℎ𝑡𝜑2𝑢ℎ𝑡+1−𝜑2,𝜑2𝑢ℎ2𝑡𝜑2𝑢ℎ𝑡+1−𝜑2(3.69)
doubly truncated so that it remains within the following bounds:𝜀new𝑡∈𝛾−𝑙𝑡,𝛾+𝑙𝑡,(3.70)
where
𝑙𝑡=ℎ𝑡+2−𝜔−𝛽𝜔−𝛽2ℎ𝑡𝛽𝛼,(3.71)
using an inverse transform method and then computeℎnew𝑡+1𝜀=𝜔+𝛼new𝑡−𝛾2+𝛽ℎ𝑡,(3.72)
which in turn implies a real value for 𝑑new𝑡+1=(ℎ𝑡+2−𝜔−𝛽ℎnew𝑡+1)/𝛼 and so guarantees that ℎnew𝑡+1 lies within the acceptance bounds.

The inverse transform method to draw the doubly truncated Gaussian random variable first draws a uniform random number𝑢∼𝑈(0,1)(3.73)
and then computes the following:⎛⎜⎜⎜⎝𝑢=(1−𝑢)Φ𝛾−𝑙𝑡−1−𝜑2𝑟𝑡−𝛿ℎ𝑡/𝜑2𝑢ℎ𝑡+1−𝜑2𝜑2𝑢ℎ2𝑡/𝜑2𝑢ℎ𝑡+1−𝜑2⎞⎟⎟⎟⎠⎛⎜⎜⎜⎝+𝑢Φ𝛾+𝑙𝑡−1−𝜑2𝑟𝑡−𝛿ℎ𝑡/𝜑2𝑢ℎ𝑡+1−𝜑2𝜑2𝑢ℎ2𝑡/𝜑2𝑢ℎ𝑡+1−𝜑2⎞⎟⎟⎟⎠.(3.74)

A draw is then given by𝜀new𝑡=Φ−1𝑢.(3.75)

However, if the bounds are close to each other (the degree of truncation is small) the extra computations involved make this method unnecessarily slow, and so we prefer to use the accept-reject method where we draw
𝜀new𝑡∣𝑟𝑡,ℎ𝑡,𝜙∼𝑁1−𝜑2𝑟𝑡−𝛿ℎ𝑡𝜑2𝑢ℎ𝑡+1−𝜑2,𝜑2𝑢ℎ2𝑡𝜑2𝑢ℎ𝑡+1−𝜑2(3.76)
and accept the draw if 𝛾−𝑙𝑡≤𝜀new𝑡≤𝛾+𝑙𝑡, and otherwise we repeat the drawing (this method is inefficient if the truncation lies in the tails of the distribution). It may be worth assessing the degree of truncation first, and, depending on its tightness, choose one simulation method or the other.

The conditional density of 𝜀new𝑡 will be given according to the definition of a truncated normal distribution: 𝑝𝜀new𝑡∣||𝜀new𝑡||−𝛾≤𝑙𝑡,𝑟𝑡,ℎ𝑡=1,𝜙√𝑣𝑡/𝑟𝑡,ℎ𝑡𝜑𝜀new𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡×Φ𝛾+𝑙𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡−Φ𝛾−𝑙𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡−1,(3.77)
where Φ(⋅) is the cdf of the standard normal.

By using the change of variable formula, we have that the density of ℎnew𝑡+1 will be𝑝ℎnew𝑡+1∣ℎnew𝑡+1∈𝜔+𝛽ℎ𝑡,𝛽−1ℎ𝑡+2−𝜔,𝑟𝑡,ℎ𝑡=𝑐,𝜙new𝑡||2𝛼𝑑new𝑡||×Φ𝛾+𝑙𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡−Φ𝛾−𝑙𝑡−𝜀𝑡/𝑟𝑟,ℎ𝑡√𝑣𝑡/𝑟𝑡,ℎ𝑡−1.(3.78)

Using Bayes theorem we have that the acceptance probability will be𝑝ℎmin1,𝑡+2∣ℎnew𝑡+1,𝑟𝑡+1𝑝𝑟,𝜙𝑡+1∣ℎnew𝑡+1,𝜙𝑝𝑟𝑡+1∣ℎ𝑛𝑡+1𝑝ℎ,𝜙𝑡+2∣ℎ𝑛𝑡+1,𝑟𝑡+1,𝜙.(3.79)

Since the degree of truncation is the same for old and new, the acceptance probability will be
𝑝𝑟min1,𝑡+1∣ℎnew𝑡+1𝑝𝑟𝑡+1∣ℎ𝑛𝑡+1𝑐new𝑡+1𝑐𝑛𝑡+1𝑑𝑛𝑡+1𝑑new𝑡+1,(3.80)
where 𝑝(𝑟𝑡+1∣ℎ𝑡+1) is a mixture of two univariate normal densities so𝑟𝑡+1∣ℎ𝑡+1∼𝑁𝛿ℎ𝑡+1,𝜑2𝑢1−𝜑2ℎ𝑡+1ℎ+1𝑡+1.(3.81)

Hence,𝑝𝑟𝑡+1∣ℎ𝑛𝑡+1=1𝜑2𝜋2𝑢/1−𝜑2ℎ𝑛𝑡+1ℎ+1𝑛𝑡+1−𝑟exp𝑡+1−𝛿ℎ𝑛𝑡+122𝜑2𝑢/1−𝜑2ℎ𝑛𝑡+1ℎ+1𝑛𝑡+1,(3.82)
and the acceptance probability becomes⎡⎢⎢⎢⎣ℎmin1,𝑛𝑡+1ℎnew𝑡+13/2√ℎ𝑛𝑡+2−𝜔−𝛽ℎ𝑛𝑡+1ℎ𝑛𝑡+2−𝜔−𝛽ℎnew𝑡+1𝜅ℎnew𝑡+1𝜅ℎ𝑛𝑡+1⎤⎥⎥⎥⎦,(3.83)
where
𝜅ℎ𝑖𝑡+1⎡⎢⎢⎢⎢⎢⎢⎣−𝑟=exp𝑡+1−𝛿ℎ𝑖𝑡+122𝜑2𝑢1−𝜑2ℎ𝑖𝑡+1ℎ+1𝑖𝑡+1−𝜑2𝑢1−𝜑2ℎ𝑖𝑡+1+12𝜑2𝑢1−𝜑2ℎ𝑖𝑡+12⎛⎜⎜⎜⎜⎜⎜⎝⎛⎜⎜⎜⎜⎝𝑟𝛾−𝑡+1−𝛿ℎ𝑖𝑡+1𝜑2𝑢1−𝜑2ℎ𝑖𝑡+1⎞⎟⎟⎟⎟⎠+12+ℎ𝑟𝑡+2−𝜔−𝛽ℎ𝑖𝑡+1𝛼⎞⎟⎟⎟⎟⎟⎟⎠⎤⎥⎥⎥⎥⎥⎥⎦×⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣⎡⎢⎢⎢⎢⎣2𝜑1+exp2𝑢1−𝜑2ℎ𝑖𝑡+1+1𝜑2𝑢1−𝜑2ℎ𝑖𝑡+12⎛⎜⎜⎜⎜⎝𝑟𝛾−𝑡+1−𝛿ℎ𝑖𝑡+1𝜑2𝑢1−𝜑2ℎ𝑖𝑡+1⎞⎟⎟⎟⎟⎠+1ℎ𝑛𝑡+2−𝜔−𝛽ℎ𝑖𝑡+1𝛼⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣𝜑exp2𝑢1−𝜑2ℎ𝑖𝑡+1+1𝜑2𝑢1−𝜑2ℎ𝑖𝑡+12⎛⎜⎜⎜⎜⎝𝑟𝛾−𝑡+1−𝛿ℎ𝑖𝑡+1𝜑2𝑢1−𝜑2ℎ𝑖𝑡+1⎞⎟⎟⎟⎟⎠+1ℎ𝑛𝑡+2−𝜔−𝛽ℎ𝑖𝑡+1𝛼⎤⎥⎥⎥⎥⎦⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.(3.84)

Overall the MCMC of {ℎ𝑡+1}∣𝐫,𝜙 includes the following steps.(1)Specify an initial value {ℎ(0)}.(2)Repeat for 𝑛=1,…,𝑀.(a)Repeat for 𝑡=0,…,𝑇−1.(i)Use an inverse transform method to simulate
𝜀new𝑡∣𝑟𝑡,ℎ𝑡,𝜙∼𝑁1−𝜑2𝑟𝑡−𝛿ℎ𝑡𝜑2𝑢ℎ𝑡+1−𝜑2,𝜑2𝑢ℎ2𝑡𝜑2𝑢ℎ𝑡+1−𝜑2(3.85)
doubly truncated.(ii)Calculate
ℎnew𝑡+1𝜀=𝜔+𝛼𝑐𝑡−𝛾2+𝛽ℎ𝑡.(3.86)
Steps (2)(a)(i) and (2)(a)(ii) are equivalent to draw
ℎ𝑡+1newℎ∼𝑝new𝑡+1∣𝑟𝑡,ℎ𝑡,𝜙(3.87)
appropriately truncated.(iii)Calculate
𝛼𝑟𝑝𝑟=min1,𝑡+1∣ℎnew𝑡+1𝑝𝑟𝑡+1∣ℎ𝑛𝑡+1𝑐new𝑡+1𝑐𝑛𝑡+1𝑑𝑛𝑡+1𝑑new𝑡+1.(3.88)(iv)Set
ℎ𝑡+1𝑛+1=ℎ𝑡+1newifUnif(0,1)≤𝛼𝑟ℎ𝑡+1𝑛otherwise(3.89)

Remark 3.7. Every time we change ℎ𝑡+1, we calculate only one normal density since the transformation is Markovian, and, since 𝑡=0,…,𝑇−1, we need 𝑂(𝑇) calculations.

Notice that if we retain ℎnew𝑡+1, then 𝜀new𝑡 is retained and we will not need to simulate 𝑠𝑡 at a later stage. In fact we only need to simulate 𝑠𝑡 at 𝑡=𝑇 since we need to know 𝜀𝑇. The final step involves computing𝛿(𝑛)𝑡+1=𝑟𝑡+1−𝜀(𝑛)𝑡+1ℎ(𝑛)𝑡+1,𝑡=0,…,𝑇−1,𝑛=1,…,𝑀.(3.90)

Using all the above simulated values, we may now take average of simulations and compute the quantities needed for the SEM algortihm. As for the Bayesian inference, having completed Step (2) we may now proceed to the Gibbs sampling and M-H steps to obtain draws from the required posterior density. Thus, the first-order Markov transformation of the model made feasible an MCMC algorithm which allows the calculation of a classical estimator via the simulated EM algorithm and a simulation-based Bayesian inference in 𝑂(𝑇) computational operations.

3.4. A Comparison of the Simulators

In order to compare the performance of the inefficient and the efficient MCMC sampler introduced in the previous subsection, we have generated realizations of size 𝑇=240 for the simple GQARCH(1,1)-M-AR(1) model with parameters 𝛿=0.1, 𝜑=0.85, 𝛼=0.084, 𝛽=0.688,