Given a sample of observations {yi:i=1,…,N}, the
log-likelihood function for an unknown parameter θ is

(1)lN(θ)≡∑i=1Nlnf(θ|yi).

Let f˜(θ|yi,ω) be an unbiased simulator such that

(2)Eω[f˜(θ|y,ω)|y]=f(θ;y)

where ω is a vector of R simulated random variates. Then, the
maximum simulated likelihood (MSL) estimator for θ is

(3)θ˜MSL≡argmaxθl˜N(θ)

where l˜N(θ)≡∑i=1Nlnf˜(θ|yi,ω)
for some sequence of simulations {ωi}.

There are two points which deserve special attention. First, the estimator is
conditional upon the particular sequence of simulations {ωi}
used. That is to say one will obtain a different estimate for each such sequence
used. Second, even though the simulator of f is unbiased, the resulting MSL estimate
will be biased. That is, even though we have

(4)E[l(θ)]=l(θ),

this does not imply

(5)E[argmaxθl˜(θ)]=argmaxθl(θ).

Unbiased simulation of the log-likelihood function is generally infeasible due
to the nonlinearity introduced by the natural log transformation of the likelihood
function, which can usually be simulated without bias.

Consistency

All is not lost
because, even though our estimate is biased, we can still obtain an estimator whose
probability limit is the same as the MLE. This requires that the sample average of
the simulated log-likelihood converges to the sample average log-likelihood. This
can be accomplished by increasing the number of simulations, and thus decreasing the
simulation error, at a sufficiently fast rate relative to the sample size. We have the
following lemma (see Newey and McFadden, 1994):

Lemma. Suppose the following:

θ∈Θ⊂ℝK and Θ is compact,

Q0(θ) and QN(θ) are continuous in θ,

θ0≡argmaxθ∈ΘQ0(θ) is unique,

θ^N≡argmaxθ∈ΘQN(θ), and

QN(θ)→Q0(θ) in probability uniformly in θ as N→∞.

Then, θ^N→θ0 in probability.

Now, suppose that f satisfies the conditions of this lemma. In particular,
suppose that the obersvations yi are iid, that θ is identified,
and that f(θ,y) is continuous in θ over some compact set
Θ. Finally, assume that
E[supθ∈Θ|lnf(θ,y)|] is finite.

Now, given a sequence of simulators ωir, iid across r, the
the MSL estimator defined as

(6)θ˜MSL≡argmaxθ1N∑i=1Nlnf˜(θ|yi,ωi)

is consistent if R→∞ as N→∞. For a proof refer to
Hajivassiliou and Ruud (1994, p. 2417).

Asymptotic Normality

Suppose that f˜ is differentiable in θ. Then we can form a Taylor
expansion approximation of Δθl˜(θ) around θ0:

(7)Δθl˜(θ^MSL)=Δθl˜(θ0)+Δθ2l˜(θ¯)(θ^MSL−θ0)

for some θ¯ lying on the line segment between θ^MSL
and θ0.
By definition, the left hand side equals zero and after multiplying by N and
rearranging we find

(8)N(θ^MSL−θ0)=−[1NΔθ2l˜(θ¯)]−11NΔθl˜(θ0).

Now, the consistency of θ^MSL implies consistency of
θ¯ and so

(9)1NΔθ2l˜(θ¯)→pE[Δθ2lnf(θ0|y)].

As for the gradient term, we have

(10)1NΔθl˜(θ0)=1N∑i=1NΔθf˜(θ0|yi,ωi)f˜(θ0|yi,ωi).

Ideally, to prove asymptotic normallity we would like this to converge to some
mean zero normal distribution. However,
the expectation of the individual terms in this summation are nonzero, so we
cannot apply a central limit theorem directly. We can rewrite this term as
follows:

(11)1NΔθl˜(θ0)=1NΔθl(θ0)+AN+BN

with

(12)AN=1N∑i=1N{Δθlnf˜−Eω[Δθlnf˜]}

and

(13)BN=1N∑i=1N{Eω[Δθlnf˜]−Δθlnf}.

The term AN represents the pure simulation noise and has expectation zero.
The BN term represents the simulation bias. Proposition 4 of Hajivassiliou and Ruud (1994, p. 2418)
shows that if R grows fast enough relative to N, specifically if R/N→∞,
then the simulation bias is harmless. Finally, Proposition 5 (p. 2419) shows that
θ^MSL is in fact asymptotically efficient.