Asymptotic Variance of Estimation; Mean Squared Error

For the statistical model introduced in Section 3.4.2 we now investigate the asymptotic behavior of the variance if .

**Proof**-
- Clearly,

and thus - This representation will now be used to show (78) for
the case
.
- In this case we observe
and
- Furthermore, by the stationarity of the Markov chain
,
- A combination of the results above yields

- where the second equality is due to the identity
- Taking into account the representation formula (76) for this implies (78).

- In this case we observe
- It is left to show that (78) is also true for an
arbitrary initial distribution
.
- At this point we will use a more precise notation: We will write instead of and instead of .
- It suffices to show that

- For this purpose we introduce the following notation: For
let
und
- Then, by (79),

where we denote the three summands in the last expression by , and , respectively. - does not depend on and hence .
- As the state space is finite we obtain for
that
- This implies
for any , as
- Furthermore, for we have the following estimate

where it is easy to see that the supremum is finite. - Due to the ergodicity of the Markov chain
, the last
summand will become arbitrarily small for sufficiently large .
This completes the proof of (80).

- Clearly,
**Remarks**-
- Recall that
- by Theorem 2.1 from ,,Statistik I'' we obtain for the
*mean squared error*of the MCMC estimator defined in (70) for that

- i.e., the mean squared error of the MCMC estimator is equal to the sum of the squared bias and the variance of the estimator .

- by Theorem 2.1 from ,,Statistik I'' we obtain for the
- Both summands on the right hand side of (81) converge to 0 if but with different rates of convergence.
- Consequently,
the asymptotic behavior of the mean squared error
of
is crucially influenced by the
*asymptotic variance*of the estimator, whereas the*bias*plays a minor role. - In other words: It can make sense to choose the simulation matrix
such that
- the asymptotic variance is as small as possible,
- even if this results in a certain increase of the asymptotic bias .

- Recall that

In order to investigate this problem more deeply we introduce the following notation: Let

- Let
and
be two transition
matrices on such that
and
are reversible.
- For arbitrary such that let ,
- i.e., outside the diagonal all entries of the transition matrix are greater or equal than the corresponding entries of the transition matrix .

- Then, for any function
,

**Proof**-
- Let
be a transition matrix such that the pair
is reversible. It suffices to show that

- By Theorem 3.19,

where denotes the fundamental matrix of introduced by (74).- On the other hand, as
, we get that
- Taking into account (84) this implies

- On the other hand, as
, we get that
- As the pair
is reversible, by the representation
formula (75) for the fundamental matrix
that was derived in Lemma 3.3 we
obtain for arbitrary
- As
is a stochastic matrix and
is
reversible
- only the entries where (or alternatively the entries where ) can be chosen arbitrarily. This can be seen as follows.
- For every pair such that the entries ,
and can be expressed via in the following
way:
- For arbitrary
the entry
of the matrix product
is given by
- This implies that the matrix
is non-negative definite, i.e., for all
- By (86) this yields for arbitrary such
that

- This completes the proof of (83).

- Let
be a transition matrix such that the pair
is reversible. It suffices to show that
**Remarks**-
As a particular consequence of Theorem 3.20 we get
that
- the simulation matrix of the Metropolis algorithm (i.e. if we consider equality in (50) ) minimizes the asymptotic variance
- within the class of all Metropolis-Hastings algorithms having an arbitrary but fixed ,,potential transition matrix'' .