Statistical Tests

In literature numerous

*statistical significance tests*are discussed in order to investigate characteristics of random number generators; see e.g. G.S. Fishman (1996)*Monte Carlo: Concepts, Algorithms and Applications*, Springer, New York.

- We only recall two such tests
- which are important for investigating characteristics of linear congruential generators (and other random number generators)
- and have already been mentioned in the courses ,,Statistik I'' and ,,Statistik II'', respectively.

- Pearson's -goodness of fit test (see Section 5.2 of the
lecture notes for ,,Statistik II'') is used to check
- if the generated pseudo-random numbers can be regarded as realizations of uniformly distributed random variables
- and if we may assume the independence of these random variables.

- Another method for the generation of sequences
of
numbers having desirable characteristics
- is based on minimizing the
*Kolmogorov distance*(see Section 1.5 of ,,Statistik I'' and Section 5.1 of ,,Statistics II'', respectively)*every*natural number . - In literature this procedure is referred to as
*Quasi-Monte-Carlo-Method*; see e.g. H. Niederreiter (1992)*Random Number Generation and Quasi-Monte-Carlo Methods*, SIAM, Philadelphia.

- is based on minimizing the

*-goodness of fit test of uniform distribution*The following test is considered in order to check if the pseudo-random numbers

- can be regarded as realizations of independent sampling variables that are uniformly distributed on the interval .
- The interval is divided in subintervals of equal length
and
- we consider the -dimensional (hypothetical) vector of parameters and
- the test statistic
where

- If the sampling variables are independent and uniformly distributed on the interval , by Theorem 5.5 from ,,Statistik II''
- the test statistic is asymptotically distributed.
- Thus, for sufficiently large the hypothesis
is rejected if
- We will illustrate this test by the following numerical example. For
,
and we want to check if
- the hypothesis that the sampling variables are uniformly
distributed is conformable with a sample
of pseudo-random numbers. The sample
has the following vector
of class
frequencies:

- In this case we obtain
and hence
- Thus, the hypothesis of a uniform distribution on is not rejected.

- the hypothesis that the sampling variables are uniformly
distributed is conformable with a sample
of pseudo-random numbers. The sample
has the following vector
of class
frequencies:

**Remarks**-
- As a generalization of the -goodness of fit test for
checking the uniform distribution of some sample variables one can
also check
- if for a given natural number (e.g. or ) the pseudo-random vectors can be regarded
- as realizations of independent random vectors that are uniformly distributed on .

- For this purpose the unit cube is divided into
smaller cubes of equal size,
- which are of the form .
- Furthermore, we consider the -dimensional (hypothetical) vector of parameters and
- the test statistic
where

- As a generalization of the -goodness of fit test for
checking the uniform distribution of some sample variables one can
also check

*Run Test*There are a number of other significance tests allowing to evaluate the quality of random number generators. In particular it can be verified

- if the generated pseudo-random numbers can be regarded as realizations of independent random variables having a certain distribution. In our case we consider the hypothesis of a uniform distribution on .
- The following
*run test*checks in particular- if the
*independence*assumption for the sampling variables is reflected sufficiently well by the pseudo-random numbers . - This is done by analyzing the lengths of monotonically increasing
subsequences, also called
*runs*, within the sequence of pseudo-random numbers.

- if the
- For this purpose we define the random variables
by
the recursion formula

where . - The random variables
where

are called the runs of the sequence .

The significance test that will be constructed is based on the following property of the runs .**Theorem 3.3**The random variables introduced in are independent and identically distributed such that

if the random variables are independent and uniformly distributed on .**Proof**-
- Let
be independent and uniformly distributed on
.
- Then for all and for arbitrary natural numbers
, we get that

- This implies that the runs are independent and identically distributed.

- Then for all and for arbitrary natural numbers
, we get that
- Furthermore, an induction argument shows that for arbitrary
and

- For , equation (8) obviously holds. By the
formula of total probability we obtain

where the last equality is a consequence of the independence and -uniform distribution of . - Assume now that (8) is true for some . Then

where the second but one equality uses the induction hypothesis. - This shows (8) for any .

- For , equation (8) obviously holds. By the
formula of total probability we obtain
- Moreover, by (8) we can conclude that for any

- Let
be independent and uniformly distributed on
.
**Remarks**-
- Let us assume that sufficiently many pseudo-random numbers have been generated that are resulting in the runs according to (5) and (6).
- We choose pairwise disjoint intervals
on the positive real axis such that
- the probabilities
- For these probabilities we consider the -dimensional
(hypothetical) vector

and - the test statistic
where

- the probabilities
- According to Theorem 3.3 for large the hypothesis will be rejected if . Note that this requires the generation a sufficiently large number of pseudo-random numbers .