next up previous contents
Next: Transformation of Uniformly Distributed Up: Generation of Pseudo-Random Numbers Previous: Linear Congruential Generators   Contents


Statistical Tests





  1. $ \chi ^2$-goodness of fit test of uniform distribution

    The following test is considered in order to check if the pseudo-random numbers $ u_1,\ldots,u_n$

    Remarks
     
    • As a generalization of the $ \chi ^2$-goodness of fit test for checking the uniform distribution of some sample variables one can also check
      • if for a given natural number $ d\ge 1$ (e.g. $ d=2$ or $ d=3$) the pseudo-random vectors $ (u_1,\ldots,u_d),\ldots,(u_{(n-1)d+1},\ldots,u_{nd})$ can be regarded
      • as realizations of independent random vectors $ (U_1,\ldots,U_d),\ldots,(U_{(n-1)d+1},\ldots,U_{nd})$ that are uniformly distributed on $ (0,1]^d$.
    • For this purpose the unit cube $ (0,1]^d$ is divided into $ r^d$ smaller cubes $ B_j$ of equal size,
      • which are of the form $ ((i_1-1)/r,i_1/r]\times\ldots\times((i_d-1)/r,i_d/r]$.
      • Furthermore, we consider the $ (r^d-1)$-dimensional (hypothetical) vector $ {\mathbf{p}}_0=(1/r^d,\ldots,1/r^d)$ of parameters and
      • the test statistic $ T_n:\mathbb{R}^{nd}\to[0,\infty)$ where

        $\displaystyle T_n({\mathbf{u}}_1,\ldots,{\mathbf{u}}_n)=\sum\limits
_{j=1}^{r^d}\;\frac{(Z_j({\mathbf{u}}_1,\ldots,{\mathbf{u}}_n)-n/r^d)^2}{n/r^d}\;,
$

        $ {\mathbf{u}}_i=(u_{(i-1)d+1},\ldots,u_{id})$ and $ Z_j({\mathbf{u}}_1,\ldots,{\mathbf{u}}_n)=\char93 \{i:\, 1\le i\le n,\, {\mathbf{u}}_i\in
B_j\}$. Notice that $ Z_j({\mathbf{u}}_1,\ldots,{\mathbf{u}}_n)$ denotes the number of pseudo-random vectors in $ W_j$.


  2. Run Test

    There are a number of other significance tests allowing to evaluate the quality of random number generators. In particular it can be verified


    The significance test that will be constructed is based on the following property of the runs $ W_1,W_2,\ldots$.

    Theorem 3.3   The random variables $ W_1,W_2,\ldots$ introduced in % latex2html id marker 35945
$ (\ref{run1})$ are independent and identically distributed such that

    $\displaystyle P(W_j=k)=\frac{k}{(k+1)!}\;,\qquad\forall\, k=1,2,\ldots\,,$ (7)

    if the random variables $ U_1,U_2,\ldots$ are independent and uniformly distributed on $ (0,1]$.

    Proof
     
    • Let $ U_1,U_2,\ldots$ be independent and uniformly distributed on $ (0,1]$.
      • Then for all $ n\ge 1$ and for arbitrary natural numbers $ k_1,\ldots,k_n\ge 1$, we get that
        $\displaystyle { P(W_1=k_1,\ldots,W_n=k_n) =
P(V_1=k_1,V_2-V_1-1=k_2,\ldots,V_n-V_{n-1}-1=k_n)}$
          $\displaystyle =$ $\displaystyle P\bigl(V_1=k_1,V_2=k_2+k_1+1,\ldots,V_n=k_n+\ldots+k_1+n-1\bigr)$  
          $\displaystyle =$ $\displaystyle P\bigl(U_i\le
U_{i+1},\,\forall\,i=1,\ldots,k_1-1,\,U_{k_1}>U_{k_1+1},$  
            $\displaystyle U_i\le
U_{i+1},\,\forall\,i=k_1+2,\ldots,k_1+1+k_2-1,\,U_{k_1+1+k_2}>U_{k_1+1+k_2+1},\ldots,$  
            $\displaystyle U_i\le
U_{i+1},\,\forall\,i=k_1+1+\ldots+k_{n-1}+2,\ldots,k_1+1+\ldots+k_{n-1}+1+k_n-1,\,$  
            $\displaystyle \hspace{3cm}
U_{k_1+1+\ldots+k_{n-1}+1+k_n}>U_{k_1+1+\ldots+k_{n-1}+1+k_n+1}\bigr)$  
          $\displaystyle =$ $\displaystyle P\bigl(U_i\le
U_{i+1},\,\forall\,i=1,\ldots,k_1-1,\,U_{k_1}>U_{k_1+1}\bigr)$  
            $\displaystyle P\bigl(U_i\le
U_{i+1},\,\forall\,i=k_1+2,\ldots,k_1+1+k_2-1,\,U_{k_1+1+k_2}>U_{k_1+1+k_2+1}\bigr)$  
            $\displaystyle \ldots P\bigl(U_i\le
U_{i+1},\,\forall\,i=k_1+1+\ldots+k_{n-1}+2,\ldots,k_1+1+\ldots+k_{n-1}+1+k_n-1,\,$  
            $\displaystyle \hspace{3cm}
U_{k_1+1+\ldots+k_{n-1}+1+k_n}>U_{k_1+1+\ldots+k_{n-1}+1+k_n+1}\bigr)$  
          $\displaystyle =$ $\displaystyle P\bigl(U_i\le
U_{i+1},\,\forall\,i=1,\ldots,k_1-1,\,U_{k_1}>U_{k_1+1}\bigr)$  
            $\displaystyle \ldots P\bigl(U_i\le U_{i+1},\,\forall\,i=1,\ldots,k_n-1,\,
U_{k_n}>U_{k_n+1}\bigr)\,.$  

      • This implies that the runs $ W_1,W_2,\ldots$ are independent and identically distributed.
    • Furthermore, an induction argument shows that for arbitrary $ k\in\mathbb{R}$ and $ t\in(0,1]$

      $\displaystyle P(U_1\le\ldots\le U_k\le t)=\frac{t^k}{k!}\,.$ (8)

      • For $ k=1$, equation (8) obviously holds. By the formula of total probability we obtain
        $\displaystyle P(U_1\le\ldots\le U_{k+1}\le t)$ $\displaystyle =$ $\displaystyle \int_0^1 P(U_1\le\ldots\le
U_k\le U_{k+1}\le t\mid
U_{k+1}=x)\,P(U_{k+1}\in\, dx)$  
          $\displaystyle =$ $\displaystyle \int_0^1
P(U_1\le\ldots\le U_k\le x\le t\mid
U_{k+1}=x)\,P(U_{k+1}\in\, dx)$  
          $\displaystyle =$ $\displaystyle \int_0^1
P(U_1\le\ldots\le U_k\le x\le t)\, dx\,,$  

        where the last equality is a consequence of the independence and $ (0,1]$-uniform distribution of $ U_1,U_2,\ldots$.
      • Assume now that (8) is true for some $ k\ge 1$. Then
        $\displaystyle P(U_1\le\ldots\le U_{k+1}\le t)$ $\displaystyle =$ $\displaystyle \int_0^t
P(U_1\le\ldots\le U_k\le x)\, dx$  
          $\displaystyle =$ $\displaystyle \int_0^t\;\frac{x^k}{k!}\,dx=\frac{t^{k+1}}{(k+1)!}\;,$  

        where the second but one equality uses the induction hypothesis.
      • This shows (8) for any $ k\ge 1$.
    • Moreover, by (8) we can conclude that for any $ k\in\mathbb{N}$
      $\displaystyle P(U_1\le\ldots\le U_k,\,U_k> U_{k+1})$ $\displaystyle =$ $\displaystyle \int_0^1
P(U_1\le\ldots\le U_k,\,U_k>x)\, dx$  
        $\displaystyle =$ $\displaystyle \int_0^1 \Bigl(P(U_1\le\ldots\le U_k\le 1)-
P(U_1\le\ldots\le U_k\le x)\Bigr)\, dx$  
        $\displaystyle =$ $\displaystyle \int_0^1\Bigl(\frac{1}{k!}\;-\;\frac{x^k}{k!}\Bigr)\,
dx$  
        $\displaystyle =$ $\displaystyle \frac{1}{k!}\;-\;\frac{1}{(k+1)!}=\frac{k}{(k+1)!}\;.$  


       
        $ \Box$


    Remarks
     
    • Let us assume that sufficiently many pseudo-random numbers $ u_1,u_2\ldots$ have been generated that are resulting in the $ n$ runs $ w_1,\ldots,w_n$ according to (5) and (6).
    • We choose $ r$ pairwise disjoint intervals $ (a_1,b_1],\ldots,(a_r,b_r]$ on the positive real axis such that
      • the probabilities

        $\displaystyle p_{0,j}=\sum\limits
_{k\in\mathbb{N}\cap(a_j,b_j]}\frac{k}{(k+1)!}\,,\qquad j=1,\ldots,r
$

        are almost equal.
      • For these probabilities we consider the $ (r-1)$-dimensional (hypothetical) vector
        $ {\mathbf{p}}_0=(p_{0,1},\ldots,p_{0,r-1})$ and
      • the test statistic $ T_n:\mathbb{R}^n\to[0,\infty)$ where

        $\displaystyle T_n(w_1,\ldots,w_n)=\sum\limits
_{j=1}^r\;\frac{(Y_j(w_1,\ldots,w_n)-np_{0,j})^2}{np_{0,j}}\;,
$

        and $ Y_j(w_1,\ldots,w_n)=\char93 \{i:\, 1\le i\le n,\, a_j<w_i\le b_j\}$ denotes the number of run lengths $ w_1,\ldots,w_n$ belonging to class $ j$.
    • According to Theorem 3.3 for large $ n$ the hypothesis $ H_0:{\mathbf{p}}={\mathbf{p}}_0$ will be rejected if $ T(w_1,\ldots,w_n)>\chi^2_{r-1,1-\alpha}$. Note that this requires the generation a sufficiently large number of pseudo-random numbers $ u_1,u_2,\ldots$.



next up previous contents
Next: Transformation of Uniformly Distributed Up: Generation of Pseudo-Random Numbers Previous: Linear Congruential Generators   Contents
Ursa Pantle 2006-07-20