Generation of normal distributions revisited

Umeda, Takayuki

doi:10.1007/s00180-024-01468-3

Generation of normal distributions revisited

Original Paper
Open access
Published: 23 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Statistics Aims and scope Submit manuscript

Generation of normal distributions revisited

Download PDF

Takayuki Umeda ORCID: orcid.org/0000-0001-6882-9200¹^nAff2

458 Accesses
1 Altmetric
Explore all metrics

Abstract

Normally distributed random numbers are commonly used in scientific computing in various fields. It is important to generate a set of random numbers as close to a normal distribution as possible for reducing initial fluctuations. Two types of samples from a uniform distribution are examined as source samples for inverse transform sampling methods. Three types of inverse transform sampling methods with new approximations of inverse cumulative distribution functions are also discussed for converting uniformly distributed source samples to normally distributed samples.

Advances in Gaussian random field generation: a review

Article 05 August 2019

Random Sampling Methods

Monte Carlo Methods

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Random samples from a normal distribution are widely used in scientific computing. In numerical simulations of statistical physics, such as plasma particle-in-cell simulations (Hockney and Eastwood 1981; Birdsall and Langdon 1985) and Monte Carlo simulations (Hammersley and Handscomb 1964), initial distributions of particles in both configuration and velocity spaces strongly affect final results. A set of samples with an exactly requested distribution has been commonly used for initial loading of particles (Byers and Grewal 1970), whose numerical method is known as the “quiet start” (Morse and Nielson 1971).

A distribution of particles in the velocity space should be as close as possible to a requested distribution, such as the Maxwell–Boltzmann distribution (i.e., the normal distribution or the Gaussian distribution). Script language environments, such as Python™, MATLAB® and its clones, have their own built-in functions that generate pseudo-random samples from a normal distribution. On the other hand, high-performance programming languages such as C and Fortran do not have their own intrinsic functions/subroutines that generate pseudo-random samples from a normal distribution, while C++ has a class “normal_distribution” since C++11. There is a numerical method that generates pseudo-random samples with a standard normal distribution based on the central limit theorem (e.g., Lindeberg 1922), which requires twelve sets of source samples from a uniform distribution to generate one sample from a normal distribution. Pseudo-random samples from a normal distribution are also generated with pseudo-random samples from a uniform distribution by using numerical methods such as the rejection sampling (e.g., Kinderman and Monahan 1997; Marsalia and Tsang 1984; Leva 1992; Marsalia and Tsang 1998, 2000) and the inverse transform sampling (e.g., Box and Muller 1958). The rejection sampling generally needs a larger number of source samples than the desired number of samples. Therefore, it is not easy to parallelize the rejection sampling.

The purpose of the present study is to examine numerical methods for generating pseudo-random samples from a normal distribution in multi dimensions for high-performance computing languages such as C and Fortran. The inverse transform sampling is focused on, which is a one-to-one conversion method from a uniform distribution into an arbitrary desired distribution unlike the rejection sampling. A numerical method for generating samples from an exact uniform distribution is given in Sect. 2. New approximations of the inverse cumulative distribution function for a standard normal distribution in one dimension (i.e., the inverse error function) and in three dimensions are given in Sect. 3. Numerical methods for generating samples from standard normal distributions with these approximations are compared against the classic method with the Box–Muller transform (Box and Muller 1958) in Sect. 4. Finally, the conclusion is given in Sect. 5.

2 Source samples from uniform distributions

Unlike script language environments, high-performance programming languages such as C and Fortran do not have their own intrinsic functions or subroutines that generate pseudo-random samples from a normal distribution. Hence, samples from a normal distribution are generated with source samples from a uniform distribution. In this section, source samples from a uniform distribution are discussed firstly.

C has an intrinsic function “rand,” which returns a pseudo-random integer from a uniform distribution in the range between 0 and “RAND_MAX” (which depends on compilers). The period of the “rand” function is $2^{32}-1$. Fortran 90 has an intrinsic subroutine “random_number,” which returns an array of pseudo-random floating-point values “r” (in either single- or double-precision) from a uniform distribution in the range of $0 \le r < 1$. However, the period of the random number generator depends on compilers. For example, the period of random_number of Intel® oneAPI ver. 2023 (ifort) is approximately $10^{18}(\approx 2^{60})$ (Intel 2022a). On the other hand, the period of random_number of the latest GNU Fortran (gfortran) is $2^{256}-1$ (GCC 2022). Instead of these intrinsic functions/subroutines, the Mersenne Twister (MT19937) (Matsumoto and Nishimura 1998) has been commonly used in scientific computing, since the Mersenne Twister has a very long period of $2^{19,937}-1$. C++ has a class “mt19937” since C++11.

In the present study, another type of pseudo-random samples from an exact uniform distribution is discussed. Firstly, a set of “sequential” samples in the range of $0 \le s < 1$ is generated from an exact uniform distribution by the following simple procedure,

$$\begin{aligned} s_{i} = \textrm{mod}\left( \frac{i}{N} + r, 1\right) , \end{aligned}$$

(1)

where N is the number of samples, $0 \le r < 1$ is a random floating-point value as a seed, and $i = 1,\cdots , N$ is an integer sequence. The modulo function “$\textrm{mod}(a,m)$” returns the remainder after division of a by m. Secondly, by shuffling the samples generated from Eq. (1), a set of pseudo-random samples from an exact uniform distribution is generated. The shuffling of a list is performed by using a set of random integers without duplication, which is known as a random permutation.

$$\begin{aligned} {\varvec{u}} \Leftarrow {\varvec{s}} \left\{ \textrm{randperm}\left( N \right) \right\} . \end{aligned}$$

(2)

In C and Fortran, the shuffling of a list is performed by using uniformly distributed pseudo-random numbers described above. Note that script language environments have their own built-in functions for random permutations. For example, MATLAB® has a built-in function “randperm(N),” which returns a random permutation of the integers between 1 and N (Mathworks 2023).

In uniformly distributed samples generated from Eq. (2), a specific value reappears in a period longer than that of the seed random samples. With the number of samples of $N = 2^n$, however, the same value sometimes appears in different streams.

3 Generation of normal distributions

3.1 Inverse transform sampling in one dimension

The inverse transform sampling method uses the inverse function of the cumulation (or prefix sum) of a desired distribution to transform a uniform distribution to the desired distribution. For the one-dimensional standard normal (Gaussian or Maxwell–Boltzmann) distribution, the cumulative distribution is given as

$$\begin{aligned} F(x) = \int _{-\infty }^x \frac{1}{\sqrt{2\pi }}\exp \left( -\frac{t^2}{2}\right) \textrm{d}t = \frac{1}{2}\textrm{erf}\left( \frac{x}{\sqrt{2}}\right) +\frac{1}{2}, \end{aligned}$$

(3)

where $\textrm{erf}(x)$ is the error function, which is intrinsic (“erf”) in C/C++ and Fortran (since Fortran 2008). The inverse cumulative distribution is given as

$$\begin{aligned} F^{-1}(u) = x = \sqrt{2} \textrm{erf}^{-1} \left( 2u-1\right) . \end{aligned}$$

(4)

This equation performs the one-to-one transform of a set of samples $0\le u<1$ from a uniform distribution to a set of samples x from a standard normal distribution.

The inverse transform sampling in one dimension in Eq. (4), however, is not commonly used. This is because there has not been a fast but free numerical library of the inverse error function. Numerical libraries of recent commercial compilers, e.g., Intel® oneAPI Math Kernel Library (oneMKL) (Intel 2022b) and NVIDIA® CUDA® API (NVIDIA 2023) have a built-in function “erfinv,” while the GNU compiler does not have one. Note that MATLAB® also has a built-in function “erfinv” (Mathworks 2023).

Instead of the numerical libraries of commercial compilers, an approximation of the inverse error function is used in the present study. The inverse error function was approximated by several methods. Classic approximations are obtained by the Taylor series expansion (Philip 1960; Carlitz 1963) and the Chebyshev series expansion (Blair et al. 1976). However, the numerical accuracy of these approximations depends on the number of terms.

In the present study, an invertible approximation of the error function (e.g. Winitzki 2008; Soranzo and Epure 2012a, b, 2014) is examined. Note that there are a variety of approximations for the error function $\textrm{erf}(x)$ as listed in the following references ( Soranzo and Epure 2014; Yerukala and Boiroju 2015; Matić et al. 2018; Eidous and Abu-Shareefa 2020, and the references therein), but most of them are not invertible. In the present study, an accurate and invertible approximation of the error function obtained by Soranzo and Epure (2012a) is used,

$$\begin{aligned} \textrm{erf}(x) \approx \sqrt{1-\exp \left( -\frac{ax^2+bx^4}{1+cx^2+dx^4}\right) } \end{aligned}$$

(5)

the inverse of which results in

$$\begin{aligned}{} & {} \textrm{erf}^{-1}(x) \approx \sqrt{\frac{ \sqrt{\left\{ c\log (1-x^2)+a\right\} ^2-4\left\{ d\log (1-x^2)+b\right\} \log (1-x^2)} -\left\{ c\log (1-x^2)+a\right\} }{2\left\{ d\log (1-x^2)+b\right\} }} \end{aligned}$$

(6)

for $x\ge 0$. In the previous studies, the parameters a, b, c and d are assumed to be constant due to the invertibility (Winitzki 2008; Soranzo and Epure 2012a, b).

In the present study, the parameters $a=4/\pi$ and $b=0.14$ are assumed to be constant, which are identical to Winitzki (2008). On the other hand, the parameters c and d are assumed to be functions of $x(\ge 0)$ in order to reduce numerical errors from the built-in inverse error function “erfinv” in MATLAB® for $0.7<x<1$. The argument x is separated into three intervals, $0 \le x < 0.72$, $0.72 \le x < 0.94$, and $0.94 \le x < 1$. A brute-force search is performed to find a set of parameters c and d that minimizes the root mean square of numerical errors in each interval. Then, the parameters c and d are given as follows,

$$\begin{aligned} c(x)= & {} c_1+0.5(c_2-c_1)\tanh \left( \frac{x-0.72}{0.01}\right) +0.5(c_3-c_2)\tanh \left( \frac{x-0.94}{0.01}\right) \\ d(x)= & {} d_1+0.5(d_2-d_1)\tanh \left( \frac{x-0.72}{0.01}\right) +0.5(d_3-d_2)\tanh \left( \frac{x-0.94}{0.01}\right) \end{aligned}$$

with $c_1=0.14$, $c_2=0.1404$, $c_3=0.1415$, $d_1=0.00145$, $d_2=0.0008$, and $d_3=0.0002$.

Figure 1 shows a comparison between erfinv in MATLAB® and the approximation from Eq. (6). As a reference, the Taylor series expansion of the inverse error function with the number of Taylor series terms $M=20$ and $M=50$ is superimposed as well, where

$$\begin{aligned} \textrm{erf}^{-1}(x) \approx \sum _{m=0}^M \frac{h_m}{2m+1} x^{2m+1} \end{aligned}$$

(7)

with (Carlitz 1963)

$$\begin{aligned} h_m = \sum _{k=0}^{m-1}\frac{h_k h_{m-1-k}}{(k+1)(2k+1)}, \ \ \ h_0=1. \end{aligned}$$

Panel (b) shows the profiles of the function/approximations for $0 \le x \le 1$, and panel (a) shows its expansion for $0.9 \le x \le 1$. Panel (c) shows the relative error, $\eta = |(f-f_\textrm{approx}) /{f} |$. The relative error of the Taylor series expansion is around the machine epsilon of double-precision floating point for a small x. However, the relative error exceeds $10^{-2}$ at $x\approx 0.96$ with $M=20$ and at $x\approx 0.98$ with $M=50$. This result suggests that much higher-degree terms are necessary to approximate the inverse error function near $x=1$ by the Taylor series expansion. The relative error of the new approximation is less than $10^{-4}$ for $x<0.9937$. It should be noted, however, that the new approximation is not invertible since c and d are functions of x.

3.2 Inverse transform sampling in two dimensions

For the two-dimensional standard normal distribution, the cumulative distribution is given by using the transform from the rectangular coordinate to the polar coordinate as

$$\begin{aligned} F(x,y)= & {} \int _{-\infty }^x \int _{-\infty }^y \frac{1}{2\pi }\exp \left( -\frac{t_1^2+t_2^2}{2}\right) \textrm{d}t_1\textrm{d}t_2 \nonumber \\= & {} \int _{0}^\chi \int _{0}^{\zeta } \frac{r}{2\pi }\exp \left( -\frac{r^2}{2}\right) \textrm{d}r\textrm{d}\phi \nonumber \\= & {} \frac{\zeta }{2\pi } \left\{ 1-\exp \left( -\frac{\chi ^2}{2}\right) \right\} \end{aligned}$$

(8)

with $\chi ^2 = x^2+y^2$ and $\zeta = \tan ^{-1}\left( y/x\right)$. Hence, the inverse cumulative distribution is given as

$$\begin{aligned} F^{-1}(u_1,u_2) = (x,y) = \left( \sqrt{-2\log (u_1)}\cos (2\pi u_2), \sqrt{-2\log (u_1)}\sin (2\pi u_2)\right) . \end{aligned}$$

(9)

This procedure is well-known as the Box–Muller transform (Box and Muller 1958), which transforms two samples $0\le (u_1,u_2)<1$ from a two-dimensional uniform distribution to two samples (x, y) from a two-dimensional standard normal distribution on one to one.

Note that samples from a two-dimensional standard normal distribution are generated as well by replacing $(u_1,u_2)$ with another set of samples from a two-dimensional uniform distribution $-1\le (v_1,v_2)<1$, where

$$\begin{aligned} u_1=v_1^2+v_2^2, \ \ \ \cos (2\pi u_2) = \frac{v_1}{v_1^2+v_2^2}, \ \ \ \sin (2\pi u_2) = \frac{v_2}{v_1^2+v_2^2} \end{aligned}$$

with the rejection of $1 < v_1^2+v_2^2$, which is known as the Marsaglia polar method (Marsaglia and Bray 1964). However, the rejection method is not focused on in the present study.

The Box–Muller transform (Box and Muller 1958) has been widely used, because almost all programming languages have intrinsic (or built-in) functions “sqrt,” “cos,” “sin,” and “log,” unlike “erfinv.” As a drawback, however, four sets of uniformly distributed samples are needed for generating three sets of normally distributed samples in three dimensions, because the Box–Muller transform is not able to generate one set of normally distributed samples from one set of uniformly distributed samples.

3.3 Inverse transform sampling in three dimensions

For the three-dimensional standard normal distribution, the cumulative distribution is given by using the transform from the rectangular coordinate to the spherical coordinate as

$$\begin{aligned} F(x,y,z)= & {} \int _{-\infty }^x \int _{-\infty }^y \int _{-\infty }^z \frac{1}{\sqrt{2\pi }^3}\exp \left( -\frac{t_1^2+t_2^2+t_3^2}{2}\right) \textrm{d}t_1\textrm{d}t_2 \textrm{d}t_3 \nonumber \\= & {} \int _{0}^\chi \int _{0}^{\upsilon } \int _{0}^{\zeta } \frac{r^2}{\sqrt{2\pi }^3}\exp \left( -\frac{r^2}{2}\right) \sin \theta \textrm{d}r\textrm{d}\theta \textrm{d}\phi \nonumber \\= & {} \frac{\zeta }{\sqrt{2\pi }^3} \left\{ \sqrt{\frac{\pi }{2}} \textrm{erf} \left( \frac{\chi }{\sqrt{2}}\right) - \chi \exp \left( -\frac{\chi ^2}{2}\right) \right\} (1-\cos \upsilon ) \end{aligned}$$

(10)

with $\chi ^2 = x^2+y^2+z^2$, $\upsilon = \cos ^{-1}\left( z/\sqrt{x^2+y^2+z^2} \right)$, and $\zeta = \cos ^{-1}\left( x/\sqrt{x^2+y^2}\right)$. There is no analytic inverse function of the $\chi$ component of the cumulative distribution g(x),

$$\begin{aligned} g(x) \equiv \textrm{erf}\left( \frac{x}{\sqrt{2}}\right) -\sqrt{\frac{2}{\pi }}x\exp \left( -\frac{x^2}{2}\right) . \end{aligned}$$

(11)

In the present study, the cumulative distribution g(x) is approximated by the following function that is explicitly invertible,

$$\begin{aligned} g(x) \approx \left\{ 1-\exp \left( -\frac{ax^2+bx^4}{1+cx^2+dx^4}\right) \right\} ^{\frac{3}{2}}. \end{aligned}$$

(12)

Figure 2 shows a comparison between g(x) and the approximation from Eq. (12) with $a=0.4129$, $b=0.0823$, $c=0.1906$ and $d=-0.000925$, which are obtained by a brute-force search to minimize the root mean square of numerical errors from Eq. (11). Panel (a) shows the profiles of the function/approximation for $0 \le x \le 6$. Panel (b) shows the corresponding relative error. Panel (c) shows the corresponding absolute error. The relative error of the approximation in Eq. (12) is less than $10^{-2}$, and the absolute error of the approximation in Eq. (12) is less than $10^{-4}$.

With the constants a, b, c and d, the inverse function of g(x) is then given as follows,

$$\begin{aligned} \textrm{erf}^{-1}(x) \approx \sqrt{\frac{ \sqrt{\left\{ c\log (1-x^\frac{2}{3})+a\right\} ^2-4\left\{ d\log (1-x^\frac{2}{3})+b\right\} \log (1-x^\frac{2}{3})} -\left\{ c\log (1-x^\frac{2}{3})+a\right\} }{2\left\{ d\log (1-x^\frac{2}{3})+b\right\} }}. \end{aligned}$$

(13)

Hence, the inverse cumulative distribution of the three-dimensional standard normal distribution is given as

$$\begin{aligned} F^{-1}(u_1,u_2,u_3) = (x,y,z) = \left( \begin{array}{l} g^{-1}(u_1) \sin \left\{ \cos ^{-1}(2u_2-1) \right\} \cos (2\pi u_3) \\ g^{-1}(u_1) \sin \left\{ \cos ^{-1}(2u_2-1) \right\} \sin (2\pi u_3) \\ g^{-1}(u_1) \cos \left\{ \cos ^{-1}(2u_2-1) \right\} \\ \end{array} \right) ^T. \end{aligned}$$

(14)

Note that $\cos \left\{ \cos ^{-1}(2u_2-1) \right\} = (2u_2-1)$ and $\sin \left\{ \cos ^{-1}(2u_2-1) \right\} = \sqrt{1-(2u_2-1)^2} = 2\sqrt{u_2(1-2u_2)}$.

4 Numerical Tests

In the present numerical tests, four sets of source samples are generated with two methods. In the first method, four uniform distributions, $({\varvec{u}}_1,{\varvec{u}}_2,{\varvec{u}}_3,{\varvec{u}}_4)$, are generated from Eq. (1), and then they are shuffled from Eq. (2). In the second method, pseudo-random numbers, $({\varvec{r}}_1,{\varvec{r}}_2,{\varvec{r}}_3,{\varvec{r}}_4)$, are generated with MT19973 (Matsumoto and Nishimura 1998). Then, by using these source samples, three sets of random samples from standard normal distributions are generated with one-, two-, and three-dimensional inverse transform sampling methods shown in the previous section. Note that ${\varvec{u}}_4$ and ${\varvec{r}}_4$ are used for two-dimensional inverse transform sampling (i.e., the Box–Muller transform) only.

Figure 3 shows an example of 1D standard normal distributions with $N_{sample}=10,000$ samples. The bars show the histogram of the samples with 100 bins. The samples “${\varvec{h}}^{(1)}_{u,1}$” and “${\varvec{h}}^{(1)}_{r,1}$” are generated by the 1D inverse transform sampling in Eq. (4) with the source samples ${\varvec{u}}_1$ and ${\varvec{r}}_1$, respectively. The samples “${\varvec{h}}^{(2)}_{u,1}$” and “${\varvec{h}}^{(2)}_{u,2}$” are generated by the 2D inverse transform sampling (i.e., the Box–Muller transform) in Eq. (9) with the source samples ${\varvec{u}}_1$ and ${\varvec{u}}_2$. The samples “${\varvec{h}}^{(2)}_{r,1}$” and “${\varvec{h}}^{(2)}_{r,2}$” are generated by the 2D inverse transform sampling with the source samples ${\varvec{r}}_1$ and ${\varvec{r}}_2$. The samples “${\varvec{h}}^{(3)}_{u,1}$,” “${\varvec{h}}^{(3)}_{u,2}$” and “${\varvec{h}}^{(3)}_{u,3}$” are generated by the 3D inverse transform sampling in Eq. (14) with the source samples ${\varvec{u}}_1$, ${\varvec{u}}_2$ and ${\varvec{u}}_3$. The samples “${\varvec{h}}^{(3)}_{r,1}$,” “${\varvec{h}}^{(3)}_{r,2}$”and “${\varvec{h}}^{(3)}_{r,3}$” are generated by the 3D inverse transform sampling with the source samples ${\varvec{r}}_1$, ${\varvec{r}}_2$ and ${\varvec{r}}_3$.

The 1D histograms ($h_i$) of these distributions are evaluated by the standard deviation from the 1D standard normal distribution f(x),

$$\begin{aligned} \sigma = \sqrt{ \frac{1}{N_{bin}} \sum _{i=1}^{N_{bin}} \left\{ h_i - f(x_i) \right\} ^2 }, \end{aligned}$$

(15)

where $N_{bin}$ denotes the number of bins for the histogram with

$$\begin{aligned} f(x) = \frac{N_{sample}}{\sqrt{2\pi }} \exp \left( -\frac{x^2}{2}\right) . \end{aligned}$$

(16)

Figure 4 shows the ratio of deviation ($\sigma /N_{spb}$) of each sample as a function of $N_{spb} \equiv N_{sample}/N_{bin}$. In panel (a), samples are generated with the source samples ${\varvec{r}}$, i.e., MT19973 (Matsumoto and Nishimura 1998). In panel (b), samples are generated with the source samples ${\varvec{u}}$, i.e., random permutations from uniform distributions. The circle marks show the deviation of samples generated by the 1D inverse transform sampling in Eq. (4). The triangle marks show the deviation of samples generated by the 2D inverse transform sampling (i.e., the Box–Muller transform) in Eq. (9). The square marks show the deviation of samples generated by the 3D inverse transform sampling in Eq. (14). The error bars show the maximum and minimum deviations.

It is clearly shown that the 1D inverse transform sampling with source samples of random permutations provides samples from the 1D standard normal distribution with lower deviation. There is no difference in the deviation between the 2D and 3D inverse transform samplings with source samples of random permutations. The deviations of samples generated by the 1D/2D/3D transform samplings with source samples of MT19973 are the same as those by the 2D/3D transform samplings with source samples of random permutations. The deviation of samples from the 1D standard normal distribution decreases with the 0.5th power of the number of samples per bin.

The 2D histograms ($h_{i,j}$) of these distributions are then evaluated by the standard deviation from the 2D standard normal distribution f(x, y) as well,

$$\begin{aligned} \sigma = \sqrt{ \frac{1}{{N_{bin}^2}} \sum _{i=1}^{N_{bin}} {\sum _{j=1}^{N_{bin}}} \left\{ h_{i,j} - f(x_i,y_j) \right\} ^2 }, \end{aligned}$$

(17)

with

$$\begin{aligned} f(x,y) = \frac{N_{sample}}{2\pi } \exp \left( -\frac{x^2+y^2}{2}\right) . \end{aligned}$$

(18)

Figure 5 shows the ratio of deviation ($\sigma /N_{spb}$) of each sample as a function of $N_{spb} \equiv N_{sample}/N_{bin}^2$ with the same format as in Fig. 4. The deviation of samples from the 2D standard normal distribution decreases with the 0.5th power of the number of samples per bin. The deviations of samples generated by the 1D/2D/3D transform samplings with source samples of MT19973 are the same as those by 1D/2D/3D transform samplings with source samples of permutations.

The 3D histograms ($h_{i,j,k}$) of these distributions are also evaluated by the standard deviation from the 3D standard normal distribution f(x, y, z),

$$\begin{aligned} \sigma = \sqrt{ \frac{1}{N_{bin}^3} \sum _{i=1}^{N_{bin}} \sum _{j=1}^{N_{bin}} \sum _{k=1}^{N_{bin}} \left\{ h_{i,j,k} - f(x_i,y_j,z_k) \right\} ^2 }, \end{aligned}$$

(19)

with

$$\begin{aligned} f(x,y,z) = \frac{N_{sample}}{\sqrt{2\pi }^3} \exp \left( -\frac{x^2+y^2+z^2}{2}\right) . \end{aligned}$$

(20)

Fig. 6 shows the ratio of deviation ($\sigma /N_{spb}$) of each sample as a function of $N_{spb} \equiv N_{sample}/N_{bin}^3$ with the same format as in Fig. 4. The results of the 3D histograms are comparable to those of the 2D histograms.

The computing costs of the random number generator are measured on a single compute core of Intel® Xeon® Gold 6342 processor with of Intel® oneAPI ver. 2021.5.0. It takes 1.84 s. to generate $N=10^8$ samples from the 1D standard normal distribution with Eq. (4), 1.32 s. to generate $N=10^8\times 2$ samples from the 2D standard normal distribution with Eq. (9), and 2.33 s. to generate $N=10^8\times 3$ samples from the 3D standard normal distribution with Eq. (14). It also takes 0.18 s. to generate $N=10^8$ samples from an exact uniform distribution with Eq. (1), and 0.90 s. to permutate the $N=10^8$ samples with Eq. (2). As a reference, it takes 4.28 s. to generate $N=10^8$ samples with the “random_number” subroutine of Fortran.

5 Conclusion

The inverse transform sampling methods are revisited for generating samples from standard normal distributions with source samples from uniform distributions. In the present study, the inverse cumulative distribution functions for standard normal distributions in one and three dimensions are approximated by new functions that are analytically invertible. The source samples are generated from uniform distributions by two methods. One is the Mersenne Twister (MT19937) (Matsumoto and Nishimura 1998) and the other is an exact uniform distribution with random permutations (shuffling).

It is shown that the deviations of samples from the 1D, 2D, and 3D standard normal distributions decrease with the 0.5th power of the number of samples per bin. The numerical tests have shown that there is no substantial difference among the deviations of samples from the standard normal distributions generated by 1D, 2D, and 3D inverse transform samplings with MT19937 and random permutations, suggesting that the new approximations of the 1D and 3D inverse cumulative distribution functions work well.

The conventional Box–Muller transform (Box and Muller 1958) (i.e., 2D inverse transform sampling) uses four sets of samples from uniform distributions for generating three sets of samples from standard normal distributions. On the other hand, the present 1D and 3D inverse transform sampling methods use three sets of samples from uniform distributions for generating three sets of samples from standard normal distribution, which is computationally cheaper. The 1D and 3D inverse cumulative distribution functions proposed in the present study consist of “sqrt” and “log” functions, which are not difficult for coding in C/C++ and Fortran.

Data availability

The source codes of inverse cumulative distribution functions in Fortran and a sample code for using them are available from https://github.com/taka-umeda/norm.dist.

References

Birdsall CK, Langdon AB (1985) Plasma physics via computer simulation. McGraw-Hill, New York
Google Scholar
Blair JM, Edwards CA, Johnson JH (1976) Rational Chebyshev approximations for the inverse of the error function. Math Comp 30:827–830. https://doi.org/10.1090/S0025-5718-1976-0421040-7
Article MathSciNet Google Scholar
Box GEP, Muller AME (1958) note on the generation of random standard normal deviates. Ann Math Stat 29:610–611. https://doi.org/10.1214/aoms/1177706645
Article Google Scholar
Byers JA, Grewal M (1970) Perpendicularly propagating plasma cyclotron instabilities simulated with a one-dimensional computer model. Phys Fluids 13:1819–1830. https://doi.org/10.1063/1.1693160
Article ADS Google Scholar
Carlitz L (1963) The inverse of the error function. Pac J Math 13:459–470
Article MathSciNet Google Scholar
Eidous OM, Abu-Shareefa R (2020) New approximations for standard standard normal distribution function. Commun Stat Theory Methods 49:1357–1374. https://doi.org/10.1080/03610926.2018.1563166
Article MathSciNet Google Scholar
GCC team, The GNU Fortran Compiler, October 2022. https://gcc.gnu.org/onlinedocs/gfortran/
Hammersley JM, Handscomb DC (1964) Monte Carlo methods. Springer, Netherlands. https://doi.org/10.1007/978-94-009-5819-7
Book Google Scholar
Hockney RW, Eastwood JW (1981) Computer simulation using particles. McGraw-Hill, New York
Google Scholar
Intel Corporation (2022) Intel® oneAPI Specification 1.2-rev-1 documentation. https://spec.oneapi.io/versions/latest/
Intel Corporation, Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference, December 2022. https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/
Kinderman AJ, Monahan JF (1977) Computer generation of random variables using the ratio of uniform deviates. ACM Tran Math Softw 3:257–260. https://doi.org/10.1145/355744.355750
Article Google Scholar
Leva JL (1992) A fast standard normal random number generator. ACM Tran Math Softw 18:449–453. https://doi.org/10.1145/138351.138364
Article Google Scholar
Lindeberg JW (1922) Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung. Math Z 15:211–225. https://doi.org/10.1007/FBF01494395
Marsaglia G, Bray TA (1964) A convenient method for generating standard normal variables. SIAM Rev 6:260–264. https://doi.org/10.1137/1006063
Article MathSciNet ADS Google Scholar
Marsaglia G, Tsang WW (1984) A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions. SIAM J Sci Stat Comput 5:349–359. https://doi.org/10.1137/0905026
Article MathSciNet Google Scholar
Marsaglia G, Tsang WW (1998) The Monty Python method for generating random variables. ACM Tran Math Softw 24:341–350. https://doi.org/10.1145/292395.292453
Article MathSciNet Google Scholar
Marsaglia G, Tsang WW (2000) The ziggurat method for generating random variables. J Stat Softw 5:1–7. https://doi.org/10.18637/jss.v005.i08
Article Google Scholar
Mathworks Incorporated, MATLAB® Mathematics, February 2023. https://www.mathworks.com/help/matlab/mathematics.html
Matić I, Radoičić IR, Stefanica D (2018) A sharp Pólya-based approximation to the standard normal cumulative distribution function. Appl Math Comput 322:111–122. https://doi.org/10.1016/j.amc.2017.10.019
Article MathSciNet Google Scholar
Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul 8:3–30. https://doi.org/10.1145/272991.272995
Article Google Scholar
Morse RL, Nielson CW (1971) Numerical simulation of the Weibel instability in one and two dimensions. Phys Fluids 14:830–840. https://doi.org/10.1063/1.1693518
Article ADS Google Scholar
NVIDIA Corporation, CUDA Math API, API Reference Manual, January 2023. https://docs.nvidia.com/cuda/cuda-math-api/
Philip JR (1960) The function inverfc $\theta$, Australian. J Phys 13:13–20
MathSciNet Google Scholar
Soranzo A, Epure E (2014) Very simply explicitly invertible approximations of standard normal cumulative and standard normal quantile function. Appl Math Sci 8:4323–4341. https://doi.org/10.12988/ams.2014.45338
Article Google Scholar
Soranzo A, Epure E (2012a) Simply explicitly invertible approximations to 4 decimals of error function and standard normal cumulative distribution function. arXiv:1201.1320 [stat.CO]
Soranzo A, Epure E (2012b) Practical explicitly invertible approximation to 4 decimals of standard normal cumulative distribution function modifying Winitzki’s approximation of erf. arXiv:1211.6403 [math.ST], https://arxiv.org/abs/1211.6403
Winitzki S (2008) A Handy approximation for the Error Function and its Inverse, online text. http://sites.google.com/site/winitzki/sergei-winitzkis-files
Yerukala R, Boiroju NK (2015) Approximations to standard normal distribution function. Int J Sci Eng Res 6:515–518
Google Scholar

Download references

Acknowledgements

This work was conducted as a computational joint research program at the Institute for Space-Earth Environmental Research, Nagoya University.

Funding

Open Access funding provided by Nagoya University. This work was supported by MEXT/JSPS under Grant-In-Aid (KAKENHI) for Scientific Research (B) No. JP19H01868.

Author information

Takayuki Umeda
Present address: Infomration Initiative Center, Hokkaido University, Sapporo, 060-0811, Japan

Authors and Affiliations

Institute for Space-Earth Environmental Research, Nagoya University, Nagoya, 464-8601, Japan
Takayuki Umeda

Authors

Takayuki Umeda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.U. developed the new schemes, developed the numerical codes, performed the numerical tests, analyzed the numerical results. drafted the manuscript, and approved the final manuscript.

Corresponding author

Correspondence to Takayuki Umeda.

Ethics declarations

Conflict of interest

The author declares that they have no competing interests.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Umeda, T. Generation of normal distributions revisited. Comput Stat (2024). https://doi.org/10.1007/s00180-024-01468-3

Download citation

Received: 21 May 2023
Accepted: 17 January 2024
Published: 23 February 2024
DOI: https://doi.org/10.1007/s00180-024-01468-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Generation of normal distributions revisited

Abstract

Similar content being viewed by others