skip to main content

A New Test for Hamming-Weight Dependencies

Published:25 July 2022Publication History

Skip Abstract Section

Abstract

We describe a new statistical test for pseudorandom number generators (PRNGs). Our test can find bias induced by dependencies among the Hamming weights of the outputs of a PRNG, even for PRNGs that pass state-of-the-art tests of the same kind from the literature, and particularly for generators based on F2-linear transformations such as the dSFMT [22], xoroshiro1024+  [1], and WELL512 [19].

Skip 1INTRODUCTION Section

1 INTRODUCTION

Pseudorandom number generators (PRNGs) are algorithms that generate a seemingly random output using a deterministic algorithm. A w-bit PRNG is defined by a state space S, a transition (or next-state) computable function \( \tau :S\rightarrow S \), and a computable output function \( \varphi :S\rightarrow \lbrace \,0,1\,\rbrace ^w \) that maps the state space into w-bit words. One then considers an initial state, or seed \( \sigma \in S \), and computes the sequence of w-bit outputs \( \begin{equation*} \varphi (\sigma), \varphi (\tau (\sigma)), \varphi \bigl (\tau ^2(\sigma)\bigr), \varphi \bigl (\tau ^3(\sigma)\bigr), ... \end{equation*} \) The outputs can be used to generate reals in the unit interval—for example, multiplying them by \( 2^{-w} \). Knuth [8] discusses PRNGs at length.1

A classic example is given by multiplicative congruential generators, which are defined by a prime modulus \( \mu \) and a multiplier \( \alpha \). Then, \( S=\mathbf {Z}/\mu \mathbf {Z} \), \( \tau :x\mapsto \alpha x \), and the output function is given by the binary representation of x (one tries to choose \( \mu \) close to \( 2^w \)). Another well-known example is the class of \( \mathbf {F}_2 \)-linear generators [10], in which S is a vector of \( \mathbf {F}_2^n \) (i.e., n bits) and \( \tau \) is an \( \mathbf {F}_2 \)-linear transformation on S; however, usually, the transformation can be expressed by basic \( \mathbf {F}_2 \)-linear operations on words, such as rotations, shift, and XORs, rather than in matrix form. The output function might pick a word of w bits from the state: for example, \( n=kw \) for some k and then the state can be represented by k w-bit words, and the output function can just choose one of those. In some generators, moreover, the output function is not \( \mathbf {F}_2 \)-linear.

Several theoretical properties help in the design of PRNGs: however, once designed, a PRNG is submitted to a set of statistical tests, which try to discover some statistical bias in the output of the generator. The tests compute, using the output of the generator, statistics whose distribution is known (at least approximately) under the assumption that the output of the generator is random. Then, by applying the (complementary) cumulative distribution function to the statistics, one obtains a p-value, which should be neither too close to zero nor too close to one (see Knuth [8] for a complete introduction to the statistical testing of PRNGs).

The Hamming weight of a w-bit word x is the number of ones in its binary representation. Tests for Hamming-weight dependencies try to discover some statistical bias in the Hamming weight of the output of the generator. In particular, such tests do not depend on the numerical values of the outputs: indeed, sorting the bits of each examined block (e.g., first all zeroes and then all ones) would not modify the results of the test (albeit the values would now be very small).

Since the number of ones in a random w-bit word has a binomial distribution with w trials and probability of success \( 1/2 \), in the most trivial instance one examines m consecutive outputs \( {x}_0 \), \( {x}_1 \), \( \dots \, \), \( {x}_{m-1} \) and checks that the average of their Hamming weights has the correct distribution (which will be quickly approximated very well by a normal distribution as m grows [5]). Tests may also try to detect dependencies: for example, one can consider (overlapping) pairs of consecutive outputs, and check that the associated pairs of Hamming weights have the expected distribution [11]. Matsumoto and Nishimura [17] have introduced a theoretical figure of merit that can predict after how many samples a \( \mathbf {F}_2 \)-linear generator will fail a specific Hamming-weight test. The NIST statistical test suite for PRNGs [20] contains tests based on Hamming-weight dependencies as well.

In this article, we introduce a new test for Hamming-weight dependencies that improves significantly over the state of the art. We find bias in some old and some new generators for which tests of this type from TestU01 [12], a well-known framework for statistical testing of PRNGs, were unable to find bias even using a large amount of output.

All code used in this article is available under the GNU General Public License.2 Code for reproducing the results of this work has been permanently stored on the Zenodo platform.3

Skip 2MOTIVATION Section

2 MOTIVATION

It is known since the early days of \( \mathbf {F}_2 \)-linear generators that sparse transition matrices induce some form of dependency on the Hamming weight of the state. Since the output is computed starting from the state, these dependencies might induce Hamming-weight dependencies in the output as well. For example, if the state has very low Hamming weight—that is, very few ones—one might need a few iterations (or more than a million, in the case of the Mersenne Twister with 19,937 bits of state [16]) before the state contains ones and zeroes approximately in the same amount. However, this is a minor problem because for generators with, say, at least 128 bits of state, the probability of passing through such states is negligible.

Yet what we witness very clearly in the case of almost-all-zeros state might be true in general: states with few ones might lead to states with few ones, or due to XOR operations, states with many ones might lead to states with few ones. This kind of dependency is more difficult to detect.

Here we consider as motivation a few generators: xorshift128+ [25] is the stock generator of most Javascript implementations in common browsers; the SFMT (SIMD-Friendly Mersenne Twister) [21] is a recent improvement on the classic Mersenne Twister using SIMD instructions, and we will use the version with 607 bits of state; the dSFMT [22] is another version of the Mersenne Twister that natively generate doubles, and we will use the version with 521 bits of state; WELL is a family of generators with excellent equidistribution properties [19], and we will use the version with 512 bits of state; and finally, we consider a new \( \mathbf {F}_2 \)-linear transformation, xoroshiro, designed by the authors, and the associated generators xoroshiro128+ and xoroshiro1024+ [1].4 All of these generators have quite sparse transition matrices (WELL512 having the densest matrix), and one would expect some kind of Hamming-weight dependency to appear.

To check whether this is true, we can test for such dependencies using TestU01 [12], a well-known framework for testing generators, which implements tests related to Hamming weights from L’Ecuyer and Panneton [11] and Rukhin et al. [20] (please refer to the TestU01 guide for a detailed description of the tests). Table 1 shows the basic parameters of the nine tests we performed. The parameters were inspired by the author choices in the BigCrush test suite [12], but instead of analyzing \( 10^{9} \) or fewer outputs, as happens in BigCrush, we analyze up to \( 10^{13} \) 32-bit values (e.g., 40 TB of data), hoping that examining a much larger amount of data might help in finding bias in the generators. Besides the parameter inspired by BigCrush, following a suggestion from a referee, we also tried a number of power-of-two values for the L parameter of HammingIndep and HammingCorr.

Table 1.
LabelTest Parameters
HW0HammingWeight2 (\( N=1 \), \( L=10^6 \))
HW1HammingWeight2 (\( N=1 \), \( L=10^7 \))
HW2HammingWeight2 (\( N=1 \), \( L=10^8 \))
HIkHammingIndep (\( N=1 \), \( L=k \))
HCkHammingCorr (\( N=1 \), \( L=k \))
  • Note: We consider the entire output of the generator (i.e., TestU01 parameters \( r=0 \), \( s=32 \)). The parameter d of HammingIndep has been set to zero. The parameter k varies among 30, 300, and 1,200 for HI and 30, 300, 500 for HC, as in BigCrush. Moreover, in both cases, we tested k equal to 128, 256, 512, and 1,024 following a suggestion from a referee.

Table 1. Parameters for Statistical Tests Related to Hamming-Weights from TestU01 [12]

  • Note: We consider the entire output of the generator (i.e., TestU01 parameters \( r=0 \), \( s=32 \)). The parameter d of HammingIndep has been set to zero. The parameter k varies among 30, 300, and 1,200 for HI and 30, 300, 500 for HC, as in BigCrush. Moreover, in both cases, we tested k equal to 128, 256, 512, and 1,024 following a suggestion from a referee.

Some of the generators have a 64-bit output, but TestU01 expects generators to return 32-bit integers, so we tested both the lower 32 bits, the upper 32 bits, and the upper and lower bits interleaved. (We do not discard any bits, as it is possible in TestU01.) In the case of the dSFMT, there is a specific call to generate a 32-bit value.

The disappointing results are reported in Table 2: despite the very sparse nature of the transition matrices of these generators, and the very large amount of data, only problems with the SFMT (already using \( 10^9 \) values), xorshift128+ (using \( 10^{10} \) values), and xoroshiro128+ (using \( 10^{13} \) values) are reported.5 All other p-values at \( 10^{13} \) are within the range \( [0.01..0.99] \).6

Table 2.
n\( n=10^{9} \)\( n=10^{10} \)\( n=10^{11} \)\( n=10^{12} \)\( n=10^{13} \)
SFMT (607 bits)HI512 (I)
xorshift128+HI128 (U)
dSFMT (521 bits)
WELL512
xoroshiro128+HI128 (I)
xoroshiro1024+
  • Note: The n parameter gives the number of 32-bit outputs examined. The tests have been run on the lower 32 bits of the output (“L”), the upper 32 bits (“U”), and interleaving the upper and lower bits (“I”). We report the first test failed by a generator, where failure is a p-value outside of the range \( [0.01..0.99] \).

Table 2. Results for the TestU01 [12] Statistical Tests Related to Hamming Weights

  • Note: The n parameter gives the number of 32-bit outputs examined. The tests have been run on the lower 32 bits of the output (“L”), the upper 32 bits (“U”), and interleaving the upper and lower bits (“I”). We report the first test failed by a generator, where failure is a p-value outside of the range \( [0.01..0.99] \).

Skip 3TESTING HAMMING-WEIGHT DEPENDENCIES Section

3 TESTING HAMMING-WEIGHT DEPENDENCIES

In this section, we introduce a new test for Hamming-weight dependencies that will find bias in all generators from Table 2. For the generators whose bias was detected by TestU01, the test will be able to obtain similar or better p-values using an order of magnitude fewer data.

Let us denote with \( \nu x \) [9] the Hamming weight of a w-bit word x—that is, the number of ones in its binary representation. We would like to examine the output of a generator and find medium-range dependencies in the Hamming weights of its outputs.

For example, the current output might have an average weight (close to \( w/2 \)) with higher-than-expected probability if three outputs ago we have seen a word with average weight, or the current output might have a non-average weight (high or low) with higher-than-expected probability depending on whether four outputs ago we have seen average weight and five outputs ago we have seen non-average weight.

We will consider sequences of w-bit values (w even and not too small, say \( \ge 16 \)) extracted from the output of the generator. Usually, w will be the entire output of the generator, but it is possible to run the test on a subset of bits, break the generator output into smaller pieces fed sequentially to the test, or glue (a subset of) the generator output into larger pieces.

The basic idea of the test is that of generating a vector whose coordinates should appear to be drawn from independent random variables with a standard normal distribution, given that the original sequence was random; apply a unitary transformation, obtaining a transformed vector; and derive a p-value using the fact that the coordinates of the transformed vector should still appear to be drawn from independent random variables with a standard normal distribution [24], given that the original sequence was random. The transform will be designed in such a way to make dependencies as those we described emerge more clearly.

First of all, we first must define a window we will be working on: thus, we fix a parameter k and consider overlapping k-tuples of consecutive w-bit values (ideally, the number of bits of state should be less than kw). We will write \( \langle x_0, x_1, ..., x_{k-1}\rangle \) for such a generic k-tuple.

Now we need to classify outputs as “average” or “extremal” with respect to their Hamming weight. We thus consider an integer parameter \( \ell \le w/2 \) and the map \( \begin{equation*} x\stackrel{d}{\mapsto }{\left\lbrace \begin{array}{ll}0& \text{if $\nu x\lt w/2 -\ell $,}\\ 1& \text{if $w/2 -\ell \le \nu x\le w/2 + \ell $,}\\ 2& \text{if $\nu x\gt w/2 + \ell $.} \end{array}\right.} \end{equation*} \) In other words, we compute the Hamming weight of x and categorize x in three classes: left tail (before the \( 2\ell +1 \) most frequent weights), central (the \( 2\ell +1 \) central, most frequent weights), and right tail (after the \( 2\ell +1 \) most frequent weights). The standard choice for \( \ell \) is the integer such that the overall probability of the \( 2\ell +1 \) most frequent weights is closest to \( 1/2 \). For example, for \( w=32 \) we have \( \ell =1 \), whereas for \( w=64 \) we have \( \ell =2 \).

We thus get from the k-tuple \( \langle x_0, x_1, ..., x_{k-1}\rangle \) a signature \( \langle d(x_0), d(x_1), ..., d(x_{k-1})\rangle \) of ktrits (base-3 digits), which we will identify with its value as a number in base 3: \( \begin{equation*} \sum _{i=0}^{k-1}d(x_i)3^{k-1-i}. \end{equation*} \)

Now, given a sequence of m w-bit values, for each signature s we compute the average number of ones in the word appearing after a k-tuple with signature s in the sequence. More precisely, a subsequence of the form \( \langle x_0, x_1, ..., x_k\rangle \) contributes \( \nu x_k \) to the average associated with the signature \( \langle d(x_0), d(x_1), ..., d(x_{k-1})\rangle \).

This bookkeeping can be easily performed using \( 3^k \) integer variables while streaming the generator output. For a large m, this procedure yields \( 3^k \) values with approximately normal distribution [5],7 which we normalize to a standard normal distribution; we denote the resulting row vector with \( \mathbf {v}=\langle {v}_0 \), \( {v}_1 \), \( \dots \, \), \( {v}_{3^k-1}\rangle \).8\( ^{,} \)9

We now apply to \( \mathbf {v} \) a Walsh–Hadamard-like transform, multiplying \( \mathbf {v} \) by the k-th Kronecker power10 \( T_k \) of the unitary base matrix (1) \( \begin{equation} M=\left(\begin{matrix}\frac{1}{\sqrt 3} & \phantom{-}\frac{1}{\sqrt 2} & \phantom{-}\frac{1}{\sqrt 6}\\ \frac{1}{\sqrt 3} & \phantom{-}0 & -\frac{2}{\sqrt 6}\\ \frac{1}{\sqrt 3} & -\frac{1}{\sqrt 2} & \phantom{-}\frac{1}{\sqrt 6}\\ \end{matrix}\right). \end{equation} \) Assuming that \( T_k \) is indexed using sequences of trits as numerals in base 3, the transform can be implemented recursively in the same vein as the fast Walsh–Hadamard transform (or any transform based on Kronecker powers), as if we write \( \mathbf {v}= [\mathbf {v}^0\;\mathbf {v}^1\;\mathbf {v}^2] \), where \( \mathbf {v}^0 \), \( \mathbf {v}^1 \), and \( \mathbf {v}^2 \) are the three subvectors indexed by signatures starting with 0, 1, and 2, respectively, we have by definition \( T_0=1 \) and (2) \( \begin{equation} \begin{split} \mathbf {v} T_k=[\mathbf {v}^0\quad \mathbf {v}^1\quad \mathbf {v}^2]\left(\begin{matrix}\frac{1}{\sqrt 3}T_{k-1} & \phantom{-}\frac{1}{\sqrt 2}T_{k-1} & \phantom{-}\frac{1}{\sqrt 6}T_{k-1}\\ \frac{1}{\sqrt 3}T_{k-1} & \phantom{--}0 & -\frac{2}{\sqrt 6}T_{k-1}\\ \frac{1}{\sqrt 3}T_{k-1} & -\frac{1}{\sqrt 2}T_{k-1} & \phantom{-}\frac{1}{\sqrt 6}T_{k-1}\\ \end{matrix}\right)\\ =\left[\frac{1}{\sqrt 3}(\mathbf {v}^0 +\mathbf {v}^1 +\mathbf {v}^2)T_{k-1} \quad \frac{1}{\sqrt 2}(\mathbf {v}^0 -\mathbf {v}^2)T_{k-1} \quad \frac{1}{\sqrt 6}(\mathbf {v}^0 -2\mathbf {v}^1 +\mathbf {v}^2)T_{k-1} \right]. \end{split} \end{equation} \) A detailed C implementation of \( T_k \) will be described in Section 4.3.

We will denote the transformed vector by \( \mathbf {v}^{\prime }= \mathbf {v} T_k \), and we shall write \( v^{\prime }_i \) for the transformed values. Since \( T_k \) is unitary, the \( v^{\prime }_i \)’s must appear still to be drawn from a standard normal distribution, and we can thus compute p-values for each of them. We combine p-values by dividing the indices of the vector \( \mathbf {v}^{\prime } \) in C categories \( {\mathscr{C}}_1 \), \( {\mathscr{C}}_2 \), \( \dots \, \), \( {\mathscr{C}}_{C} \) using the number of non-zero trits contained in their base-3 representation—that is, the number of non-zero trits in the associated signature: \( \mathscr{C}_j \), \( 1\le j\lt C \), contains indices with j non-zero trits, whereas \( \mathscr{C}_C \) contains all remaining indices, whose base-3 representation has at least C non-zero trits (\( C\le k \); usually, \( C=\lfloor k/2\rfloor + 1 \)). We discard \( v^{\prime }_0 \).

Given a category, say of cardinality c, we thus have a p-value \( p_i \) for each \( v^{\prime }_i \) in the category; we then consider the minimum of the p-values, say \( \bar{p} \), and compute a final category p-value by composing with the cumulative distribution function of the minimum of c independent uniform random variables in the unit interval [3, Equation (2.2.2)], obtaining the category p-value \( 1-(1-\bar{p})^c \). Finally, we take the minimum category p-value over all categories, and apply again the same cumulative distribution function with parameter C, since we are taking the minimum over C categories: this yields the final p-value of the test. Formally, \( \begin{equation*} p = 1 - \left(1 - \min _{1\le j \le C}\left(1 - \left(1 - \min _{i\in \mathscr{C}_j} p_i \right)^{\left|\mathscr{C}_i\right|}\right)\right)^C. \end{equation*} \)

The point of the transform \( T_k \) is that while \( v_i \) represents the (normalized) average number of ones after k previous outputs with density pattern described by the trit representation of i, \( v^{\prime }_i \) represents a combination of the average number of ones after k previous outputs satisfying different constraints: in the end, a unitary transformation is just a change of coordinates.

As a simple example, let us write the index \( \bar{\imath } \) of a transformed value \( v^{\prime }_{\bar{\imath }} \) as a sequence of trits \( {t}_1 \), \( {t}_2 \), \( \dots \, \), \( {t}_{k} \). If the trits are all zero, looking at (2) one can see that we are just computing the normalized sum of all values, which is of little interest: indeed, we discard \( v^{\prime }_0 \).

On the contrary, if a single trit, say in position \( \bar{\jmath } \), is equal to 1, \( v^{\prime }_{\bar{\imath }} \) is given by the sum of all \( v_i \)’s in which the \( \bar{\jmath } \)-th trit of i is 0 (\( \bar{\jmath } \) steps before we have seen few zeros) minus the sum of all \( v_i \)’s in which the \( \bar{\jmath } \)-th trit of i is 2 (\( \bar{\jmath } \) steps before we have seen many zeros): if the Hamming weight of the output depends on the Hamming weight of the output \( \bar{\jmath } \) steps before, the value of \( v^{\prime }_{\bar{\imath }} \) will be biased. Intuitively, if the transition matrix is very sparse, we expect vectors with low or high Hamming weight to be mapped to vectors with the same property.

If instead a single trit in position \( \bar{\jmath } \) is equal to 2, we will detect a kind of bias in which the Hamming weight of the current value depends on whether the Hamming weight of the output \( \bar{\jmath } \) steps before was extremal or average: more precisely, whether the value is larger when the Hamming weight of the output \( \bar{\jmath } \) steps before was average and smaller when the Hamming weight of the output \( \bar{\jmath } \) steps before was extremal (and vice versa). Intuitively, we expect that the shift/rotate-XOR of states with a very small or very large number of ones will have a small number of ones (in the first case, by sparsity, in the second case, by cancellation).

More complex trit patterns detect more complex dependencies: the most interesting patterns, however, usually are those with few non-zero trits, as a zero trit acts as a “don’t care about that previous output”: this property is immediate from (2), as the first \( 3^{k-1} \) output values, which correspond to a “don’t care” value in the first position, are obtained by applying recursively \( T_{k-1} \) over the renormalized sum of \( \mathbf {v}_0 \), \( \mathbf {v}_1 \), and \( \mathbf {v}_2 \), thus combining the values associated with signatures identical but for the first trit.

This is also why we assemble p-values by categories: by putting indices with a higher chance of giving low p-values in small categories, the test becomes more sensitive.

3.1 Results

We ran tests with \( w=32 \) or \( w=64 \) and k ranging from 8 to 19, depending on the state size. We performed the tests incrementally—that is, for increasingly larger values of m—and stopped after a petabyte (\( 10^{15} \) bytes) of data or if we detected a p-value smaller than \( 10^{-20} \).

Table 3 reports some generators failing our test. All generators considered other than xorshift pass BigCrush, except for linearity tests [2, 4, 14] (called MatrixRank and LinearComp in TestU01). We report the faulty signature—that is, the pattern of dependencies that caused the low p-value: it provides interesting insights into the structure of the generator. Indeed, we can see that for generators that cycle through their state array, combining a small part of the state, the test can locate exactly the dependencies from those parts: for example, the Hamming weight of which the output of xorshift1024 depends, not surprisingly, from the Hamming weight of the first and last word of state.

Table 3.
Generator\( p=10^{-20} \) @Faulty Signature
xorshift128\( 8\times 10^8 \)00000021
xorshift128+\( 6\times 10^9 \)00000012 (transitional)
xorshift1024\( 6\times 10^8 \)2000000000000001
xorshift1024+\( 9\times 10^9 \)2000000000000001 (transitional)
xoroshiro128\( 1\times 10^{10} \)00000012
xoroshiro128+\( 5\times 10^{12} \)00000012
xoroshiro1024\( 5\times 10^{12} \)1100000000000001
xoroshiro1024+\( 4\times 10^{13} \)1100000000000001 (transitional)
Tiny Mersenne Twister (127 bits)\( 8 \times 10^{13}\rightarrow \)00000202 (\( w=32 \))
Mersenne Twister (521 bits)\( 4\times 10^{10}\rightarrow \)1000000100000000
2000000100000000
Mersenne Twister (607 bits)\( 4\times 10^{8}\rightarrow 4\times 10^{10} \)1000000001000000000
2000000001000000000
SFMT (607 bits)\( 4\times 10^{8} \)001000001000
dSFMT (521 bits)\( 6 \times 10^{12} \)1001000100100010
WELL512\( 3 \times 10^{15} \)2001002200000000
  • Note: We report the number of bytes generating a p-value smaller than \( 10^{-20} \). We report also the trit signature that caused the low p-value. Ranges (represented using the arrow symbol \( \rightarrow \)) appear when we tried several variants: a missing right extreme means that some instances did not fail the test within the 1-PB limit.

Table 3. Detailed Results of the Test Described in Section 3 for \( w=64 \)

  • Note: We report the number of bytes generating a p-value smaller than \( 10^{-20} \). We report also the trit signature that caused the low p-value. Ranges (represented using the arrow symbol \( \rightarrow \)) appear when we tried several variants: a missing right extreme means that some instances did not fail the test within the 1-PB limit.

First, we examine a xorshift generator [13] with 128 bits of state and its variant xorshift128+ that we discussed in Section 2. We can find bias in the latter using just 6 GB of data. Analogously, we find bias in a xorshift generator using 1,024 bits of state, and in the SFMT [21] with 607 bits of state using just 400 MB of data. On these extremely simple generators, the performance of the test is thus in line with that of the tests in TestU01.

However, once we turn to the other generators in Table 2, the situation is different: we can find bias in all generators, sometimes using an order of magnitude less data than in Table 2.

Our test can also find Hamming-weight dependencies in some generators of the Mersenne Twister family with small-to-medium size. First of all, we consider the 64-bit Tiny Mersenne Twister [23], which has 127 bits of state and a significantly more complex structure than the other generators in Table 3. Moreover, contrarily to other members of the Mersenne Twister family, the output function of the Tiny Mersenne Twister contains a non-\( \mathbf {F}_2 \)-linear operation—a sum in \( \mathbf {Z}/2^{64}\mathbf {Z} \). To find the bias, we had to resort to a slightly more detailed analysis, using \( w=32 \) and breaking up the 64-bit output of the generator into two 32-bit words. We report a range of results because we tried a few parameters published by the authors.

We also analyzed the classic Mersenne Twister [16] at 521 and 607 bits. We used the library of Matsumoto and Nishimura [15] for the dynamic creation of Mersenne Twisters and generated eight different instances of each generator: this is why we report in Table 3 a range of values and multiple signatures. The 607-bit version performs much worse than the 521-bit version (in fact, all instances we tested failed even the classical Gap test from BigCrush). However, more importantly, we found huge variability in the test results depending on the parameter generated by the library: in some cases, the 607-bit Mersenne Twister performs in our test similarly to a xorshift128 generator, which has a simpler structure and a much smaller state.

Finally, we were able to find bias in WELL512 [19]. In this case, we noticed that the p-value was slowly drifting toward zero at about 1 PB of data, so we continued the test until it passed the threshold \( 10^{-20} \).

A comparison between Tables 2 and 3 clearly shows that our new test is significantly more powerful than the tests of the same kind available in TestU01, as it can detect bias on \( \mathbf {F}_2 \)-linear generators for which no such bias was previously detectable. In fact, to the best of our knowledge, this is the first time that Tiny Mersenne Twister, the dSFMT at 521 bits, and WELL512 fail a test that is not a linearity test.

It is worth noting that in the first submission of this work, xoroshiro128+ did not present failures in Table 2. We were able to find a low p-value (\( \approx \)\( 3\times 10^{-11} \)) only specifying the value 128 for the parameter L, as suggested by a referee. Larger values of L (e.g., 256, \( 300,..., \)) do not yield a failure. This is in sharp contrast with our test, where testing with \( k^{\prime }\gt k \) will preserve the failures found in dimension k, because if s is a k-dimensional failing signature, then \( 0^{k^{\prime }-k}s \) will be a failing \( k^{\prime } \)-dimensional signature, with some small adjustments due to the different scaling to the standard normal distribution and the different size of categories. In other words, increasing the dimension of the test will not prevent the test from detecting bias that was previously detectable at a lower dimension: the same does not happen for the HammingIndep test of TestU01.

Skip 4IMPLEMENTATION DETAILS Section

4 IMPLEMENTATION DETAILS

We will now discuss some implementation details. To be able to perform our test in the petabyte range, it must be engineered carefully: in particular, the main loop enumerating the output of the generator and computing the values \( v_i \) must be as fast as possible. Counting the number of ones in a word can be performed using single-clock specialized instructions in modern CPUs. The \( v_i \)’s are stored in an array of \( 3^k \) elements indexed by the value of a signature as a numeral in base 3, as required by the recursive implementation of \( T_k \) (see Section 4.3). One can very easily keep track of the current trit signature value by using the update rule \( s\leftarrow \lfloor s / 3\rfloor + t\cdot 3^{k-1} \), where t is the next trit.

We can replace the division with the fixed-point computation \( \lfloor (\lceil 2^{32}/3\rceil s) / 2^{32}\rfloor \) (this strategy works up to \( k=19 \) using 64-bit integers), so by precomputing \( \lceil 2^{32}/3\rceil \) and \( 3^{k-1} \), the costly operations in the update of s can be reduced to two independent multiplications.

4.1 Small Counters

The main implementation challenge, however, is that of reducing the counter update area to improve the locality of access to the counters, and possibly making it fit into some level of the processor cache.11 In a naive implementation, we would need to use two “large” 64-bit values to store the number of appearances of signature s and the sum of Hamming weights of the following words. Instead, we will use a single “small” 32-bit value, with a fourfold space saving. In particular, we will use 13 bits for the counter and 19 bits for the summation. This is a good choice, as the largest Hamming weight for \( w=32 \) or \( w=64 \) is 64, so if the counter does not overflow, the summation will not, either.12

We fix a batch size and update the small values blindly through the batch. At the end of the batch, we update the large counters using the current values of the small counters and zero the latter ones. At the same time, we check that the sum of the small counters is equal to the batch size: if not, a counter overflowed. Otherwise, we continue with the next batch, possibly computing the transform and generating a p-value.

4.2 Batch Sizes

How large should a batch be? We prefer larger batches, as access to large counters will be minimized, but too large a batch will overflow small counters. This question is interesting, as it is related to the mean passage time distribution of the Markov chain having all possible signatures as states, and the probability of moving from signature s to signature \( \lfloor s / 3\rfloor + t\cdot 3^{k-1} \) given by the probability of observing the trit t. Let this probability be p for the central values (trit 1) and \( (1-p)/2 \) for the extremal values (trits 0 and 2). We are interested in the following question: given a k, p, and a batch size B, what is the probability that a counter will overflow? This question can be reduced to the following question: given k, \( p, \) and a batch size B, what is the probability that the Markov chain after B steps passes through the all-one signature more than \( 2^{13} \) times?13 We want to keep this probability very low (say, \( 10^{-100} \)) so as to not interfere with the computation of the p-values from the test; moreover, in this way, if we detect a counter overflow we can simply report that we witnessed an event that cannot happen with probability greater than \( 10^{-100} \), given that the source is random—that is, a p-value.

Note that, in principle, we could use general results about Markov chains [6, Theorem 7.4.2] which state that in the limit, the number of passages is normally distributed with mean and variance related to those of the recurrence time distribution, which can, in turn, be computed symbolically using the Drazin inverse [7, 18].

Since, however, no explicit bound is known for the convergence speed of the preceding limit, we decided to compute exactly the mean passage time distribution for the all-ones signature. To do this, we model the problem as a further Markov chain with states \( x_{c,s} \), where \( 0\le c\le b \), b is a given overflow bound, and \( 0\le j\lt k \).

The idea is that we will define transitions so that after u steps the probability of being in state \( x_{c,j} \) will be the probability that after examining u w-bit values our current trit signature has a maximal suffix of j trits equal to one, and that we have counted exactly c passages through the all-ones signature (b or more, when \( c=b \)), with the proviso that the value \( j=k-1 \) represents both maximal suffixes of length \( k-1 \) and of length k (we can lump them together, as receiving a one increases the passage count in both cases). We use an initial probability distribution in which all states with \( c\ne 0 \) have probability zero, and all states with \( c=0 \) have probability equal to the steady-state probability of j, which implies that we are implicitly starting the original chain in the steady state. However, as argued in the work of Hunter [6], the initial distribution is essentially irrelevant in this context.

We now define the transitions so that the probability distribution of the new chain evolves in parallel with the distribution of passage times of the original chain (with the probability for more than b passages lumped together):

  • all states \( x_{c,j} \) have a transition with probability \( 1-p \) to \( x_{c,0} \);

  • all states \( x_{c,j} \) with \( j\lt k-1 \) have a transition with probability p to \( x_{c,j+1} \);

  • all states \( x_{c,k-1} \) with \( c\lt b \) have a transition with probability p to \( x_{c+1,k-1} \); and

  • there is a loop with probability p on \( x_{b,k-1} \).

It is easy to show that after u steps, the sum of the probabilities associated with the states \( x_{b,-} \) is exactly the probability of overflow of the counter associated with the all-one signature. We thus iterate the Markov chain (squaring the transition matrix is possible only for small k) until, say at step B, we obtain a probability of, say, \( 3^{-k}\bar{p} \): we can then guarantee that, given that the source is random, running our test with batches of size B we can observe overflow only with probability at most \( \bar{p} \).

This approach becomes unfeasible when we need to iterate the Markov chain more than, say, \( 10^7 \) times. However, at that point, we use a very good approximation: we apply a simple dynamic-programming scheme on the results for \( 10^6 \) steps to extend the results to a larger number of steps. The idea is that if you know the probability \( q_{u,c} \) that the counter for the all-ones signature is c after u steps, then approximately \( \begin{align*} q_{u+v,c} &= \sum _{f + g = c}q_{u,f}\cdot q_{v,g} \qquad \text{for $0\le c \lt b$,}\\ q_{u+v,b} &= \sum _{f + g \ge b}q_{u,f}\cdot q_{v,g}. \end{align*} \) The approximation is due to the fact that the preceding equations implicitly assume that the Markov chain is reset to its steady-state distribution after u steps, but experiments at smaller sizes show that the error caused by this approximation is, as expected, negligible for large u. We thus initialize \( q_{u,c} \) for \( u=10^6 \) with exact data, then we iterate the preceding process to obtain the probabilities \( q_{2^h u},c \). These probabilities are then combined in the same way to approximate the probabilities associated with every multiple of u; at that point, we can find the desired batch size by a binary search governed by the condition that the probability associated with the overflow bound b is below a suitable threshold (e.g., \( 10^{-100}/3^k \)).14

In the end, we computed the ideal batch size as described earlier for \( 1\le k\le 19 \) and included the result into our code (e.g., when \( w=64, \) one obtains 15 \( \times \) \( 10^3 \) for \( k=1 \), 23 \( \times \) \( 10^6 \) for \( k=8 \), and \( 10^9 \) for \( k=16 \)). Combining all ideas described in this section, our test for Hamming-weight dependencies with parameters \( w=64 \) and \( k=8 \) can analyze a terabyte of output of a 64-bit generator in little more than 3 minutes on an Intel Core i7-8700B CPU @3.20 GHz. The \( k=16 \) test is an order of magnitude slower due to the larger memory accessed.

4.3 Implementing the Transform Tk

In Figure 1, we show an in-place, recursive C implementation of the transform \( T_k \) defined in Section 3. The code is similar to analogous code for the Walsh–Hadamard transform or similar transforms based on Kronecker powers.

Fig. 1.

Fig. 1. The code for the (in-place) transform described in Section 3. It should be invoked with sig equal to \( 3^{k-1} \) .

The code assumes that the \( 3^k \)-dimensional vector \( \mathbf {v} \) is represented in the array v. The value associated with each signature is stored in a position equal to the signature (considered, as usual, as a base-3 numeral). In particular, the first \( 3^{k-1} \) values correspond to signatures of the form 0s, the following \( 3^{k-1} \) values to signatures of the form 1s, and the last \( 3^{k-1} \) values to signatures of the form 2s. The function transform() must be invoked on v with the additional parameter sig set to \( 3^{k-1} \).

We first note that if \( k=1, \) the function will just execute once the body of the for loop, resulting in the in-place multiplication of the 3-dimensional vector \( \mathbf {v} \) by the base matrix M, as expected.

In the general case, the code scans the three subarrays v, p1, and p2, of length \( 3^{k-1} \), which as discussed previously correspond to signatures starting with 0, 1, and 2, respectively. With the notation of (2), these subarrays correspond to the subvectors \( \mathbf {v}^0 \), \( \mathbf {v}^1 \), and \( \mathbf {v}^2 \), respectively, and it is immediate that the three subvectors appearing in the final result of (2) are computed in place by the for loop. After that computation, the recursion applies by induction the transform with one dimension less to each of the three subarrays in place. We conclude that transform() implements correctly in place the transform \( T_k \).

Skip 5CONCLUSION Section

5 CONCLUSION

We have described a new test for Hamming-weight dependencies based on a unitary transform. Properly implemented, the test is very powerful: for example, in a matter of hours, it finds bias in the dSFMT with 521 bits of state and in xoroshiro1024+; it can even find bias in WELL512, even though its transition matrix is much denser, and in the Tiny Mersenne Twister. For these generators, no bias was previously known beyond linearity tests. In particular, the Hamming-weight tests in TestU01 [12], a state-of-the-art testing framework, are unable to find any bias in several generators of Table 3, whereas all those generators fail our test.

Our test is very effective on \( \mathbf {F}_2 \)-linear generators with relatively sparse transitions matrices, particularly when wk is not smaller than the number of bits of state of the generator. In practice, the best results are obtained on generators with less than a few thousand bits of state.

Similar to linearity tests, a failure in our test is an indication of lesser randomness, but in general the impact will depend on the application. We do not expect dependencies like those in xorshift128+ or the SFMT to be pernicious, but they highlight a weakness of the associated \( \mathbf {F}_2 \)-linear transformation.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

The authors would like to thank Jeffrey Hunter for useful pointers to the literature about mean passage times in Markov chains and Pierre L’Ecuyer for a number of suggestions that improved the quality of the presentation.

Footnotes

  1. 1 Here we are slightly simplifying the presentation: in general, the codomain of the output function can be an arbitrary finite set; moreover, depending on the detailed definition, the output sequence might start with \( \varphi (\tau (\sigma)) \).

    Footnote
  2. 2 http://prng.di.unimi.it/.

    Footnote
  3. 3 https://zenodo.org/badge/latestdoi/412112034.

    Footnote
  4. 4 The generators combine two words of state using a sum in \( \mathbf {Z}/2^{64}\mathbf {Z} \).

    Footnote
  5. 5 The latter failure emerged only after testing with additional values of the parameter L suggested by one of the referees.

    Footnote
  6. 6 In two cases, we found p-values slightly outside this range, but a test using \( 1.2\times 10^{13} \) values showed that they were statistical flukes.

    Footnote
  7. 7 In practice, one must size m depending on the number of signatures so that each signature has a sufficiently large number of associated samples. Implementations can quickly provide the user with a preview output using the value zero for random variables associated with non-appearing signatures, warning the user that final p-values too close to one might be artifacts.

    Footnote
  8. 8 Breaking the generator output in smaller pieces obviously provides a finer analysis of the distribution of each piece, but a test with parameters k and w “covers” kw bits of output: if we instead analyze values made of \( w/2 \) bits, to cover kw bits of output we need to increase the length of tuples to 2k, with a quadratic increase of memory usage.

    Footnote
  9. 9 There is also a transitional variant of the test: we see the sequence of w-bit values as a stream of bits, xor the stream with itself shifted forward by one bit, and run the test on the resulting w-bit values. In practice, we look for Hamming-weight dependencies between bit transitions.

    Footnote
  10. 10 For a definition of the Kronecker product, see the work of Zhang [26, Section 4.3].

    Footnote
  11. 11 When k is large, this is not possible, but we provide the option of improving memory access using large pages of the translation lookaside buffer where available.

    Footnote
  12. 12 With a similar argument, when \( w=16 \) one can choose 14 bits for the counter and 18 bits for the summation.

    Footnote
  13. 13 It is obvious that the the all-ones signature has the highest probability in the steady-state distribution, and that by bounding its probability of overflow we obtain a valid bound for all other signatures as well.

    Footnote
  14. 14 It is worth noting that based on the preceding computations, the normal approximation [6] is not very accurate even after a billion steps.

    Footnote

REFERENCES

  1. [1] Blackman David and Vigna Sebastiano. 2021. Scrambled linear pseudorandom number generators. ACM Transactions on Mathematical Software 47, 4 (2021), Article 36, 32 pages.Google ScholarGoogle Scholar
  2. [2] Carter G. D.. 1989. Aspects of Local Linear Complexity. Ph.D. Dissertation. University of London.Google ScholarGoogle Scholar
  3. [3] David H. A. and Nagaraja H. N.. 2004. Order Statistics. Wiley.Google ScholarGoogle Scholar
  4. [4] Erdmann E. D.. 1992. Empirical Tests of Binary Keystreams. Master’s Thesis. Department of Mathematics, Royal Holloway and Bedford New College, University of London.Google ScholarGoogle Scholar
  5. [5] Hipp Christian and Mattner Lutz. 2008. On the normal approximation to symmetric binomial distributions. Theory of Probability and Its Applications 52, 3 (2008), 516523.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Hunter Jeffrey J.. 1983. Mathematical Techniques of Applied Probability, Volume 2, Discrete-Time Models: Techniques and Applications. Academic Press, New York, NY.Google ScholarGoogle Scholar
  7. [7] Hunter Jeffrey J.. 2008. Variances of first passage times in a Markov chain with applications to mixing times. Linear Algebra and Its Applications 429, 5 (2008), 11351162.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Knuth Donald E.. 1998. The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  9. [9] Knuth Donald E.. 2011. The Art of Computer Programming, Volume 4: Combinatorial Algorithms. Part 1. Vol. 4A. Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  10. [10] L’Ecuyer Pierre and Panneton François. 2009. \( \mathbf {F}_2 \)-linear random number generators. In Advancing the Frontiers of Simulation, Alexopoulos Christos, Goldsman David, and Wilson James R. (Eds.). International Series in Operations Research & Management Science, Vol. 133. Springer US, 169193.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] L’Ecuyer Pierre and Simard Richard. 1999. Beware of linear congruential generators with multipliers of the form \( a=\pm 2^q\pm 2^r \). ACM Transactions on Mathematical Software 25, 3 (1999), 367374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] L’Ecuyer Pierre and Simard Richard. 2007. TestU01: A C library for empirical testing of random number generators. ACM Transactions on Mathematical Software 33, 4 (2007), Article 22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Marsaglia George. 2003. Xorshift RNGs. Journal of Statistical Software 8, 14 (2003), 16.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Marsaglia George and Tsay Liang-Huei. 1985. Matrices and the structure of random number sequences. Linear Algebra and Its Applications 67 (1985), 147156.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Matsumoto Makoto and Nishimura Takuji. 1998. Dynamic creation of pseudorandom number generators. Monte Carlo and Quasi-Monte Carlo Methods 2000 (1998), 5669.Google ScholarGoogle Scholar
  16. [16] Matsumoto Makoto and Nishimura Takuji. 1998. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 8, 1 (1998), 330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Matsumoto Makoto and Nishimura Takuji. 2002. A nonempirical test on the weight of pseudorandom number generators. In Monte Carlo and Quasi-Monte Carlo Methods 2000: Proceedings of a Conference Held at Hong Kong Baptist University, Hong Kong SAR, China, Fang Kai-Tai, Niederreiter Harald, and Hickernell Fred J. (Eds.). Springer, Berlin, Germany, 381395.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Jr Carl D. Meyer. 1975. The role of the group generalized inverse in the theory of finite Markov chains. SIAM Review 17, 3 (1975), 443464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Panneton François, L’Ecuyer Pierre, and Matsumoto Makoto. 2006. Improved long-period generators based on linear recurrences modulo 2. ACM Transactions on Mathematical Software 32, 1 (2006), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Rukhin Andrew, Soto Juan, Nechvatal James, Smid Miles, Barker Elaine, Leigh Stefan, Levenson Mark, et al. 2001. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. National Institute for Standards and Technology, pub-NIST:adr. NIST Special Publication 800-22, with revisions dated May 15, 2001.Google ScholarGoogle Scholar
  21. [21] Saito Mutsuo and Matsumoto Makoto. 2008. SIMD-oriented fast Mersenne Twister: A 128-bit pseudorandom number generator. In Monte Carlo and Quasi-Monte Carlo Methods 2006, Keller Alexander, Heinrich Stefan, and Niederreiter Harald (Eds.). Springer, 607622.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Saito Mutsuo and Matsumoto Makoto. 2009. A PRNG specialized in double precision floating point numbers using an affine transition. In Monte Carlo and Quasi-Monte Carlo Methods 2008, L’Ecuyer Pierre and Owen Art B. (Eds.). Springer, Berlin, Germany, 589602. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Saito Mutsuo and Matsumoto Makoto. 2015. Tiny Mersenne Twister (Version 1.1). Retrieved April 2, 2022 from http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/TINYMT/.Google ScholarGoogle Scholar
  24. [24] Tong Y. L.. 2012. The Multivariate Normal Distribution. Springer, New York, NY.Google ScholarGoogle Scholar
  25. [25] Vigna Sebastiano. 2016. Further scramblings of Marsaglia’s xorshift generators. Journal of Computational and Applied Mathematics 315 (2016), 175181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Zhang F.. 2011. Matrix Theory: Basic Results and Techniques. Springer, New York, NY.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A New Test for Hamming-Weight Dependencies
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Modeling and Computer Simulation
        ACM Transactions on Modeling and Computer Simulation  Volume 32, Issue 3
        July 2022
        119 pages
        ISSN:1049-3301
        EISSN:1558-1195
        DOI:10.1145/3514182
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2022
        • Online AM: 28 March 2022
        • Revised: 1 March 2022
        • Accepted: 1 March 2022
        • Received: 1 April 2021
        Published in tomacs Volume 32, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format