1 Introduction

A common property of cryptographic primitives in the domain of public-key cryptography (PKC) is that there is, in most cases, a natural distinction between a secret-key holder (SKH) and a public-key holder (PKH). For instance, in the digital signature (DS) context the SKH is the signer, and in public-key encryption (PKE) the SKH is the receiver; the verifier and the sender, respectively, are PKHs. The security properties of such schemes are typically focused on protecting primarily the SKH: In the signature context, unforgeability means that the signer cannot be impersonated by an adversary, and security notions for PKE require that messages encrypted to the receiver remain confidential. Thus, naturally, the SKH has a vital interest in its keys being properly generated, i.e., in a way covered by the security model, while this is only of secondary importance to the PKH.

In some PKC applications, however, also parties not holding the secret key might require assurance about that the key material has been generated in a proper way. Typical examples arise in multi-party settings where the SKH manages a set of mutually distrusting parties who require protection from each other. For instance, in group signature schemes there is a group manager that issues certificates to registered parties, allowing them to sign messages on behalf of the whole group. While the resulting signatures should in principle be anonymous (cannot be linked to the particular signer), to prevent misuse there is often a traceability feature that allows the group manager to revoke the anonymity of a signer by creating a publicly-verifiable non-interactive proof that testifies that an indicated signer created a particular signature. If such a tracing option exists, the group manager should however not be able to falsely accuse a member of having signed some document. Many group signature schemes have been proposed in the past, but some of them (e.g., [1]) provably provide the latter property only if the group manager’s keys are properly formed.Footnote 1 Other settings where trust in the secret keys generated by other parties is required include e-cash [13], cryptographic accumulators [9], undeniable signatures [18], double-authentication preventing signatures [2, 27].

If a cryptographic scheme is solely based on the discrete logarithm problem (DLP) in a prime-order group, checking that keys of the type \(X=g^x\) are well-formed is a trivial job (because all keys are well-formed). In the RSA setting the situation is more subtle: Given parameters (Ne), before assuming the security of the system the PKH might want to be convinced that the following questions can be answered affirmatively: (1) does N have precisely two prime divisors, (2) is N square-free, (3) is e coprime to \(\varphi (N)\), i.e., is the mapping \(m\mapsto m^e\bmod N\) a bijection (rather than lossy). Further, in some settings it might be necessary to know (4) whether \(N=pq\) is a safe-prime modulus, i.e., whether \((p-1)/2\) and \((q-1)/2\) are primes by themselves. In settings specifically based on the hardness of factoring an additional question might be (5) whether squaring is a bijection on \(\mathbf {QR}(N)\), more specifically (6) whether N is a Blum integer, and even more specifically (7) whether N is a Rabin–Williams integer.Footnote 2

What are known approaches for convincing participants of the validity of predicates like the ones listed above? In some research papers corresponding arguments are just missing [1], or they are side-stepped by explicitly assuming honesty of key generation in the model [2]. Other papers refer to works like [10] that propose non-interactive proof systems for convincing verifiers of the validity of such relations. Concretely, [10] provides a NIZK framework for showing that an RSA number is the product of two safe primes. While powerful, the NIZK technique turns out to be practically not usable: The argument is over the intermediate results of four Miller–Rabin tests, a large number of range tests, etc., making the resulting proof string prohibitively long. Another approach is to pick prime numbers, moduli, and exponents in a certain way such that showing specific properties becomes feasible with number-theoretic techniques. Working with restricted parameter classes might however remove standard conformance and render implementations less efficient; for instance, the authors of [23] develop tools for showing that the mapping \(m\mapsto m^e\) is a permutation, but these tools work only for fairly large values of e.

A third approach is tightly connected with the number-theoretic structures that motivate the requirements for the conditions listed above. (It is less general than the NIZK approach of [10] but usually does not require picking parameters in a specific way.) For instance, if an application of RSA requires that e be coprime to \(\varphi (N)\) then this is for a specific reason, namely that information shall not be lost (but remain recoverable) when raising it to the power of e. Thus, instead of abstractly checking the \(e\mid \varphi (N)\) relation, a corresponding check could be centered precisely around the information-loss property of the exponentiation operation. Our results are based on this strategy. Our techniques are inspired by, and improving on, prior work that we describe in detail in the following.

1.1 Interactive Zero-Knowledge Testing of Certain Relations

We reproduce results of Gennaro et al. [19]. As a running example, consider the question of whether \(e\mid \varphi (N)\) holds, where N is an RSA modulus and e a small prime exponent. The relation holds if and only if the mapping \(x\mapsto x^e \bmod N\) is bijective, characterized by all \(y\in \mathbb {Z}_N^*\) having an eth root. This motivates an (interactive) protocol in which a prover convinces a verifier of relation \(e\mid \varphi (N)\) by first letting the verifier pick a random value \(y\in \mathbb {Z}_N^*\) and send it to the prover, then letting the prover (who knows the factorization of N) compute the eth root \(x\in \mathbb {Z}_N^*\) of y and return it to the verifier, and finally letting the verifier accept if and only if \(x^e=y\bmod N\). Prover and verifier may run multiple repetitions of this protocol, each time with a fresh challenge y. If the prover is able to return a valid response for each challenge, then the verifier is eventually convinced of the \(e\mid \varphi (N)\) claim. Indeed, if \(e\not \mid \varphi (N)\), then only about one of e elements of \(\mathbb {Z}_N^*\) have an eth root, so the protocol would detect this with high probability and a cheating prover would be caught.

Note that if the protocol would be deployed in precisely the way we described it, it would be of limited use. The reason is that it is not zero-knowledge; in particular, the prover would effectively implement an ‘eth root oracle’ for values y arbitrarily picked by the verifier, and this would likely harm the security of most applications. The proposal of [19] considers fixing this by making sure that challenges y are picked in a sufficiently random way. Concretely, the full protocol [19, Sect. 4.1] involves four message passes as follows: (1) the verifier picks \(y_1\in \mathbb {Z}_N^*\) and sends a commitment to this value to the prover, (2) the prover picks \(y_2\in \mathbb {Z}_N^*\) and sends this value to the verifier, (3) the verifier opens the commitment; both parties now compute \(y\leftarrow y_1y_2\), (4) the prover computes the eth root of y and sends it to the verifier. Unfortunately, the security analysis of [19] does not cover the full protocol; rather it restricts attention to only the last prover-to-verifier message and shows that it is zero-knowledge under the assumption that value y “can be thought as provided by a trusted third party” [19, Sect. 2.3]. We stress that a proof for the full four-message protocol is not immediate: Proving it zero-knowledge seems to require assuming an extractability property of the commitment scheme (so that the simulator can find ‘the right’ \(y_2\) value), and the increased interactiveness calls for a fresh analysis in a concurrent communication setting anyway (if the protocol shall be of practical relevance). Neither of these issues is mentioned, let alone resolved, in [19].

1.2 Our Results

We construct practical protocols for convincing a verifier that certain relevant number-theoretic properties hold for RSA parameters. This includes statements on the number of prime factors of the modulus, its square-freeness, etc. Concretely, we propose two generic protocol frameworks that can be instantiated to become proof systems for many different relations: The first framework is based on [19] and effectively compresses the first three messages of the full protocols into a single one by, intuitively speaking, using a random oracle to implement the mentioned trusted third party. Precisely, continuing our running example, we let the verifier only specify a random seed r and let both parties derive value y as per \(y\leftarrow H(r)\) via a random oracle. The random oracle model turns out to be strong enough to make the full protocol sound and zero-knowledge. Because of the reduced number of message passes, concurrency is not an issue.

The second framework is similar in spirit but uses the random oracle in a different and novel way. Here, the challenge y can be freely picked by the verifier (no specific distribution is required), the prover again computes the eth root x of it, but instead of sharing x with the verifier it only discloses the hash H(x) of it. Note that, unless the verifier knows value x anyway, if H behaves like a random oracle then the hash value does not leak anything.

We highlight that the second protocol has two important advantages over the first: (1) The first protocol requires a random oracle that maps into the ‘problem space’ (here: challenge space \(\mathbb {Z}_N^*\)). However, for some number-theoretic tests, e.g., whether N is a Blum integer, the problem space we (and [19]) work with is \(\mathbf {QR}(N)\), i.e., the set of quadratic residues modulo N, and for such spaces it is unclear how to construct a random oracle mapping into them. Note that, in contrast, the second protocol does not require hashing into any particular set. (2) Some number-theoretic relations allow for an easier check when the second framework is used. For instance, identifying Blum integers involves the prover computing the four square roots that quadratic residues always have. In the first protocol framework, returning all four square roots is prohibitive as this would immediately allow for factorizing N. In the second framework, however, hash values of all square roots can be returned without doing harm to security.

Please consider Sect. 5 for the full list of number-theoretic properties for which we provide a proof system.

1.3 Related Work

We note that techniques similar to ours appear implicitly or explicitly in a couple of prior works. For instance, the validation of RSA parameters is a common challenge in password-based key agreement; in particular, adversaries might announce specially crafted parameters (Ne) that would lead to partial password exposure. The work of Zhu et al. [34] addresses this, but without a formal analysis, by pursuing approaches that are similar to our second protocol instantiated with one particular number-theoretic relation. The work of [28] provides a security analysis of [34]. (It seems, however, that the analysis is incomplete: The output length of the hash function does not appear in the theorem statement, but for short output lengths the statement is obviously wrong.) We conclude by noting that both [28, 34], and also a sequence of follow-up works in the domain of password-based key agreement, employ variants of our two protocols in an ad-hoc fashion, and not at the generic level and for the large number of number-theoretic problems as we do.

A higher level of abstraction, also in the domain of password-based key agreement, can be found in the work of Catalano et al. [11]. Their work considers exclusively our first approach. Further, while considering soundness and zero-knowledge definitions for language problems, their constructions are not on that level but directly targeting specific number-theoretic problems.

Considering proof systems not relying on random oracles, basically any desired property of an RSA modulus can be proven by employing general zero-knowledge proof systems for NP languages [8, 20, 21]. However, these protocols are usually less efficient than proof systems designed to establish a particular property. Thus a vast amount of papers provides systems of the latter type. Targeted properties include that an RSA modulus N has factors of approximately equal size [6, 12, 16, 17, 24] or is the product of two safe primes [10]. The approach of having the prover provide solutions to number-theoretic problems is taken in several proof systems. Concretely, there are protocols of this type proving that N is square-free [7, 19], has at most two prime factors [5, 19, 25, 29], satisfies a weakened definition of Blum integer [5, 29], is the product of two almost strong primes [19]. A shortcoming common to the protocols deciding whether N has at most two prime factors is that they either have a two-sided error or have to impose additional restrictions on N, the first leading to an increased number of repetitions of the protocol in order to achieve security, the latter to artificially restricted choices of N.

Bellare and Yung [3] show that any trapdoor permutation can be certified, i.e., they provide a protocol to prove that a function is invertible on an overwhelming fraction of its range. Kakvi et al. [23] show that given an RSA modulus N and an exponent e such that \( e\ge N^{1/4} \) Coppersmith’s method can be used to efficiently determine whether the RSA function \( x\mapsto x^e \) defines a permutation on \( \mathbb {Z}_N^*\). However, their result does not apply to exponents of size smaller than \( N^{1/4}\). A proof for RSA key generation with verifiable randomness is given in [22]. The protocol makes use of the protocols of [7, 29] as subroutines and relies on a trusted third party. Benhamouda et al. [4] provide a protocol proving in the random oracle model that at least two of the factors of a number N were generated using a particular prime number generator. However, in order to achieve security the construction requires N to be the product of many factors, which usually is prohibitive in the RSA setting.

We note that a topic in cryptography somewhat connected to our work is the fraudulent creation of parameters. More specifically, the works in [30,31,32,33] consider Kleptography, i.e., the creation of asymmetric key pairs by an adversary-modified generation algorithm such that, using a trapdoor, the adversary can recover the secret key from the public key. Preventing such attacks is not the goal of our work, and our protocols will indeed not succeed in catching properly performed Kleptography.

By nothing-up-my-sleeves (NUMS) parameter generation one subsumes techniques to propose parameters for cryptosystems in an explainable and publicly reproducible way. For instance, the internal constants of the hash functions of the SHA family are derived from the digits of the square and cube roots of small prime numbers, making the existence of trapdoors (e.g., for finding collisions) rather unlikely. While we do not advise against NUMS techniques, we note that using them restricts freedom in parameter generation and thus might break standard conformance and lead to less efficient systems. Moreover, and more relevantly in the context of our work, NUMS techniques typically apply to DL-based cryptosystems and not to RSA-based ones.

1.4 Organization

The overall focus of this work is on providing practical methods for proving certain properties of RSA-like parameter sets. Our interactive proof systems, however, follow novel design principles that promise finding application also outside of the number-theoretic domain. We thus approach our goal in a layered fashion, by first exposing our proof protocols such that they work for abstract formulations of problems and corresponding solutions, and then showing how these formalizations can be instantiated with the number-theoretic relations we are interested in.

More concretely, the structure of this article is as follows: In Sect. 2 we fix notation and recall some general results from number theory. In Sect. 3 we formulate a variant of the is-word-in-language problem and connect it to problems and solutions in some domain; we further introduce the concept of a challenge-response protocol for proving solutions of the word problem. In Sect. 4 we study two such protocols: Hash-then-Solve, which is inspired by the work of [19], and Solve-then-Hash, which is novel. Finally, in Sect. 5 we show how RSA-related properties can be expressed as instances of our general framework so that they become accessible by our proof systems.

2 Preliminaries

We fix notation and recall basic facts from number theory.

2.1 Notation

Parts of this article involve the specification of program code. In such code we use assignment operator ‘\(\leftarrow \)’ when the assigned value results from a constant expression (including from the output of a deterministic algorithm), and we write ‘\(\leftarrow _{\scriptscriptstyle \$}\)’ when the value is either sampled uniformly at random from a finite set or is the output of a randomized algorithm. In a security experiment, the event that some algorithm A outputs the value v is denoted with \(A\Rightarrow v\). In particular, \(\Pr [A\Rightarrow 1]\) denotes the probability, taken over the coins of A, that A outputs value 1. We use bracket notation to denote associative arrays (a data structure that implements a ‘dictionary’). For instance, for an associative array A the instruction \(A[7]\leftarrow 3\) assigns value 3 to memory position 7, and the expression \(A[2]=5\) tests whether the value at position 2 is equal to 5. Associative arrays can be indexed with elements from arbitrary sets. When assigning lists to each other, with ‘\(\mathop {\underline{ {x}}}\)’ we mark “don’t-care” positions. For instance, \((a,\mathop {\underline{ {x}}})\leftarrow (9,4)\) is equivalent to \(a\leftarrow 9\) (value 4 is discarded). We use the ternary operator known from the C programming language: If C is a Boolean condition and \(e_1,e_2\) are arbitrary expressions, the expression “\({C}\mathrel {\texttt {?}}{e_1}\mathrel {\texttt {:}}{e_2}\)” evaluates to \(e_1\) if C holds, and to \(e_2\) if C does not hold. We further use Iverson brackets to convert Booleans to numerical values. That is, writing “[C]” is equivalent to writing “\({C}\mathrel {\texttt {?}}{1}\mathrel {\texttt {:}}{0}\)”. If A is a randomized algorithm we write [A(x)] for the set of outputs it produces with non-zero probability if invoked on input x. If uv are (row) vectors of values, \(u\!\parallel \!v\) denotes their concatenation, i.e., the vector whose first elements are those of u, followed by those of v. We use symbol  to indicate when the union of two sets is a disjoint union.

2.2 Number Theory

We write \(\mathbb {N}=\{1,2,3,\ldots \}\) and \(\mathbb {P}\subseteq \mathbb {N}\) for the set of natural numbers and prime numbers, respectively. For every natural number \(N\in \mathbb {N}\) we denote the set of prime divisors of N with \(\mathbb {P}(N)\). Thus, for any \( N\in \mathbb {N}\) there exists a unique family \((\nu _p)_{p\in \mathbb {P}(N)}\) of multiplicities \(\nu _p\in \mathbb {N}\) such that

$$\begin{aligned} N=\prod _{p\in \mathbb {P}(N)}p^{\nu _p}. \end{aligned}$$

We denote with \(\mathrm {odd}(N)\) the ‘odd part’ of N, i.e., what remains of N after all factors 2 are removed; formally, \(\mathrm {odd}(N)=\prod _{p\in \mathbb {P}(N),p\ne 2}p^{\nu _p}\).

Consider \( N\in \mathbb {N}\) and the ring \( \mathbb {Z}_N=\mathbb {Z}/N\mathbb {Z}\). The multiplicative group \( \mathbb {Z}_N^* \) of \( \mathbb {Z}_N \) has order \( \varphi (N)=\prod _{p\in \mathbb {P}(N)}(p-1)p^{\nu _p-1} \), where \( \varphi \) is Euler’s totient function. By the Chinese Remainder Theorem (CRT) there exists a ring isomorphism

For \(N,e\in \mathbb {N}\) consider the exponentiation mapping \(x\mapsto x^e\bmod N\). This mapping is 1-to-1 on \(\mathbb {Z}_N^*\) iff \(\gcd (e,\varphi (N))=1\). The general statement, that holds for all Ne, is that the exponentiation mapping is L-to-1 for

$$\begin{aligned} L=\prod _{p\in \mathbb {P}(N)}\gcd (e,\varphi (p^{\nu _p})). \end{aligned}$$
(1)

We write \(\mathbf {QR}(N)\) for the (group of) quadratic residues (i.e., squares) modulo N.

3 Challenge-Response Protocols for Word Problems

We define notions of languages, statements, witnesses, and a couple of algorithms that operate on such objects. We then introduce the notion of a challenge-response protocol for the word problem in such a setting.

3.1 Associating Problems with the Words of a Language

Statements, candidates, witnesses. Let \(\varSigma \) be an alphabet and let \(\mathcal {L}\subseteq \mathcal {U}\subseteq \varSigma ^*\) be languages. We assume that deciding membership in \(\mathcal {U}\) is efficient, while for \(\mathcal {L}\) this might not be the case. Each element \(x\in \varSigma ^*\) is referred to as a statement. A statement x is a candidate if \(x\in \mathcal {U}\). A statement x is valid if \(x\in \mathcal {L}\); otherwise, it is invalid. (Thus, in general there coexist valid and invalid candidates.) For all candidates x we assume a (possibly empty) set of witnesses \(\mathcal {W}_x\) such that valid candidates are characterized by having a witness: \(\forall x\in \mathcal {U}:|\mathcal {W}_x|\ge 1\iff x\in \mathcal {L}\).

Relating problems with candidates. For all candidates \(x\in \mathcal {U}\) let \(\mathcal {P}_x\) be a problem space and \(\mathcal {S}_x\) a solution space, where we require that deciding membership in \(\mathcal {P}_x\) is efficient. Let \({\mathcal {R}\! el }_x\subseteq \mathcal {P}_x\times \mathcal {S}_x\) be a relation that (abstractly) matches problems with solutions. For any problem \( P \in \mathcal {P}_x\) we write \({\mathcal {S}\! ol }_x( P ):=\{ S \mid ( P , S )\in {\mathcal {R}\! el }_x\}\subseteq \mathcal {S}_x\) for the set of its solutions. Not necessarily all problems are solvable, so we partition the problem space as such that precisely the elements of \(\mathcal {P}^+_x\) have solutions: \( P \in \mathcal {P}^+_x\iff |{\mathcal {S}\! ol }_x( P )|\ge 1\) and, equivalently, \( P \in \mathcal {P}^-_x\iff {\mathcal {S}\! ol }_x( P )=\emptyset \). We extend relation \({\mathcal {R}\! el }_x\) to \({\mathcal {R}\! el }^*_x:={\mathcal {R}\! el }_x\cup (\mathcal {P}^-_x\times \{\bot \})\) by marking problems without solution with the special value \(\bot \), and we extend notion \({\mathcal {S}\! ol }_x\) to \({\mathcal {S}\! ol }^*_x\) such that for all \( P \in \mathcal {P}^-_x\) we have \({\mathcal {S}\! ol }^*_x( P )=\{\bot \}\). We require that every candidate has at least one problem-solution pair: \(\forall x\in \mathcal {U}:|{\mathcal {R}\! el }_x|\ge 1\).

We assume four efficient algorithms, \(\mathrm {Verify}\), \(\mathrm {Sample}\), \(\mathrm {Sample}^*\), and \(\mathrm {Solve}\), that operate on these sets. Deterministic algorithm \(\mathrm {Verify}\) implements for all candidates the indicator function of \({\mathcal {R}\! el }\), i.e., decides whether a problem and a solution are matching. More precisely, \(\mathrm {Verify}\) takes a candidate \(x\in \mathcal {U}\), a problem \( P \in \mathcal {P}_x\), and a potential solution \( S \in \mathcal {S}_x\) for \( P \), and outputs a bit that indicates whether \(( P , S )\) is contained in \({\mathcal {R}\! el }_x\) or not. Formally, \(\forall x\in \mathcal {U},( P , S )\in \mathcal {P}_x\times \mathcal {S}_x:\mathrm {Verify}(x, P , S )=1\iff ( P , S )\in {\mathcal {R}\! el }_x\). Algorithm \(\mathrm {Sample}\) is randomized, takes a candidate \(x\in \mathcal {U}\), and outputs a (matching) problem-solution pair \(( P , S )\in {\mathcal {R}\! el }_x\). Algorithm \(\mathrm {Sample}^*\) is randomized, takes a candidate \(x\in \mathcal {U}\), and outputs a pair \(( P , S )\in {\mathcal {R}\! el }^*_x\) (note that \( S =\bot \) if \( P \in \mathcal {P}^-_x\)). Finally, deterministic algorithm \(\mathrm {Solve}\) takes a (valid) statement \(x\in \mathcal {L}\), a witness \(w\in \mathcal {W}_x\) for it, and a problem \( P \in \mathcal {P}_x\), and outputs the subset of \(\mathcal {S}_x\) that contains all solutions of \( P \). (If no solution exists, \(\mathrm {Solve}\) outputs the empty set.) Formally, \(\forall x\in \mathcal {L},w\in \mathcal {W}_x, P \in \mathcal {P}_x:\mathrm {Solve}(x,w, P )={\mathcal {S}\! ol }_x( P )\).

If we write \(\mathcal {P}=\bigcup \mathcal {P}_x\), \(\mathcal {S}=\bigcup \mathcal {S}_x\), \({\mathcal {R}\! el }=\bigcup {\mathcal {R}\! el }_x\), \({\mathcal {R}\! el }^*=\bigcup {\mathcal {R}\! el }^*_x\), \(\mathcal {W}=\bigcup \mathcal {W}_x\), where the unions are over all \(x\in \mathcal {U}\), a shortcut notation for the syntax of the four algorithms is

figure a

Number of solutions, spectrum, solvable-problem density. Note that different problems \( P \in \mathcal {P}^+\) have, in general, different numbers of solutions. For any set \(\mathcal {M}\subseteq \mathcal {U}\) of candidates, the spectrum \(\#\mathcal {M}\) collects the cardinalities of the solution sets of all solvable problems associated with the candidates listed in \(\mathcal {M}\). Formally,

$$ \#\mathcal {M} :=\{|{\mathcal {S}\! ol }_x( P )|: x\in \mathcal {M}, P \in \mathcal {P}^+_x\}. $$

Consequently, \(\max \#\mathcal {L}\) is the largest number of solutions that solvable problems associated with valid candidates might have, and \(\min \#(\mathcal {U}\setminus \mathcal {L})\) is the smallest number of solutions of solvable problems associated with invalid candidates. Further, for a set \(\mathcal {M}\subseteq \mathcal {U}\) the solvable-problem density distribution \(\varDelta \mathcal {M}\), defined as

$$\begin{aligned} \varDelta \mathcal {M}:=\{|\mathcal {P}^+_x|/|\mathcal {P}_x|: x\in \mathcal {M}\}, \end{aligned}$$

indicates the fractions of problems that are solvable (among the set of all problems), for all candidates in \(\mathcal {M}\). Most relevant in this article are the derived quantities \(\min \varDelta \mathcal {L}\) and \(\max \varDelta (\mathcal {U}\setminus \mathcal {L})\).

Uniformity notions for sampling algorithms. For the two sampling algorithms defined above we introduce individual measures of quality. For \(\mathrm {Sample}\) we say it is problem-uniform (on invalid candidates) if for all \(x\in \mathcal {U}\setminus \mathcal {L}\) the problem output by \(\mathrm {Sample}(x)\) is uniformly distributed in \(\mathcal {P}^+_x\). Formally, for all \(x\in \mathcal {U}\setminus \mathcal {L}, P '\in \mathcal {P}^+_x\) we require that

$$\begin{aligned} \Pr [( P ,\mathop {\underline{ {x}}})\leftarrow _{\scriptscriptstyle \$}\mathrm {Sample}(x): P = P ']= 1/|\mathcal {P}^+_x|. \end{aligned}$$

Further we say that \(\mathrm {Sample}\) is solution-uniform (on invalid candidates) if for all \(x\in \mathcal {U}\setminus \mathcal {L}\) and each pair \(( P , S )\) output by \(\mathrm {Sample}(x)\), solution \( S \) is uniformly distributed among all solutions for \( P \). Formally, we require that for all \(x\in \mathcal {U}\setminus \mathcal {L},( P ', S ')\in [\mathrm {Sample}(x)]\) we have

$$\begin{aligned} \Pr [( P , S )\leftarrow _{\scriptscriptstyle \$}\mathrm {Sample}(x): S = S '\mid P = P ']= 1/|{\mathcal {S}\! ol }_x( P ')|. \end{aligned}$$

For \(\mathrm {Sample}^*\) we say it is problem-uniform (on valid candidates) if for all \(x\in \mathcal {L}\) the problem output by \(\mathrm {Sample}^*(x)\) is uniformly distributed in \(\mathcal {P}_x\). Formally, for all \(x\in \mathcal {L}, P '\in \mathcal {P}_x\) we require that

$$\begin{aligned} \Pr [( P ,\mathop {\underline{ {x}}})\leftarrow _{\scriptscriptstyle \$}\mathrm {Sample}^*(x): P = P ']= 1/|\mathcal {P}_x|. \end{aligned}$$

Further we say that \(\mathrm {Sample}^*\) is solution-uniform (on valid candidates) if for all \(x\in \mathcal {L}\) and each pair \(( P , S )\) output by \(\mathrm {Sample}^*(x)\), the solution \( S \) is uniformly distributed among all solutions of \( P \) (if a solution exists at all, i.e., if \( S \ne \bot \)). Formally, we require that for all \(x\in \mathcal {L},( P ', S ')\in [\mathrm {Sample}^*(x)]\) we have

$$\begin{aligned} \Pr [( P , S )\leftarrow _{\scriptscriptstyle \$}\mathrm {Sample}^*(x): S = S '\mid P = P ']= 1/|{\mathcal {S}\! ol }^*_x( P ')|. \end{aligned}$$

3.2 Challenge-Response Protocols

In the context of Sect. 3.1, a challenge-response protocol (CRP) for \( (\mathcal {L},\mathcal {U}) \) specifies a (verifier) state space \( {\mathcal {S}\! t }\), a challenge space \( {\mathcal {C}\! h }\), a response space \( {\mathcal {R}\! sp }\), and efficient algorithms \(\mathrm {V}_1,\mathrm {P},\mathrm {V}_2\) such that \( \mathrm {V}=(\mathrm {V}_1,\mathrm {V}_2) \) implements a stateful verifier and \(\mathrm {P}\) implements a (stateless) prover. In more detail, algorithm \( \mathrm {V}_1 \) is randomized, takes a candidate \( x\in \mathcal {U}\), and returns a pair \( ( st ,c) \), where \( st \in {\mathcal {S}\! t }\) is a state and \( c\in {\mathcal {C}\! h }\) a challenge. Prover \( \mathrm {P}\), on input of a valid statement \( x\in \mathcal {L}\), a corresponding witness \( w\in \mathcal {W}_x \), and a challenge \( c\in {\mathcal {C}\! h }\), returns a response \( r\in {\mathcal {R}\! sp }\). Finally, deterministic algorithm \( \mathrm {V}_2 \), on input a state \( st \in {\mathcal {S}\! t }\) and a response \( r\in {\mathcal {R}\! sp }\), outputs a bit that indicates acceptance (1) or rejection (0). An overview of the algorithms’ syntax is as follows.

figure b

We define the following correctness and security properties for CRPs.

  • Correctness. Intuitively, a challenge-response protocol is correct if honest provers convince honest verifiers of the validity of valid statements. Formally, we say a CRP is \(\delta \) -correct if for all valid candidates \( x\in \mathcal {L}\) and corresponding witnesses \( w\in \mathcal {W}_x \) we have

    $$\begin{aligned} \Pr \left[ ( st ,c)\leftarrow _{\scriptscriptstyle \$}\mathrm {V}_1(x);r\leftarrow _{\scriptscriptstyle \$}\mathrm {P}(x,w,c):\mathrm {V}_2( st ,r)\Rightarrow 1\right] \ge \delta . \end{aligned}$$

    If the CRP is 1-correct we also say it is perfectly correct.

  • Soundness. Intuitively, a challenge-response protocol is sound if (dishonest) provers cannot convince honest verifiers of the validity of invalid statements. Formally, a CRP is \(\varepsilon \) -sound if for all invalid candidates \( x\in \mathcal {U}\setminus \mathcal {L}\) and all (potentially unbounded) algorithms \( \mathrm {P}^* \) we have

    $$\begin{aligned} \Pr \left[ ( st ,c)\leftarrow _{\scriptscriptstyle \$}\mathrm {V}_1(x);r\leftarrow _{\scriptscriptstyle \$}\mathrm {P}^*(x,c):\mathrm {V}_2( st ,r)\Rightarrow 0\right] \ge \varepsilon . \end{aligned}$$

    If the CRP is 1-sound we also say it is perfectly sound. To quantity \(1-\varepsilon \) we also refer to as the soundness error.

  • Zero-knowledge. Intuitively, a challenge-response protocol is (perfectly) zero-knowledge if (dishonest) verifiers do not learn anything from interacting with (honest) provers, beyond the fact that the statement is valid. Formally, a CRP is (perfectly) zero-knowledge if there exists a simulator \( \mathrm {S}\) such that for all (potentially unbounded) distinguishers \( \mathrm {D}\), all valid candidates \( x\in \mathcal {L}\), and all corresponding witnesses \( w\in \mathcal {W}_x \), we have

    $$\begin{aligned} |\Pr [\mathrm {D}^{\mathrm {P}(x,w,\cdot )}\Rightarrow 1]-\Pr [\mathrm {D}^{\mathrm {S}(x,\cdot )}\Rightarrow 1]|=0 . \end{aligned}$$

    Here, with \(\mathrm {P}(x,w,\cdot )\) and \(\mathrm {S}(x,\cdot )\) we denote oracles that invoke the prover algorithm \(\mathrm {P}\) on input xwc and the simulator \(\mathrm {S}\) on input xc, respectively, where challenge c is in both cases provided by distinguisher \(\mathrm {D}\) on a call-by-call basis.

In Sect. 4 we study two frameworks for constructing challenge-response protocols of the described type. The analyses of the corresponding protocols will be in the random oracle model, meaning that the algorithms \( \mathrm {V}_1 , \mathrm {P}, \mathrm {V}_2 \) have access to an oracle \( \mathrm {H}\) implementing a function drawn uniformly from the set of all functions between some fixed domain and range. Also the above correctness and security definitions need corresponding adaptation by (1) extending the probability spaces to also include the random choice of \(\mathrm {H}\), and (2) giving all involved algorithms, i.e., \(\mathrm {V}_1 , \mathrm {P}, \mathrm {V}_2 , \mathrm {P}^*, \mathrm {D}\), oracle access to \(\mathrm {H}\). In the zero-knowledge definition, simulator \( \mathrm {S}\) simulates both \( \mathrm {P}\) and \( \mathrm {H}\).

4 Constructing Challenge-Response Protocols

In Sect. 3 we linked the word decision problem of a language to challenge-response protocols (CRP). Concretely, if \(\mathcal {L}\subseteq \mathcal {U}\) are languages, a corresponding CRP would allow a prover to convince a verifier that a given candidate statement is in \(\mathcal {L}\) rather than in \(\mathcal {U}\setminus \mathcal {L}\). In the current section we study two such protocols, both requiring a random oracle. The first protocol, Hash-then-Solve, is inspired by prior work but significantly improves on it, while the second protocol, Solve-then-Hash, is novel. The bounds on correctness and security of the two protocols are, in general, incomparable. In the following paragraphs we give a high-level overview of their working principles.

Let \(x\in \mathcal {U}\) be a (valid or invalid) candidate statement. In the protocol of Sect. 4.1 a random oracle \(\mathrm {H}\) is used to generate problem instances for x as per \( P \leftarrow \mathrm {H}(r)\), where r is a random seed picked by the verifier. If \( P \) has a solution \( S \), the prover recovers it and shares it with the verifier who accepts iff the solution is valid. (If \( P \) has multiple solutions, the prover picks one of them at random.) Note that solving problems is in general possible also for invalid candidates, but the idea behind this protocol is that it allows for telling apart elements of \(\mathcal {L}\) and \(\mathcal {U}\setminus \mathcal {L}\) if the fraction of solvable problems among the set of all problems associated with valid candidates is strictly bigger than the fraction of solvable problems among all problems associated with invalid candidates, i.e., if \(\min \varDelta \mathcal {L}>\max \varDelta (\mathcal {U}\setminus \mathcal {L})\). (As we show in Sect. 5, this is the case for some interesting number-theoretic decision problems.)

We now turn to the protocol of Sect. 4.2. Here, the random oracle is not used to generate problems as above. Rather, the random oracle is used to hash solutions into bit strings. Concretely, the verifier randomly samples a problem \( P \) with corresponding solution \( S \). It then sends \( P \) to the prover who derives the set of all solutions for it; this set obviously includes \( S \). The prover hashes all these solutions and sends the set of resulting hash values to the verifier. The latter accepts if the hash value of \( S \) is contained in this set. Note that finding the set of all solutions for problems is in general possible also for invalid candidates, but the protocol allows for telling apart valid from invalid candidates if (solvable) problems associated with valid candidates have strictly less solutions than problems associated with invalid candidates, i.e., if \(\max \#\mathcal {L}<\min \#(\mathcal {U}\setminus \mathcal {L})\). Indeed, if the verifier does not accept more hash values than the maximum number of solutions for valid statements, a cheating prover will make the verifier accept only with a limited probability, while in the valid case the verifier will always accept. (We again refer to Sect. 5 for number-theoretic problems that have the required property.)

Let us quickly compare the two approaches. In principle, whether they are applicable crucially depends on languages \(\mathcal {L},\mathcal {U}\) and the associated problem and solution spaces. Note that the random oracles are used in very different ways: in the first protocol to ensure a fair sampling of a problem such that no solution is known a priori (to neither party), and in the second protocol to hide those solutions from the verifier that the latter does not know anyway. That the random oracle in the first protocol has to map into the problem space might represent a severe technical challenge as for some relevant problem spaces it seems unfeasible to find a construction for such a random oracle.Footnote 3 In such cases the second protocol might be applicable.

4.1 A GMR-Inspired Protocol: Hash-then-Solve

A general protocol framework for showing that certain properties hold for a candidate RSA modulus (that it is square-free, Blum, etc.) was proposed by Gennaro, Micali, and Rabin in [19]. Recall from the discussion in the introduction that the full version of their protocol has a total of four message passes and involves both number-theoretic computations and the use of a commitment scheme. In this section we study a variant of this protocol where the commitment scheme is implemented via a random oracle. The benefit is that the protocol becomes more compact and less interactive. Concretely, the number of message passes decreases from four to two.

Let \(\mathcal {L}\subseteq \mathcal {U}\subseteq \varSigma ^*\) be as in Sect. 3.1, and let \(l\in \mathbb {N}\) be a security parameter. Let \((\mathrm {H}_x)_{x\in \mathcal {U}}\) be a family of hash functions (in the security reduction: random oracles) such that for each \(x\in \mathcal {U}\) we have a mapping \(\mathrm {H}_x:\{0,1\}^l\rightarrow \mathcal {P}_x\). Consider the challenge-response protocol with algorithms \(\mathrm {V}_1,\mathrm {P},\mathrm {V}_2\) as specified in Fig. 1. The idea of the protocol is that the verifier picks a random seed r which it communicates to the prover and from which both parties deterministically derive a problem as per \( P \leftarrow \mathrm {H}_x(r)\). The prover, using its witness, computes the set \(\mathbf {S}\) of all solutions of \( P \), denotes one of them with \( S \), and sends \( S \) to the verifier. (If \( P \) has no solution, the prover sends \(\bot \).) The verifier accepts (meaning: concludes that \(x\in \mathcal {L}\)) iff \( S \ne \bot \) and \( S \) is indeed a solution for \( P \). Importantly, while the prover selects the solution \( S \) within set \(\mathbf {S}\) in a deterministic way (so that for each seed r and thus problem \( P \) it consistently exposes the same solution even if queried multiple times), from the point of view of the verifier the solution \( S \) is picked uniformly at random from the set of all solutions of \( P \). This behavior is implemented by letting the prover make its selection based on an additional random oracle that is made private to the prover by including the witness w in each query. Theorem 1 assesses the correctness and security of the protocol.

Fig. 1.
figure 1

Hash-then-Solve: Random-oracle based version of the GMR protocol from [19]. Specifications of the three CRP algorithms can be readily extracted from the code: algorithm \(\mathrm {V}_1\) is in lines 00–01, algorithm \(\mathrm {V}_2\) is in lines 02–05, and algorithm \(\mathrm {P}\) is in lines 06–10. The expression of the form \( S \leftarrow \$_ P (\mathbf {S})\) in line 09 is an abbreviation for \( S \leftarrow \mathrm {RO}(x,w, P ,\mathbf {S})\), where \(\mathrm {RO}:\{0,1\}^*\rightarrow \mathbf {S}\) is a (private) random oracle.

Theorem 1

The Hash-then-Solve protocol defined in Fig. 1 is \(\delta \)-correct and \(\varepsilon \)-sound and perfectly zero-knowledge, where

$$\begin{aligned} \delta =\min \varDelta (\mathcal {L}) \qquad \text {and}\qquad \varepsilon =1-\max \varDelta (\mathcal {U}\setminus \mathcal {L}) , \end{aligned}$$

if hash functions \((\mathrm {H}_x)_{x\in \mathcal {U}}\) are modeled as random oracles. For this result we assume that the \(\mathrm {Sample}^*\) algorithm is both problem-uniform and solution-uniform.

Proof

Correctness. Let \( x\in \mathcal {L}\) and \(w\in \mathcal {W}_x\). Since \( \mathrm {H}_x \) is modeled as a random oracle, problem \( P \) assigned in line 07 is uniformly distributed in \( \mathcal {P}_x\). Set \(\mathbf {S}\) from line 08 is empty if \( P \in \mathcal {P}^-_x\) and contains elements if \( P \in \mathcal {P}^+_x\). The probability that the prover outputs a solution, and that the verifier accepts it in line 05, is thus precisely \( |\mathcal {P}^+_x|/|\mathcal {P}_x|\). A lower bound for this value is \(\delta =\min \varDelta (\mathcal {L})\).

Soundness. Let \( x\in \mathcal {U}\setminus \mathcal {L}\). A necessary condition for the verifier to accept in line 05 is that there exists a solution to problem \( P =\mathrm {H}_x(r) \), i.e., that \( P \in \mathcal {P}^+_x\). Since \( \mathrm {H}_x \) is modeled as a random oracle, \( P \) is uniformly distributed in \( \mathcal {P}_x\). The probability of \( P \) having a solution is thus \( |\mathcal {P}^+_x|/|\mathcal {P}_x|\). This value is at most \( \max \varDelta (\mathcal {U}\setminus \mathcal {L})\). Thus \(\varepsilon =1-\max \varDelta (\mathcal {U}\setminus \mathcal {L})\) is a lower bound for the probability of the verifier not accepting in a protocol run.

Zero-knowledge. We show that the protocol is zero-knowledge by specifying and analyzing a simulator \( \mathrm {S}\). Its code is in Fig. 2. The prover oracle \( \mathrm {P}(x,w,\cdot ) \) and the random oracle \( \mathrm {H}_x(\cdot ) \) are simulated by algorithms \( \mathrm {P}_\mathrm {sim}\) and \( \mathrm {H}_\mathrm {sim}\), respectively. Associative array \(\mathbf {R}\) reflects the input-output map of the random oracle and is initialized such that all inputs map to special value \(\bot \). If \(\mathrm {H}_\mathrm {sim}\) is queried on a seed r, a fresh problem-solution pair is sampled using the \(\mathrm {Sample}^*\) algorithm, the pair is registered in \(\mathbf {R}\), and the problem part is returned to the caller. Note that by the assumed problem-uniformity of \(\mathrm {Sample}^*(x)\) this is an admissible implementation of a random oracle that maps to set \(\mathcal {P}_x\).

The task of the \( \mathrm {P}_\mathrm {sim}\) algorithm is to return, for any seed r, a uniformly picked solution for the problem \( P =\mathrm {H}_x(r)\); if no solution exists, the oracle shall return \(\bot \). This is achieved by returning the solution part of the problem-solution pair that was sampled using \(\mathrm {Sample}^*\) when processing the random oracle query \(\mathrm {H}_x(r)\). Note that this argument uses both the solution uniformity of \(\mathrm {Sample}^*\) and the fact that the \(\mathrm {P}\) algorithm from Fig. 1 is deterministic and in particular always outputs the same solution if a seed is queried multiple times to a \( \mathrm {P}(x,w,\cdot ) \) prover.    \(\square \)

Fig. 2.
figure 2

Simulator \( \mathrm {S}\). Associative array \(\mathbf {R}\) is initialized as per \(\mathbf {R}[\cdot ]\leftarrow \bot \), i.e., such that all values initially map to \(\bot \). Note that lines 00–02 become redundant if one requires (w.l.o.g.) that \(\mathrm {H}_\mathrm {sim}(r)\) is always queried before \(\mathrm {P}_\mathrm {sim}(r)\).

4.2 Our New Protocol: Solve-then-Hash

We propose a new challenge-response protocol for the word decision problem in languages. Like the one from Sect. 4.1 it uses a random oracle, but it does so in a quite different way: The random oracle is not used for generating problems, but for hashing solutions. The advantage is that constructing a random oracle that maps into a problem space might be difficult (for certain problem spaces), while hashing solutions to bit strings is always easy.

Let \(\mathcal {L}\subseteq \mathcal {U}\subseteq \varSigma ^*\) be as in Sect. 3.1. Let \(\mathcal {H}\) be a finite set and \( \mathrm {H}:\{0,1\}^*\rightarrow \mathcal {H}\) a hash function (in the security reduction: a random oracle). The idea of the protocol is that the verifier samples a problem-solution pair \(( P , S )\) and communicates the problem to the prover, the latter then, using its witness, computes the sets \(\mathbf {S}\) of all solutions of \( P \) and \(\mathbf {h}\) of hash values of these solutions, and returns set \(\mathbf {h}\) to the verifier, and the verifier finally checks whether the hash value h of \( S \) is contained in this set. An important detail is that the prover uses pseudorandom bit-strings to pad the returned set of hash values to constant-size: If \(k=\max \#\mathcal {L}\) is the maximum number of solutions of problems associated with valid candidates, then the prover exclusively outputs sets \(\mathbf {h}\) of this cardinality. The algorithms of the corresponding challenge-response protocol are specified in Fig. 3. (Note that when transmitting \(\mathbf {h}\) from the prover to the verifier an encoding has to be chosen that hides the order in which elements were added to \(\mathbf {h}\).) The analysis of our protocol is in Theorem 2. The main technical challenge of the proof is that it has to deal with collisions of the random oracle (two or more solutions might hash to the same string).

Fig. 3.
figure 3

Solve-then-Hash: Our new challenge-response protocol. We assume \(k=\max \#\mathcal {L}\). Specifications of the three CRP algorithms can be readily extracted from the code: algorithm \(\mathrm {V}_1\) is in lines 00–02, algorithm \(\mathrm {V}_2\) is in lines 03–05, and algorithm \(\mathrm {P}\) is in lines 06–12. In line 08, the cardinality of set \(\mathbf {S}\) is denoted with t. Expressions of the form \(h\leftarrow \$^u_v(\mathcal {H})\) in line 10 are abbreviations for \(h\leftarrow \mathrm {RO}(x,w,u,v)\), where \(\mathrm {RO}:\{0,1\}^*\rightarrow \mathcal {H}\) is a (private) random oracle.

Theorem 2

Let \( k=\max \#\mathcal {L}\), \(m=\min \#(\mathcal {U}\setminus \mathcal {L})\), and \( M=\max \#(\mathcal {U}\setminus \mathcal {L}) \), such that \( k\le m\le M\). Then the Solve-then-Hash protocol defined in Fig. 3 is perfectly correct and \(\varepsilon \)-sound and perfectly zero-knowledge, where

$$\begin{aligned} \varepsilon =1- \bigl (k/m+k/|\mathcal {H}|+(\min (M,q))^2/|\mathcal {H}|\bigr )\approx 1-k/m , \end{aligned}$$

if \(\mathrm {H}\) is modeled as a random oracle and q is the maximum number of random oracle queries posed by any (dishonest) prover \( \mathrm {P}^*\). For this result we assume that the \(\mathrm {Sample}\) algorithm is both problem-uniform and solution-uniform.

Proof

Correctness. Let \( x\in \mathcal {L}\) and \(w\in \mathcal {W}_x\). Then for \( ( P , S ) \) from line 00 we have \( S \in \mathbf {S}\) in line 07. Further, as \(x\in \mathcal {L}\) we have \(t\le k=\max \#\mathcal {L}\) in line 08 and thus \(|\mathbf {h}|\le k\) in line 04 and \(h\in \mathbf {h}\) in line 05. Thus \( \mathrm {V}_2 \) accepts with probability 1.

Soundness. Let \( x\in \mathcal {U}\setminus \mathcal {L}\) be an invalid candidate and \( \mathrm {P}^* \) a (malicious) prover. Let \( Win \) denote the event that \( \mathrm {P}^* \) succeeds in finding a response \( \mathbf {h}\) such that verifier \( \mathrm {V}_2 \) accepts, i.e. the event \( \left\{ (h, P )\leftarrow _{\scriptscriptstyle \$}\mathrm {V}_1(x);\mathbf {h}\leftarrow _{\scriptscriptstyle \$}\mathrm {P}^*(x, P ): \mathrm {V}_2(h,\mathbf {h})\Rightarrow 1\right\} \). Recall that \( {\mathcal {S}\! ol }_x( P ) \) denotes the set of solutions of problem \( P \), and let \( S _1,\dots , S _l\in {\mathcal {S}\! ol }_x( P ) \) denote the solutions to the problem on which \( \mathrm {P}^* \) queries random oracle \(\mathrm {H}\), i.e., the elements such that \( \mathrm {P}^* \) queries for \( \mathrm {H}( P , S _i) \) with \( i\in \{1,\ldots ,l\}\). We define \( Col =\{\exists i\ne j: \mathrm {H}( P , S _i)=\mathrm {H}( P , S _j)\} \) as the event that the hash values of at least two of the queried solutions collide. We have

$$\begin{aligned} \Pr [ Win ]&= \Pr \left[ Win \mid Col \right] \Pr \left[ Col \right] +\Pr \left[ Win \mid \lnot Col \right] \Pr \left[ \lnot Col \right] \\&\le \Pr \left[ Col \right] +\Pr \left[ Win \mid \lnot Col \right] . \end{aligned}$$

We conclude that \( \Pr [ Win ]< k/m+k/|\mathcal {H}|+(\min (M,q))^2/|\mathcal {H}|\) by showing that

$$ \text {(a)}\; \Pr \left[ Col \right] <(\min (M,q))^2/|\mathcal {H}|$$

and

$$ \text {(b)}\; \Pr \left[ Win \mid \lnot Col \right] \le k/m+k/|\mathcal {H}|. $$

For claim (a), note that \( x\in \mathcal {U}\setminus \mathcal {L}\) implies that the set \( {\mathcal {R}\! el }_x( P ) \) of solutions of problem \( P \) has at most \( \max \#(\mathcal {U}\setminus \mathcal {L})=M \) elements. \( \mathrm {P}^* \) makes at most q queries to \( \mathrm {H}\). Hence \( l\le \min (M,q)\). We obtain

$$\begin{aligned} \Pr \left[ Col \right]&= \Pr \left[ \exists i\ne j:\mathrm {H}( P , S _i)=\mathrm {H}( P , S _j)\right] \\&\le l^2\Pr \left[ \mathrm {H}( P , S _1)=\mathrm {H}( P , S _2)\right] \le \min (M,q)^2/|\mathcal {H}|, \end{aligned}$$

where the last two inequalities hold since \( \mathrm {H}\) is modeled as a random oracle.

We conclude the proof by showing claim (b). Recall that \( S \) is the solution sampled alongside problem \( P \). Since algorithm \(\mathrm {Sample}\) is solution-uniform, \( S \) is distributed uniformly in \( {\mathcal {S}\! ol }_x( P ) \), which implies that \( \mathrm {H}( P , S ) \) is uniformly distributed in \( \{\mathrm {H}( P , S '): S '\in {\mathcal {S}\! ol }_x( P )\}\). Note that \( |{\mathcal {S}\! ol }_x( P )|\ge m=\min \#(\mathcal {U}\setminus \mathcal {L}) \) and that —conditioned on \( \lnot Col \)— all values \( \mathrm {H}( P , S ') \) that \( \mathrm {P}^* \) knows are distinct. Conditioned on the events \( S \in \{ S _1,\dots , S _l\} \) and \( \lnot Col \), prover \( \mathrm {P}^* \) guesses \( \mathrm {H}( P , S ) \) with probability at most 1 / l. If, on the other hand, \( S \notin \{ S _1,\dots , S _l\} \), then \( \mathrm {H}( P , S ) \) is uniformly distributed from \( \mathrm {P}^* \)’s point of view. Hence its best chance of guessing it is \( 1/|\mathcal {H}|\). Note that \( \Pr [ S \in \{ S _1,\dots , S _l\}]\le l/m\). Summing up —conditioned on \( \lnot Col \)\( \mathrm {P}^* \)’s chance of correctly guessing \( \mathrm {H}( P , S ) \) is bounded by \( l/m \cdot 1/l + 1/|\mathcal {H}|= 1/m+1/|\mathcal {H}|\). Event \( Win \) according to line 04 cannot occur if \( \mathbf {h}\) contains more than k elements, so we obtain \( \Pr \left[ Win \mid \lnot Col \right] \le k/m+k/|\mathcal {H}|\).

Fig. 4.
figure 4

Simulator \( \mathrm {S}\) for the protocol of Fig. 3. We require (w.l.o.g.) that \(\mathrm {H}_\mathrm {sim}(\cdot )\) is queried at most once on each input. Expressions of the form \(h\leftarrow \$^u_v(\mathcal {H})\) in line 02 are abbreviations for \(h\leftarrow \mathrm {RO}(u,v)\), where \(\mathrm {RO}:\{0,1\}^*\rightarrow \mathcal {H}\) is a (private) random oracle. In line 07, the lengths of vectors \(\mathbf {R}_\mathrm {U}[ P ]\) and \(\mathbf {R}_\mathrm {F}[ P ]\) are \(t-1\) and \(k-t+1\), respectively. In line 08, the new lengths of vectors \(\mathbf {R}_\mathrm {U}[ P ]\) and \(\mathbf {R}_\mathrm {F}[ P ]\) are t and \(k-t\), respectively.

Zero-knowledge. We show that the protocol is zero-knowledge by specifying and analyzing a simulator \( \mathrm {S}\). Its code is in Fig. 4. The prover oracle \( \mathrm {P}(x,w,\cdot ) \) and the random oracle \( \mathrm {H}(\cdot ,\cdot ) \) are simulated by algorithms \( \mathrm {P}_\mathrm {sim}\) and \( \mathrm {H}_\mathrm {sim}\), respectively. For oracle \( \mathrm {H}\) we assume w.l.o.g. that it is not queried twice on the same input.

Core components of our simulator are the associative arrays \( \mathbf {R}_\mathrm {U}[\cdot ] \) and \( \mathbf {R}_\mathrm {F}[\cdot ] \) that associate problems with used and fresh random hash values, respectively. The simulator starts with initializing for each problem a vector of k-many fresh hash values.Footnote 4 Oracle \( \mathrm {H}_\mathrm {sim}\) on input a problem-solution pair \( ( P , S ) \) checks whether \( S \) is a solution to \( P \). If not, a random hash value is returned. Otherwise the vector of (fresh) hash values \( \mathbf {R}_\mathrm {F}[ P ] \) associated to \( P \) is retrieved. The first element of this vector is taken as the response of the random oracle query; however, before the response is output, the element is appended to the vector of (used) hash values \( \mathbf {R}_\mathrm {U}[ P ] \) associated to \( P \). Note this procedure will never fail (i.e., never a value has to be taken from \( \mathbf {R}_\mathrm {F}[ P ] \) after the list is emptied) since there are at most \( k=\max \#\mathcal {L}\) solutions to \( P \). Queries to \( \mathrm {P}_\mathrm {sim}\) on input \( P \) are responded with the set \(\mathbf {h}\) of all elements contained in \( \mathbf {R}_\mathrm {F}[ P ] \) and \( \mathbf {R}_\mathrm {U}[ P ] \), which by definition of \( \mathrm {H}_\mathrm {sim}\) stays unchanged throughout the simulation. Since these elements are initialized as random hash values, responses to queries to \( \mathrm {P}_\mathrm {sim}\) have the correct distribution. Furthermore, for every \( S \in {\mathcal {S}\! ol }_x( P ) \) we have that \( \mathrm {H}_\mathrm {sim}( P , S ) \) is contained in \( \mathrm {P}_\mathrm {sim}( P )\). Summing up, the output of \( \mathrm {P}_\mathrm {sim}\) and \( \mathrm {H}_\mathrm {sim}\) is correctly distributed and simulator \( \mathrm {S}\) provides distinguisher \( \mathrm {D}\) with a perfect simulation of \( \mathrm {P}(x,w,\cdot )\).    \(\square \)

4.3 Generalizing the Analysis of the Solve-then-Hash Protocol

We generalize the statement of Theorem 2, making it applicable to a broader class of languages. Recall that our protocol from Sect. 4.2 decides membership in a language \( \mathcal {L}\subseteq \mathcal {U}\) if for every (invalid) candidate \( x\in \mathcal {U}\setminus \mathcal {L}\) and every solvable problem \( P \in \mathcal {P}_x^+ \) the number \( |{\mathcal {S}\! ol }_x( P )|\) of solutions to \( P \) exceeds the maximum number \( \max \#\mathcal {L}\) of solutions to problems associated with valid candidates. We next relax this condition by showing that for soundness it already suffices if the expected value of \( |{\mathcal {S}\! ol }_x( P )|\) (over randomly sampled \( P \in \mathcal {P}_x^+ \)) exceeds \( \max \#\mathcal {L}\). In order to do so, we associate to \( \mathcal {L}\) and \( \mathcal {U}\) the function \(\varepsilon _{\mathcal {L},\mathcal {U}}:[0,1]\rightarrow \mathbb {R}^+\) such that

$$\begin{aligned} \varepsilon _{\mathcal {L},\mathcal {U}}(\gamma ):=\min \{\varepsilon '\mid \forall x\in \mathcal {U}\setminus \mathcal {L}:\Pr [ P \leftarrow _{\scriptscriptstyle \$}\mathcal {P}_x^+:\max \#(\mathcal {L})/|{\mathcal {S}\! ol }_x( P )|\le \varepsilon ']\ge \gamma \} , \end{aligned}$$

i.e., the function that associates to each probability value \( \gamma \in [0,1] \) the smallest factor \( \varepsilon ' \) such that for every invalid x a uniformly sampled problem with probability of at least \( \gamma \) has at least \( \max \#(\mathcal {L})/\varepsilon ' \) solutions.

In Theorem 3 we give a correspondingly refined soundness analysis of the Solve-then-Hash protocol. Note that, as the protocol itself did not change, the correctness and zero-knowledge properties do not require a new analysis. Note further that \( \varepsilon _{\mathcal {L},\mathcal {U}}(1)=\max \#(\mathcal {L})/\min \#(\mathcal {U}\setminus \mathcal {L}) \), and that thus the soundness analysis of Theorem 2 is just the special case of Theorem 3 where \( \gamma =1\).

Theorem 3

Let \( k=\max \#\mathcal {L}\) and \( M=\max \#(\mathcal {U}\setminus \mathcal {L}) \) such that \( k\le M\). Then for every \( \gamma \in [0,1] \) the Solve-then-Hash protocol defined in Fig. 3 is perfectly correct and \(\varepsilon \)-sound and perfectly zero-knowledge, where

$$\begin{aligned} \varepsilon =1-\bigl (\varepsilon _{\mathcal {L},\mathcal {U}}(\gamma )+(1-\gamma )/(1-c)+k/|\mathcal {H}|+c\bigr ) \approx \gamma -\varepsilon _{\mathcal {L},\mathcal {U}}(\gamma ) , \end{aligned}$$

if \(\mathrm {H}\) is modeled as a random oracle, q is the maximum number of random oracle queries posed by any (dishonest) prover \( \mathrm {P}^* \) and \( c=(\min (M,q))^2/|\mathcal {H}|\). For this result we assume that the \(\mathrm {Sample}\) algorithm is both problem-uniform and solution-uniform.

Proof

The correctness and zero-knowledge property of the protocol were already shown in the proof of Theorem 2. We thus show the bound on the soundness error. Fix \(\gamma \in [0,1]\) and let \(\varepsilon _{\mathcal {L},\mathcal {U}}=\varepsilon _{\mathcal {L},\mathcal {U}}(\gamma )\). Let \( x\in \mathcal {U}\setminus \mathcal {L}\) be an invalid candidate and \( \mathrm {P}^* \) a (malicious) prover. Let \( Win \) denote the event that \( \mathrm {P}^* \) succeeds in finding a response \( \mathbf {h}\) such that verifier \( \mathrm {V}_2 \) accepts, i.e. the event \( \left\{ (h, P )\leftarrow _{\scriptscriptstyle \$}\mathrm {V}_1(x);\mathbf {h}\leftarrow _{\scriptscriptstyle \$}\mathrm {P}^*(x, P ): \mathrm {V}_2(h,\mathbf {h})\Rightarrow 1\right\} \). Recall that \( {\mathcal {S}\! ol }_x( P ) \) denotes the set of solutions of problem \( P \), and let \( S _1,\dots , S _l\in {\mathcal {S}\! ol }_x( P ) \) denote the solutions to the problem on which \( \mathrm {P}^* \) queries random oracle \(\mathrm {H}\), i.e., the elements such that \( \mathrm {P}^* \) queries for \( \mathrm {H}( P , S _i) \) with \( i\in \{1,\ldots ,l\}\). We define \( Col =\{\exists i\ne j: \mathrm {H}( P , S _i)=\mathrm {H}( P , S _j)\} \) as the event that the hash values of at least two of the queried solutions collide. We have

$$\begin{aligned} \Pr [ Win ]&= \Pr \left[ Win \mid Col \right] \Pr \left[ Col \right] +\Pr \left[ Win \mid \lnot Col \right] \Pr \left[ \lnot Col \right] \\&\le \Pr \left[ Col \right] +\Pr \left[ Win \mid \lnot Col \right] . \end{aligned}$$

We conclude that \( \Pr [ Win ]<\varepsilon _{\mathcal {L},\mathcal {U}}+(1-\gamma )/(1-c)+k/|\mathcal {H}|+c \) by showing that

$$ \text {(a)}\; \Pr \left[ Col \right] <(\min (M,q))^2/|\mathcal {H}|=c $$

and

$$ \text {(b)}\; \Pr \left[ Win \mid \lnot Col \right] \le \varepsilon _{\mathcal {L},\mathcal {U}}+(1-\gamma )/(1-c)+k/|\mathcal {H}|. $$

Claim (a) follows as in the proof of Theorem 2. In order to prove (b) we denote by \( PG \) the event that the problem \( P \) given as input to \( \mathrm {P}^* \) by the verifier is “good” in the sense of having many solutions, i.e. the event \( \{\max \#(\mathcal {L})/|{\mathcal {S}\! ol }_x( P )|\le \varepsilon _{\mathcal {L},\mathcal {U}}\}\). We have

$$\begin{aligned} \Pr [ Win \mid \lnot Col ]&=\Pr [ Win \mid \lnot Col \wedge PG ]\Pr [ PG \mid \lnot Col ]\\&\qquad + \Pr [ Win \mid \lnot Col \wedge \lnot PG ]\Pr [\lnot PG \mid \lnot Col ]\\&\le \Pr [ Win \mid \lnot Col \wedge PG ]+\Pr [\lnot PG \mid \lnot Col ]\\&\le \Pr [ Win \mid \lnot Col \wedge PG ]+\Pr [\lnot PG ]/\Pr [\lnot Col ] . \end{aligned}$$

As stated above, we have \( \Pr [\lnot Col ]\ge 1-c\). Further, by problem-uniformity, \( P \) is distributed uniformly on \( \mathcal {P}_x^+ \) and by the definition of \( \varepsilon _{\mathcal {L},\mathcal {U}} \) we have \( \Pr [\lnot PG ]\le 1-\gamma \). Hence \( \Pr [\lnot PG ]/\Pr [\lnot Col ]\le (1-\gamma )/(1-c) \) and it remains to show that \( \Pr [ Win \mid \lnot Col \wedge PG ]\le \varepsilon _{\mathcal {L},\mathcal {U}}+k/|\mathcal {H}|\). Since \( S \) is sampled with (solution-uniform) \( \mathrm {Sample}\), it is distributed uniformly on \( {\mathcal {S}\! ol }_x( P ) \), which implies that \( \mathrm {H}( P , S ) \) is uniformly distributed on \( \{\mathrm {H}( P , S '): S '\in {\mathcal {S}\! ol }_x( P )\}\). Recall that \( k=\max \#\mathcal {L}\). If event \( PG \) occurs then \(|{\mathcal {S}\! ol }_x( P )|\ge k/\varepsilon _{\mathcal {L},\mathcal {U}}\). Further —conditioned on \( \lnot Col \)— all values \( \mathrm {H}( P , S ') \) that \( \mathrm {P}^* \) knows are distinct. Conditioned on the events \( S \in \{ S _1,\dots , S _l\} \), \( PG \) and \( \lnot Col \) prover \( \mathrm {P}^* \) guesses \( \mathrm {H}( P , S ) \) with probability at most 1 / l. If, on the other hand, \( S \notin \{ S _1,\dots , S _l\} \), then from \( \mathrm {P}^* \)’s point of view \( \mathrm {H}( P , S ) \) is uniformly distributed on \( \mathcal {H}\). Hence in this case its best chance of guessing it is \( 1/|\mathcal {H}|\). Note that \( \Pr [ S \in \{ S _1,\dots , S _l\}\mid \lnot Col \wedge PG ]\le l\cdot \varepsilon _{\mathcal {L},\mathcal {U}}/k\). Summing up —conditioned on \( \lnot Col \) and \( PG \)— prover \( \mathrm {P}^* \)’s chance of correctly guessing \( \mathrm {H}( P , S ) \) is bounded by \( l\varepsilon _{\mathcal {L},\mathcal {U}}/k \cdot 1/l + 1/|\mathcal {H}|= \varepsilon _{\mathcal {L},\mathcal {U}}/k+1/|\mathcal {H}|\). Event \( Win \) according to line 04 cannot occur if \( \mathbf {h}\) contains more than k elements, so we obtain \( \Pr \left[ Win \mid \lnot Col \right] \le \varepsilon _{\mathcal {L},\mathcal {U}}+k/|\mathcal {H}|\).    \(\square \)

5 Challenge-Response Protocols in the Domain of Number-Theory

We provide several protocols to prove number theoretic properties of a number \( N\in \mathbb {N}\), the corresponding witness being the factorization of N. More formally, we consider the universe

$$\begin{aligned} \mathcal {L}_\mathrm {odd}=\{N\in \mathbb {N}: \nu _2=0; |\mathbb {P}(N)|\ge 2\} \end{aligned}$$

of odd numbers, which have at least two prime factors. Note that \( \mathcal {L}_\mathrm {odd}\) can be efficiently decided. We associate problem and solution spaces as defined in Sect. 3.1 to several languages \( \mathcal {L}\subseteq \mathcal {L}_\mathrm {odd}\), hence obtaining membership checking protocols via Theorems 1 and 2. In most cases the problem and solution space associated to a statement \( N\in \mathcal {L}_\mathrm {odd}\) are defined as \( \mathbb {Z}_N^* \), while the defining relation \( {\mathcal {R}\! el }_N \) for problem b and solution a is of the type \( b\equiv a^e \bmod N\), where the exponent e is chosen according to the number theoretic property of N we want to prove. Equation (1) of Sect. 2.2 serves as a primary tool to deduce bounds on \( \max \#(\mathcal {L}) \) and \( \min \#(\mathcal {L}_\mathrm {odd}\setminus \mathcal {L})\). Defining \( {\mathcal {R}\! el }_N \) in the described way enables us to to sample from it as follows. Algorithm \( \mathrm {Sample}\) first chooses a solution a uniformly from \( \mathcal {S}_N=\mathbb {Z}_N^*\). Then the corresponding problem b is set to \( a^e\). In this way a is uniformly distributed on \( {\mathcal {S}\! ol }_N(b) \) and the proposed algorithm samples solution-uniformly (for both valid and invalid candidates) as required for the Solve-then-Hash protocol of Sect. 4.2.

For some of the considered languages the map \( a\mapsto a^e \) defines a permutation on \( \mathbb {Z}_N^* \) for every valid statement \( N\in \mathcal {L}\). In this case every problem is solvable, we hence have \( \mathcal {P}_N^+=\mathcal {P}_N \), and the described sampling algorithm also fulfills the property of problem-uniformity and can be used in the Hash-then-Solve protocol of Sect. 4.1. For other of the considered languages the space \( \mathcal {P}_N^+ \) of solvable problems is a proper subset of \( \mathcal {P}_N \) and it seems not feasible to construct an algorithm with the desired properties. In this cases only the Solve-then-Hash protocol can be used to decide the language.

Table 1. Protocols for properties of RSA moduli. Assume \( k=\max \#\mathcal {L}\) and \( m=\min \#(\mathcal {U}\setminus \mathcal {L})\). Columns seven and eight indicate whether the Hash-then-Solve (HtS) or Solve-then-Hash (StH) protocol can be used to decide \( \mathcal {L}\). \(\mathcal {L}_\mathrm {pp}\) and \( \mathcal {L}_\mathrm {rsa}\) are intersections of other decidable languages and can be decided by running the corresponding protocols in parallel.

Considered languages. We provide a toolbox of protocols checking arguably the most important properties required of RSA-type moduli. An overview of our results is given in Table 1. Combining several of the protocols gives a method to check for properties required of typical applications. For example the property that the RSA map \( a\mapsto a^e\bmod N \) defined by numbers (Ne) is “good” can be checked by showing that N has exactly two prime factors and is square free and that e indeed defines a permutation on \( \mathbb {Z}_N^*\). If an application requires a feature more specific than the ones we treat, then likely corresponding problem and solution spaces and a corresponding relation can be found. As a starting point we consider the languages

$$\begin{aligned} \mathcal {L}_\mathrm {sf}&:=\{N\in \mathcal {L}_\mathrm {odd}:\gcd (N,\varphi (N))=1\} \\ \mathcal {L}_\mathrm {ppp}&:=\{N\in \mathcal {L}_\mathrm {odd}:|\mathbb {P}(N)|=2\} \end{aligned}$$

of square free numbers and prime power products, i.e. numbers having exactly two prime factors. For both languages the corresponding relation was implicitly given in [19]. Note that by definition of \( \varphi (N) \) condition \( (\gcd (\varphi (N),N)=1) \) implies that \( \nu _p=1 \) for every \( p\in \mathbb {P}(N) \) and hence indeed the number is square free. Due to the choice of the relation it additionally implies that \( p\not \mid (q-1) \) for every \( p,q\in \mathbb {P}(N)\). Intersecting both languages yields the language

$$\begin{aligned} \mathcal {L}_\mathrm {pp}:=\{pq\in \mathcal {L}_\mathrm {odd}:p,q\in \mathbb {P},p\ne q,p\not \mid (q-1),q\not \mid (p-1)\} \end{aligned}$$

of prime products. Each N in this language is the product of two distinct primes, a minimal requirement on RSA moduli. We further give relations for the languages

$$\begin{aligned} \mathcal {L}_\mathrm {per}&:=\{(N,e)\in \mathcal {L}_\mathrm {odd}\times \mathbb {N}:a\mapsto a^e\text { defines a permutation}\}\\ \mathcal {L}_\mathrm {rsa}&:=\{(N,e)\in \mathcal {L}_\mathrm {pp}\times \mathbb {N}:a\mapsto a^e\text { defines a permutation}\} \end{aligned}$$

of pairs (Ne) such that exponentiation with e defines a permutation on \( \mathbb {Z}_N^* \) and N being a prime product such that e defines a permutation on \( \mathbb {Z}_N^*\). The relations were implicitly used in [11, 34]. Building on the protocol for \( \mathcal {L}_\mathrm {pp}\) we consider the language

$$\begin{aligned} \mathcal {L}_\mathrm {blum}&:=\{pq\in \mathcal {L}_\mathrm {pp}: p\equiv q\equiv 3\bmod 4\} \end{aligned}$$

of Blum integers, i.e. prime products with both primes being equal to 3 modulo 4. We give problem and solution spaces and a corresponding relation, which up to our knowledge has not been used so far, such that \( \mathcal {L}_\mathrm {blum}\) can be decided in universe \( \mathcal {L}_\mathrm {pp}\). Finally, we show that it can be efficiently decided whether the trapdoor function corresponding to Paillier’s encryption scheme, which corresponds to pairs (Ng) consisting of a prime product N and an element g of \( \mathbb {Z}_{N^2}^* \), indeed defines a bijection. A protocol for this property has up to our knowledge not been given so far. Note that given (Ng) it is assumed to be hard to decide whether the corresponding map is bijective, since it has been shown to be a lossy trapdoor function under the decisional quadratic residuosity assumption [15].

5.1 Deciding \( \mathcal {L}_\mathrm {sf}\)

Consider the language

$$\begin{aligned} \mathcal {L}_\mathrm {sf}:=\{N\in \mathcal {L}_\mathrm {odd}: \gcd (N,\varphi (N))=1 \} \end{aligned}$$

of square free integers, i.e. of odd numbers such that for every \( p,q\in \mathbb {P}(N) \) we have \( \nu _p=1 \) and \( p\not \mid q-1\). We show that \( \mathcal {L}_\mathrm {sf}\) can be decided in universe \( \mathcal {L}_\mathrm {odd}\). For a statement \( N\in \mathcal {L}_\mathrm {odd}\) let the corresponding witness be its factorization. We define the corresponding problem and solution spaces and the defining relation as

$$\begin{aligned} \mathcal {P}_N&= \mathbb {Z}_N^*\\ \mathcal {S}_N&= \mathbb {Z}_N^*\\ {\mathcal {R}\! el }_N&= \{(b,a)\in (\mathbb {Z}_N^*)^2: b\equiv a^N \bmod N\} . \end{aligned}$$

\( {\mathcal {R}\! el }_N \) is defined via the map \( \mathbb {Z}_N^*\rightarrow \mathbb {Z}_N^* \); \( a\mapsto a^N\). By Eq. (1) of Sect. 2.2 this map is a bijection exactly if \( N\in \mathcal {L}_\mathrm {sf}\), i.e. if \( \gcd (N,\varphi (N))=1 \), and, since N is odd, at least 3-to-1 if \( N\in \mathcal {L}_\mathrm {odd}\setminus \mathcal {L}_\mathrm {sf}\). Hence \( \max \#(\mathcal {L}_\mathrm {sf})=1 \) and \( \min \#(\mathcal {L}_\mathrm {odd}\setminus \mathcal {L}_\mathrm {sf})=3\).

We now describe the corresponding algorithms. Algorithms \( \mathrm {Sample}\) samples from \( {\mathcal {R}\! el }_N \) by choosing \( a\leftarrow _{\scriptscriptstyle \$}\mathbb {Z}_N^* \), setting \( b\leftarrow a^N \) and returning the problem-solution pair (ba). As discussed above, since the solution a is sampled at random and the corresponding problem b is derived from it afterwards, a is uniformly distributed on \( {\mathcal {S}\! ol }_N(b) \) and \( \mathrm {Sample}\) is solution-uniform. \( \mathrm {Verify}\) on input (ba) checks whether \( b\equiv a^n \bmod n \) and responds accordingly. Note that Nth roots modulo N can be efficiently computed given the factorization of N. Hence it is possible to construct the problem solving algorithm \( \mathrm {Solve}\) and by Theorem 2 language \( \mathcal {L}_\mathrm {sf}\) can be decided using the Solve-then-Hash protocol.

For every valid statement \( N\in \mathcal {L}_\mathrm {sf}\) the map \( \mathbb {Z}_N^*\rightarrow \mathbb {Z}_N^*\); \(a\mapsto a^N \) defining the relation \( {\mathcal {R}\! el }_N \) is a bijection. Hence in this case every problem \( b\in \mathcal {P}_N \) is solvable. Further the problems sampled by \( \mathrm {Sample}\) are uniformly distributed on \( \mathcal {P}_N \) and solutions are uniformly distributed on the corresponding solution set \( {\mathcal {S}\! ol }_N(b)\). Thus \( \mathrm {Sample}\) is both problem-uniform and solution-uniform, and therefore fulfills the requirements, which are necessary to be used as sampling algorithm \( \mathrm {Sample}^* \) in the Hash-then-Solve protocol of Sect. 4.1.

5.2 Deciding \( \mathcal {L}_\mathrm {ppp}\)

Consider the language

$$\begin{aligned} \mathcal {L}_\mathrm {ppp}:=\{N\in \mathcal {L}_\mathrm {odd}:|\mathbb {P}(N)|=2\} \end{aligned}$$

of prime power products, i.e. of odd numbers that have exactly two prime factors. We show that \( \mathcal {L}_\mathrm {ppp}\) can be decided in universe \( \mathcal {L}_\mathrm {odd}\). For a statement \( N\in \mathcal {L}_\mathrm {odd}\) let the corresponding witness be its factorization. We define the corresponding problem and solution spaces and the defining relation as

$$\begin{aligned} \mathcal {P}_N&= \mathbb {Z}_N^*\\ \mathcal {S}_N&= \mathbb {Z}_N^*\\ {\mathcal {R}\! el }_N&= \{(b,a)\in (\mathbb {Z}_N^*)^2: b\equiv a^2 \bmod N\} . \end{aligned}$$

\( {\mathcal {R}\! el }_N \) is defined via the map \( \mathbb {Z}_N^*\rightarrow \mathbb {Z}_N^* \); \( a\mapsto a^2\). Since N is odd we obtain by Eq. (1) of Sect. 2.2 that this map is 4-to-1 if \( N\in \mathcal {L}_\mathrm {ppp}\), i.e. if N has at most 2 distinct prime factors, and at least 8-to-1 if \( N\in \mathcal {L}_\mathrm {odd}\setminus \mathcal {L}_\mathrm {ppp}\). Hence \( \max \#(\mathcal {L}_\mathrm {ppp})=4 \) and \( \min \#(\mathcal {L}_\mathrm {odd}\setminus \mathcal {L}_\mathrm {ppp})=8\).

We now describe the corresponding algorithms. Algorithm \( \mathrm {Sample}\) samples from \( {\mathcal {R}\! el }_N \) by choosing \( a\leftarrow _{\scriptscriptstyle \$}\mathbb {Z}_N^* \), setting \( b\leftarrow a^2 \) and returning the problem-solution pair (ba). Note that \( \mathrm {Sample}\) is solution-uniform. \( \mathrm {Verify}\) on input (ba) checks whether \( b\equiv a^2\bmod N \) and responds accordingly. Note that square roots modulo N can be efficiently computed given the factorization of N. Hence it is possible to construct the problem solving algorithm \( \mathrm {Solve}\) and by Theorem 2 language \( \mathcal {L}_\mathrm {ppp}\) can be decided using the Solve-then-Hash protocol.

Let \( N\in \mathcal {L}_\mathrm {ppp}\) be a valid statement. The set \( \mathcal {P}^+_N \) of solvable problems is the set \( \mathbf {QR}(N) \) of quadratic residues modulo N. Hence a sampling algorithm \( \mathrm {Sample}^* \) compatible with the Hash-then-Solve protocol of Sect. 4.1 would require that (a) the sampled problems are uniformly distributed in \( \mathbb {Z}_N^* \) and (b) if a sampled problem is solvable then it is accompanied by a solution. While both sampling uniformly from \( \mathbb {Z}_N^* \) or sampling uniformly from \( (b,a)\in {\mathcal {R}\! el }_N\subseteq \mathbf {QR}(N)\times \mathbb {Z}_N^* \) is easy, it is unclear how to construct an algorithm with the required properties that does not need access to the factorization of N. The authors of [19] overcome this problem by imposing additional requirements on N. They give a protocol able to verify that \( pq=N\in \mathcal {L}_\mathrm {ppp}\) such that \( p,q\not \equiv 1\bmod 8 \) and \( p\not \equiv q\bmod 8\). For this restricted language exactly one element of the set \( \{+b,-b,+2b,-2b\} \) has a square root for every \( b\in \mathbb {Z}_N^*\). Changing the relation to pairs (ba), such that a is the root of one of those elements one then defines \( \mathrm {Sample}^* \) to sample (ba) with algorithm \( \mathrm {Sample}\) from above and then output \( (c\, b,a) \), where \( c\leftarrow _{\scriptscriptstyle \$}\{+1,-1,+2,-2\}\).

5.3 Deciding \( \mathcal {L}_\mathrm {per}\)

Consider the language

$$\begin{aligned} \mathcal {L}_\mathrm {per}:=\{(N,e)\in \mathcal {L}_\mathrm {odd}\times \mathbb {N}:a\mapsto a^e\text { defines a permutation}\} \end{aligned}$$

of pairs (Ne) such that the map \( a\mapsto a^e \) defines a permutation. We show that \( \mathcal {L}_\mathrm {per}\) can be decided in universe \( \mathcal {L}_\mathrm {odd}\). For a statement \( N\in \mathcal {L}_\mathrm {odd}\) let the corresponding witness be its factorization. We define the corresponding problem and solution spaces and the defining relation as

$$\begin{aligned} \mathcal {P}_N&= \mathbb {Z}_N^*\\ \mathcal {S}_N&= \mathbb {Z}_N^*\\ {\mathcal {R}\! el }_N&= \{(b,a)\in (\mathbb {Z}_N^*)^2: b\equiv a^e \bmod N\}. \end{aligned}$$

\( {\mathcal {R}\! el }_N \) is defined via the map \( \mathbb {Z}_N^*\rightarrow \mathbb {Z}_N^* \); \( a\mapsto a^e\). Since this map is a homomorphism, it is at least 2-to-1 if it is not bijective. Hence \( \max \#(\mathcal {L}_\mathrm {sf})=1 \) and \( \min \#(\mathcal {L}_\mathrm {odd}\setminus \mathcal {L}_\mathrm {sf})=2\).

We now describe the corresponding algorithms. Algorithm \( \mathrm {Sample}\) samples from \( {\mathcal {R}\! el }_N \) by choosing \( a\leftarrow _{\scriptscriptstyle \$}\mathbb {Z}_N^* \), setting \( b\leftarrow a^e \) and returning the problem-solution pair (ba). Note that \( \mathrm {Sample}\) is both problem-uniform and solution-uniform. \( \mathrm {Verify}\) on input (ba) checks whether \( b\equiv a^e\bmod N \) and responds accordingly. Note that eth roots modulo N can be efficiently computed given the factorization of N. Hence it is possible to construct the problem solving algorithm \( \mathrm {Solve}\) and by Theorem 2 language \( \mathcal {L}_\mathrm {per}\) can be decided using the Solve-then-Hash protocol.

Further, for every valid statement \( N\in \mathcal {L}_\mathrm {per}\) the map \( \mathbb {Z}_N^*\rightarrow \mathbb {Z}_N^*\); \(a\mapsto a^e \) defining the relation \( {\mathcal {R}\! el }_N \) is a bijection. Hence in this case every problem \( b\in \mathcal {P}_N \) is solvable. Further the problems sampled by \( \mathrm {Sample}\) are uniformly distributed on \( \mathcal {P}_N \) and solutions are uniformly distributed on the corresponding solution set \( {\mathcal {S}\! ol }_N(b)\). Thus \( \mathrm {Sample}\) is both problem-uniform and solution-uniform, and therefore fulfills the requirements, which are necessary to be used as sampling algorithm \( \mathrm {Sample}^* \) in the Hash-then-Solve protocol of Sect. 4.1.

5.4 Deciding \( \mathcal {L}_\mathrm {pp}\) and \( \mathcal {L}_\mathrm {rsa}\)

Consider the languages

$$ \mathcal {L}_\mathrm {pp}:=\{pq\in \mathcal {L}_\mathrm {odd}:p,q\in \mathbb {P},p\ne q,p\not \mid (q-1),q\not \mid (p-1)\} $$

of prime products, i.e. square-free numbers having exactly two prime factors, and

$$ \mathcal {L}_\mathrm {rsa}:=\{(N,e)\in \mathcal {L}_\mathrm {pp}\times \mathbb {N}:a\mapsto a^e\text { defines a permutation}\} $$

of pairs (Ne) such that N is a prime product and the RSA map \( \mathbb {Z}_N^*\rightarrow \mathbb {Z}_N^* \); \( a\mapsto a^e \) defines a permutation. We have \( \mathcal {L}_\mathrm {pp}=\mathcal {L}_\mathrm {ppp}\cap \mathcal {L}_\mathrm {sf}\) and \( \mathcal {L}_\mathrm {rsa}=\mathcal {L}_\mathrm {per}\cap \mathcal {L}_\mathrm {ppp}\cap \mathcal {L}_\mathrm {sf}\). The protocols deciding \( \mathcal {L}_\mathrm {sf}\), \( \mathcal {L}_\mathrm {ppp}\) and \( \mathcal {L}_\mathrm {per}\) are all defined with respect to the same universe \( \mathcal {L}_\mathrm {odd}\). By running them in parallel we hence obtain protocols deciding \( \mathcal {L}_\mathrm {pp}\) or \( \mathcal {L}_\mathrm {rsa}\) respectively with respect to \( \mathcal {L}_\mathrm {odd}\).

5.5 Deciding \( \mathcal {L}_\mathrm {blum}\)

Consider the language

$$ \mathcal {L}_\mathrm {blum}:=\{pq\in \mathcal {L}_\mathrm {pp}: p\equiv q\equiv 3\bmod 4\} $$

of Blum integers. We show that \( \mathcal {L}_\mathrm {blum}\) can be decided in universe \( \mathcal {L}_\mathrm {pp}\). For a statement \( N\in \mathcal {L}_\mathrm {pp}\) let the corresponding witness be its factorization. We define the corresponding problem and solution spaces and the defining relation as

$$\begin{aligned} \mathcal {P}_N&= \mathbb {Z}_N^*\\ \mathcal {S}_N&= \mathbb {Z}_N^*\\ {\mathcal {R}\! el }_N&= \{(b,a)\in (\mathbb {Z}_N^*)^2: b\equiv a^4 \bmod N\}. \end{aligned}$$

Since all statements are elements of \( \mathcal {L}_\mathrm {pp}\) and hence have two odd prime factors, every square in \( \mathbb {Z}_N^* \) has four square roots. Further, if N a is Blum integer then each element of \( \mathbf {QR}(N) \) has exactly one root that is again a square. This implies that every problem of \( \mathcal {P}^+=\{b\in \mathbb {Z}_N^*: b\equiv a^4 \text { for some }a\in \mathbb {Z}_N^*\} \) has four corresponding solutions, i.e. \( \max \#(\mathcal {L}_\mathrm {sf})=2\). If on the other hand \( N\in \mathcal {L}_\mathrm {pp}\setminus \mathcal {L}_\mathrm {blum}\), then every element of the form \( b=a^4 \) has at least two square roots, which are elements of \( \mathbf {QR}(N)\). Hence in this case we obtain \( \min \#(\mathcal {L}_\mathrm {pp}\setminus \mathcal {L}_\mathrm {blum})=8\).

We now describe the corresponding algorithms. Algorithm \( \mathrm {Sample}\) samples from \( {\mathcal {R}\! el }_N \) by choosing \( a\leftarrow _{\scriptscriptstyle \$}\mathbb {Z}_N^* \), setting \( b\leftarrow a^4 \) and returning the problem-solution pair (ba). Note that \( \mathrm {Sample}\) is solution-uniform. \( \mathrm {Verify}\) on input (ba) checks whether \( b\equiv a^4\bmod N \) and responds accordingly. Note that 4th roots modulo N can be efficiently computed given the factorization of N. Hence it is possible to construct the problem solving algorithm \( \mathrm {Solve}\) and by Theorem 2 language \( \mathcal {L}_\mathrm {blum}\) can be decided using the Solve-then-Hash protocol.

Let \( N\in \mathcal {L}_\mathrm {blum}\) be a valid statement. Since for Blum integers squaring is a permutation on \( \mathbf {QR}(N) \), the space of solvable problems is given by \( \mathbf {QR}(N)\). Hence as in the case of the relation for language \( \mathcal {L}_\mathrm {ppp}\) it seems unfeasible to construct an alternative sampling algorithm \( \mathrm {Sample}^* \) that admits the use of the Hash-then-Solve protocol of Sect. 4.1.

5.6 Deciding \( \mathcal {L}_\mathrm {pai}\)

Let \( N\in \mathcal {L}_\mathrm {pp}\) and \( g\in \mathbb {Z}_{N^2}^* \) such that N divides the order of the group generated by g. In this case the following function associated to N and g, which is used in Paillier’s encryption scheme [26], defines a bijection that can be efficiently inverted given the factorization of N.

$$ f_{n,g}:{\left\{ \begin{array}{ll} \mathbb {Z}_N\times \mathbb {Z}_N^*&{}\rightarrow \mathbb {Z}_{N^2}^*\\ (a_1,a_2)&{}\mapsto g^{a_1}\,a_2^N\bmod N^2 \end{array}\right. } $$

In this section we show that our protocols can be used to check in universe \( \mathcal {L}_\mathrm {pp}\), whether a public key (Ng) for the Paillier encryption scheme indeed defines a bijection. Hence consider the language

$$ \mathcal {L}_\mathrm {pai}:=\{(N,g)\in \mathcal {L}_\mathrm {pp}\times \mathbb {N}:g\in \mathbb {Z}_{N^2}^*,f_{N,g}\text { is permutation}\}. $$

Note that the condition \( g\in \mathbb {Z}_{N^2}^* \) can be efficiently checked. For a statement \( N\in \mathcal {L}_\mathrm {pp}\) let the corresponding witness be its factorization. We define the corresponding problem and solution spaces and the defining relation as

$$\begin{aligned} \mathcal {P}_N&= \mathbb {Z}_{N^2}^*\\ \mathcal {S}_N&= \mathbb {Z}_N\times \mathbb {Z}_N^*\\ {\mathcal {R}\! el }_N&= \{(b,a)\in \mathcal {P}_{(N,g)}\times \mathcal {S}_{(N,g)}: b\equiv f_{N,g}(a) \bmod N\}. \end{aligned}$$

\( {\mathcal {R}\! el }_N \) is defined via map \( f_{(N,g)} \), which is a homomorphism. Hence if it is not bijective it is at least 2-to-1 and we obtain \( \max \#(\mathcal {L}_\mathrm {sf})=1 \) and \( \min \#(\mathcal {L}_\mathrm {odd}\setminus \mathcal {L}_\mathrm {sf})=2\).

We now describe the corresponding algorithms. Algorithm \( \mathrm {Sample}\) samples from \( {\mathcal {R}\! el }_N \) by choosing \( a\leftarrow _{\scriptscriptstyle \$}\mathbb {Z}_N\times \mathbb {Z}_N^* \), setting \( b\leftarrow f_{(N,g)}(a) \) and returning the problem-solution pair (ba). Note that \( \mathrm {Sample}\) is both problem-uniform and solution-uniform. \( \mathrm {Verify}\) on input (ba) checks whether \( b\equiv f_{(N,g)}(a) \) and responds accordingly. Map \( f_{(N,g)} \) can be efficiently inverted given the factorization of N. Hence it is possible to construct the problem solving algorithm \( \mathrm {Solve}\) and by Theorem 2 language \( \mathcal {L}_\mathrm {pai}\) can be decided using the Solve-then-Hash protocol.

For every valid statement \( N\in \mathcal {L}_\mathrm {pai}\) the map \( f_{(N,g)} \) defining the relation \( {\mathcal {R}\! el }_N \) is a bijection. Hence in this case every problem \( b\in \mathcal {P}_N \) is solvable. Further the problems sampled by \( \mathrm {Sample}\) are uniformly distributed on \( \mathcal {P}_N \) and solutions are uniformly distributed on the corresponding solution set \( {\mathcal {S}\! ol }_N(b)\). Thus \( \mathrm {Sample}\) is both problem-uniform and solution-uniform, and therefore fulfills the requirements, which are necessary to be used as sampling algorithm \( \mathrm {Sample}^* \) in the Hash-then-Solve protocol of Sect. 4.1.

The constructions can be easily adapted to handle the generalized version of the trapdoor function from [14], which uses domain \( \mathbb {Z}_{N^s}\times \mathbb {Z}_N^* \) and range \( \mathbb {Z}_{N^{s+1}}^* \) for some \( s\in \mathbb {N}\).