1. Introduction
With the gradual maturity of emerging technologies such as blockchain and cloud computing, people’s new demands for sensitive data protection and the development of privacy protection technologies promote each other. Admittedly, zero-knowledge proof technology is one of the most concerned academia and industry areas of research in recent years. As an important cryptographic primitive, hash function has a wide range of application scenarios combined with zero-knowledge proof.
Zero-knowledge proof is an efficient protocol based on probabilistic verification, which enables the prover to prove to the verifier that the prover knows a secret value without revealing any information about the value. As early as 1987, it was proved that any NP (non-deterministic polynomial) problem has a zero-knowledge proof [
1]. The general solution for converting NP problems into zero-knowledge proofs is to split the problem into an intermediate form expressed by arithmetic gates or logic gates as the basic operation, and then extract the intermediate form into a circuit description matrix; finally, the proof is generated based on the back-end program of a specific zero-knowledge proof protocol.
However, using a unified automatic zero-knowledge proof circuit generator for different algorithms cannot meet the current high efficiency requirements. The zero-knowledge proof of hash function is increasing in practical application scenarios of blockchain, and more efficient algorithms are required for circuit conversion. At present, the zero-knowledge proof circuit based on SM3 has not yet been implemented. It needs an efficient circuit implementation method as a crucial zero-knowledge proof algorithm.
The main research content of this paper is the design and implementation of a zero-knowledge proof circuit based on the typical hash function SM3. At the current stage of zero-knowledge proof development, the main implementation difficulty lies in converting the computational process of the problem to be proved into an arithmetic circuit, which is a tedious and low-fault-tolerant task. Starting from the corresponding relationship between the calculation steps of SM3 and the circuit modules, we designed a circuit-layered structure. Each layer contains multiple zero-knowledge proof sub-circuits, and is composed of lower-level sub-circuits. With the rational design of the circuit structure and the normalization of the circuit generation paradigm, we implement a secure and efficient SM3 zero-knowledge proof circuit and extend a variety of zero-knowledge proof protocols on this basis. The main contributions of this paper are as follows.
A layered structure of SM3 circuit is proposed, which realizes circuit decoupling and multiplexing, and the lower-layer circuit is transparent to the upper-layer circuit. The circuits designed through this architecture have low coupling, and each packaged sub-circuit has a specific logical meaning, which is convenient for later expansion of larger circuits.
We implemented the SM3 zero-knowledge proof circuit and performed functional and performance tests on the physical machine. Full conversion details and paradigms are given for each subcomputing step of SM3 to the R1CS circuit-constrained form. The realized SM3 hash preimage zero-knowledge proof circuit has high efficiency, and its complexity is reduced by half compared with the general zero-knowledge proof circuit generator. The realization effect is comparable to the current optimal implementation of SHA256.
Based on the realized SM3 zero-knowledge proof circuit, various zero-knowledge proof protocols are extended and implemented. We designed and implemented zero-knowledge proof circuits for an elliptic curve discrete logarithm and SM3 preimage equality, hash chain proof, and finally tested the implementation results.
This paper is organized as follows. In
Section 2, preliminary knowledge related to the SM3 zero-knowledge proof circuit in this article is introduced. We give the architecture design of the zero-knowledge proof circuit based on SM3 and analyze how the SM3 zero-knowledge proof circuit is generated in
Section 3. Then, in
Section 4, we study a detailed circuit conversion generation scheme and the dependencies between the circuits. Additionally, a specific SM3-based zero-knowledge proof circuit conversion generation paradigm is introduced. In
Section 5, we implement the results of functional and performance tests and analyze the test results. In
Section 6, we propose several concrete implementations extend practical protocols based on SM3 zero-knowledge proof circuit, which are widely used in blockchain. Finally, we conclude this paper in
Section 7.
2. Related Work
In 2016, Groth et al. [
2] proposed a concise, non-interactive, zero-knowledge proof scheme based on bilinear pairings, which is called Groth16. The most significant advantage of this scheme is that it uses asymmetric pairing, so that the proof only contains three group elements; as a result, the new system cannot surpass it in proof size in recent years. However, Groth16 requires a trusted setup and re-runs the initialization phase with different parameters in the face of other circuits, which causes specific difficulties for its practical use. Based on the inner product proof technique proposed by Bootle et al. [
3], Bünz et al. [
4] proposed the Bulletproofs zero-knowledge proof system in 2017. Bulletproofs supports logarithmic-sized aggregate proofs and does not require trusted initiation. Sasson et al. [
5] proposed zk-STARKs in 2018, a quantum-attack-resistant zero-knowledge proof system without trusted guidance. Maller et al. [
6] continued to optimize the globally updatable CRS in 2019, reducing the size of the CRS from the square level of the circuit size to a linear level.
Admittedly, the most cumbersome part of the general zero-knowledge proof scheme is expressing the problem in the form of an arithmetic circuit or a Boolean circuit. In 2010, Meiklejohn et al. [
7] implemented a zero-knowledge proof description language system, which has an interpreter that can convert the input program into a special cryptographic language. After that, such zero-knowledge proof circuit representation tools began to develop. For example, Sasson et al. [
8] proposed a virtual machine called TinyRAM in 2011, which can receive NP problems described in C language and execute the proof and verification process in the virtual machine. In 2013, Parno et al. [
9] provided a compiler that can convert C language program into quadratic arithmetic and quadratic span programs. The team continued to improve TinyRAM, and proposed new models vnTinyRAM and scalable TinyRAM. The scalable TinyRAM refers to the idea of recursive proof generation by Valiant et al. [
10] and Bitansky et al. [
11], and realizes recursive proof generation through elliptic curve technology.
The toolchain Pinocchio provides zero-knowledge proof system a near-practical standardized solution, and is still in use today. In addition, there is also another typical zero-knowledge proof circuit generator Pentry proposed by Braun et al. [
12]. Pentry enables computational processes to be stored on untrusted memory, mainly by the prover generating a digest for each computational process. Given the efficiency of automatic circuit generation, Kosba et al. [
13] developed a new compiler xJsnark which optimizes short and extended integer types and supports programming through JAVA. Dimitris et al. [
14] present Zilch, a framework that accelerates and simplifies the deployment of VC and ZKPK for any application transparently, i.e., without the need of trusted setup. Although the current zero-knowledge proof arithmetic circuit generation tools can automatically perform the conversion from calculation to circuit, the conversion efficiency is low, which is also its future optimization direction.
4. Circuit Hierarchical Architecture Design Based on SM3
This section mainly introduces the design ideas of the entire SM3 circuit architecture. First, we determine the zero-knowledge proof framework after implementing the circuit program. Then, the specific program function modules in the circuit implementation are introduced from the horizontal level. Finally, we divide the circuit scale and calling relationship into layers, showing the circuit architecture design at the vertical level.
4.1. Frame Selection
The generation process of zero-knowledge proof is similar to the principle of the compiler. It can be divided into two parts: front-end and back-end. The front-end is responsible for converting the problem that needs to be proved into a form that can be accepted by the zero-knowledge proof protocol, such as arithmetic circuits, etc., while the back-end of zero-knowledge proof generation is responsible for generating zero-knowledge proofs for this specific intermediate form, and indicators such as generation efficiency and security will depend on which zero-knowledge proof protocol the back-end adopts.
Zero-knowledge proof circuit generation is the process of zero-knowledge proof front-end generating an intermediate form of zero-knowledge proof. The general solution to this process is to use circuit generation tools such as Pinocchio, which is similar to the front-end of a compiler. It automatically converts each intermediate variable during statement execution into pin variables of an arithmetic circuit by analyzing the syntax of high-level language statements, and finally generates a complete circuit. However, this automated tool creates a lot of circuit redundancy and reduces the proof efficiency. By manually generating the SM3 circuit, we rationally design the circuit conversion structure, improve the reusability of the sub-circuit structure, and ultimately improve the conversion efficiency and ensure the correctness of the circuit.
A more suitable back-end framework can be selected by analyzing the efficiency of several zero-knowledge proof protocols based on arithmetic circuits.
Table 1 shows that in terms of proof generation time, in addition to zk-STARKs requiring complexity of
, the time complexity of several other zero-knowledge proof schemes is better, which is
. Moreover, Groth16, Sonic, and Plonk can achieve a constant level in proof size, but the proof size of Bulletproofs and zk-STARKs will increase with the logarithmic or logarithmic square speed with the circuit size.
Compared with range proof, membership proof, etc., SM3 preimage proof has multiple rounds of round-robin operation and a larger number of bits of operation data, resulting in enormous circuit size. Therefore, zero-knowledge proof schemes such as Groth16, Plonk, etc., whose proof efficiency has relatively low asymptotic complexity relative to circuit size, will be considered first.
At the same time, it is also necessary to consider the development ecosystem of specific zero-knowledge proof circuits. The libsnark library based on Groth16 is currently the most widely used zero-knowledge proof library, which has a rich built-in circuit library. Developing in C++ makes it a good ecosystem, and it is easy to combine with others’ open-source circuit codes. The purpose of this implementation of SM3-based hash preimage zero-knowledge proof is to allow more application scenarios for state secrets in zero-knowledge proof, so the follow-up application research and development based on this circuit is essential. Therefore, the ecology and ease of use of libsnark give it a great advantage.
4.2. Functional Module Design
From the horizontal level, the architecture of this paper can be divided into the following functional modules.
The variable assignment module assigns each variable of the SM3 algorithm to the zero-knowledge proof circuit carrier object of libsnark, where the variables can be divided into public and private variables. The public variable refers to the public input in the proof, such as the image in the SM3 known preimage proof, while the private variable refers to the secret input in the proof and the intermediate value generated in the calculation process. In addition, the operation process is divided into modulo operation and bit operation. The bit operation needs to split the variable into multiple bits for allocation. When a variable is used numerous times in the calculation, it should not be assigned multiple times, which will cause redundancy of variables and repeated constraints between the same variables.
The variable constraint module generates R1CS constraints from the circuit variables assigned by the previous module, i.e., the values represented by each circuit gate’s input and output pins. Multiple related R1CS constraints can be combined linearly, and the variables generate constraints of the form . This is also the most critical step; namely, a complete arithmetic circuit structure is generated. A, B and C are the matrices that fully represent the constraint coefficients of the circuit, and X represents the vector formed by the variables in the circuit. Only the correct X can make this equation hold.
The evidence generation module computes and assigns the X vectors in the constraints generated by the previous module. The assignment of X as a variable that makes the equation true is called the witness in zero-knowledge proofs. The module first assigns each input variable, and then disassembles each calculation step of the entire calculation circuit. It assigns values to intermediate variables step by step, and finally generates complete evidence that can satisfy the circuit constraints.
4.3. Circuit Hierarchy Design
The architecture design of the vertical layer of the SM3 zero-knowledge proof circuit is divided into layers based on circuit scale and calling relationship. After the low-level circuit modules are encapsulated, low-level modules can be directly contacted by high-level modules, and the lower layer is transparent to the upper layer. Its four-layer structure is divided from bottom to top into:
The auxiliary operation layer: This layer provides the minimum unit circuit required for this design and can realize basic arithmetic circuits such as bit operation and format conversion. All upper-layer circuits are constructed based on these auxiliary circuits;
The core operation layer: This layer provides the most core Boolean function and the arithmetic circuit of the permutation function in the SM3 compression function. The compression function of SM3 involves multiple rounds of iterative operations, and each round of iterative operations involves a large number of Boolean functions and permutation functions in this layer. Therefore, the efficiency of circuit conversion in this layer will significantly affect the final circuit size;
The iterative compression layer: This layer implements the complete circuit of the SM3 compression function to compress a single message block, which can be divided into the message expansion circuit and the argument circuit. This layer implements the basic computing unit of SM3;
Merkle–Damgård layer: This layer connects the circuits of the iterative compression layer in series, thereby realizing the calculation of the SM3 hash value for any length of message input and generating circuit constraints.
The vertical layer design of the circuit is shown in
Figure 4, and each layer of circuit is composed of several sub-circuit modules.
4.3.1. Auxiliary Operation Layer
This XOR3 module generates a two-bit or three-bit XOR circuit.
When there are two forms of a single message word and a bit operation variable in the variable operation, this packing module is used to constrain a single message word variable and 32-bit variables. It is applied in Scenarios in which bit operations and modulo addition operations are alternately performed, such as the computation of and variables in a compression function.
The function of the circuit constraint generated by the modulo operation is to ensure that the non-modulo variable and the modulo variable are consistent in the useful bits. The function of the circuit constraint generated by the modulo operation is to ensure that the non-modulo variable and the modulo variable are consistent in the useful bits. Therefore, it is necessary to use the message word splitting circuit above to establish constraints on the non-modulus variables and multiple bit variables, and then use the message word merging circuit to establish constraints on the lower 32-bit bit variables and the modulo variables, and thus establish constraints before and after the mold.
This circular shift module cyclically shifts the array of bit variables to the left without changing the content represented by a single bit variable, and consequently, it does not affect other bit operations.
4.3.2. Core Operation Layer
The following Boolean function FF is used in the compression function of SM3.
The following Boolean function GG is used in the compression function of SM3.
After two cyclic shifts, the permutation function performs a 32-bit XOR operation on a message word and its value. SM3 uses the following two permutation functions.
4.3.3. Iterative Compression Layer
Message expansion refers to expanding the 16 words divided by the message group into 132 words through two rounds of circulation.
is generated as follows.
is generated as follows.
The input of SM3 compression function is the output of the previous round of compression function and the message word after the block expansion of the current message. This module performs 64 rounds of iterative operations on these two data, uses A, B, C, D, E, F, G, and H registers to store intermediate variables, and finally obtains the 256-bit output of the current compression function. The complete calculation process is as follows, shown in Algorithm 1.
4.3.4. Merkle–Damgård Layer
This layer groups the message input instantiates multiple iterative compression circuit modules according to the grouping situation, and connects them in series. As the private input of the proof, the hash preimage is passed into the message input part of each iterative compression circuit. Correspondingly, as the public input, the hash value is assigned to the output circuit pin of the last iterative compression circuit.
Algorithm 1 SM3 round function algorithm |
Input A, B, C, D, E, F, G, H register initial value Output A, B, C, D, E, F, G, H register update value
1: function ROUNDFUCTION ()
2: 3: while do 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: end while 18: return 19: end function |
4.4. Security Analysis
For the zero-knowledge proof protocol, its security consists of three parts: completeness, soundness, and zero knowledge. Here, we combine its completeness and soundness as correctness and analyze it separately from zero-knowledge.
Verifying zero-knowledge proof requires a probabilistic algorithm, which essentially challenges all circuit gate constraints of the zero-knowledge proof circuit. Only if all R1CS constraints for the computational transformation are correct can it be guaranteed that all intermediate variables will satisfy the constraints when the input evidence is valid.
The circuit hierarchy design in this paper performs the conversion of all calculation steps according to the SM3 standard algorithm. These conversions can be divided into Boolean operation conversion, algebraic operation conversion, and mixed operation conversion. Therefore, the design ensures the correctness of constraint generation from two dimensions of the calculation process and the calculation form.
Zero-knowledge should guarantee that the circuit-generated proofs contain information independent of the circuit’s secret inputs. This property depends on the zero-knowledge proof back-end protocol. Since the Groth16 based on the libsnark framework selected in this design is statistical zero-knowledge, this property can be satisfied.
As described, this circuit is secure in theoretical design, and its correctness can be further verified in the functional test in
Section 4.
5. Implementation and Optimization of SM3 Preimage Zero-Knowledge Proof Circuit
5.1. Circuit Conversion
This section converts the specific operations of each step of SM3 into addition gates and multiplication gates and generates the R1CS constraint relationship.
5.1.1. Auxiliary Operation Circuit
The compression function of SM3 includes not only algebraic operations, such as modulo
addition, but also various Boolean operations, such as
and
Boolean functions. Algebraic operations use a 32-bit message word as the basic operation unit, while Boolean operations operate on a per-bit basis. Therefore, for mixed operations involving algebraic and Boolean operations, the 32-bit message word needs to be continuously converted between a single digit and a 32-bit existence. For example, for message word
X, its binary representation is
. Its algebraic form and the equation relationship between each bit based on addition and multiplication are:
Furthermore, the three variable terms of its R1CS constraint are:
In addition to the constraint between a single bit and a complete message word, there is an implicit constraint. Since there is no data type for variables in the circuit, additional constraints need to be placed on the variables representing a single bit to ensure its ’bit character’; namely, any bit
x after the expansion of the message word should satisfy:
The circuit is implemented based on the combination of three single-bit numbers. For the single-bit Boolean operation
, it can be divided into two steps, where
is the calculation result, and
is the calculated intermediate value:
Convert XOR operation to addition and multiplication:
Further, for the XOR operation of three 32-bit message words, it is only necessary to use the message word-splitting circuit to split and constrain each message word and generate the circuit for the message bits at the corresponding position. Finally, the calculated message bits are combined into a single message word output through the message word combining circuit.
When calculating the intermediate variables , and , the compression function of SM3 performs a modulo addition operation, which leads to the establishment of a constraint relationship between the effective bits before and after the modulo. For the modulo operation in the modulo circuit, the message word splitting operation is used to convert it into message bits, and then the valid bits are constrained to be equal.
For instance, given the message word
and
:
Convert modulo operation to addition and multiplication:
The cyclic shift operation used in the SM3 calculation process is to cyclically shift the message word to the right; namely, cyclically shift the original message word
to the right
i times to obtain new message word
:
Convert shift operation to addition and multiplication:
5.1.2. Core Operation Circuit
The Boolean function
is a piecewise function, which corresponds to the 64-round function of SM3. Its
part uses the three-digit XOR circuit in the auxiliary operation circuit described above for conversion; and for the operation
, it is converted into the bit form
by message word splitting and merging module in the auxiliary arithmetic circuit. For the single-bit form, the Boolean-converted form is:
Finally, the individual message bits are combined into a complete message word using the Message word merging module in the auxiliary arithmetic circuit.
Similarly, the Boolean function
is the piecewise function, which corresponds to the 64-round function of SM3. Its
uses the three-digit XOR circuit in the auxiliary operation circuit described above for conversion. For the operation
, it is converted into the bit form
by message word splitting and merging module in the auxiliary arithmetic circuit. For the single-bit form, the Boolean-converted form is:
Likewise, the individual message bits are combined into a complete message word using the Message word merging module in the auxiliary arithmetic circuit.
The permutation function equations used by SM3 involve two operations of cyclic shift and three-digit XOR. Thus, the corresponding permutation function circuit and R1CS constraint can be generated only by combining the circular shift circuit and the three-digit XOR circuit in the auxiliary operation circuit.
5.1.3. Iterative Compression Circuit
The message expansion calculation includes two iterative forms to expand the message word to 132. For the message expansion calculation , which can pass from the inside to the outside through the cyclic shift circuit for the three-digit XOR circuit and the permutation function circuit of the lower circuit nested combination. For the message expansion calculation , a three-number XOR circuit can be used, inputting the first two message words and setting the third number to the constant zero.
Since each round function performs multiple calculations, each calculation can realize the conversion of the addition and multiplication circuits by multiplexing the underlying circuit. The multiplexing relationship between each calculation expression of the SM3 round function and the sub-circuit modules is shown in
Table 2.
5.1.4. Merkle–Damgård Circuit
This circuit is at the top of the circuit layered architecture of this design, and is also the input and output circuit for the SM3 hash preimage zero-knowledge proof. The input variable of this circuit is the SM3 preimage x, the output variable is the hash value y, and the constraint relationship to be proved is . Next, the constraint relationship will be built using the lower-level circuit module to construct a multi-message block computing circuit.
When SM3 preprocesses the message, it will pad the message to make the message length an integer multiple of 512. This step will generate a large number of intermediate variables and constraints between variables, resulting in redundant circuit size, especially when the length to be padded is long.
Since the 64-bit bit string at the end of the padded message is the binary representation of the length of the original message, the original message can be directly restored through the padded message. Due to the fact that this process is not part of a one-way function, implementing message stuffing outside the circuit without including this process into the zero-knowledge proof circuit does not compromise security.
After the message padding preprocessing is completed, the message is divided into blocks
according to the size of 512 bits. For the SM3 calculation using the Merkle–Damgård structure, the corresponding constraint for each message block is
. Based on this basic calculation unit, the constraint relation
can be transformed into the constraints of the image and preimage of
n compression functions
:
Where CF is the compression function and is the initial value of registers .
Finally, for the constraint of a single compression function, it contains 116 rounds of message expansion and 64 rounds of round functions, which can be realized by splicing the message expansion circuit and the round function circuit in the iterative operation circuit layer.
5.2. Circuit Implementation Paradigm
Regarding the realization of specific circuits, there are two types of general-purpose circuit generator implementations and dedicated implementations. The former only need to output a C program with formatted input and output description arguments to a circuit generator to automatically generate a circuit in the form of R1CS, such as the circuit generator Pinocchio. In contract, after designing the circuit structure manually, we realized the sub-circuit modules one by one from the bottom up.
The code implementation of this paper is based on the libsnark framework. The libsnark code library provides various circuit base classes, and each base class module implements the member methods of circuit instantiation, arithmetic constraint binding and internal state calculation. Therefore, all circuit modules in this paper will inherit these base classes for implementation.
Each circuit type inherits from the circuit module base class. Its member variables include sub-circuit objects and intermediate variable objects. Sub-circuit objects refer to all sub-circuits that can be reused by the current circuit type. Intermediate variable objects refer to all intermediate variables not included in the sub-circuit in the current circuit calculation process. Its member methods are circuit constructor, constraint generation function and state calculation function; corresponding to the circuit generation paradigms are circuit instantiation, arithmetic constraint binding, and internal state calculation.
5.2.1. Circuit Instantiation
Circuit instantiation refers to generating circuit instances for each intermediate variable of a computational process. For each zk-SNARKs proof, libsnark will initialize a blank circuit board object. The intermediate variables corresponding to each circuit line in the circuit board belong to the specified finite field, and each intermediate variable has a unique label on the circuit board.
The circuit instantiation step is carried out in the constructor of the circuit object, which takes the input and output variables of the current circuit as function parameters to construct the variables inside the circuit and creates unique variables for these internal variables on the entire proof board label. Starting from the input variable of the circuit, after each round of calculation, the newly generated intermediate value of the calculation needs to be allocated a unique variable in the circuit board. If the variable requires message word splitting operation, the corresponding bit variable needs to be allocated additionally.
5.2.2. Arithmetic Constraint Binding
After the circuit is instantiated, all intermediate variables of the current zero-knowledge proof calculation are bound to the corresponding wires in the circuit with unique labels, and these wires are independent of each other. The arithmetic constraint binding step is to constrain these wires to each other through the multiplication gate and the addition gate. This step only needs to input the allocated circuit variables according to the R1CS constraint form given in
Section 3 into the add constraint interface provided by libsnark, which can bind these variables to specified constraints.
Add R1CS constraint form steps as:
- (1)
According to the previous conversion results, extract the three parts (A, B, and C) of the R1CS constraint form, such as Equations (13) to (15);
- (2)
A, B, and C contain the addition and multiplication of one or more variables, called linear combinations of circuit variables. In the implementation process, these variables are formed into corresponding linear combination objects through the interface provided by libsnark;
- (3)
Input the linear combination objects , , and as the parameters of the R1CS constraint interface added in libsnark to generate corresponding constraint object.
5.2.3. Internal State Calculation
Only if the prover has the correct input message can all intermediate variables in the SM3 execution process be calculated correctly. Thus, all the R1CS constraints generated by the arithmetic constraint binding process are satisfied, and a correct proof is provided.
Any sub-circuit only exposes the circuit input and output but hides the internal variables. The internal state calculation process is to assign the input variable of the circuit and then calculate the value of the internal variable step by step according to the calculation process of the circuit. Then, we assign it to the internal variable set in the previous step until the entire calculation process is completed.
There are two points to note. First, the internal state calculation method only depends on the calculation itself. If the calculation itself is a mixed operation, it can be directly implemented by the corresponding operator without considering the conversion to addition and multiplication. Second, attention should be paid to the form in which the internal variables exist. If the input is an integer on a finite field, and the circuit variables exist in the form of bits, the corresponding conversions should be performed.
5.3. Circuit Implementation Optimization
Under the premise that the circuit conversion process is correct, it is necessary to optimize the circuit implementation further, reduce circuit redundancy, and improve the proof efficiency. Hence, three main optimizations have been made in the specific implementation process.
5.3.1. Avoid Complex Control Flow
The SM3 algorithm is an arithmetic-based algorithm that does not involve complex control flow or memory access. If the Pinocchio circuit generator automatically analyzes the C program and generates the circuit, it will generate a large number of redundant variables on some control flow statements. A constraint circuit will be generated between the redundant variables, resulting in the circuit being too large. Consequently, the implementation process of this paper actively avoids the characteristics of high-level programming language control flow and memory access, as a result of which the coding process is mainly based on assignment, loop, and conventional operations.
5.3.2. Reuse Temporary Variables
In different life cycles of program execution, the same intermediate calculation value may be assigned to different temporary variables. Multiple circuit variables will be assigned to the circuit after analysis by the general zero-knowledge proof circuit generation tool. The paper binds these temporary variables that represent the same intermediate value in different life cycles to the libsnark circuit board as a globally unique circuit variable, which realizes the reuse of temporary variables.
5.3.3. Preprocess Message Padding
In this paper, the message padding step is carried out outside the zero-knowledge proof circuit to avoid a large amount of circuit redundancy due to the variable-length variable allocation and constraint operation of message padding. In the implementation process, a program unrelated to the zero-knowledge proof circuit will be used to pad the input message according to the SM3 padding rule to generate a message block with multiple 512 bits, which will be used as the fixed-length input of the SM3 zero-knowledge proof circuit.
6. Test and Analysis
6.1. Test Environment
This paper tests the implemented SM3 zero-knowledge proof circuit from two perspectives of functional and performance. The test uses the Ubantu operating system and the libsnark zero-knowledge proof framework regarding the physical machine test environment. Moreover, we use Pinocchio as a contrast circuit generator.
The specific physical machine test environment is shown in
Table 3.
6.2. Functional Test
The functional test is used to test whether the SM3 zero-knowledge proof circuit is implemented correctly, and it is also the most basic test. Likewise, the correctness of the zero-knowledge proof is an essential part of its security; hence, this functional test also verifies the security of the SM3 zero-knowledge proof circuit implementation from the perspective of correctness.
Typically, y is the public input of the zero-knowledge proof system, and x is the secret input. We randomly select the input message x, and calculate the SM3 value . The prover holds the message , generates about through the interface implemented in this paper, and completes the zero-knowledge proof . The verifier inputs the zero-knowledge proof obtained from the prover and the public input y to be verified through the interface. The verification program is run to obtain the verification result.
The correctness of the SM3 zero-knowledge proof circuit is verified by changing the secret input in the test.
Table 4 selects six sets of test cases from 1000 tests for display.
Figure 5 and
Figure 6 show the test cases of test number 1 and test number 2 in
Table 4, respectively, and the two are used as a set of test comparisons. Eventually, the 1000 sets of tests prove to be 100% correct.
6.3. Performance Test
This article provides performance tests in both longitudinal and horizontal dimensions. The longitudinal test is used to compare the circuit size gap between the efficient implementation of the SM3 zero-knowledge proof circuit and the general zero-knowledge proof circuit generation tool Pinocchio, which reflects the performance difference between the designed circuit structure and the automatically generated circuit structure. The performance of our SM3 scheme is reflected by comparing the efficiency of manual generation of the SHA256 hash function circuit. The horizontal test is used to compare the circuit size gap between this implementation and the most efficient implementation of the SHA256 zero-knowledge proof circuit. The horizontal test compares the circuit design efficiency based on the calculation process of the hash function itself.
The performance test results are shown in
Figure 7. A total of four sets of data are tested, representing the optimal manual generation and the automatic generation of the circuit size by Pinocchio of SM3 and SHA256.
It can be seen from the longitudinal test results that the size of the circuit implemented in this paper is about half of the size of the SM3 circuit generated by the general-purpose zero-knowledge proof circuit generation tool Pinocchio. Analyzed in principle, Pinocchio will generate too many repeated intermediate variables during the conversion process, and constraining these repeated intermediate variables will cause a lot of redundancy [
36]. The manual circuit generation process reduces repetitive variables and improves circuit generation efficiency.
According to current results from industry, the optimal practice of the SHA256 zero-knowledge proof circuit generates 27,904 circuit gates, while the number of circuit gates generated by the automatic circuit generation tool reaches 58,160 [
37]. Both the implementation of SM3 in this paper and the rest of the research SHA256 is achieved via manual circuit generation to reduce intermediate variable redundancy and reduce circuit size. Taking SHA256 as the comparison object, it can be seen from the horizontal test results that the optimal practice of SHA256 reduces the circuit size by half, which is comparable to the optimization efficiency of this paper.
SM3 and SHA256 have certain similarities in iterative structure and some logical functions. The reason why SM3 is larger than SHA256 circuit is that, on the one hand, SM3 generates more than 132 message words than SHA256 through message expansion to increase the complexity; on the other hand, the iterative compression step of SM3 involves more mixed operations, requiring frequent message word splitting and merging operations.
Moreover, Poseidon is a hash function specially designed for zero-knowledge proof calculations. As can be seen from
Table 5, although the size of SM3 is reduced to nearly 30,000 in this paper, the size of the final circuit generated by Poseidon is only a few hundred gates due to the use of a simple Substitution box (S-box). However, a SNARK-friendly hash function such as Poseidon cannot reach the strength of SM3 and SHA256 in terms of security; as aresult, this SM3 zero-knowledge proof circuit implementation is irreplaceable for scenarios with high security requirements.
Correspondingly, we also test the generation and verification time of the zero-knowledge proof circuit. This paper mainly focuses on designing and implementing the zero-knowledge proof circuit based on SM3 and completes the zero-knowledge proof process through a fixed back-end library. The current industrial zero-knowledge proof generation time is in the order of seconds, while the verification time is in the order of milliseconds. According to
Figure 5 and
Figure 6 above, when the hash input is short, the zero-knowledge proof verification time of SM3 circuit designed in this paper is about 14 ms, which is in line with the industry average. Since this paper focuses on testing the SM3 circuit, the implementation is prefilled, and there is no CRS trusted setup and other links. Therefore, the generation time of SM3 circuit in the zero-knowledge proof generation is 103 ms in this paper with high efficiency.
Further, after multiple tests, we show that the circuit size and zero-knowledge proof time increase linearly with the input message’s length in
Figure 8. The SM3 algorithm requires that the input message be padded to a multiple of 512 bits. Only one message block is needed when the hash input is a short message and the input message is less than 512 bits. At this time, the SM3 circuit size is 32,836. As the input message’s length grows, the message needs to be padded into multiple message blocks. In the implementation, multiple SM3 circuits are connected in series, multiplying the total circuit size, generation and verification time.
To summarize, this SM3 zero-knowledge proof circuit reduces the circuit size by half compared to the automatic circuit generation tool. In comparison, half of the performance improvement is equivalent to the best practice efficiency of the SHA256 zero-knowledge proof circuit, which proves the excellent performance of the scheme from the side. When the proof time complexity of the zero-knowledge proof back-end protocol is , the circuit optimization effect will further reduce the zero-knowledge proof time by more than half.
7. Application Design and Implementation of Zero-Knowledge Proof Based on SM3
Cryptography is one of the cornerstones of blockchain, and cryptography concepts are widely used in blockchain. This section introduces several zero-knowledge proof protocols that combine elliptic curve discrete logarithms and hash chains. These efficient zero-knowledge proof protocols based on SM3 can improve the overall operating efficiency and security of the system in the blockchain. Both these protocols use the libsnark library for programming based on the realized SM3 zero-knowledge proof circuit, and follow the above circuit generation paradigm for programming.
7.1. Proof of Equivalence between SM3 and Elliptic Curve Discrete Logarithm
Since the elliptic curve discrete logarithm cryptosystem can provide more robust security under the same key length, many cryptographic algorithms based on the elliptic curve discrete logarithm problem have a wide range of practical applications. For instance, the electronic signature algorithms are implemented by elliptic curve ECDSA and EdDSA, the ElGamal Encryption Algorithm, and the Hybrid Encryption Scheme based on Elliptic Curve and Pedersen commitments in the form of elliptic curves, etc. The elliptic curve signature algorithm accounts for an important proportion in the blockchain, and ECDSA appears in many famous blockchain projects, such as Bitcoin and Ethereum.
These algorithms all use elliptic curves (secp256k1, curve25519, or p521) for calculation, their private key is an integer, and the public key is the corresponding point on the elliptic curve. Combining elliptic curve discrete logarithms with SM3 can construct more practical privacy-preserving protocols.
7.1.1. Protocol Design
As mentioned above, the protocol is programmed with the libsnark library, and its steps are as follows.
- (1)
The prover selects an integer from as the secret number x to be proved;
- (2)
The prover multiplies the secret number x by the base point of the elliptic curve to calculate the point on the elliptic curve;
- (3)
The prover calculates the SM3 hash value of the secret number x;
- (4)
The prover adopts a general zero-knowledge proof scheme, taking x as the secret input, and as the public input.
Then, the prover generates the proof .
The verifier runs the polynomial time verification algorithm of the zero-knowledge proof scheme to verify the proof’s correctness.
7.1.2. Protocol Implementation
The protocol implementation steps are as follows.
- (1)
Implement the message padding algorithm of SM3 outside the circuit, i.e., convert the private input into bit form, and allocate additional bit variables to 512-bit integer multiples on this basis;
- (2)
Use the padded bit sequence as the input of the SM3 circuit;
- (3)
Use the unfilled original bit sequence as the input of the elliptic curve calculation;
- (4)
In the elliptic curve calculation circuit, a message word merging circuit is used to combine the input bit sequence into an elliptic curve calculation unit, and an appropriate elliptic curve is selected to calculate the points on the elliptic curve;
- (5)
Take the SM3 value to be proved and the elliptic curve point as the output value of the two circuits, respectively;
- (6)
Constraints and intermediate variable values are generated for the complete circuit according to the circuit generation paradigm, and a zero-knowledge proof is obtained.
This paper finally selects the 256-bit elliptic curve sm2p256v1 in the prime number field in the SM2 national secret elliptic curve encryption [
38] as the standard, and its parameters are:
,
,
.
7.1.3. Test and Analysis
We carry out the functional test of implementing the SM3 zero-knowledge proof circuit. The results show that the verification fails when any of the elliptic curve number multiplication relationships and the SM3 calculation relationship between the secret number input in the test case and the public input are not satisfied. Thus, the protocol is correctly implemented.
Likewise, we carry out the performance test of implementing the SM3 zero-knowledge proof circuit. When the secret number padding length is 512 bits, the circuit size is 33,796. It can be seen from the protocol design that the circuit complexity mainly depends on the SM3 circuit. The part of the elliptic curve discrete logarithm proof is relatively small, and the generation and verification time are also close to the SM3 circuit. After analysis, the generation time is about 700 ms, and the verification time is about 17 ms. Therefore, this paper’s optimization of the SM3 circuit will significantly improve the efficiency of this type of protocol implementation.
7.2. Hash Chain Proof Based on SM3
Hash chains have a wide range of applications in key generation and blockchain. For instance, the hash chain uses the one-way characteristic of the hash function to generate a one-time key. The user and the server are verified by the adjacent output of the hash chain. After the verification, the key is updated to the last output on the hash chain. In addition, according to the seriality of the hash chain calculation process, it can be used as a weakly verifiable delay function [
39], which can be applied to time-constrained blockchain scenarios such as space–time proofs.
7.2.1. Protocol Design
For a hash chain with a hash function of SM3 and a length of n, the prover needs to prove that it holds the input message x so that a specific public value is established, and other information about x is not leaked.
Similar to the hash preimage proof, this hash chain proof only contains one private input and one public input, which are the message x and the public value y, respectively. The prover executes the general zero-knowledge proof algorithm to prove . The verifier runs the polynomial time verification algorithm of the zero-knowledge proof scheme to verify the proof’s correctness.
7.2.2. Protocol Implementation
Since the topology of the hash chain is chain-like, its programming implementation is relatively simple, and the protocol can be implemented only by connecting the SM3 circuits above in series.
7.2.3. Test and Analysis
We perform multiple sets of tests on this circuit implementation. When the input message’s length is 512 bits, the length of the hash chain is adjusted for comparison. When the hash chain length is one, the circuit degenerates into an SM3 preimage proof circuit with a size of 32,836; when the hash chain length increases to n, the circuit size is 32,836n, which is consistent with the theoretical analysis size. The performance of the circuit mainly depends on the performance of the SM3 circuit, and both the proof generation and verification time are proportional to the SM3 according to the hash chain length.
8. Conclusions
This paper summarizes the current usage scenarios and implementation challenges of SM3 hash preimage zero-knowledge proof, studying the general zero-knowledge proof protocol and the generation technology of a zero-knowledge proof circuit. Moreover, we analyze the basic process and primary consideration factor for implementation of SM3 zero-knowledge proof circuit. On this basis, a four-layer SM3 layered circuit structure is designed, and an elaborate scheme is given for the conversion process of the sub-circuits in each layer. After that, based on the libsnark framework, we implement all sub-circuit modules recursively from top to bottom in accordance with the designed circuit implementation paradigm. Accordingly, we entirely realize the SM3 zero-knowledge proof circuit, which successfully passes the functional and performance tests. In addition, we extract several standard privacy protection requirements in the blockchain field and design an extended zero-knowledge proof protocol combined with SM3.
Although the current research on zero-knowledge proof mainly focuses on the back-end protocol’s efficiency and security, it is always a tedious task to convert the problem to be proved into a zero-knowledge proof circuit. Admittedly, it is impossible for users of zero-knowledge proof technology to manually implement corresponding zero-knowledge proof circuits for all NP problems. Therefore, it is hoped that in the future development of this field, researchers will further investigate automatic conversion tools with high conversion efficiency to reduce circuit redundancy and bring the circuit size close to the theoretical limit.