Generic constructions of PoRs from codes and instantiations

Julien Lavauzelle; Françoise Levy-dit-Vehel

doi:10.1515/jmc-2018-0018

Open Access Published by De Gruyter February 19, 2019

Generic constructions of PoRs from codes and instantiations

Julien Lavauzelle and Françoise Levy-dit-Vehel

From the journal Journal of Mathematical Cryptology

https://doi.org/10.1515/jmc-2018-0018

Abstract

In this paper, we show how to construct – from any linear code – a Proof of Retrievability (𝖯𝗈𝖱) which features very low computation complexity on both the client (𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋) and the server (𝖯𝗋𝗈𝗏𝖾𝗋) sides, as well as small client storage (typically 512 bits). We adapt the security model initiated by Juels and Kaliski [PoRs: Proofs of retrievability for large files, Proceedings of the 2007 ACM Conference on Computer and Communications Security—CCS 2007, ACM, New York 2007, 584–597] to fit into the framework of Paterson, Stinson and Upadhyay [A coding theory foundation for the analysis of general unconditionally secure proof-of-retrievability schemes for cloud storage, J. Math. Cryptol. 7 2013, 3, 183–216], from which our construction evolves. We thus provide a rigorous treatment of the security of our generic design; more precisely, we sharply bound the extraction failure of our protocol according to this security model. Next we instantiate our formal construction with codes built from tensor-products as well as with Reed–Muller codes and lifted codes, yielding 𝖯𝗈𝖱s with moderate communication complexity and (server) storage overhead, in addition to the aforementioned features.

Keywords: Proof of retrievability; error-correcting code; cloud storage

MSC 2010: 11T71

1 Introduction

1.1 Motivation

Cloud computing and storage has evolved quite spectacularly over the past decade. Especially, data outsourcing allows users and companies to lighten their storage burden and maintenance cost. Though, it raises several issues: for example, how can someone check efficiently that he can retrieve without any loss a massive file that he had uploaded on a distant server and erased from his personal system?

Proofs of retrievability (𝖯𝗈𝖱s) address this issue. They are cryptographic protocols involving two parts: a client (or a verifier) and a server (or a prover). 𝖯𝗈𝖱s usually consist in the following phases. First, a key generation process creates secret material related to the file, meant to be kept by the client only. Then the file is initialised, that is, it is encoded and/or encrypted according to the secret data held by the client. This processed file is uploaded to the server. In order to check retrievability, the client can run a verification procedure, which is the core of the 𝖯𝗈𝖱. Finally, if the client is convinced that the server still holds his file, the client can proceed at any time to the extraction of the file.

Several parameters must be taken into account. Plainly, the verification process has to feature a low communication complexity, as the main goal is to avoid downloading a large part of the file to only check its extractability. Second, the storage overhead induced by the protocol must be low, as large server overhead would imply high fees for the customer. Third, the computation cost of the verification procedure must be low, both for the client (which is likely to own a lightweight device) and the server (whose computation work could also be expensive for the client).

Notice that proofs of data possession (𝖯𝖣𝖯) represent protocols close to what is needed in 𝖯𝗈𝖱s. However, in 𝖯𝖣𝖯s, one does not require the client to be able to extract the file from the server. Instances of 𝖯𝖣𝖯s are given by Ateniese et al. [2]. Besides, protocols of Lillibridge et al. [8] and Naor and Rothblum [10] are very often seen as precursors for 𝖯𝗈𝖱s. For instance, the work of Naor and Rothblum [10] considers a setting in which the client directly accesses the file stored by the prover/server (while the actual 𝖯𝗈𝖱 definition uses “an arbitrary program as opposed to a simple memory layout and this program may answer these questions in an arbitrary manner” [14]).

1.2 Previous work

Juels and Kaliski [6] gave the first formal definition of 𝖯𝗈𝖱s. They also proposed a first construction based on so-called sentinels (namely, random parts of the file to be checked during the verification step) the client keeps secretly on his device. Additionally, an erasure code ensures the integrity of the file to be extracted. This seminal work also raised several interesting points. On the one hand, it revealed that (i) the client must store secret data to be used in the verification step and (ii) coding is needed in order to retrieve the file without erasures or errors. On the other hand, in Juels and Kaliski’s construction, the verification step can only be performed a finite number of times since sentinels cannot be reused endlessly.

As a consequence, Shacham and Waters proposed to consider unbounded-use𝖯𝗈𝖱s in [14], where they built two kinds of 𝖯𝗈𝖱s. The first one is based on linear combinations of authenticators produced via pseudo-random functions; its security was proved using cryptographic tools such as unforgeable MAC scheme, semantically secure symmetric encryption and secure PRFs. The second one is a publicly verifiable scheme based on the Diffie–Hellman problem in bilinear groups.

Bowers, Juels and Oprea [3] adopted a coding-theoretic approach (inner code, outer code) to compare variants of Shacham–Waters and Juels–Kaliski schemes. They focused on the efficiency of the schemes, and proved that, despite bounded use, new variants of Juels–Kaliski construction are highly competitive compared to other existing schemes.

In [11], Paterson, Stinson and Upadhyay provide a general framework for 𝖯𝗈𝖱s in the unconditional security model. They show that retrievability of the file can be expressed as error correction of a so-called response code. That allows them to precisely quantify the extraction success as a function of the success probability of a proving algorithm: indeed, in this setting, extraction can be naturally seen as nearest-neighbour decoding in the response code. They notably apply their framework to prove the security of a modified version of the Shacham–Waters scheme. Also, notice that, prior to [11], Dodis, Vahan and Wichs [4] proposed another coding-theoretic model for 𝖯𝗈𝖱s that allowed them to build efficient bounded-use and unbounded-use 𝖯𝗈𝖱 schemes.

With practicality in mind, other features have been deployed on 𝖯𝗈𝖱s. For instance, Wang et al. [15] presented a 𝖯𝗈𝖱 construction based on Merkle hash trees, which allows efficient file updates on the server. Their scheme is provably secure under cryptographic assumptions (hardness of Diffie–Hellman in bilinear groups, unforgeable signatures, etc.) and has been improved by Mo, Zhou and Chen [9] in order to prevent unbalanced trees. More recently, other features have been proposed for 𝖯𝗈𝖱s, such as multi-prover 𝖯𝗈𝖱s (see [12]) or public verifiability (for instance in [13]).

1.3 Our approach

As we remarked before, most 𝖯𝗈𝖱 schemes rely on two techniques: (i) the client locally stores secret data in order to check the integrity of the file, and (ii) the client encodes the file in order to repair a small number of erasures and errors that could have been missed during the verification step.

In this work, we propose to build 𝖯𝗈𝖱 schemes using codes that fulfil the two previous goals, when equipped with a suitable family of efficiently computable random permutations. More precisely, our idea is the following. Given a file F, a code 𝒞 and a family of random permutations σK, the client sends to the server an encoded and scrambled version σK⁢(𝒞⁢(F)) of his file. Then the verification step consists in checking “short” relations among descrambled symbols of w=𝒞⁢(F), which come, for instance, from low-weight parity-check equations for 𝒞. Moreover, during the extraction step, the code 𝒞 provides the redundancy necessary to repair erasures and potential unnoticed errors.

In the present work, we develop a seminal idea that appeared in [7], where the authors proposed a construction of 𝖯𝗈𝖱s based on lifted codes. We here provide a more generic construction and give a deeper analysis of its security.

While our scheme does not feature updatability nor public verifiability, we emphasise the genericity of our construction, which is based on well-studied algebraic and combinatorial structures, namely, codes and their parity-check equations. Moreover, since the code 𝒞 is public, the client must only store the secret material associated to the random permutations σK, which consist in a few bytes. Besides, an honest server simply needs to read pieces of w during the verification step, and therefore has very low computational burden compared to many other 𝖯𝗈𝖱 schemes.

1.4 Organisation

Section 2 is devoted to the definition and security model of proofs of retrievability. Despite the great disparity of models in 𝖯𝗈𝖱 literature, we try to keep close to the definitions given in [6, 11] for the sake of uniformity.

Section 3 presents our construction of 𝖯𝗈𝖱. Precisely, in Section 3.1, we introduce objects called verification structures for a code 𝒞 that will be used in the definition of our 𝖯𝗈𝖱 scheme (Section 3.2). A rigorous analysis of our scheme is the purpose of the remainder of that section.

The performance of our generic construction is given in Section 4. We then provide several instances in Section 5, proving the practicality of our 𝖯𝗈𝖱 schemes for some classes of codes.

2 Proofs of retrievability

2.1 Definition of underlying protocols

We recall that, in proofs of retrievability, a user wants to estimate if a message m can be retrieved from a encoded version w of the message stored on a server. In all what follows, the user will be known as the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 (wants to verify the retrievability of the message) while the server is the 𝖯𝗋𝗈𝗏𝖾𝗋 (aims at proving the retrievability). The message space is denoted by ℳ while 𝒲, the (server) file space, is the set of encoded versions of the messages. We also denote by 𝒦 the set of secret values (or keys) kept by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋, and by ℛ the space of responses to challenges.

Throughout the paper, the symbols ←R and ← respectively denote the output of randomised and deterministic algorithms.

Definition 2.1.

A keyed proof of retrievability (𝖯𝗈𝖱) is a tuple of algorithms (𝖪𝖾𝗒𝖦𝖾𝗇, 𝖨𝗇𝗂𝗍, 𝖵𝖾𝗋𝗂𝖿𝗒, 𝖤𝗑𝗍𝗋𝖺𝖼𝗍) running as follows:

The key generation algorithm 𝖪𝖾𝗒𝖦𝖾𝗇 generates uniformly at random a key κ←R𝒦. The key κ is secretly kept by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋.
The initialisation algorithm𝖨𝗇𝗂𝗍 is a deterministic algorithm which takes, as input, a message m∈ℳ and a key κ∈𝒦, and outputs a file w∈𝒲. 𝖨𝗇𝗂𝗍 is run by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 which initially holds the message m. After the process, the file w is sent to the 𝖯𝗋𝗈𝗏𝖾𝗋, and the message m is erased on 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋’s side. Upon receipt of w, the 𝖯𝗋𝗈𝗏𝖾𝗋 sets a deterministic algorithm 𝖯(w) that will be run during the verification procedure.
The verification algorithm𝖵𝖾𝗋𝗂𝖿𝗒 is a randomised algorithm initiated by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 which needs a secret key κ∈𝒦 and interacts with the 𝖯𝗋𝗈𝗏𝖾𝗋. 𝖵𝖾𝗋𝗂𝖿𝗒 is depicted in Figure 1 and works as follows:
1. the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 runs a random query generator that outputs a challenge u←R𝒬 (the set 𝒬 being the so-called query set);
2. the challenge u is sent to the 𝖯𝗋𝗈𝗏𝖾𝗋;
3. the 𝖯𝗋𝗈𝗏𝖾𝗋 outputs a response ru←𝖯(w)⁢(u)∈ℛ;
4. the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 checks the validity of ru according to u and κ; the algorithm 𝖵𝖾𝗋𝗂𝖿𝗒 finally outputs the Boolean value 𝖢𝗁𝖾𝖼𝗄⁢(u,ru,κ).
The extraction algorithm𝖤𝗑𝗍𝗋𝖺𝖼𝗍 is run by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋. It takes, as input, κ and r=(ru:u∈𝒬)∈ℛ𝒬 and outputs either a message m′∈ℳ or a failure symbol ⊥. We say that extraction succeeds if 𝖤𝗑𝗍𝗋𝖺𝖼𝗍⁢(r,κ)=m.

The vector r=(ru←𝖯(w)(u))u∈𝒬∈ℛ𝒬 is called the response word associated to 𝖯(w).

$Figure 1 Definition of the algorithm 𝖵𝖾𝗋𝗂𝖿𝗒{\mathsf{Verify}}.$

Figure 1

Definition of the algorithm 𝖵𝖾𝗋𝗂𝖿𝗒.

Note that, in assuming that the response algorithm 𝖯(w) is deterministic and non-adaptive^[1], we follow the work of Paterson, Stinson and Upadhyay [11]. The authors justify determinism of response algorithms by the fact that any probabilistic prover can be replaced by a deterministic prover whose success probability is at least as good as the probabilistic one.

In Definition 2.1, we can see that a deterministic algorithm 𝖯(w) can be represented by the vector of its outputs r=(𝖯(w)⁢(u))u∈𝒬, called the response word of 𝖯(w). Therefore, we can assume that, before the verification step, the 𝖯𝗋𝗈𝗏𝖾𝗋 produces a word r(w)∈ℛ𝒬 related to the file w he holds. In other words, we model provers as algorithms 𝖯 which, given as input w, return a word r∈ℛ𝒬.

Following [11], we also assume in this chapter that the extraction algorithm 𝖤𝗑𝗍𝗋𝖺𝖼𝗍 is deterministic, though, in general, it can be randomised. Finally, notice that proofs of retrievability aim at proving the extractability of a file. The extraction algorithm is therefore a tool to retrieve the whole file. Hence its computational efficiency is not a crucial feature.

Table 1 summarises the information held by each entity after the initialisation step. Table 2 reports the inputs and outputs of the algorithms involved in a 𝖯𝗈𝖱.

Table 1

Information held by each entity after the initialisation step.

𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋	𝖯𝗋𝗈𝗏𝖾𝗋
κ	w

Table 2

Inputs and outputs of the algorithms involved in a 𝖯𝗈𝖱.

Algorithm	𝖪𝖾𝗒𝖦𝖾𝗇	𝖨𝗇𝗂𝗍	𝖵𝖾𝗋𝗂𝖿𝗒	𝖢𝗁𝖾𝖼𝗄	𝖤𝗑𝗍𝗋𝖺𝖼𝗍
Input	1λ	m, κ	r, κ	u, ru, κ	r, κ
Output	κ	w	True or False	True or False	m′ or ⊥

2.2 Security models

One should first notice that, despite many efforts, proofs of retrievability lack a general agreement on the definition of their security model. Nevertheless, our definitions remain very close to the ones given in the original work of Juels and Kaliski [6].

For a response word r∈ℛ𝒬 given by the 𝖯𝗋𝗈𝗏𝖾𝗋 and a key κ∈𝒦 kept by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋, we first define the success of r according to κ as

succ(r,κ):=Pru(𝖢𝗁𝖾𝖼𝗄(u,ru,κ)=𝚃𝚛𝚞𝚎),

where the probability is taken over the internal randomness of 𝖵𝖾𝗋𝗂𝖿𝗒. A first security model can be defined as follows.

Definition 2.2 (Security model, strong version).

Let ε,τ∈[0,1]. A proof of retrievability (𝖪𝖾𝗒𝖦𝖾𝗇,𝖨𝗇𝗂𝗍,𝖵𝖾𝗋𝗂𝖿𝗒,𝖤𝗑𝗍𝗋𝖺𝖼𝗍) is strongly (ε,τ)-sound if, for every initial file m∈ℳ, every uploaded file w∈𝒲 and every prover 𝖯:𝒲→ℛ𝒬, we have

(2.1)Pr⁡(𝖤𝗑𝗍𝗋𝖺𝖼𝗍⁢(r,κ)≠msucc⁡(r,κ)≥1-ε|κ←R𝖪𝖾𝗒𝖦𝖾𝗇⁢(1λ)w←𝖨𝗇𝗂𝗍⁢(m,κ)r←𝖯⁢(w))≤τ,

the probability being taken over the internal randomness of 𝖪𝖾𝗒𝖦𝖾𝗇 under the constraint that w=𝖨𝗇𝗂𝗍⁢(m,κ).

A remark concerning parameters ε and τ

In proofs of retrievability, we aim at making the extraction of the desired file m as sure as possible when the audit succeeds. Hence it is desirable to have τ small. On the other hand, the parameter ε measures the rate of unsuccessful audits which leads the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 to believe the extraction will fail. Therefore, one does not necessarily need to look for large values of ε, though, in practice, large ε afford more flexibility, for instance, if communication errors occur between the 𝖯𝗋𝗈𝗏𝖾𝗋 and the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋 during the verification procedure.

Definition 2.2 provides a strong security model, in the sense that (i) it does not require any bound on the response algorithms given by the 𝖯𝗋𝗈𝗏𝖾𝗋 and (ii) the probability in (2.1) is taken over fixed messages m (informally, it means the 𝖯𝗋𝗈𝗏𝖾𝗋 knows m).

However, keyed proofs of retrievability are usually insecure according to the security model given in Definition 2.2. For instance, in [11], Paterson, Stinson and Upadhyay noticed that in the Shacham–Waters scheme [14], given the knowledge of m and w, an unbounded 𝖯𝗋𝗈𝗏𝖾𝗋 may be able to

compute (or at least randomly guess) a key κ such that 𝖨𝗇𝗂𝗍⁢(m,κ)=w,
build m′≠m such that 𝖨𝗇𝗂𝗍⁢(m′,κ)=w′,
set 𝖯(w′)=r′ which (a) successfully passes every audit and (b) leads to the extraction of m′≠m.

Hence we choose to use a weaker but still realistic security model, where, informally, the 𝖯𝗋𝗈𝗏𝖾𝗋 only knows what he stores (that is, w) and has no information on the initial message m. The following security model thus remains conform with the one given by Paterson, Stinson and Upadhyay [11].

Definition 2.3 (Security model, weak version).

Let ε,τ∈[0,1]. A proof of retrievability (𝖪𝖾𝗒𝖦𝖾𝗇,𝖨𝗇𝗂𝗍,𝖵𝖾𝗋𝗂𝖿𝗒,𝖤𝗑𝗍𝗋𝖺𝖼𝗍) is weakly (ε,τ)-sound (or simply (ε,τ)-sound) if, for every polynomial-time prover 𝖯:𝒲→ℛ𝒬 and every uploaded file w∈𝒲, we have

(2.2)Pr⁡(𝖤𝗑𝗍𝗋𝖺𝖼𝗍⁢(r,κ)≠msucc⁡(r,κ)≥1-ε|m←Rℳκ←R𝖪𝖾𝗒𝖦𝖾𝗇⁢(1λ)w←𝖨𝗇𝗂𝗍⁢(m,κ)r←𝖯⁢(w))≤τ.

In equation (2.2), the randomness comes from pairs (m,κ)∈ℳ×𝒦 picked uniformly at random among those satisfying w=𝖨𝗇𝗂𝗍⁢(m,κ).

Since we deal with values of τ very close to 0, we also say that a strongly (ε,τ)-sound 𝖯𝗈𝖱 admits λ=-log2⁡(τ) bits of security against ε-adversaries.

Informally, saying that a 𝖯𝗈𝖱 is not weakly sound amounts to finding a polynomial-time deterministic algorithm 𝖯 which

takes, as input, a file w∈𝒲 and outputs a response word r∈ℛ𝒬,
makes the extraction fail with non-negligible probability (over messages m and keys κ such that the corresponding response words are successfully audited).

3 Our generic construction

Schematically, in the initialisation phase of our construction, the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋

encodes his file according to a code 𝒞,
scrambles the resulting codeword using a tuple of permutations over the base field,
uploads the result to the 𝖯𝗋𝗈𝗏𝖾𝗋.

As we explained in the introduction, the verification step then consists in checking that the server is still able to give answers that, once descrambled, satisfy low-weight parity-check equations for 𝒞.

For this purpose, we next introduce objects called verification structures for codes, which will be used in the definition of our generic 𝖯𝗈𝖱 scheme.

3.1 Verification structures: A tool for our PoR scheme

We here consider 𝔽q, the finite field with q elements. From well-known coding theory terminology, the support of a word w∈𝔽qn is supp(w):={i∈[1,n],wi≠0}, and its weight is wt(w):=|supp(w)|.

In this work, we need to consider codes whose alphabets are finite-dimensional spaces ℛ over 𝔽q, typically ℛ=𝔽qs. Precisely, a code 𝒞 of length n over ℛ is a subset of ℛn. A code 𝒞⊆ℛn is 𝔽q-linear if 𝒞 is a vector space over 𝔽q. When ℛ=𝔽q, we get the usual definition of linear codes over finite fields. Unless stated otherwise, we only consider 𝔽q-linear codes, that we will refer to as codes.

We usually denote by k the dimension over 𝔽q of a code 𝒞. Its minimum distancedmin⁢(𝒞) is the smallest Hamming distance between two distinct codewords. If n is the length of 𝒞, then dmin⁢(𝒞)/n∈[0,1] is the relative minimum distance of the code 𝒞, while k/n represents its rate. If 𝒞⊆𝔽qn, its dual code 𝒞⊥ is defined as {h∈𝔽qn,∑i=1nhi⁢ci=0⁢for all⁢c∈𝒞}. Codewords in 𝒞⊥ are also called parity-check equations for 𝒞.

Definition 3.1 (Verification structure).

Let 1≤ℓ≤n and 𝒞⊆𝔽qn be a code. Let also 𝒬 be a non-empty set of ℓ-subsets of [1,n]. Set ℛ=𝔽qℓ. We define the restriction mapR associated to 𝒬 as

R:𝒬×𝔽qn→ℛ,

(u,w)↦w|u.

Given an integer s≥1 and a map V:𝒬×ℛ→𝔽qs, we say that (𝒬,V) is a verification structure for 𝒞 if the following holds:

For all i∈[1,n], there exists u∈𝒬 such that i∈u.
For all u∈𝒬, the map 𝔽qn→𝔽qs given by a↦V⁢(u,R⁢(u,a)) is surjective and vanishes on the code 𝒞. Explicitly,
V⁢(u,R⁢(u,c))=0 for all⁢c∈𝒞.

The map V is then called a verification map for 𝒞, and the set 𝒬 a query set for 𝒞. By convention, for w∈𝔽qn and r∈ℛ𝒬, we define

R⁢(w):=(R(u,w):u∈𝒬)∈ℛ𝒬,V⁢(r):=(V(u,ru):u∈𝒬)∈(𝔽qs)𝒬.

Finally, the code R(𝒞):={R(c),c∈𝒞} is called the response code of 𝒞.

Example 3.2 (Fundamental example).

Let 𝒞 be a code, and let ℋ be a set of parity-check equations for 𝒞 of Hamming weight ℓ, whose supports are pairwise distinct. Define the query set 𝒬={supp(h),h∈ℋ} and, for any u∈𝒬, h⁢(u) to be the unique parity-check equation in ℋ whose support is u. Finally, we define a map V by

V:𝒬×ℛ→𝔽q,(u,r)↦∑i=1ℓh⁢(u)ui⁢ri.

Notice that we set s=1 here. By construction, it is clear that (𝒬,V) is a verification structure for 𝒞.

Example 3.3 (Toy example).

Let 𝒞⊆𝔽27 be a binary Hadamard code of length n=7 and dimension k=3. In other words, 𝒞 is defined by a parity-check matrix

H=(1110000100110010000110100110010100100110100010101).

According to Example 3.2, we define 𝒬 to be the set of supports of rows of H. In other words,

𝒬={{1,2,3},{1,4,5},{1,6,7},{2,5,6},{2,4,7},{3,4,6},{3,5,7}}.

Then the verification map V:𝒬×𝔽23→𝔽2 can be defined as follows. If u={u1,u2,u3}∈𝒬 and b∈𝔽2u is indexed according to u, then we define

V⁢(u,b)=∑i=13bui.

Now let m=(m1,m2,m3)∈𝔽23. The message m can be encoded into

c=(m1,m2,m1+m2,m3,m1+m3,m1+m2+m3,m2+m3)∈𝒞.

Hence the word r=R⁢(c)∈(𝔽23)7 is

r=((c1c2c3),(c1c4c5),(c1c6c7),(c2c5c6),(c2c4c7),(c3c4c6),(c3c5c7))=((m1m2m1+m2),(m1m3m1+m3),(m1m1+m2+m3m2+m3),(m2m1+m3m1+m2+m3),(m2m3m2+m3),(m1+m2m3m1+m2+m3),(m1+m2m1+m3m2+m3)).

For each vector-coordinate b∈𝔽23 of r=R⁢(c), one can now check that ∑jbj=0. Hence we get V⁢(R⁢(c))=0, as expected.

From now on, we denote by N=|𝒬| the length of the response code R⁢(𝒞) of a code 𝒞 equipped with a verification structure (𝒬,V).

3.2 Definition of our PoR scheme

Let (𝒬,V) be a verification structure for 𝒞⊆𝔽qn, and let σ∈𝔖⁢(𝔽q)n, where 𝔖⁢(𝔽q) denotes the set of permutations over 𝔽q. Any n-tuple of permutations σ=(σ1,…,σn)∈𝔖⁢(𝔽q)n naturally acts on c∈𝔽qn by

σ⁢(c)↦(σ1⁢(c1),…,σn⁢(cn)),

and we define σ(𝒞)={σ(c),c∈𝒞}. Let finally

Vσ:𝒬×𝔽qℓ→𝔽qs,

(u,y)↦V⁢(u,σ|u-1⁢(y)),

where σ|u-1⁢(y)=(σu1-1⁢(y1),…,σuℓ-1⁢(yℓ)). The map Vσ has been defined in order to satisfy

Vσ⁢(u,R⁢(u,σ⁢(c)))=V⁢(u,R⁢(u,c))

for every (c,u)∈𝒞×𝒬.

Based on this, our 𝖯𝗈𝖱 construction is given in Figure 2.

$Figure 2 Definition of our 𝖯𝗈𝖱{\mathsf{PoR}} scheme.$

Figure 2

Definition of our 𝖯𝗈𝖱 scheme.

$Figure 3 Our extraction procedure 𝖤𝗑𝗍𝗋𝖺𝖼𝗍⁢(r,σ){\mathsf{Extract}(r,\sigma)}.$

Figure 3

Our extraction procedure 𝖤𝗑𝗍𝗋𝖺𝖼𝗍⁢(r,σ).

3.3 Analysis

3.3.1 Preliminary results

We first give results concerning verification structures and response codes. The following two lemmata are straightforward to prove.

Lemma 3.4.

Let (Q,V) be a verification structure for a code C⊆Fqn. Then (Q,Vσ) is a verification structure for σ⁢(C).

Lemma 3.5.

Let Q be any query-set for a code C⊆Fqn whose elements have cardinality ℓ≥1. Then its response code R⁢(C) is an Fq-linear code over the alphabet R≃Fqℓ.

Remark 3.6.

By considering σ⁢(𝒞) instead of 𝒞, we loose the 𝔽q-linearity, but one can check that verification structures still make sense and provide the result claimed in Lemma 3.4.

The next result states that the map 𝒞↦σ⁢(𝒞) does not modify the distance between codewords.

Lemma 3.7.

Let C⊆Fqn be a linear code, (Q,V) a verification structure for C, and σ∈S⁢(Fq)n. Then it holds that

the distribution of distances in 𝒞 and σ⁢(𝒞) are the same,
the distribution of distances in R⁢(𝒞) and R⁢(σ⁢(𝒞)) are the same.

Proof.

Since every σi is one-to-one, for any c,c′∈𝒞, we get

d⁢(c,c′)=|{i∈[1,n],ci≠ci′}|=|{i∈[1,n],σi(ci)≠σi(ci′)}|=d⁢(σ⁢(c),σ⁢(c′)).

The proof for response codes relies on the same argument. ∎

Remark these results imply that, if 𝒞 is linear, then the minimum distance of R⁢(σ⁢(𝒞)) is the minimum weight of R⁢(𝒞).

Definition 3.8.

Let ε∈[0,1] and (𝒬,V) be a verification structure for a code 𝒞⊆𝔽qn. We say r∈ℛ𝒬 is ε-close to (𝒬,V) if

wt(V(r)):=|{u∈𝒬,V(u,ru)≠0}|≤εN.

Let now c∈𝒞 and β∈[0,1]. We say that r∈ℛ𝒬 is a β-liar for (𝒬,V,c) if

|{u∈𝒬,V(u,ru)=0andru≠R(u,c)}|≤βN.

Bounded-distance error-and-erasure decoder

Let 𝒜⊆𝔽qn be any code of minimum distance d, and let a∈𝒜 be corrupted with b errors and e erasures, resulting in a word r′∈(𝔽q∪{⊥})n. Then it is well known that, as long as 2⁢b+e<d, it is possible to retrieve a from r′ thanks to a so-called bounded-distance error-and-erasure decoding algorithm. This is precisely the decoding algorithm that we employ in Figure 3 on the code 𝒜=R⁢(𝒞).

Our framework allows us to reformulate the extraction success in terms of a probability to decode corrupted codewords. More precisely:

Proposition 3.9.

Let σ∈S⁢(Fq)n, m∈Fqk, and denote by d the minimum distance of R⁢(C) of length N. Let also r∈RQ be the response word, output of a proving algorithm P taking w=σ⁢(C⁢(m)) as input. Finally, assume that r is ε-close to (Q,Vσ) and a β-liar for (Q,Vσ,w), with (ε+2⁢β)⁢N<d. Then Extract⁢(r,σ)=m, where Extract⁢(r,σ) is defined in Figure 3.

Proof.

Recall that r′∈(ℛ∪{⊥})𝒬 represents the word we get from r after step (ii) of the algorithm given in Figure 3. Let us now translate our assumptions on r in coding-theoretic terminology:

r is ε-close to (𝒬,Vσ) means that there are at most ε⁢N challenges u∈𝒬 for which we know that the coordinate ru′ is not authentic. This justifies that we assign erasure symbols to these coordinates.
r is a β-liar for (𝒬,V,c) means that there are at most β⁢N other corrupted values ru′, but we cannot identify them. Therefore, we can assimilate these coordinates to errors.

To sum up, we see r′ as a corruption of R⁢(𝒞⁢(m)) with at most ε⁢N erasures and at most β⁢N errors, where N=|𝒬|. Since we assume that (ε+2⁢β)⁢N<d, we know from the previous discussion that the decoding succeeds to retrieve m. ∎

3.3.2 Bounding the extraction failure

According to Definition 2.3, our 𝖯𝗈𝖱 scheme is weakly (ε,τ)-sound if, for every polynomial-time algorithm 𝖯 outputting a response word r(w) from a file w, we have

Prσ,m⁡(decodingr(w)⁢into⁢m⁢failswt(Vσ(r(w)))≤εN|m←R𝔽qkσ←R𝔖⁢(𝔽q)nw=σ⁢(𝒞⁢(m)))≤τ.

Using Proposition 3.9, the security analysis of our 𝖯𝗈𝖱 scheme reduces to measuring the ability of the 𝖯𝗋𝗈𝗏𝖾𝗋 to produce a response word r which is ε-close to (𝒬,Vσ) and a β-liar for (𝒬,Vσ,w), with (ε+2⁢β)⁢N≥d.

For fixed r∈ℛ𝒬, σ∈𝔖⁢(𝔽q)n and w=σ⁢(𝒞⁢(m)) the authentic file given to the prover, we define three subsets of 𝒬:

𝒟(r,w):={u∈𝒬,ru≠R(w)u} and D(r,w):=|𝒟(r,w)|=wt(r-R(w)). This represents challenges u on which the response word r differs from the authentic one R⁢(w).
ℰ(r,σ):={u∈𝒬,Vσ(u,ru)≠0} and E(r,σ):=|ℰ(r,σ)|=wt(Vσ(r)). These are challenges u on which the associated coordinate ru is not accepted by the verification map (it corresponds to erasures in the decoding process).
ℬ(r,σ,w):={u∈𝒬,ru≠R(w)uandVσ(u,ru)=0} and B(r,σ,m):=|ℬ(r,σ,m)|. These are the challenges u on which the associated coordinate ru is accepted by the verification map, but differs from the authentic response su (it corresponds to errors in the decoding process).

One can easily check that, for every σ, the sets ℰ⁢(r,σ) and ℬ⁢(r,σ,w) define a partition of 𝒟⁢(r,w). The probability of extraction failure can thus be written as

(3.1)Pr⁡(2⁢D⁢(r,w)-E⁢(r,σ)≥dmin⁢(R⁢(𝒞))E⁢(r,σ)≤ε⁢N|m←R𝔽qkσ←R𝔖⁢(𝔽q)nw=σ⁢(𝒞⁢(m))).

For w∈𝔽qn , let us define the set of admissible permutations and messages

Φw:={(σ,m)∈𝔖(𝔽q)n×𝔽qk,w=σ(𝒞(m))},

so that equation (3.1) rewrites

Pr⁡(2⁢D⁢(r,w)-E⁢(r,σ)≥dmin⁢(R⁢(𝒞))E⁢(r,σ)≤ε⁢N|(σ,m)←RΦw).

Later on, we will use the notation PrΦw to refer to the fact that (σ,m) is uniformly drawn from Φw. Similarly we will use notation 𝔼Φw for the expectancy and VarΦw for the variance.

Given r∈ℛ𝒬, we also define

α(r,w):=maxu∈𝒟⁢(r,w)PrΦw(Vσ(u,ru)=0)

and α:=max(r,w)α(r,w), where (r,w) are such that D⁢(r,w)≠0. The parameter α∈(0,1) is called the bias of the verification structure (𝒬,V) for 𝒞. It corresponds to the maximum probability that a response is accepted but not authentic.

Lemma 3.10.

For all r∈RQ and w∈Fqn, we have

𝔼Φw⁢(E⁢(r,σ))≥(1-α)⁢D⁢(r,w).

Proof.

A simple computation shows

𝔼Φw⁢(E⁢(r,σ))=𝔼Φw⁢(∑u∈𝒟⁢(r,w)𝟙Vσ⁢(u,ru)≠0)=∑u∈𝒟⁢(r,w)PrΦw⁡(Vσ⁢(u,ru)≠0)≥∑u∈𝒟⁢(r,w)(1-α)≥(1-α)⁢D⁢(r,w).∎

Lemma 3.10 essentially means that, if an adversary to our 𝖯𝗈𝖱 scheme wants its response word to be (in average) ε-close to the verification structure, then he should modify at most D⁢(r,w)≤ε⁢N1-α responses. Below, we take advantage of this result, and we measure the probability of an extraction failure.

First, for δ,ε∈(0,1), let

p⁢(r,w;ε,δ):=PrΦw(2D(r,w)-E(r,σ)≥δNandE(r,σ)≤εN)=PrΦw⁡(E⁢(r,σ)≤min⁡{ε⁢N,2⁢D⁢(r,w)-δ⁢N}).

The probability p⁢(r,w;ε,δ) represents the probability that the extraction fails for a response code of relative distance δ and an adversarial response word r associated to w, which is ε-close to the verification structure. Let us bound p⁢(r,w;ε,δ).

Proposition 3.11.

Let δ,ε∈(0,1) such that δ⁢1-α1+α>ε. Let also r∈RQ and w∈Fqn. Then we have

p⁢(r,w;ε,δ)≤VarΦw⁢(E⁢(r,σ))(1+α2⁢(δ⁢1-α1+α-ε))2⁢N2.

Proof.

We distinguish three cases.

(i) 2⁢D⁢(r,w)-δ⁢N<0. The event E⁢(r,σ)≤min⁡{ε⁢N,2⁢D⁢(r,w)-δ⁢N} never occurs since E⁢(r,σ)≥0. Hence p⁢(r,w;ε,δ)=0.

(ii) ε⁢N≤2⁢D⁢(r,w)-δ⁢N. The inequality E⁢(r,σ)≤ε⁢N implies

E⁢(r,σ)-𝔼Φw⁢(E)≤ε⁢N-(1-α)⁢D⁢(r,w)≤ε⁢N-(1-α)⁢ε+δ2⁢N≤-1+α2⁢(δ⁢1-α1+α-ε)⁢N.

Hence, using Chebychev’s inequality,

p⁢(r,w;ε,δ)=PrΦw⁡(E⁢(r,σ)≤ε⁢N)≤PrΦw⁡(|E⁢(r,σ)-𝔼Φw⁢(E)|≥1+α2⁢(δ⁢1-α1+α-ε)⁢N)≤VarΦw⁢(E⁢(r,σ))(1+α2⁢(δ⁢1-α1+α-ε))2⁢N2.

(iii) 0≤2⁢D⁢(r,w)-δ⁢N<ε⁢N. In this case, E⁢(r,σ)≤2⁢D⁢(r,w)-δ⁢N implies

E⁢(r,σ)-𝔼Φw⁢(E)≤(1+α)⁢D⁢(r,w)-δ⁢N≤(1+α)⁢ε+δ2⁢N-δ⁢N≤-1+α2⁢(δ⁢1-α1+α-ε)⁢N.

Therefore, similarly to the previous case, we obtain the claimed result. ∎

For any u∈𝒟⁢(r,w), denote by Xu the {0,1}-random variable “𝟙Vσ⁢(u,ru)=0” when σ is uniformly drawn from Φw. It holds that E⁢(r,σ)=∑u∈𝒟⁢(r,w)(1-Xu).

Recall that two real random variables Y,Z are uncorrelated if 𝔼⁢(Y⁢Z)=𝔼⁢(Y)⁢𝔼⁢(Z). For instance, two independent random variables are uncorrelated.

Lemma 3.12.

Let r∈RQ and w∈Fqn. If the random variables {Xu}u∈D⁢(r,w) are pairwise uncorrelated, then

VarΦw⁢(E⁢(r,σ))≤D⁢(r,w).

Proof.

By assumption, {Xu}u∈𝒟⁢(r,w) are pairwise uncorrelated; hence

VarΦw⁢(E⁢(r,σ))=∑u∈𝒟⁢(r,w)VarΦw⁢(1-Xu).

The trivial bound VarΦw⁢(1-Xu)≤1 gives the result. ∎

As a corollary of Proposition 3.11 and Lemma 3.12, under the same hypothesis and assuming δ⁢1-α1+α>ε, we get

p⁢(r,w;ε,δ)≤4N⁢((1-α)⁢δ-(1+α)⁢ε)2

since D⁢(r,w)≤N. Moreover, if limN→∞⁡δ>0 and limN→∞⁡α=0, then p⁢(r,w;ε,δ)=𝒪⁢(1/N).

Therefore, we end up with the following theorem.

Theorem 3.13.

Let (Q,V) be a verification structure for C with bias α. Let N=|Q|, and let δ=dmin⁢(R⁢(C))/N be the relative distance of the associated response code. Finally, assume that, for any r∈RQ and any w∈Fqn, the variables {Xu}u∈D⁢(r,w) are pairwise uncorrelated. Then, for any ε<δ⁢1-α1+α, the PoR scheme associated to C and (Q,V) is (ε,τ)-sound, where

τ=4N⁢((1-α)⁢δ-(1+α)⁢ε)2.

For asymptotically small α, a code 𝒞 equipped with a verification structure satisfying the conditions of Theorem 3.13 thus gives an (ε,τ)-sound 𝖯𝗈𝖱 scheme for every ε<(1+o⁢(1))⁢δ and τ=𝒪⁢(1/N).

According to Theorem 3.13, we thus need to look for (sequences of) codes 𝒞 and associated verification structures (𝒬,V) such that

the response code R⁢(𝒞) admits a good relative distance δ=dmin⁢(R⁢(𝒞))/N,
the bias α is small,
random variables {Xu}u∈𝒟⁢(r,w) are pairwise uncorrelated.

Sections 3.4 and 3.5 characterise conditions under which the last two points are fulfilled. Then, in Section 5, we discuss which response codes can achieve good relative distance.

3.4 Estimating α

In this section, we prove that, assuming Φw approximates the uniform distribution over 𝔖⁢(𝔽q)n in a sense that we make precise later, the bias α can be bounded according to parameters of the verification structure.

Let us fix r∈ℛ𝒬, w∈𝔽qn and u∈𝒬. We recall that α is defined by

α=maxr,w⁡maxu∈𝒟⁢(r,w)⁡PrΦw⁡(Vσ⁢(u,ru)=0),

where randomness comes from σ←RΦw={(σ,m)∈𝔖(𝔽q)n×𝔽qk,w=σ(𝒞(m))}. We notice that this is equivalent to write σ←R{σ∈𝔖(𝔽q)n,σ-1(w)∈𝒞}.

For convenience, we will view ru∈ℛ=𝔽qℓ as a vector indexed by u=(u1,…,uℓ), so that we can easily denote by ru⁢[uj]∈𝔽q its j-th coordinate, 1≤j≤ℓ. We define the code Ku:=kerV(u,⋅)⊆𝔽qℓ, and up to re-indexing coordinates, 𝒞|u⊆Ku. This allows us to write that, for every σ, we have Vσ⁢(u,ru)=0 if and only if σu-1⁢(ru)∈Ku. Finally, we denote by Zu:={i∈u,ru[i]≠R(w)u[i]} the set of coordinates of ru that are not authentic.

Let Yu⁢(σ) represent the event “σu-1(ru)∈Ku∣supp(σu-1(ru))=Zu”. Informally, the reason why we consider an event Yu⁢(σ) conditioned by supp⁡(σu-1⁢(ru))=Zu is that the 𝖯𝗋𝗈𝗏𝖾𝗋 is free to choose any support Zu on which he can modify the original file. More formally, this constraint will help us to bound the probability PrΦw⁡(Vσ⁢(u,ru)=0) in Lemma 3.14. We say that Φw is sufficiently uniform if, for every u∈𝒬, we have

γu:=Pr⁡[Yu⁢(σ)∣σ←RΦw]-Pr⁡[Yu⁢(σ)∣σ←R𝔖⁢(𝔽q)n]Pr⁡[Yu⁢(σ)∣σ←R𝔖⁢(𝔽q)n]=o(1)

when the file size n⁢log⁡q→∞. In other words, Φw is sufficiently uniform if it is a good approximation of the whole set of n-tuples of permutations, when considering the probability that Yu⁢(σ) happens.

Lemma 3.14.

Let r, w, u and Zu be defined as above. Let also Au=|{x∈Ku,supp(x)=Zu}|. Then

PrΦw⁡(Vσ⁢(u,ru)=0)≤(1+γu)⁢Au(q-1)|Zu|.

Proof.

For every σ such that (σ,m)∈Φw, we know that σu-1⁢(R⁢(w)u)∈Ku, and we recall that Vσ⁢(u,ru)=0 if and only if σu-1⁢(ru)∈Ku. Since Ku is linear, and up to considering σu-1⁢(R⁢(w)u-ru) instead, we can assume without loss of generality that σu-1⁢(ru)⁢[i]=0 for every i∈u∖Zu. In other words, we assume that supp⁡(σu-1⁢(ru))=Zu.

Remark that

Prσ←R𝔖⁢(𝔽q)n[σu-1(ru)∈Ku∣supp(σu-1(ru))=Zu]=Prx←R𝔽qℓ⁡[x∈Ku∣supp⁡(x)=Zu]=Prx←R𝔽qℓ⁡[x∈Ku∣supp⁡(x)=Zu]=Au(q-1)|Zu|

since Au counts the number of codewords in Ku whose support is Zu.

Therefore, we get

PrΦw⁡(Vσ⁢(u,ru)=0)≤PrΦw⁡[Vσ⁢(u,ru)=0∣supp⁡(σu-1⁢(ru))=Zu]=(1+γu)⁢Pr𝔖⁢(𝔽q)n⁡[Vσ⁢(u,ru)=0∣supp⁡(σu-1⁢(ru))=Zu]=(1+γu)⁢Prx←R𝔽qℓ⁡[x∈Ku∣supp⁡(x)=Zu]=(1+γu)⁢Au(q-1)|Zu|.∎

Lemma 3.15.

Let Su be the Fq-vector space 〈{x∈Ku,supp(x)=Zu}〉, and assume that Su≠{0}. We have

Au≤q|Zu|-dmin⁢(Su)+1.

Proof.

We prove that, if Au>qe for some integer e≥0, then dmin⁢(Su)≤|Zu|-e, which clearly induces our result. If Au>qe, then dim⁡Su>e since |Su|≥Au. The Singleton bound then provides

dmin⁢(Su)≤|Zu|-dim⁡Su+1≤|Zu|-e.∎

Finally, we get the following upper bound on α.

Proposition 3.16.

Let Δ=min⁡{dmin⁢(Ku),u∈Q}. Then

α≤(1+γ)⁢(1+1q-1)ℓ⁢q-Δ+1,

where γu=max⁡γu.

Proof.

Remark that Su, defined in previous lemma, is a subcode of Ku shortened on u∖Zu. Hence

dmin⁢(Ku)≤dmin⁢(Su),

and we can apply previous results and obtain the desired bound

α≤maxu,r⁡(1+γu)⁢(qq-1)|Zu|⁢q-dmin⁢(Ku)+1≤(1+γ)⁢(1+1q-1)ℓ⁢q-Δ+1,

where γ=maxu⁡γu. ∎

If every Φw is sufficiently uniform, then, by definition, we have γ=o⁢(1) when the file size n⁢log⁡q→∞. This assumption is significant since we desire to have a small bias α, which is deeply linked to the soundness of 𝖯𝗈𝖱s (see Theorem 3.13). In Appendix A, we present experimental estimates of α, validating that the assumption that Φw is sufficiently uniform.

3.5 Pairwise uncorrelation of {Xu}u∈𝒟

This section is devoted to proving that variables {Xu}u∈𝒟⁢(r,w) are pairwise uncorrelated if the supports of challenges u∈𝒟⁢(r,w) have small pairwise intersection. For this purpose, let us recall that, for fixed r∈ℛ𝒬, w and u∈𝒟⁢(r,w), the random variable Xu represents 𝟙Vσ⁢(u,ru)=0 when σ is uniformly picked in Φw.

We first state a technical lemma that will be useful to prove Proposition 3.18 below. For clarity, we denote by d⊥⁢(𝒞) the minimum distance of the dual code 𝒞⊥ of a linear code 𝒞.

Lemma 3.17.

Let C⊆Fqn be a linear code and T⊂[1,n], |T|=t, where t<d⊥⁢(C). For a∈FqT, we define

𝒱a={c∈𝒞,c|T=a} 𝑎𝑛𝑑 Na=|𝒱a|.

Then

𝒱0={v∈𝒞,v|T=0} is a linear subcode of 𝒞;
for every non-zero a∈𝔽qT, there exists a non-zero c(a)∈𝒞 such that 𝒱a=𝒱0+{c(a)};
for every a∈𝔽qT, Na=qk-t, where k=dim⁡𝒞.

Proof.

(i) The fact that 𝒱0={v∈𝔽qX,v|T=0} is actually the well-known definition of the shortening of a code. It is easy to prove that it defines a linear code.

(ii) Let a∈𝔽qT be non-zero, and let us first prove that there exists c(a)∈𝒞 such that c|T(a)=a. If it were not the case, then, by definition, we would have 𝒞|T≠𝔽qt. But this is impossible since 𝒞⊥ contains no non-zero codeword of weight less that t. It is then easy to check that 𝒱a=𝒱0+{c(a)}.

(iii) First notice that 𝒱a∩𝒱b=∅ if a≠b. Since

𝒞=⋃a∈𝔽qt𝒱a,

we get the expected result. ∎

Proposition 3.18.

If max⁡{|u∩v|,u≠v∈Q}<min⁡{d⊥⁢(C|u),u∈Q}, then the random variables {Xu}u∈Q are pairwise uncorrelated.

Proof.

Recall that Ku:=kerV(u,⋅) and that, by definition of a verification structure, we have 𝒞|u⊆Ku. For u≠v∈𝒬, let us prove that 𝔼⁢(Xu⁢Xv)=𝔼⁢(Xu)⁢𝔼⁢(Xv). First,

𝔼⁢(Xu⁢Xv)=Pr⁡(Vσ⁢(u,ru)=0⁢and⁢Vσ⁢(v,rv)=0)=Pr⁡(σ-1⁢(ru)|u∈Ku⁢and⁢σ-1⁢(rv)|v∈Kv).

Denote t=|u∩v|, and let (𝐚,𝐛)∈(𝔽qt)2. We denote by Z⁢(σ,𝐚,𝐛) the event

σ-1⁢(ru)|u∩v=𝐚 and σ-1⁢(rv)|u∩v=𝐛.

We first notice that {σ|u∩v-1,σ∈Φw}=𝔖(𝔽q)t. Indeed, we can here use an argument similar to the proof of Lemma 3.17: the constraint σ-1⁢(w)∈𝒞 is ineffective on σ|u∩v-1 since |u∩v|≤t<d⊥⁢(𝒞|z) for every z∈𝒬. Therefore, for every (𝐚,𝐛)∈(𝔽qt)2, we have

Pr⁡(Z⁢(σ,𝐚,𝐛))=q-2⁢t,

and it follows that

𝔼⁢(Xu⁢Xv)=1q2⁢t⁢∑𝐚,𝐛∈(𝔽qt)2Pr⁡(σ-1⁢(ru)|u∈Ku⁢and⁢σ-1⁢(rv)|v∈Kv∣Z⁢(σ,𝐚,𝐛)).

Recall now that t<min⁡{d⊥⁢(𝒞|u),u∈𝒬}≤min⁡{d⊥⁢(Ku),u∈𝒬}. Hence, for fixed 𝐚 and 𝐛, the variables σ-1(ru)|u∈Ku∣Z(σ,𝐚,𝐛) and σ-1(rv)|v∈Kv∣Z(σ,𝐚,𝐛) are independent (once again, it is a consequence of the structure results of Lemma 3.17). Therefore,

𝔼⁢(Xu⁢Xv)=1q2⁢t⁢∑𝐚,𝐛∈(𝔽qt)2Pr⁡(σ-1⁢(ru)|u∈Ku∣Z⁢(σ,𝐚,𝐛))⁢Pr⁡(σ-1⁢(rv)|v∈Kv∣Z⁢(σ,𝐚,𝐛)).

Then

𝔼⁢(Xu⁢Xv)=1q2⁢t⁢∑𝐚,𝐛∈(𝔽qt)2Pr⁡(σ-1⁢(ru)|u∈Ku∣σ-1⁢(ru)|u∩v=𝐚)⁢Pr⁡(σ-1⁢(rv)|v∈Kv∣σ-1⁢(rv)|u∩v=𝐛),

and we conclude since

𝔼⁢(Xu)=q-t⁢∑𝐚∈𝔽qtPr⁡(σ-1⁢(ru)|u∈Ku∣σ-1⁢(ru)|u∩v=𝐚).∎

4 Performance

4.1 Efficient scrambling of the encoded file

In the 𝖯𝗈𝖱 scheme we propose, the storage cost of an n-tuple of permutations in 𝔖⁢(𝔽q)n is excessive since it is superlinear in the original file size. In this subsection, we propose a storage-efficient way to scramble the codeword c∈𝒞 produced by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋.

Precisely, we want to define a family of maps (σ(κ))κ, where σ(κ):𝒞→𝔽qn, c↦w∈𝔽qn, with the following requirements:

For every κ, the map σ(κ) is efficiently computable and requires a low storage.
For every κ and every c∈𝒞, if w=σ(κ)⁢(c), then, for every i∈[1,n], the local inverse map wi↦ci is efficiently computable.
If κ is randomly generated but unknown, then, given the knowledge of w=σ(κ)⁢(c) and 𝒞, it is hard to produce a response word r∈ℛ𝒬 such that, for many u∈𝒬, both Vσ(κ)⁢(u,ru)=0 and ru≠w|u hold. To be more specific and in light of the security analysis of Section 3.3, we require that it is hard to distinguish σ(κ)⁢(c) from a random (z1,…,zn)∈𝔽qn, where symbols zi are picked independently and uniformly at random.

We here propose to derive σ(κ) from a suitable block cipher, yielding the explicit construction given below. Of course, other proposals can be envisioned.

The construction

Let IV denote a random initialisation vector for AES in CTR mode (IV could be a nonce concatenated with a random value). Vector IV is kept secret by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋, as well as a randomly chosen key κ for the cipher. Let also f be a permutation polynomial over 𝔽q of degree d>1. For instance, one could choose f⁢(x)=xd with gcd⁡(d,q-1)=1. Notice that polynomial f can be made public.

Let s=⌊256⌈log2⁡q⌉⌋ be the number of 𝔽q-symbols one can store in a 256-bit word^[2]. Up to appending a few random bits to c, we assume that s∣n, and we define t=n/s. Let us fix a partition of [1,n] into s-tuples i=(i1,…,is); it can be, for instance, (1,…,s), (s+1,…,2⁢t),…,((t-1)⁢s+1,…,n). Notice that this partition does not need to be chosen at random. Given c=(c1,…,cn)∈𝒞 and i an element of the above partition, we now define

bi=(f⁢(ci1)⁢∣…∣⁢f⁢(cis))⊕AESκ⁢(IV⊕i)∈{0,1}256.

If log2⁡q∤256, trailing zeroes can be added to evaluations of f. Finally, the pseudo-random permutation σ is defined by

σ(c):=(b1,…,bt).

Design rationale

AES is a natural choice when one needs a (secret-)keyed pseudo-random permutation. Also notice that, with this construction, one only needs to store the key κ and the vector IV since the other objects (the polynomial f, the partition) are made public. Hence our objectives in terms of storage are met.

We now point out the necessity to use i as a part of the input of the AES cipher. Assume that we do not. Then the local permutation σj, 1≤j≤n, would not depend on j. As a consequence, for a certain class of codes, the local verification map ru↦Vσ⁢(u,ru) would not depend on u, and a malicious 𝖯𝗋𝗈𝗏𝖾𝗋 would then be able to produce accepted answers while storing only a small piece of the file w (e.g., w|u for only one u∈𝒬).

Another mandatory feature is the non-linearity of the permutation polynomial f. Indeed, assume, for instance, that f=id. Then, given the knowledge of w=σ⁢(c), it would be very easy for a malicious 𝖯𝗋𝗈𝗏𝖾𝗋 to produce a word w′≠w such that r′=R⁢(w′) is always accepted by the 𝖵𝖾𝗋𝗂𝖿𝗂𝖾𝗋. Simply, the 𝖯𝗋𝗈𝗏𝖾𝗋 defines w′=w+c′, where c′ is any non-zero codeword of 𝒞. Hence one sees that the polynomial f must be non-linear in order to prevent such kind of attacks.

4.2 Parameters

We here consider a 𝖯𝗈𝖱 built upon a code 𝒞⊆𝔽qn with verification structure (𝒬,V) satisfying ℛ=𝔽qℓ and V⁢(ℛ)=𝔽qs. We also assume that we use an n-tuple of pseudo-random permutations as described in the previous subsection.

Communication complexity

At each verification step, the client sends an ℓ-tuple of coordinates (u1,…,uℓ), ui∈[1,n]. The server then answers with corresponding symbols wui∈𝔽q. Therefore, the upload communication cost is ℓ⁢log2⁡n bits, while the download communication cost is ℓ⁢log2⁡q, thus a total of ℓ⁢(log2⁡n+log2⁡q) bits.

Computation complexity

In the initialisation phase, following the encryption described in Section 4.1, the client essentially has

to compute the codeword c∈𝒞 associated to its message,
to make n evaluations of the permutation polynomial f over 𝔽q,
to compute t=n⁢log2⁡q256 AES ciphertexts to produce the word w to be sent to the server.

Given a generator matrix of 𝒞, the codeword c can be computed in 𝒪⁢(k⁢n) operations over 𝔽q with a matrix-vector product. Notice that quasi-linear-time encoding algorithms exist for some classes of codes. Besides, if a monomial or a sparse permutation polynomial is used, then the cost of each evaluation is 𝒪⁢((log2⁡q)3). If we denote by c the bitcost of an AES encryption, we get a total bitcost of 𝒪⁢(n⁢k⁢(log2⁡q)2+n⁢(log2⁡q)3+c⁢n⁢log2⁡q) for the initialisation phase. Recall this is a worst-case scenario in which the encoding process is inefficient.

At each verification step, an honest server only needs to read ℓ symbols from the file it stores. Hence its computation complexity is 𝒪⁢(ℓ). The client has to compute a matrix-vector product over 𝔽q, where the matrix has size s×ℓ and the vector has size ℓ, thus a computation cost of 𝒪⁢(ℓ⁢s) operations over 𝔽q.

Storage needs

The client stores 2×256 bits for secret material κ and IV to use in AES. The server storage overhead exactly corresponds to the redundancy of the linear code 𝒞, that is, (n-dim⁡𝒞)⁢log2⁡q bits.

Other features

Our 𝖯𝗈𝖱 scheme is unbounded-use since every challenge reveals nothing about the secret data held by the client. It does not feature dynamic updates of files. Though, we must emphasise that the file w the client produces can be split among several servers, and the verification step remains possible even if the servers do not communicate with each other. Indeed, computing a response to a challenge does not require mixing distinct symbols wi of the uploaded file. Therefore, our scheme is well suited for the storage of large static distributed databases. Parameters of the 𝖯𝗈𝖱 schemes we propose are reported in Figure 4.

Figure 4

Summary of parameters of our 𝖯𝗈𝖱 construction for an original file of size k⁢log2⁡q bits and a code 𝒞 of dimension k over 𝔽q equipped with a verification structure (𝒬,V) such that |u|=ℓ and rank⁡V⁢(u,⋅)≤s for all u∈𝒬.

5 Instantiations

In this section, we present several instantiations of our 𝖯𝗈𝖱 construction. We first recall basics and notation from coding theory.

The code Rep⁢(ℓ)⊆𝔽qℓ denotes the repetition code 〈(1,…,1)〉. We recall that Rep⁢(ℓ)⊥ is the parity code Par(ℓ):={c∈𝔽qℓ,∑i=1ℓci=0}. Let 𝒞,𝒞′ be two linear codes over 𝔽q of respective parameters [n,k,d] and [n,k′,d′]. Their tensor product 𝒞⊗𝒞′ is the 𝔽q-linear code generated by words

(cicj′:1≤i≤n, 1≤j≤n′)∈𝔽qn⁢n′.

It has dimension k⁢k′ and minimum distance d⁢d′. We also denote by

𝒞⊗s:=𝒞⊗…⊗𝒞⏟s⁢ times⊆𝔽qns

the s-fold tensor product of 𝒞 with itself.

5.1 Tensor-product codes

The upcoming subsection illustrates our construction with a non practical but simple instance. The next ones lead to practical 𝖯𝗈𝖱 instances.

5.1.1 A simple but non-practical instance

Let n=N⁢ℓ and 𝒬={ui={iℓ+1,iℓ+2,…,(i+1)ℓ},i∈[0,N-1]}. The set 𝒬 defines a partition of [1,n]. We define the code

𝒞={c∈𝔽qn,∑j∈ucj=0for allu∈𝒬}⊆𝔽qn.

In other words, 𝒞=Par⁢(ℓ)⊗𝔽qN, and a parity-check matrix H for 𝒞 is given by

H=(1⋯10⋯⋯⋯⋯⋯00⋯01⋯1⋱⋮⋮⋮⋮⋱⋱00⋯⋯⋯⋯⋯01⋯1).

The verification map V:𝒬×𝔽qℓ→𝔽q is defined by V(u,b):=∑j=1ℓbuj for all (u,b)∈𝒬×𝔽qℓ. By construction (see the fundamental Example 3.2), the pair (𝒬,V) defines a verification structure for 𝒞.

Lemma 5.1.

Let C=Par⁢(ℓ)⊗FqN as above. Then the response code R⁢(C) has minimum distance 1.

Proof.

We see that the restriction map R sends the codeword (1,-1,0,0,…,0)∈𝒞 to a word of weight 1. Besides, R is injective, so dmin⁢(R⁢(𝒞))>0. ∎

Since δ=dmin⁢(R⁢(𝒞))/N=1/N→0 when N goes to infinity, an attempt to build a 𝖯𝗈𝖱 scheme from 𝒞 cannot be practical.

5.1.2 Higher order tensor-product codes

Let 𝒜⊆𝔽qℓ be a non-degenerate [ℓ,k𝒜,d𝒜]q-linear code, and define 𝒞=𝒜⊗s⊆𝔽qn, where n=ℓs. Notice that it will be more convenient to see coordinates of words w∈𝔽qn as elements of [1,ℓ]s.

For 𝐚∈[1,ℓ]s and 1≤i≤s, we define Li,𝐚⊂[1,ℓ]s, the “i-th axis-parallel line with basis 𝐚”, as

Li,𝐚:={𝐱∈[1,ℓ]ssuch thatxj=ajfor allj≠i}.

By definition of 𝒞, a word c lies in 𝒞 if and only if, for every L=Li,𝐚, the restriction c|L∈𝒜. This means that we can define

a set of queries 𝒬={Li,𝐚,i∈[1,s],𝐚∈[1,ℓ]s},
a verification map
V:𝒬×ℛ→𝔽qℓ-k𝒜,
(L,r)↦H⁢r,
where H is a parity-check matrix for 𝒜 whose columns are ordered according to the line L.

By the previous discussion, it is clear that c∈𝒞 implies that V⁢(L,c|L)=0 for every L∈𝒬 (in fact, these two assertions are equivalent). Hence (𝒬,V) defines a verification structure for 𝒞, and we have N=|𝒬|=s⁢ℓs-1.

Lemma 5.2.

Let C=A⊗s as above. Then R⁢(C) has minimum distance s⋅dAs-1.

Proof.

Let us first prove that the minimum distance of R⁢(𝒞) is larger than s⋅d𝒜s-1. Let r=R⁢(c)∈R⁢(𝒞), and assume r≠0. Then there exists L∈𝒬 such that 0≠rL=c|L∈𝒜. Therefore, c𝐱≠0 for some 𝐱∈L⊂[1,ℓ]s. Consider the set

Si,𝐱={𝐲∈[1,ℓ]s,yi=xi}.

Very informally, the set Si,𝐱 corresponds to the hyperplane passing through 𝐱 and “orthogonal” to the i-th axis. By definition of 𝒞=𝒜⊗s, we know that c|Si,𝐱∈𝒜⊗(s-1)∖{0} for every 1≤i≤s. Let

Ui=supp⁡(c|Si,𝐱)={𝐮(i,1),…,𝐮(i,ti)}

with ti≥dmin⁢(𝒜⊗(s-1))=(d𝒜)s-1. Every 𝐮(i,j)∈Ui defines a line Li,𝐮(i,j) on which c|Li,𝐮(i,j) is a non-zero codeword of 𝒜. Equivalently, r is non-zero on index Li,𝐮(i,j)∈𝒬. Therefore,

wt(r)=|{L∈𝒬,rL≠0}|≥|⋃i=1s{Li,𝐮(i,j), 1≤j≤ti}|≥∑i=1sti≥s(d𝒜)s-1.

Let us now build a word r∈R⁢(𝒞) of weight s⁢(d𝒜)s-1. Let w∈𝒜∖{0} be a minimum-weight codeword of 𝒜, and define W:=supp(w)⊆A. Define c=w⊗s∈𝒞; then supp⁡(c)=Ws. Let finally r=R⁢(c). We see that rLi,𝐱≠0 if and only if 𝐱∈Ws. Hence we get

wt(r)=|{L∈𝒬,rL≠0}|=|⋃i=1s{Li,𝐱,𝐱∈Ws}|=s⋅d𝒜s-1

since each line Li,𝐱 is counted d𝒜 times when 𝐱 runs over Ws. ∎

Proposition 5.3.

Let δ>0, and let A be an [ℓ,ℓ⁢(1-δ)+1,ℓ⁢δ]q MDS code. Define C=A⊗s and (Q,V) as above. If every Φw is sufficiently uniform, then the PoR scheme associated to C and (Q,V) is (ε,τ)-sound for τ=O⁢(1(δ⁢ℓ)s⁢s) and every ε<ε0, where ε0=(1+O⁢(q-δ⁢ℓ+1))⁢δs when ℓ→∞.

Proof.

First, the relative distance of R⁢(𝒞) is δs according to Lemma 5.2. Then the random variables {Xu}u∈𝒟 are pairwise uncorrelated because the inequality

maxu≠v∈𝒬2⁡|u∩v|=1<ℓ⁢(1-δ)+2=minu∈𝒬⁡dmin⁢((𝒞|u)⊥)

allows us to apply Proposition 3.18. Besides, if every Φw is sufficiently uniform, then the bias α satisfies α=𝒪⁢(q-δ⁢ℓ+1) and hence 1-α1+α=1+𝒪⁢(q-δ⁢ℓ+1). Therefore, we can use Theorem 3.13, and we get the desired result. ∎

Parameters

We mainly focus on the download communication complexity in the verification step and on the server storage overhead since these are the most crucial parameters which depend on the family of codes 𝒞 we use. Besides, we consider that it is more relevant to analyse the ratio between these quantities and the file size than their absolute values.

Here, for an initial file of size |F|=((1-δ)⁢q+1)s⁢log2⁡q bits, we get

a redundancy rate
n⁢log2⁡q|F|=(q(1-δ)⁢q+1)s≤1(1-δ)s,
a communication complexity rate
ℓ⁢log2⁡q|F|=q((1-δ)⁢q+1)s≤1(1-δ)s⁢q1-s.

Example 5.4.

In Table 3, we present various parameters of 𝖯𝗈𝖱 instances admitting 0.10≤ε0≤0.16, for files of size approaching 104, 106 and 109 bits. Here 𝒜 is a [q,(1-δ)⁢q+1,δ⁢q]q MDS code (e.g., a Reed–Solomon code), and 𝒞=𝒜⊗s.

Table 3

Parameters of 𝖯𝗈𝖱 instances admitting 0.10≤ε0≤0.16.

q	δ⁢q	s	File size (bits)	Comm. rate	Redundancy rate	ε0
16	10	4	9,604	6.664×10-3	27.3	0.153
25	13	3	10,985	1.138×10-2	7.112	0.141
64	24	2	10,086	3.807×10-2	2.437	0.141
32	21	5	1,244,160	1.286×10-4	134.8	0.122
47	28	4	960,000	2.938×10-4	30.5	0.126
101	47	3	1,164,625	6.071×10-4	6.193	0.101
512	180	2	998,001	4.617×10-3	2.364	0.124
128	85	5	1,154,413,568	7.762×10-7	208.3	0.129
256	150	4	1,048,636,808	1.953×10-6	32.77	0.118
1,024	550	3	1,071,718,750	9.555×10-6	10.02	0.155
12,167	3,900	2	957,037,536	1.78×10-4	2.166	0.103
16,384	5,500	2	1,658,765,150	1.383×10-4	2.266	0.113

The previous example shows that, while the communication rate is reasonable for these 𝖯𝗈𝖱 instances over large files, the storage needs remain large.

5.2 Reed–Muller and related codes

Low-degree Reed–Muller codes are known to admit many distinct low-weight parity-check equations, whose supports correspond to affine subspaces of the ambient space. Therefore, they seem naturally adapted to our construction. Let us first consider the plane (or bivariate) Reed–Muller code case.

5.2.1 The plane Reed–Muller code RMq⁢(2,q-2)

Let 𝒞 be the Reed–Muller code

𝒞=RMq(2,q-2):={(f(x,y))(x,y)∈𝔽q2,f∈𝔽q[X,Y],degf≤q-2}.

It is well known that 𝒞 has length q2 and dimension (q-1)⁢(q-2)/2. Besides, for every line

L={𝐱=(at+b,ct+d),t∈𝔽q}⊂𝔽q2

and every c∈𝒞, we can check that ∑𝐱∈Lc𝐱=0. Indeed, let f∈𝔽q⁢[X,Y], deg⁡f=a≤q-2. The restriction of f on an affine line L can be interpolated as a univariate polynomial f|L of degree at most a. Our claim follows since ∑z∈𝔽qzi=0 for every i≤q-2.

Therefore, we can define 𝒬 as the set of affine lines L of 𝔽q2 and V⁢(L,r)=∑j=1ℓrj∈𝔽q. From the previous discussion, we see that (𝒬,V) is a verification structure for 𝒞. Also notice there are q⁢(q+1) distinct affine lines in 𝔽q2; hence N=q⁢(q+1).

Lemma 5.5.

Let C=RMq⁢(2,q-2), equipped with its verification structure defined as above. Then the response code R⁢(C) has minimum distance q2+2.

Proof.

Any non-zero codeword c∈𝒞 consists in the evaluation of a non-zero polynomial f⁢(X,Y)∈𝔽q⁢[X,Y] of degree at most q-2. Denote by L1,…,La⊂𝔽q2 the affine lines on which f vanishes, i.e., f⁢(P)=0 for every P∈Li, 1≤i≤a. We claim that a≤q-2. Indeed, since f has total degree less than q-1, it also vanishes on closed lines L1¯,…,La¯, considered as affine lines in 𝔽q¯2, where 𝔽q¯ denotes the algebraic closure of 𝔽q. Denote by gi∈𝔽q⁢[X,Y] the monic polynomial of degree 1 which defines Li¯. From Hilbert’s Nullstellensatz, there exists r>0 such that (∏i=1agi)∣fr. Since the gi’s have degree 1 and are distinct, we get a≤deg⁡f≤q-2. Hence the affine lines different from L1,…,La correspond to non-zero coordinates of R⁢(c). There are q⁢(q+1)-a≥q2+2 such lines, so dmin⁢(R⁢(𝒞))≥q2+2.

Now we claim there exists a word r∈R⁢(𝒞) of weight N-q+2=q2+2. Let L(0) and L(1) be two distinct parallel affine lines, respectively defined by X=0 and X=1. We build the word c which is -1 on coordinates corresponding to points in L(0), 1 on those corresponding to points in L(1) and 0 elsewhere. One can check that c∈𝒞; indeed, c corresponds to the evaluation of ∏z∈𝔽q∖{0,1}(z-X). Now, if we want to compute wt⁡(R⁢(c)), we only need to count the number of lines which do not intersect L(0) nor L(1). Clearly, there are only q-2 such lines. Hence wt⁡(R⁢(c))=q⁢(q+1)-(q-2), and this concludes the proof. ∎

Proposition 5.6.

Let C=RM⁢(2,q-2), and let (Q,V) be its associated verification structure. If every Φw is sufficiently uniform, then the PoR scheme associated to C and (Q,V) is (ε,τ)-sound for ε=1-o⁢(1) and τ=O⁢(1(1-ε)⁢q2), when q→∞.

Proof.

One can check that the random variables {Xu}u∈𝒟 are pairwise uncorrelated since

maxu≠v∈𝒬2⁡|u∩v|=1<ℓ⁢(1-δ)+2=minu∈𝒬⁡dmin⁢((𝒞|u)⊥).

Besides, the relative distance of R⁢(𝒞) is q2+2q⁢(q+1)→1 according to Lemma 5.5. If every Φw is sufficiently uniform, the bias α satisfies α∈𝒪⁢(1/q) and hence 1-α1+α=1+𝒪⁢(1/q). Therefore, we can use Theorem 3.13, and we get the desired result. ∎

Parameters

For an initial file of size |F|=12⁢(q-1)⁢(q-2)⁢log2⁡q bits, we get

a redundancy rate
q2⁢log2⁡q|F|=2(1-1/q)⁢(1-2/q)→2,
a communication complexity rate
q⁢log2⁡q|F|=2q⁢1(1-1/q)⁢(1-2/q)=𝒪⁢(1/q).

5.2.2 Storage improvements via lifted codes

The redundancy rate of Reed–Muller codes presented above stays stuck above 2. Affine lifted codes, introduced by Guo, Kopparty and Sudan [5], allow to break this barrier while keeping the same verification structure. Generically, they are defined as follows:

Lift(m,d):={(f(𝐏))𝐏∈𝔽qm,f∈𝔽q[X1,…,Xm]for every affine lineL⊂𝔽qm,(f(𝐐))𝐐∈L∈RSq(d+1)}.

We refer to [5] for more details about the construction. Here we focus on Lift⁢(2,q-2) since it can be compared to RM⁢(2,q-2). Indeed, one sees that

(5.1)RM⁢(2,q-2)⊆Lift⁢(2,q-2),

and equation (5.1) turns into a proper inclusion as long as q is not a prime. Besides, by definition of lifted codes, Lift⁢(2,q-2) admits the same verification structure as the one presented previously for RM⁢(2,q-2).

Lemma 5.7.

The response code of Lift⁢(2,q-2) has minimum distance at least q2-q+2.

Proof.

The rationale is similar to the proof of Lemma 5.5. Let 0≠c∈𝒞, c=(f⁢(𝐏))𝐏∈𝔽q2, f∈𝔽q⁢[X,Y], and denote by L1,…,La⊂𝔽q2 the lines on which f vanishes. The restriction of f along Li can be interpolated as a univariate polynomial f|Li⁢(T) of degree at most q-2 since (f⁢(𝐐))𝐐∈Li lies in the Reed–Solomon code RSq⁢(q-1) by definition of lifted codes. Therefore, f|Li⁢(T)=0, and f vanishes on Li¯. Repeating arguments in the proof of Lemma 5.5, we get a≤deg⁡f≤2⁢q-2 and dmin⁢(ℛ⁢(Lift⁢(2,q-2)))≥q2+q-2⁢q+2=q2-q+2. ∎

We believe the bound given in Lemma 5.7 is not tight, but it is sufficient to have dmin⁢(ℛ⁢(Lift⁢(2,q-2)))/N→1. Similarly to Proposition 5.6, we can then prove that practical 𝖯𝗈𝖱s can be constructed with the family of lifted codes Lift⁢(2,q-2).

Proposition 5.8.

Let C=Lift⁢(2,q-2), and let (Q,V) be its associated verification structure. If every Φw is sufficiently uniform, then the PoR scheme associated to C and (Q,V) is (ε,τ)-sound for every ε<1 and τ=O⁢(1(1-ε)⁢q2), when q→∞

The crucial improvement is that lifted codes potentially have much higher dimension than Reed–Muller codes. For q=2e, the dimension of Lift⁢(2,q-2) can be proved to equal 4e-3e [5].

Example 5.9.

In Table 4, we present parameters of 𝖯𝗈𝖱s based on Reed–Muller codes and lifted codes, using files of size approaching 104, 106 and 109 bits.

Table 4

Parameters of 𝖯𝗈𝖱s based on Reed–Muller codes and lifted codes.

Code	q	File size	Comm. rate	Redundancy rate
Lift	32	3,905	4.097×10-2	1.311
RM	64	11,718	3.277×10-2	2.097
Lift	64	20,202	1.901×10-2	1.217
Lift	256	471,800	4.341×10-3	1.111
RM	512	1,172,745	3.929×10-3	2.012
Lift	512	2,182,149	2.112×10-3	1.081
Lift	8,192	851,689,033	1.25×10-4	1.024
RM	16,384	1,878,704,142	1.221×10-4	2.000
Lift	16,384	3,691,134,818	6.214×10-5	1.018

Note that this family of codes has been used in the 𝖯𝗈𝖱 proposal of [7].

5.2.3 On more generic families of codes

We have presented two rather small families of codes producing practical instances of 𝖯𝗈𝖱. Let us give a short summary of approximate lower bounds on crucial 𝖯𝗈𝖱 parameters that have been shown in previous sections in Table 5.

Table 5

Approximate lower bounds on crucial 𝖯𝗈𝖱 parameters.

Family of codes over 𝔽q	Redundancy rate	Communication complexity rate
s-fold tensor product (Section 5.1.2)	(1-δ)-s	q-(s-1)⁢(1-δ)-s
Plane RM (Section 5.2.1)	2	2⁢q-1
Plane lifted code (Section 5.2.2)	1+qlog2⁡(3)-2	q-1+qlog2⁡(3)-3

Now we quickly mention other families of codes that could be interesting to consider.

Multi-variate generalisation

We have only presented Reed–Muller and lifted codes embedded into the affine plane𝔽q2. One could of course consider a broader ambient space 𝔽qm, m>2. Lines would have smaller relative weight compared to the ambient space, and thus we would decrease the communication complexity of our 𝖯𝗈𝖱 schemes. We must however care about the storage overhead which can drastically increase if m gets large: for instance, any Reed–Muller code RMq⁢(m,q-2) has rate ≤1/m!.

Lower degree generalisation

In order to increase the soundness of our 𝖯𝗈𝖱 schemes, one could consider Reed–Muller codes RMq⁢(2,d) (as well as related lifted codes) with a lower degree d<q-2. The communication complexity remains unchanged; however, we could observe overwhelming storage overhead if d is too small.

Combinatorial generalisation

Codes Lift⁢(2,q-2) can be viewed as codes from designs (see [1] for more details), where the underlying block design is the classical affine plane. Considering designs with smaller block size would lead to 𝖯𝗈𝖱s with smaller communication complexity. But once again, this could be expensive in terms of storage since only a few designs produce high-dimensional codes.

6 Conclusion

We have proposed a security model for 𝖯𝗈𝖱s in line of previous work, together with a generic code-based framework. We have then sharply quantified the extraction failure of our 𝖯𝗈𝖱 construction as a function of code parameters. Specialising this construction for particular families of codes, we provided instances with practical parameters. We hope our work will be an incentive for further proposals of code instances, aiming at better 𝖯𝗈𝖱 parameters.

Communicated by Doug Stinson

Funding source: Agence Nationale de la Recherche

Award Identifier / Grant number: 15-CE39-0013-01

Funding statement: This work is partially funded by French ANR-15-CE39-0013-01 “Manta”.

A Experimental estimate of the bias α

We here confirm our heuristic on the fact that Φw is sufficiently uniform, by providing experimental estimates of α.

Setup

We consider 𝖯𝗈𝖱 schemes using Reed–Muller codes 𝒞=RMq⁢(2,q-2), as presented in Section 5.2.1. We also fix the word w∈𝔽qn uploaded on the server during the initialisation step. Remark that, for varying w, all Φw are equivalently distributed. Indeed, if ψ∈𝔖⁢(𝔽q)n satisfies ψ⁢(w)=w′, then the distribution of permutations picked from Φw′ can be obtained by applying ψ to permutations picked from Φw. Hence, without loss of generality, we assume w=0. Proposition 3.16 claims that, in this context, α should be 𝒪⁢(1/q) since Δ=2 and ℓ≤q. For convenience, we write pΦ:=ℙΦw(Vσ(u,ru)=0), and we recall that α is an upper bound on pΦ (for varying u and r).

We proceed to three kinds of tests in order to estimate α:

Test 1. We sample N challenges u, and, for each sample, we fix t≤ℓ and ru in {x∈𝔽qℓ,|Zu|=t}. Then we estimate pΦ by running M trials and computing the average number of times Vσ⁢(u,ru)=0 occurs. We denote by ξM⁢(pΦ) this estimator. We then collect the maximum value of ξM⁢(pΦ) among the N samples of u.
Test 2. A challenge u is fixed. For several values of t, we pick N responses ru randomly in {x∈𝔽qℓ,|Zu|=t}. For every ru, we estimate pΦ with M samples. We collect the maximum value of ξM⁢(pΦ) among the N values of ru that have been picked.
Test 3. A challenge u is fixed, as well as a response ru to this challenge, which satisfies |Zu|=t for several values of t∈[2,ℓ]. We then run M trials and collect ξM⁢(pΦ).

$Figure 5 Estimators for various values of M∈[103,106]{M\in[10^{3},10^{6}]}, of q∈{8,64}{q\in\{8,64\}} and of test i, i∈{1,2,3}{i\in\{1,2,3\}}. Support size t=2{t=2} is fixed. For tests 1 and 2, the parameter N is set to 10. Black horizontal lines represent the expected value of α.$

Figure 5

Estimators for various values of M∈[103,106], of q∈{8,64} and of test i, i∈{1,2,3}. Support size t=2 is fixed. For tests 1 and 2, the parameter N is set to 10. Black horizontal lines represent the expected value of α.

$Figure 6 Estimators for various values of M∈[103,106]{M\in[10^{3},10^{6}]}, of q∈{8,64}{q\in\{8,64\}}, and of test i, i∈{1,2,3}{i\in\{1,2,3\}}. Support size t=3{t=3} is fixed. For tests 1 and 2, the parameter N is set to 10. Black horizontal lines represent the expected value of α.$

Figure 6

Estimators for various values of M∈[103,106], of q∈{8,64}, and of test i, i∈{1,2,3}. Support size t=3 is fixed. For tests 1 and 2, the parameter N is set to 10. Black horizontal lines represent the expected value of α.

$Figure 7 Estimators for various values of M∈[103,106]{M\in[10^{3},10^{6}]}, of q∈{8,64}{q\in\{8,64\}} and of test i, i∈{1,2,3}{i\in\{1,2,3\}}. Support size t=ℓ{t=\ell} is fixed. For tests 1 and 2, the parameter N is set to 10. Black horizontal lines represent the expected value of α.$

Figure 7

Estimators for various values of M∈[103,106], of q∈{8,64} and of test i, i∈{1,2,3}. Support size t=ℓ is fixed. For tests 1 and 2, the parameter N is set to 10. Black horizontal lines represent the expected value of α.

Influence of M and the chosen test on the estimator

At the end of the document, Figures 5, 6 and 7 confirm that, for fixed N and q and for any test i we use, i∈{1,2,3}, our estimator ξM⁢(pΦ) converges to a value close to 1/(q-1).

Influence of N on the estimator

Table 6 shows experimentally that, for M large enough and fixed q, the number N has few influence on the estimator (N being respectively the number of responses ru sampled in test 2, and the number of challenges u sampled in test 1). The minor increase of the values can be thought as a standard deviation due to the fact that the number of samples M=100,000 is finite.

Table 6

Estimators using tests 1 and 2 with M=100,000 and t=2 for q∈{8,64} and various values of N. The quantity 1/(q-1) represents an estimated upper bound on α that ξM⁢(pΦ) should approximate.

	Test 1		Test 2
N	q=8	q=64	q=8	q=64
1	0.1418	0.0152	0.1414	0.0158
5	0.1433	0.0163	0.1431	0.0162
10	0.1443	0.0165	0.1452	0.0166
50	0.1455	0.0169	0.1450	0.0168
100	0.1452	0.0167	0.1458	0.0168
500	0.1464	0.0169	0.1470	0.0168
1/(q-1)=	0.1429	0.01587	0.1429	0.01587

Influence of q on the estimator

In Table 7, we show that the estimator ξM⁢(pΦ) converges to an expected value 1/(q-1) for any value of q.

Table 7

Estimators using test 3 with M=1,000,000 and t=2 for various values of prime powers q. The quantity 1/(q-1) represents an estimated upper bound on α that ξM⁢(pΦ) should approximate.

q	ξM⁢(pΦ)	1/(q-1)
4	0.333	0.3333
7	0.166	0.1667
8	0.143	0.1429
16	0.0665	0.06667
17	0.0627	0.0625
31	0.0335	0.03333
32	0.032	0.03226
64	0.0161	0.01587
128	0.00791	0.007874
256	0.00382	0.003922
257	0.00398	0.004000

Acknowledgements

The authors would like to thank Daniel Augot who shared fruitful discussions on the definition of proofs-of-retrievability, as well as Alain Couvreur for his suggestion leading to the proof of Lemma 5.7.

References

[1] E. F. Assmus and J. D. Key, Designs and Their Codes, Cambridge Tracts in Math., Cambridge University, Cambridge, 1992. 10.1017/CBO9781316529836Search in Google Scholar

[2] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, O. Khan, L. Kissner, Z. N. J. Peterson and D. Song, Remote data checking using provable data possession., ACM Trans. Inf. Syst. Secur. 14 (2011), no. 1, Article ID 12. 10.1145/1952982.1952994Search in Google Scholar

[3] K. D. Bowers, A. Juels and A. Oprea, Proofs of Retrievability: Theory and Implementation, Proceedings of the First ACM Cloud Computing Security Workshop—CCSW 2009, ACM, New York (2009), 43–54. 10.1145/1655008.1655015Search in Google Scholar

[4] Y. Dodis, S. P. Vadhan and D. Wichs, Proofs of retrievability via hardness amplification, Theory of Cryptography—TCC 2009, Lecture Notes in Comput. Sci. 5444, Springer, Berlin (2009), 109–127. 10.1007/978-3-642-00457-5_8Search in Google Scholar

[5] A. Guo, S. Kopparty and M. Sudan, New affine-invariant codes from lifting, Innovations in Theoretical Computer Science—ITCS ’13, ACM, New York (2013), 529–540. 10.1145/2422436.2422494Search in Google Scholar

[6] A. Juels and B. S. Kaliski, Jr., PoRs: Proofs of retrievability for large files, Proceedings of the 2007 ACM Conference on Computer and Communications Security—CCS 2007, ACM, New York (2007), 584–597. 10.1145/1315245.1315317Search in Google Scholar

[7] J. Lavauzelle and F. Levy-dit-Vehel, New proofs of retrievability using locally decodable codes, IEEE International Symposium on Information Theory—ISIT 2016, IEEE Press, Piscataway (2016), 1809–1813. 10.1109/ISIT.2016.7541611Search in Google Scholar

[8] M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows and M. Isard, A cooperative internet backup scheme, Proceedings of the 2003 Usenix Annual Technical Conference, USENIX, Berkeley (2003), 29–41. Search in Google Scholar

[9] Z. Mo, Y. Zhou and S. Chen, A dynamic proof of retrievability (por) scheme with O⁢(log⁡n) complexity, Proceedings of IEEE International Conference on Communications—ICC 2012, IEEE Press, Piscataway (2012), 912–916. 10.1109/ICC.2012.6364056Search in Google Scholar

[10] M. Naor and G. N. Rothblum, The complexity of online memory checking, J. ACM 56 (2009), no. 1, Article ID 2. 10.1109/SFCS.2005.71Search in Google Scholar

[11] M. B. Paterson, D. R. Stinson and J. Upadhyay, A coding theory foundation for the analysis of general unconditionally secure proof-of-retrievability schemes for cloud storage, J. Math. Cryptol. 7 (2013), no. 3, 183–216. 10.1515/jmc-2013-5002Search in Google Scholar

[12] M. B. Paterson, D. R. Stinson and J. Upadhyay, Multi-prover proof of retrievability, J. Math. Cryptol. 12 (2018), no. 4, 203–220. 10.1515/jmc-2018-0012Search in Google Scholar

[13] B. Sengupta and S. Ruj, Efficient proofs of retrievability with public verifiability for dynamic cloud storage, preprint (2016), https://arxiv.org/abs/1611.03982. 10.1109/TCC.2017.2767584Search in Google Scholar

[14] H. Shacham and B. Waters, Compact proofs of retrievability, J. Cryptology 26 (2013), no. 3, 442–483. 10.1007/s00145-012-9129-2Search in Google Scholar

[15] Q. Wang, C. Wang, K. Ren, W. Lou and J. Li, Enabling public auditability and data dynamics for storage security in cloud computing, IEEE Trans. Parallel Distrib. Syst. 22 (2011), no. 5, 847–859. 10.1109/TPDS.2010.183Search in Google Scholar

Received: 2018-04-27

Revised: 2018-09-04

Accepted: 2019-01-25

Published Online: 2019-02-19

Published in Print: 2019-06-01

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.