1. Introduction

With the rapid progress of the technology of wireless communication and sensor, the Internet of Things (IoT) is changing our lives. From medical devices and air quality monitoring to intelligent street lights, energy-efficient buildings, smart home, and more, the IoT is in more places than ever. As the number of the connected IoT devices increases, the amount of data generated by these devices will also exponentially increase. According to the newer forecast from International Data Corporation (IDC) [1], there will be 41.6 billion connected IoT devices in 2025, and they will generate 79.4 ZB of data. For the IoT devices with limited computation and storage resources, it is an important challenge to how to properly use and store such vast amounts of data. Cloud computing makes it possible to process and store massive amounts of data, and it also makes those data users possessing limited resources to easily access the stored data in cloud at any time and from anywhere.

As a cryptographic tool that is realizing secure multiparty computation, private set intersection (PSI) can make two parties holding sets compute their intersection without revealing any information other than the intersection. Since PSI was proposed by Freedman in [2], a mass of PSI schemes [37] have been put forward. The powerful privacy protection of PSI makes it to be important applications in real life, such as private contact tracing [8], DNA testing and pattern matching [9], remote medical diagnostics [10], and the effectiveness assessment of online advertising [11]. Over the last few years, PSI has been further developed such that it becomes very practical with extremely fast implementations that can conduct millions of items in seconds. However, most PSI schemes require two parties possessing datasets to jointly calculate the intersection of the available datasets locally. As the commercial value of cloud computing services drives, the user might delegate cloud service provider to execute the PSI computation for the outsourced datasets in cloud.

To our knowledge, in most of the existing PSI protocols, both participants jointly compute the intersection of their sets in an interactive manner, which makes that each participant must have a local copy of its dataset. It brings a heavy burden to resource-limited users. The advent of cloud computing makes the delegation of PSI computation promising since cloud servers can provide flexible and cost-effective storage space and on-demand computing power service. Recently, several cloud-based PSI protocols [6, 12, 13] are proposed. In these schemes, to achieve PSI computation, the user needs to outsource its dataset to a cloud server. However, the cloud server is not fully trusted, and it might reveal or tamper with the items in the outsourced dataset. To ensure the privacy of the outsourced dataset, they should be processed by applying cryptographic algorithms before outsourcing. However, the complicated cryptographic operations do not only incur the heavy computation burden to the resource-constrained data users but also impede access over the dataset of the data owner. To realize access control, data owners must be real-time online to execute PSI computations with the authorized data user.

Recently, to avoid data owner real-time online and achieve fine-grained access control, Ali et al. proposed an attribute-based private set intersection computation protocol [14]. However, their scheme cannot ensure the integrity of the returned blinded dataset from the cloud, since when the cloud is not fully trusted, it may return the partial blinded dataset or delete some items of the outsourced dataset. Additionally, in their scheme, the access policy needs to be embedded in the blinded dataset. This kind of direct exposure of access policy can result in privacy revelation of data users since the access policy often contains some sensitive information.

To solve the above issue and provide fine-grained access control, in the paper, we proposed an efficient cloud-based private set intersection computation (PSI) protocol. The main contributions in this works are summarized as follows:(i)Fine-grained access control of PSI computation: it provides fine-grained access control for PSI computation in the cloud environment and makes access control over the outsourced dataset of data owners realized by applying attribute-based encryption.(ii)Offline data owner: in the PSI-computation phase, data owner does not need to be online in real-time, which reduces the burdens of communication and computation of data owner.(iii)Resisting colluding attack: the collusion between cloud server and unauthorized data users cannot obtain any information about the outsourced datasets.(iv)Data secrecy: for data owner, an authorized data user with dataset only learns the information of the intersection ; none of the other information about dataset except is obtained by data user.(v)Integrity: it can ensure the integrity of the returned blinded dataset in order to resist the malicious behaviors of the cloud server.(vi)Hidden access policy: to satisfy more practical privacy requirements, the proposed protocol enables that cloud server cannot derive any sensitive information about attribute from the blinded dataset.

1.1. Paper Organization

The remainder of the paper is organized as follows. Section 2 reviews related work, and Section 3 describes some preliminaries. In Section 4, we give problem formulation of the proposed protocol. In Section 5, the proposed PSI protocol is given. In Section 6, we analyze the security of the proposed protocol. In Section 7, we evaluate the performance of the proposed protocol. Finally, the paper is summarized in Section 8.

Meadows proposed the first secure PSI protocol [15] based on multiplicative homomorphic techniques. Due to being based on public-key cryptography, the scheme running time is unacceptable, in particular, when the size of the dataset becomes large. In [2], Freedman et al. proposed a private set intersection protocol by means of partial homomorphic encryption and point-value polynomial representation of sets. Later, Hazay and Nizim extended it to the malicious setting in [16]. In [17], Kissner and Song proposed privacy-preserving set operations. Their scheme can compute not only the intersection of the sets but also the union of the sets in a privacy-preserving way.

In [18], Jarecki and Liu proposed a novel PSI protocol based on the composite residual problem. In their scheme, the user and the server obtain the intersection of the two datasets by using parallel oblivious pseudorandom function. However, it relied on a common reference model. In [19], De Cristofaro et al. proposed two PSI protocols in malicious model. However, their schemes are unable to hide the cardinality of the user’s dataset. To overcome this problem, Ateniese et al. proposed a PSI protocol [20] under the RSA assumption. But their scheme is only proven to be secure in the random oracle model. Based on the scheme in [18], De Cristofaro and Tsudik presented an efficient PSI protocol [21] by using OPRF techniques. Over the past few years, many efficient PSI protocols [313, 16, 17] have been successively proposed.

According to cryptographic techniques used to construct PSI protocol, the existing PSI protocols are mainly classified into three different groups:(i)Public-key-based PSI protocol: homomorphic encryption is a common cryptographic technique to design PIS protocol. In the early days, most PSI protocols were constructed based on homomorphic encryption, where the protocols in [2, 8, 10, 15, 22, 23] are the classic instances. In the type of protocols, data owner first encrypts dataset to obtain the corresponding ciphertext and sends to data user, and then using homomorphic properties of homomorphic encryption, data user conducts some specific operations on the ciphertext and its dataset . Finally, data owner obtains the corresponding intersection by using its private key. This type of protocol is suitable for the scenario in which both participants possess strong computing capability. In general, such protocols require a higher computation cost since public-key cryptography is included. However, it is suitable for designing some PSI protocols with a custom function.(ii)Circuit-based PSI protocol: circuit-based generic technique of secure computation is another method to design PSI protocol. Fairplay proposed the first PSI protocol by using Yao’s garbled-circuit approach in [24]. In the subsequent works, Huang et al. presented three PSI protocols based on Yao’s generic garbled-circuit method [25]. Their schemes are competitive with the fastest public-key-based protocols. Afterwards, Pinkas et al. gave some new optimizations for circuit-based PSI in [26]. By using secure multiparty computation idea, this type of protocol can transform the specific function into garbled Boolean circuit to realize secure computation, and its key technique is symmetric cryptography. For this general circuit protocol, its advantage is that it makes the design and implementation of the protocol easier. However, due to its generalization, the garbled circuit makes the scalability of the protocol poor.(iii)Oblivious transfer- (OT-) based PSI protocol: oblivious transfer protocol is a foundation of secure computation. To realize large-scale data processing, Dong et al. presented two efficient PSI protocols [27] based on bloom filters and OT extension protocol. Their protocols are rather efficient and highly scalable compared with some PSI protocols. To reduce the runtime, Pinkas et al. gave an optimization of PSI protocol [27] using random OT extension in [26]. The core idea of this type of protocols is to have both parties collaboratively engage in many OT protocols. In general, this type of PSI protocols had lower computation costs and communication consumption, but extra keys-related computations are demanded such as secret key agreement.

From the above analysis, public-key-based PSI protocols exist as higher computation complexity, and circuit-based PSI protocol and OT-based PSI protocols have higher efficiency due to using symmetric encryption, but key agreement or secure transferring of secret keys also require additional computation costs and communication overhead.

3. Preliminaries

3.1. Composite Order Bilinear Group

Throughout the paper, we only consider composite order bilinear groups since our scheme is based on such construction. In the following, we review some concepts of such bilinear pair:(1) and are two cyclic groups with the same composite order where are two distinct primes, and it is deemed to be hard for solving the discrete logarithm problem in group .(2)Let denote a computable bilinear map which satisfies the following criteria:(i)Bilinearity: for arbitrary and all , we have .(ii)Nondegeneracy: such that has the order in .(iii)Orthogonal property: let and denote two subgroups of with the order and , respectively. For and , then .

3.1.1. The Decisional Bilinear Diffie–Hellman Assumption in

Let and be a random 5-tuple where , there does not exist an efficient PPT algorithm which can distinguish . ’s advantage of breaking the decisional Diffie–Hellman problem in is defined as

We think that the DBDH problem is against if the algorithm is capable of distinguishing and in a nonnegligible probability .

3.2. Access Tree

Access trees can make the representation of access control policies easier to understand. In what follows, we explain the access trees used in our constructions.

Access tree is a tree-like access structure, and each leaf node is associated with an attribute value, and an inner-node is represented with a threshold gate , where is the children number of inner-node and is its threshold value satisfying . Specifically, when and , it means that the corresponding threshold gate is the OR-gate and the AND-gate, respectively. For each leaf node , its threshold value is .

For the sake of presentation, the children of each node are ordering from 1 to . At the same time, we define that function is the parent of node , is the number associated with node , and then function is an attribute associated with the leaf node .

Assume that is the root of an access tree , then we use to represent this tree, and is denoted by a subtree of rooted at node if node is an inner node of . For an attribute set , if it satisfies the subtree , then it is represented as . The satisfied conditions are divided into the following two cases:(1)When is a leaf node, is returned if and only if ;(2)When is a nonleaf node, can be computed recursively. For all children of node , if at least children satisfy , then is returned.

4. Problem Formulation

In this section, to better understand the motivation of the proposed scheme, we will give the system model and threat model that correspond to our protocol.

4.1. System Model

For a cloud-based PSI computation protocol with fine-grained access control and integrity verification, its system model is shown in Figure 1. The system model consists of four entities: key generation center (KGC), cloud service provider (CSP), data owners, and data users. The roles of these entities are described as follows.

4.1.1. Key Generation Center

It is responsible for establishing system parameters and generating the secret keys of the attributes for data users. In addition, it also generates a public-private key pair for the data owners’ signature algorithm.

4.1.2. Cloud Service Provider

It has abundant storage space and powerful computing capability, and it can provide storage services of the outsourced dataset for data owners and PSI computation services for data users.

4.1.3. Data Owner

It is the owner of a dataset . To achieve fine-grained access control, data owner needs to define an access control policy before outsourcing the dataset; and then it blinds the dataset based the defined access policy.

4.1.4. Data User

It also possesses a dataset and can request the CSP to generate a token in order to compute private set intersection . It is worth noting that only data user whose attributes satisfy access policy defined by data owner can obtain the returned PSI token by the CSP.

4.2. Threat Model

In our proposed protocol, the CSP is not a fully trusted entity, like [6, 11, 14, 17], it may attempt to tamper or delete the items in the outsourced dataset, and it also might try to extract sensitive information from the outsourced dataset by colluding some unauthorized data users. For data user, it is identified as a malicious entity. It may collude the CSP to obtain more information beyond the intersections. Additionally, an unauthorized data user also may attempt to obtain the qualification of the PSI computation. The KGC is assumed to be a trusted entity. It is responsible for generating secret keys for data user’s attribute set and public/private key pairs for data owners. Data owner is assumed to be a trusted entity. It honestly encrypts its dataset and outsources them to the CSP. They never reveal the elements of their private datasets.

4.3. Security Goals

For the proposed protocol, its goals are given as follows:(1)It achieves fine-grained access control and makes that only data users satisfying the defined access policy by data owner can conduct PSI computation.(2)The authorized data users learn nothing, except the intersection of datasets, this is to say, it achieves data secrecy.(3)The CSP does not learn any information about the protocol results and the outsourced dataset, this is to say, it is adaptively secure against chosen dataset attack.(4)It must ensure the integrity of the blinded dataset returned by the cloud server to the data user.

5. Our Concrete Construction

In this section, we present a concrete construction of the cloud-based PSI protocol with fine-grained access and integrity verification. The protocol consists of the following three stages, the detailed descriptions are given as follows.

5.1. System Initiation

In this stage, key generation center (KGC) is responsible for initializing system parameters and producing secret key of the data user with attribute set . To do it, it needs to execute the following two algorithms: Setup and KeyGen.

5.1.1. Setup

Taking a security parameter and the universal attribute set as inputs, it outputs , where and are cyclic groups of order and are two distinct 512-bit primes. Let and denote two subgroups of group with the order and , respectively, and and are the generators of and , respectively. is a bilinear pairing map.

For the universal attribute set , where is the maximum number of attributes, randomly choose and to compute public keys

Additionally, KGC chooses two collision-resistant hash functions and satisfying and . Finally, public parameters are published as follows:and master secret keys are securely stored. Note that each corresponds to each attribute .

5.1.2. KeyGen

For a data user with an attribute set , if it wants to register to the system, it sends its attribute set to , and then KGC makes use of its secret key to generate the secret key for the data owner with attribute set .(1)First of all, KGC randomly chooses a number to compute(2)and then for each attribute in , it computes .(3)The resultant secret key .

In addition, for data owner, it randomly chooses to compute its public key . This public-private key pair is used to generate digital signature.

5.2. Blinding of Dataset

For a data owner, if it wants to outsource the dataset to cloud server, to ensure the security of the dataset, it needs to blind the dataset before outsourcing. To achieve the confidentiality and fine-grained access control of data, it needs to execute the following Blind algorithms.

5.2.1. Blind (Param, , X)

For a dataset and an access structure , the algorithm takes and as inputs and outputs the blinded data. The detailed process is given as follows.(1)First of all, it picks a number and at random to computewhere is the size of the dataset and is a concatenate operator.(2)Then, it generates a signature on by randomly choosing , where and .(3)To facilitate the expression of access structure, we adopt access tree . For each node in , the algorithm randomly allocates a degree polynomial in the top-down manner, where is the children number of node . Especially for root of access tree , the allocated polynomial should satisfy ; for the other node , it should satisfy , where denotes an index value of node in the children of its parent node, and is the parent node of node . Finally, for each leaf node of access tree corresponding to an attribute , the blinded data is computedwhere is a random element in group . Note that corresponds to the attribute .(4)At last, the blinded dataset of is , and data owner uploads them to cloud server.

5.3. PSI Computation

To obtain private set intersection of a blinded dataset of dataset uploaded by a special data owner, the stage is divided into three parts: Token1 generation, Token2 generation, and set intersection computation. Firstly, a data user with dataset needs to run TokenGen1 algorithm to produce a PSI-token 1 for cloud server, and then cloud server runs TokenGen2 algorithm to produce a PSI-token 2 by using PSI-token 1. Finally, data user computes the intersection of the two datasets and by using the PSI-token 2 produced by cloud server. The detailed processes are given as follows.

5.3.1. TokenGen1

Taking system parameters and secret key of data user , this algorithm randomly selects a number to compute a PSI-token 1:

5.3.2. TokenGen2

Taking the PSI-token 1 and the blinded dataset , this algorithm firstly verifies whether satisfies the access tree correlating to the blinded dataset . If it does not, then it outputs and aborts it.

For the sake of illustration, we define a recursive algorithm that takes as input an access tree , an attribute set , the blinded dataset , and a node from .

For each node in , cloud server executes a recursive algorithm as follows:(1)If is a leaf node and , where is an attribute corresponding to leaf node , then it computeswhere is the -th element in PSI-token 1 and is the -th in .(2)If is a nonleaf node, the recursive algorithm runs as follows: for all child node of node , it calls for and stores them, where denotes all children of and denotes the j-th child node of . The algorithm is run until children nodes of a node are leaf nodes, and then we computewhere is the child node of node and is threshold value of parent node , and for , its Lagrange coefficient isThrough recursively running in the bottom-to-top manner in access tree , we can obtain the PSI–token 2,In the end, the algorithm returns the PSI-token 2 , and the signature of .

5.3.3. PSI Computation

Upon receiving and , the algorithm takes , , and as inputs and executes Algorithm 1. In this algorithm, lines 1–3 are used to check the validity of the signature , which can achieve the integrity checking of the returned dataset by cloud server. Lines 7–17 are used to seek the intersection of two datasets. If holds, then the element of the intersection is found.

Input:
Output: the intersection of dataset and dataset
(1)if, then
(2) printf (“the returned dataset is not intact”);
(3) exit (0);
(4)end
(5)Compute ;
(6)Set ;
(7)fordo
(8)fordo
(9)  Compute ;
(10)  if, then
(11)   printf (“,yj);
(12)  else
(13)   j++;
(14)  end
(15)end
(16)i++;
(17)end
5.4. Correctness

In the subsection, we show that the proposed scheme is correct, because in the blinding stage, the dataset is blinded into , and each in has the following format .

In addition, in the stage of intersection computation, data user can use PSI-token 2 to obtain the following relation:

Thus, it can obtain by running Algorithm 1 in PSI computation phase. It means that the proposed scheme is correct.

6. Security Analysis

In this section, we show that the proposed PSI scheme satisfies data secrecy and resists the adaptively chosen-dataset attack.

Theorem 1. Supposed that the decisional bilinear Diffie-Hellman problem (DBDHP) in is difficult to solve, then the proposed PSI scheme is adaptively secure against chosen-dataset attack in the standard model.

Proof. Let be a (probabilistic polynomial time) PPT adversary who launches an attack on the proposed PSI scheme. If it breaks the proposed PSI scheme in a nonnegligible probability , then we are able to construct a challenger which solves the DBDHP problem.
First of all, let us recall the DBDHP problem in subgroup of group . Assume that is an instance of the BDHP problem, where are random numbers and , its goal is to determine whether the case holds. In the following interactive game, attempts to solve the BDHP problem by invoking as a subroutine.

6.1. Init Phase

In this phase, the adversary randomly selects the challenged access structure and an attribute set and sends them to the challenger.

6.2. Setup

In this phase, initializes system parameters based on the instance of the DBDH problem. Firstly, it chooses to compute

It implies .

Also, for to , it randomly chooses , , and to compute

Additionally, it also sets and chooses to set data owner’s public key as . Finally, it sends to the adversary .

6.3. Phase 1

To simulate the game, can adaptively make a series of queries in this phase.(i)KeyGen query: while the adversary makes a KeyGen query with an attribute set , where , to response it, executes the following steps:(1)First of all, it randomly chooses to computeIt implicitly defines .(2)For to , because all , we can compute(3)Finally, the secret key of the corresponding attribute is returned to the adversary .(ii)TokenGen1 queries: when the adversary issues a TokenGen1 query with attribute , first makes a KeyGen query with attribute to obtain secret key , and then it chooses a random number to compute a PSI-token 1:and it sends to the adversary.(iii)Decryption queries: for a decryption query of the ciphertext with attribute set , parses access tree from and checks whether matches the access structure . If it matches, then performs a KeyGen query with to obtain , and then it makes use of to call Decryption algorithm to retrieve data .

6.4. Challenge

In this stage, to produce a challenge, the adversary sends the challenged access tree in which leaf nodes are involved and correspond to attributes and the two dataset and to the challenger , where and are of the same size.

Subsequently, randomly flips a coin to produce the following blinded dataset:(1)It sets(2)Assume that root denotes the root node of . According to the principle from top to bottom, to compute leaf node information, we first construct a polynomial satisfying . Although is unknown, we can compute the exponential form of , namely,where is the random number and is a threshold value of root node in access tree . For the children node of root node , we can compute , where represents the polynomial corresponding to node that is the -child of root node.According to the above method, we can obtain the following values by applying the manner from top to bottom, namely,where denotes the polynomial of leaf node in access tree . Because and is a random element, we have(3)and then it produces a signature by randomly choosing , where and .(4)At last, the blinded dataset is which is returned to .

Obviously, when , we have

Thus, the produced blinded one of the dataset by the above way is valid.

6.5. Phase 2

In this phase, can still issue a series of new queries as in Phase 1, but the following restriction conditions must been satisfied:(1) is not allowed to make the KeyGen queries.(2)The blinded dataset is not allowed to make TokenGen1 query.

6.6. Guess

Eventually, the adversary returns its guess . If , then it outputs true. Otherwise, false is returned.

From the point of the adversary ’s view, the simulation of is indistinguishable from the real game. When is a random element of , the produced blinded dataset has the same distribution as the real blinded dataset. It is independent of the choice of dataset . In this case, the probability of guessing is

When holds, the produced blinded dataset is a valid one. If the adversary breaks the proposed scheme in nonnegligible probability , then it means that the adversary can solve the DBDH problem in groups with the following probability:

However, due to the difficulty of solving the DBDH problem in groups , thus the probability of the adversary breaking the proposed scheme is negligible.

Theorem 2. If it is infeasible to generate a message that yields a given hash value for one-way hash function , then our proposed PSI scheme can satisfy data secrecy in the standard model.

Proof. Suppose that there exists an adversary which breaks data secrecy in the proposed scheme, then we can construct an algorithm to solve the one-way problem of hash function. Firstly, we review the one-way problem of hash function. Given a hash value and one-way cryptographic hash function , its goal is to find a number such that it satisfies . To break the one-way problem of hash function, the algorithm needs to initialize system parameters and plays an interactive game with the adversary . The detailed processes are given as follows:(i)Setup: the algorithm takes a security parameter as inputs and outputs system parameters . For the universal attribute set , randomly choose and to compute public keysAdditionally, KGC chooses two collision-resistant hash functions and satisfying and . Finally, public parametersare sent to the adversary .(ii)Phase 1: in this phase, the adversary is able to adaptively issue KeyGen queries and TokenGen1 queries. When the adversary issues such queries, the algorithm runs KeyGen() and TokenGen1() to response them since it has master secret key.(iii)Challenge: to produce a challenge, the adversary submits an access tree and -element dataset to the algorithm . The algorithm randomly chooses a number to compute the blinded dataset .(1)For to , it computes .(2)When , it sets . It means that there is a number which satisfies .(3)For the other components of , they are computed by the Blind algorithm since the algorithm possesses master secret key and the private key of data owner.(4)Finally, it returns to the adversary .(iv)Phase 2: in this phase, the adversary is still able to issue the same queries as those in Phase 1.(v)Guess: at last, the adversary outputs its guess in the position for the dataset in the nonnegligible probability (note that or ). If the adversary wins the game, then it means that satisfies the following relation:Thus, given a hash value , the algorithm can find a value to satisfyObviously, it is in contradiction with the one-way property of hash function. Therefore, for two datasets and , the adversary can obtain nothing about or , except the intersection in .

Theorem 3. The proposed scheme can ensure the integrity of the returned blinded dataset by the cloud assuming that the underlying Schnorr signature is unforgeable.

Proof. Suppose that there exists an adversary which breaks the integrity problem of the returned blinded dataset in the proposed scheme, then a challenger can construct an algorithm which can break Schnorr signature; its goal is to output a new message-signature.
Let be a challenger; to break the security of Schnorr signature, the challenger runs as follows:(i)Setup: the algorithm takes security parameter as input and outputs system parameters . For the universal attribute set , randomly choose and to compute public keysAdditionally, KGC chooses two collision-resistant hash functions and satisfying and , and it chooses a number as public key. Finally, public parametersare sent to the adversary , and it keeps the master private key secretly.(ii)Blinding dataset queries: the adversary is able to adaptively issue Blinding Dataset queries with a dataset and an access tree . When receiving a query , the challenger executes as follows:(1)It picks a number at random to compute(2)Then, it sends to algorithm , and algorithm makes a signing query to signing oracle in the Schnorr signature security game with string . After obtaining the returned signature , returns it to the challenger .(3)Next, the challenger computes the following values by adopting to access trees :where is a random element in group .(4)Finally, it returns the blinded dataset to the adversary .(iii)Output: eventually, the adversary outputs . The adversary wins the game if is a valid signature on and the string is never made signing query to . If wins this game, then sets as the output of algorithm. Because is a valid signature on and never makes a signing query with string , it means that algorithm successfully breaks the unforgeability of Schnorr signature.Obviously, it is in contradict with the unforgeability of Schnorr signature. Thus, the proposed PSI protocol can ensure the integrity of dataset.

Theorem 4. Our proposed PSI scheme can achieve hidden access attributes.

Proof. In the proposed scheme, to achieve the attributes anonymously, we use the orthogonality property of composite order bilinear groups. In the encryption phase, the random elements and are introduced into and in the blinded dataset. It can effectively prevent some malicious attackers from testing the access policy by a possible access structure and guessing the access structure. Thus, it achieves hidden access attributes.

7. Performance Analysis

In this section, we evaluate the efficiency of the proposed PSI protocol in terms of computational costs. To give a fair comparison, we abandon the comparison with the other PSI protocols. The reason is that the goals of the proposed PSI protocol support fine-grained access control with hiding attribute and ensure the integrity of the returned dataset, which differs from most of the existing PSI protocols in which their goals are to be securely against semihonest adversaries or to improve efficiency of the PSI protocol. To our knowledge, it is the first PSI protocol with supporting attribute hidden fine-grained access control and integrity verification. Additionally, experiment results also show that the proposed PSI protocol is quite efficient.

To illustrate the effectiveness of the proposed PSI scheme, we also implement the experiment simulation based on an Ubuntu 18.04 laptop computer with the Intel(R) Core(TM) 4130 [email protected] GHz, 4 GB RAM. All algorithms are written with C language in Linux system. Because the proposed scheme is based on composite order bilinear pairing, we adopt “Type A1 pairing” in PBC Library which provides a level of security equivalent to 1024-bit discrete logarithm problem. To be simple, we assume that access tree only includes a root node and leaf nodes, namely, it is in the form of (ANDANDAND), where is an attribute.

In the proposed protocol, private set intersection is only computed by data user, and data owner does not need to participate in this phase. In the whole protocol, after the dataset is outsourced to cloud server, the data owner is offline. However, in many PSI protocols [6, 18, 20, 27], both of data owner and data user have to be online and have interaction with each other.

Here, we show the performance of the proposed PSI protocol by evaluating the execution-time overhead of each algorithm.

The setup algorithm is used to initialize system parameters; the required execution times are mainly determined by the cardinalities of the universal attribute set. Its running time increases as the cardinalities of the universal attribute set increase; the corresponding performance graphs are given in Figure 2(a). For the KeyGen algorithm, its performance is shown in Figure 2(b). According to Figure 2, we can find that the running time in the two algorithms is greatly affected by the size of the data user’s attribute set. Thus, they are two slashes that increase monotonically. When the number of attributes is 60, these two algorithm’s running time is approximately 1.2 seconds. It is acceptable. For Blind algorithm, its running time is mainly from generation of fine-grained access structure and linear to the size of attribute set. However, the size of cardinality of the dataset has little influence on the runtime of algorithm after access attributes are fixed, the reason is that we blind the dataset by adopting XOR operation in the blinding process, and XOR operation’s runtime is negligible. Their performance graphs are shown in Figures 3(a) and 3(b). From Figure 3(b), we can know when attribute number is 20 and the cardinality of dataset is ; the runtime of algorithm is only about 371.578 ms. For TokenGen1 algorithm and TokenGen2 algorithm, their runtime is linear to the size of the data user’s attribute set, the reason is that the two algorithms correspond to decryption process of attribute-based encryption scheme. Their performance graphs are shown in Figures 4(a) and 4(b). For PSI Computation algorithm, the runtime of algorithm is only related to the cardinality of dataset; it is shown in Figure 5, and we can know that for the cardinality of dataset, from to , the runtime of algorithm hardly changes since only 3 exponentiation in , 1 pairing operator, hash operators, and XOR operators are needed in this algorithm. However, XOR operator and hash operator are two kinds of lightweight operations that take almost no time. When the cardinality of dataset is , PSI computation runtime is 296.369 ms. Thus, it is very suitable for the resource-limited data user.

8. Conclusion

In this work, we presented the first private set intersection (PSI) computation with supporting hidden attribute fine-grained access control and integrity verification based on attribute encryption. The main goal of the proposed scheme is to realize customized function to cater to practical application. Compared to most of the existing schemes, the proposed scheme has the following merits: (i) for data owner, it achieves dataset’s access control by defining an access policy before the dataset is outsourced. (ii) It makes data owners to be offline during the whole PSI protocol. (iii) It ensures the integrity of the returned dataset from the cloud. (iv) The main PSI computation burden is transferred to the cloud. (v) It supports one-to-many PSI computations, that is to say, after the blinded dataset is outsourced to the cloud, data user can implement PSI computations with the cloud for arbitrary times. In addition, after giving the corresponding security analysis, we evaluate each algorithm of the proposed scheme by experiment simulation; the results show that the performance of the proposed scheme is efficient and practical. To reduce computational cost of the whole scheme, we will study fine-grained access control PSI scheme with constant computation in the future work.

Data Availability

This article contains the data that support the results of this study. If other data used to support the results of this study are needed, they can be obtained from the corresponding author.

Disclosure

The funders had no role in the design of the study; the collection, analyses, or interpretation of data; the writing of the manuscript; or the decision to publish the results.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Beijing (no. 4212019), National Natural Science Foundation of China (no. 62172005), Guangxi Key Laboratory of Cryptography and Information Security (no. GCIS201808), and Foundation of Guizhou Provincial Key Laboratory of Public Big Data (no. 2019BDKF JJ012).