ABSTRACT
We study synthetic data release for answering multiple linear queries over a set of database tables in a differentially private way. Two special cases have been considered in the literature: how to release a synthetic dataset for answering multiple linear queries over a single table, and how to release the answer for a single counting (join size) query over a set of database tables. Compared to the single-table case, the join operator makes query answering challenging, since the sensitivity (i.e., by how much an individual data record can affect the answer) could be heavily amplified by complex join relationships. We present an algorithm for the general problem, and prove a lower bound illustrating that our general algorithm achieves parameterized optimality (up to logarithmic factors) on some simple queries (e.g., two-table join queries) in the most commonly-used privacy parameter regimes. For the case of hierarchical joins, we present a data partition procedure that exploits the concept of uniformized sensitivities to further improve the utility.
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley Reading.Google ScholarDigital Library
- Hilal Asi and John C Duchi. 2020. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. NeurIPS, 14106--14117.Google Scholar
- Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 739--748.Google ScholarDigital Library
- Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. 2017. Answering conjunctive queries under updates. In PODS. 303--318.Google Scholar
- Aditya Bhaskara, Daniel Dadush, Ravishankar Krishnaswamy, and Kunal Talwar. 2012. Unconditional differentially private mechanisms for linear queries. In STOC. 1269--1284.Google Scholar
- Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. 2013. Differentially private data analysis of social networks via restricted sensitivity. In ITCS. 87--96.Google Scholar
- Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil Vadhan. 2015. Differentially private release and learning of threshold functions. In FOCS. 634--649.Google Scholar
- Mark Bun, Jonathan Ullman, and Salil Vadhan. 2018. Fingerprinting codes and the price of approximate differential privacy. SIAM J. Comput. , Vol. 47, 5 (2018), 1888--1938.Google ScholarDigital Library
- T-H Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and continual release of statistics. ACM TISSEC, Vol. 14, 3 (2011), 1--24.Google ScholarDigital Library
- Shixi Chen and Shuigeng Zhou. 2013. Recursive mechanism: towards node differential privacy and unrestricted joins. In SIGMOD. 653--664.Google Scholar
- Graham Cormode, Cecilia Procopiuc, Divesh Srivastava, Entong Shen, and Ting Yu. 2012. Differentially private spatial decompositions. In ICDE. 20--31.Google Scholar
- Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. The VLDB Journal, Vol. 16, 4 (2007), 523--544.Google ScholarDigital Library
- Bolin Ding, Marianne Winslett, Jiawei Han, and Zhenhui Li. 2011. Differentially private data cubes: optimizing noise sources and consistency. In SIGMOD. 217--228.Google Scholar
- Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, and Ashwin Machanavajjhala. 2022. R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys. In SIGMOD.Google Scholar
- Wei Dong and Ke Yi. 2021. Residual Sensitivity for Differentially Private Multi-Way Joins. In SIGMOD. 432--444.Google Scholar
- Wei Dong and Ke Yi. 2022. A Nearly Instance-optimal Differentially Private Mechanism for Conjunctive Queries. In PODS.Google Scholar
- Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006 a. Our data, ourselves: Privacy via distributed noise generation. In Annual international conference on the theory and applications of cryptographic techniques. Springer, 486--503.Google ScholarDigital Library
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006 b. Calibrating noise to sensitivity in private data analysis. In TCC. 265--284.Google Scholar
- Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. 2010. Differential privacy under continual observation. In STOC. 715--724.Google Scholar
- Cynthia Dwork, Moni Naor, Omer Reingold, and Guy N Rothblum. 2015. Pure differential privacy for rectangle queries via private partitions. In ASIACRYPT. 735--751.Google Scholar
- Robert Fink and Dan Olteanu. 2016. Dichotomies for queries with negation in probabilistic databases. ACM TODS, Vol. 41, 1 (2016), 1--47.Google ScholarDigital Library
- Quan Geng, Wei Ding, Ruiqi Guo, and Sanjiv Kumar. 2020. Tight Analysis of Privacy and Utility Tradeoff in Approximate Differential Privacy. In AISTATS. 89--99.Google Scholar
- Badih Ghazi, Neel Kamal, Ravi Kumar, Pasin Manurangsi, and Annika Zhang. 2022. Private Aggregation of Trajectories. Proc. Priv. Enhancing Technol. , Vol. 2022, 4 (2022), 626--644. https://doi.org/10.56553/popets-2022-0125Google ScholarCross Ref
- Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In PODS. 31--40.Google Scholar
- Moritz Hardt, Katrina Ligett, and Frank McSherry. 2012. A simple and practical algorithm for differentially private data release. In NIPS. 2348--2356.Google Scholar
- Moritz Hardt and Kunal Talwar. 2010. On the geometry of differential privacy. In STOC. 705--714.Google Scholar
- Xiao Hu, Stavros Sintos, Junyang Gao, K. Pankaj Agarwal, and Jun Yang. 2022. Computing Complex Temporal Join Queries Efficiently. In SIGMOD.Google Scholar
- Ziyue Huang and Ke Yi. 2021. Approximate Range Counting Under Differential Privacy. In SoCG. 45:1--45:14.Google Scholar
- Manas Joglekar and Christopher Ré. 2018. It's all a matter of degree: Using degree information to optimize multiway joins. Theory Comput. Syst. , Vol. 62(4) (2018), 810--853.Google ScholarDigital Library
- Noah Johnson, Joseph P Near, and Dawn Song. 2018. Towards practical differential privacy for SQL queries. VLDB, Vol. 11, 5 (2018), 526--539.Google ScholarDigital Library
- Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In SIGMOD. 193--204.Google Scholar
- Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. 2019. Privatesql: a differentially private SQL query engine. VLDB, Vol. 12, 11 (2019), 1371--1384.Google ScholarDigital Library
- Chao Li, Michael Hay, Vibhor Rastogi, Gerome Miklau, and Andrew McGregor. 2010. Optimizing linear counting queries under differential privacy. In PODS. 123--134.Google Scholar
- Chao Li and Gerome Miklau. 2011. Efficient batch query answering under differential privacy. arXiv preprint arXiv:1103.1367 (2011).Google Scholar
- Chao Li and Gerome Miklau. 2012. An adaptive mechanism for accurate query answering under differential privacy. VLDB , Vol. 5(6) (2012), 514--525.Google ScholarDigital Library
- Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In FOCS. 94--103.Google Scholar
- Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD. 19--30.Google Scholar
- Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private Join Queries over Distributed Databases. In OSDI. 149--162.Google Scholar
- Aleksandar Nikolov, Kunal Talwar, and Li Zhang. 2016. The Geometry of Differential Privacy: The Small Database and Approximate Cases. SIAM J. Comput. , Vol. 45, 2 (2016), 575--616.Google ScholarDigital Library
- Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity and sampling in private data analysis. In STOC. 75--84.Google Scholar
- Davide Proserpio, Sharon Goldberg, and Frank McSherry. 2014. Calibrating data to sensitivity in private data analysis: A platform for differentially-private analysis of weighted datasets. VLDB, Vol. 7, 8 (2014), 637--648.Google ScholarDigital Library
- Yuchao Tao, Xi He, Ashwin Machanavajjhala, and Sudeepa Roy. 2020. Computing local sensitivities of counting queries with joins. In SIGMOD. 479--494.Google Scholar
- Salil Vadhan. 2017. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography. Springer, 347--450.Google Scholar
- Moshe Y Vardi. 1982. The complexity of relational query languages. In STOC. 137--146.Google Scholar
- Jun Zhang, Graham Cormode, Cecilia M Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. Privbayes: Private data release via bayesian networks. ACM TODS, Vol. 42, 4 (2017), 1--41. ioGoogle ScholarDigital Library
Index Terms
- Differentially Private Data Release over Multiple Tables
Recommendations
Lower Bounds on the Error of Query Sets Under the Differentially-Private Matrix Mechanism
A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a ...
Optimal error of query sets under the differentially-private matrix mechanism
ICDT '13: Proceedings of the 16th International Conference on Database TheoryA common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a ...
Differentially private top-k query over MapReduce
CloudDB '12: Proceedings of the fourth international workshop on Cloud data managementDiscovering that Map-Reduce framework is a popular way to deal with a large scale of data, but there is a significant risk to leak out users' personal information, especially when the data is sensitive, for example, including users' health records, ...
Comments