Enabling Genomic-Phenomic Association Discovery without Sacrificing Anonymity
Figure 14
Datasets used for comparison of anonymization strategies.
The DATA SELECT process is an extraction of some records of the SD into a smaller, specific dataset, such as BioVU or a demonstration cohort. The ANONYMIZE process is the anonymization algorithm described in this manuscript. The DEMO EXTRACT process selects the remaining records associated with the Demonstration cohort from a larger, anonymized dataset. The resultant datasets are as follows: anonymized version of the Synthetic Derivative (SD-Anon); anonymized version of BioVU (BioVU-Anon); SD-Anon, from which the demonstration group is extracted (); BioVU-Anon, from which the demonstration group is extracted (); and the anonymized version of the demonstration cohort (). , , and each represent different anonymizations of the Demonstration group.