Abstract
Deep cross-modal retrieval techniques have recently achieved remarkable performance, which also poses severe threats to data privacy potentially. Nowadays, enormous user-generated contents that convey personal information are released and shared on the Internet. One may abuse a retrieval system to pinpoint sensitive information of a particular Internet user, causing privacy leakage. In this article, we propose a data-centric Proactive Privacy-preserving Cross-modal Learning algorithm that fulfills the protection purpose by employing a generator to transform original data into adversarial data with quasi-imperceptible perturbations before releasing them. When the data source is infiltrated, the inside adversarial data can confuse retrieval models under the attacker’s control to make erroneous predictions. We consider the protection under a realistic and challenging setting where the prior knowledge of malicious models is agnostic. To handle this, a surrogate retrieval model is instead introduced, acting as the target to fool. The whole network is trained under a game-theoretical framework, where the generator and the retrieval model persistently evolve to fight against each other. To facilitate the optimization, a Gradient Reversal Layer module is inserted between two models, enabling a one-step learning fashion. Extensive experiments on widely used realistic datasets prove the effectiveness of the proposed method.
- [1] . 2013. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning. 1247–1255.Google ScholarDigital Library
- [2] . 2013. Evasion attacks against machine learning at test time. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 387–402.Google ScholarDigital Library
- [3] . 2019. Cross-modal learning with adversarial samples. In Proceedings of the International Conference on Neural Information Processing Systems. 10791–10801.Google Scholar
- [4] . 2021. Semantics disentangling for generalized zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 8692–8700.Google ScholarCross Ref
- [5] . 2020. Rethinking generative zero-shot learning: An ensemble learning perspective for recognising visual patches. In Proceedings of the ACM International Conference on Multimedia. 3413–3421.Google ScholarDigital Library
- [6] . 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Inf. Syst. 38, 3 (2020), 1–25.Google ScholarDigital Library
- [7] . 2021. LowKey: Leveraging adversarial attacks to protect social media users from facial recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [8] . 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 1–9.Google ScholarDigital Library
- [9] . 2019. Scalable deep hashing for large-scale social image retrieval. IEEE Trans. Image Process. 29 (2019), 1271–1284.Google ScholarCross Ref
- [10] . 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2075–2082.Google ScholarDigital Library
- [11] . 2018. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9185–9193.Google ScholarCross Ref
- [12] . 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning. 1180–1189.Google Scholar
- [13] . 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1 (2016), 2096–2030.Google Scholar
- [14] . 2014. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [15] . 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, 1 (2012), 723–773.Google ScholarDigital Library
- [16] . 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733. Retrieved from https://arxiv.org/abs/1708.06733.Google Scholar
- [17] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- [18] . 1992. Relations between two sets of variates. In Breakthroughs in Statistics. 162–190.Google Scholar
- [19] . 2020. Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3123–3132.Google ScholarCross Ref
- [20] . 2018. Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Process. 28, 6 (2018), 2770–2784.Google ScholarCross Ref
- [21] . 2008. The MIR flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 39–43.Google ScholarDigital Library
- [22] . 2019. Adversarial examples are not bugs, they are features. In Proceedings of the International Conference in Neural Information Processing Systems. 125–136.Google Scholar
- [23] . 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3232–3240.Google ScholarCross Ref
- [24] . 2015. Multi-view discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1 (2015), 188–194.Google ScholarDigital Library
- [25] . 2019. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4893–4902.Google ScholarCross Ref
- [26] . 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.Google Scholar
- [27] . 2011. Learning hash functions for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence. 1360–1365.Google Scholar
- [28] . 2020. Vulnerability vs. reliability: Disentangled adversarial examples for cross-modal learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 421–429.Google ScholarDigital Library
- [29] . 2020. Practical no-box adversarial attacks against DNNs. In Proceedings of the International Conference on Neural Information Processing Systems. 12849–12860.Google Scholar
- [30] . 2020. Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 1379–1388.Google ScholarDigital Library
- [31] . 2015. Multi-view complementary hash tables for nearest neighbor search. In Proceedings of the IEEE International Conference on Computer Vision. 1107–1115.Google ScholarDigital Library
- [32] . 2019. Online multi-modal hashing with dynamic query-adaption. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 715–724.Google ScholarDigital Library
- [33] . 2019. Curiosity-driven reinforcement learning for diverse visual paragraph generation. In Proceedings of the ACM International Conference on Multimedia. 2341–2350.Google ScholarDigital Library
- [34] . 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2574–2582.Google ScholarCross Ref
- [35] . 2017. Fast feature fool: A data independent approach to universal adversarial perturbations. In Proceedings of the British Machine Vision Conference.Google Scholar
- [36] . 2017. Adversarial image perturbation for privacy protection a game theory perspective. In Proceedings of the IEEE International Conference on Computer Vision. 1491–1500.Google ScholarCross Ref
- [37] . 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv:1605.07277. Retrieved from https://arxiv.org/abs/1605.07277.Google Scholar
- [38] . 2020. Exploiting cross-session information for session-based recommendation with graph neural networks. ACM Trans. Inf. Syst. 38, 3 (2020), 1–23.Google ScholarDigital Library
- [39] . 2015. Multi-label cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 4094–4102.Google ScholarDigital Library
- [40] . 2020. CRSAL: Conversational recommender systems with adversarial learning. ACM Trans. Inf. Syst. 38, 4 (2020), 1–40.Google ScholarDigital Library
- [41] . 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. (2013), 2263–2291.Google Scholar
- [42] . 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 6103–6113.Google Scholar
- [43] . 2020. Fawkes: Protecting privacy against unauthorized deep learning models. In Proceedings of the USENIX Security Symposium. 1589– 1604.Google Scholar
- [44] . 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33, 10 (2020), 3351–3365.Google ScholarCross Ref
- [45] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.Google Scholar
- [46] . 2018. Neural compatibility modeling with attentive knowledge distillation. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 5–14.Google ScholarDigital Library
- [47] . 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3027–3035.Google ScholarCross Ref
- [48] . 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
- [49] Xing Xu, Kaiyi Lin, Yang Yang, Alan Hanjalic, and Heng Tao Shen. 2022. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6 (2022), 3030–3047.Google Scholar
- [50] . 2019. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarCross Ref
- [51] . 2017. The space of transferable adversarial examples. arXiv:1704.03453. Retrieved from https://arxiv.org/abs/1704.03453.Google Scholar
- [52] . 2017. Adversarial cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 154–162.Google ScholarDigital Library
- [53] . 2021. Fast-adapting and privacy-preserving federated recommender system. The VLDB J. (2021), 1–20.Google Scholar
- [54] . 2015. On deep multi-view representation learning. In Proceedings of the International Conference on Machine Learning. 1083–1092.Google Scholar
- [55] . 2021. High-dimensional sparse cross-modal hashing with fine-grained similarity embedding. In Proceedings of the Web Conference. 2900–2909.Google ScholarDigital Library
- [56] . 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 3626–3637.Google ScholarDigital Library
- [57] . 2020. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google Scholar
- [58] . 2020. Supervised hierarchical deep hashing for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 3386–3394.Google ScholarDigital Library
- [59] . 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence. 7–13.Google ScholarCross Ref
- [60] . 2021. Proactive privacy-preserving learning for retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 3369–3376.Google ScholarCross Ref
- [61] . 2021. High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 2 (2021), 563–583.Google ScholarDigital Library
- [62] . 2022. Pipattack: Poisoning federated recommender systems for manipulating item promotion. In Proceedings of the ACM International Conference on Web Search and Data Mining. 1415–1423.Google ScholarDigital Library
- [63] . 2022. Deep multimodal transfer learning for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. 33, 2 (2022), 798–810.Google ScholarCross Ref
- [64] . 2019. Deep supervised cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10394–10403.Google ScholarCross Ref
- [65] . 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 1376–1384.Google Scholar
- [66] . 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the ACM International Conference on Multimedia. 143–152.Google ScholarDigital Library
- [67] Peng-Fei Zhang, Chuan-Xiang Li, Meng-Yuan Liu, Liqiang Nie, and Xin-Shun Xu. 2017. Semi-relaxation supervised hashing for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 1762–1770.Google Scholar
Index Terms
- Proactive Privacy-preserving Learning for Cross-modal Retrieval
Recommendations
Privacy Protection in Deep Multi-modal Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalDeep learning techniques have ushered in significant progress in large-scale multi-modal retrieval. Nevertheless, the advanced techniques may be used nefariously to conduct a search that violates the privacy of individuals. In this paper, we propose a ...
HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-...
Learnable Privacy-Preserving Anonymization for Pedestrian Images
MM '22: Proceedings of the 30th ACM International Conference on MultimediaThis paper studies a novel privacy-preserving anonymization problem for pedestrian images, which preserves personal identity information (PII) for authorized models and prevents PII from being recognized by third parties. Conventional anonymization ...
Comments