skip to main content
research-article

Proactive Privacy-preserving Learning for Cross-modal Retrieval

Published:25 January 2023Publication History
Skip Abstract Section

Abstract

Deep cross-modal retrieval techniques have recently achieved remarkable performance, which also poses severe threats to data privacy potentially. Nowadays, enormous user-generated contents that convey personal information are released and shared on the Internet. One may abuse a retrieval system to pinpoint sensitive information of a particular Internet user, causing privacy leakage. In this article, we propose a data-centric Proactive Privacy-preserving Cross-modal Learning algorithm that fulfills the protection purpose by employing a generator to transform original data into adversarial data with quasi-imperceptible perturbations before releasing them. When the data source is infiltrated, the inside adversarial data can confuse retrieval models under the attacker’s control to make erroneous predictions. We consider the protection under a realistic and challenging setting where the prior knowledge of malicious models is agnostic. To handle this, a surrogate retrieval model is instead introduced, acting as the target to fool. The whole network is trained under a game-theoretical framework, where the generator and the retrieval model persistently evolve to fight against each other. To facilitate the optimization, a Gradient Reversal Layer module is inserted between two models, enabling a one-step learning fashion. Extensive experiments on widely used realistic datasets prove the effectiveness of the proposed method.

REFERENCES

  1. [1] Andrew Galen, Arora Raman, Bilmes Jeff, and Livescu Karen. 2013. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning. 12471255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Biggio Battista, Corona Igino, Maiorca Davide, Nelson Blaine, Šrndić Nedim, Laskov Pavel, Giacinto Giorgio, and Roli Fabio. 2013. Evasion attacks against machine learning at test time. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 387402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chao Li, Shangqian Gao, Cheng Deng, De Xie, and Wei Liu. 2019. Cross-modal learning with adversarial samples. In Proceedings of the International Conference on Neural Information Processing Systems. 1079110801.Google ScholarGoogle Scholar
  4. [4] Chen Zhi, Luo Yadan, Qiu Ruihong, Wang Sen, Huang Zi, Li Jingjing, and Zhang Zheng. 2021. Semantics disentangling for generalized zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 8692–8700.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Zhi, Wang Sen, Li Jingjing, and Huang Zi. 2020. Rethinking generative zero-shot learning: An ensemble learning perspective for recognising visual patches. In Proceedings of the ACM International Conference on Multimedia. 34133421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Cheng Miaomiao, Jing Liping, and Ng Michael K.. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Inf. Syst. 38, 3 (2020), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Cherepanova Valeriia, Goldblum Micah, Foley Harrison, Duan Shiyuan, Dickerson John, Taylor Gavin, and Goldstein Tom. 2021. LowKey: Leveraging adversarial attacks to protect social media users from facial recognition. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  8. [8] Chua Tat-Seng, Tang Jinhui, Hong Richang, Li Haojie, Luo Zhiping, and Zheng Yantao. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Cui Hui, Zhu Lei, Li Jingjing, Yang Yang, and Nie Liqiang. 2019. Scalable deep hashing for large-scale social image retrieval. IEEE Trans. Image Process. 29 (2019), 12711284.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Ding Guiguang, Guo Yuchen, and Zhou Jile. 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20752082.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Dong Yinpeng, Liao Fangzhou, Pang Tianyu, Su Hang, Zhu Jun, Hu Xiaolin, and Li Jianguo. 2018. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 91859193.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Ganin Yaroslav and Lempitsky Victor. 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning. 11801189.Google ScholarGoogle Scholar
  13. [13] Ganin Yaroslav, Ustinova Evgeniya, Ajakan Hana, Germain Pascal, Larochelle Hugo, Laviolette François, Marchand Mario, and Lempitsky Victor. 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1 (2016), 2096–2030.Google ScholarGoogle Scholar
  14. [14] Goodfellow Ian J., Shlens Jonathon, and Szegedy Christian. 2014. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  15. [15] Gretton Arthur, Borgwardt Karsten M., Rasch Malte J., Schölkopf Bernhard, and Smola Alexander. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, 1 (2012), 723773.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Gu Tianyu, Dolan-Gavitt Brendan, and Garg Siddharth. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733. Retrieved from https://arxiv.org/abs/1708.06733.Google ScholarGoogle Scholar
  17. [17] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hotelling Harold. 1992. Relations between two sets of variates. In Breakthroughs in Statistics. 162190.Google ScholarGoogle Scholar
  19. [19] Hu Hengtong, Xie Lingxi, Hong Richang, and Tian Qi. 2020. Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 31233132.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Hu Mengqiu, Yang Yang, Shen Fumin, Xie Ning, Hong Richang, and Shen Heng Tao. 2018. Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Process. 28, 6 (2018), 27702784.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Huiskes Mark J. and Lew Michael S.. 2008. The MIR flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 3943.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Ilyas Andrew, Santurkar Shibani, Tsipras Dimitris, Engstrom Logan, Tran Brandon, and Madry Aleksander. 2019. Adversarial examples are not bugs, they are features. In Proceedings of the International Conference in Neural Information Processing Systems. 125136.Google ScholarGoogle Scholar
  23. [23] Jiang Qing-Yuan and Li Wu-Jun. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 32323240.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Kan Meina, Shan Shiguang, Zhang Haihong, Lao Shihong, and Chen Xilin. 2015. Multi-view discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1 (2015), 188194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kang Guoliang, Jiang Lu, Yang Yi, and Hauptmann Alexander G.. 2019. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 48934902.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.Google ScholarGoogle Scholar
  27. [27] Kumar Shaishav and Udupa Raghavendra. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence. 13601365.Google ScholarGoogle Scholar
  28. [28] Li Chao, Tang Haoteng, Deng Cheng, Zhan Liang, and Liu Wei. 2020. Vulnerability vs. reliability: Disentangled adversarial examples for cross-modal learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 421429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Li Qizhang, Guo Yiwen, and Chen Hao. 2020. Practical no-box adversarial attacks against DNNs. In Proceedings of the International Conference on Neural Information Processing Systems. 12849–12860.Google ScholarGoogle Scholar
  30. [30] Liu Song, Qian Shengsheng, Guan Yang, Zhan Jiawei, and Ying Long. 2020. Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 13791388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Liu Xianglong, Huang Lei, Deng Cheng, Lu Jiwen, and Lang Bo. 2015. Multi-view complementary hash tables for nearest neighbor search. In Proceedings of the IEEE International Conference on Computer Vision. 11071115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Lu Xu, Zhu Lei, Cheng Zhiyong, Nie Liqiang, and Zhang Huaxiang. 2019. Online multi-modal hashing with dynamic query-adaption. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 715724.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Luo Yadan, Huang Zi, Zhang Zheng, Wang Ziwei, Li Jingjing, and Yang Yang. 2019. Curiosity-driven reinforcement learning for diverse visual paragraph generation. In Proceedings of the ACM International Conference on Multimedia. 23412350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Moosavi-Dezfooli Seyed-Mohsen, Fawzi Alhussein, and Frossard Pascal. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25742582.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Mopuri Konda Reddy, Garg Utsav, and Babu R. Venkatesh. 2017. Fast feature fool: A data independent approach to universal adversarial perturbations. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle Scholar
  36. [36] Oh Seong Joon, Fritz Mario, and Schiele Bernt. 2017. Adversarial image perturbation for privacy protection a game theory perspective. In Proceedings of the IEEE International Conference on Computer Vision. 14911500.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Papernot Nicolas, McDaniel Patrick, and Goodfellow Ian. 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv:1605.07277. Retrieved from https://arxiv.org/abs/1605.07277.Google ScholarGoogle Scholar
  38. [38] Qiu Ruihong, Huang Zi, Li Jingjing, and Yin Hongzhi. 2020. Exploiting cross-session information for session-based recommendation with graph neural networks. ACM Trans. Inf. Syst. 38, 3 (2020), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Ranjan Viresh, Rasiwasia Nikhil, and Jawahar C. V.. 2015. Multi-label cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 40944102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Ren Xuhui, Yin Hongzhi, Chen Tong, Wang Hao, Hung Nguyen Quoc Viet, Huang Zi, and Zhang Xiangliang. 2020. CRSAL: Conversational recommender systems with adversarial learning. ACM Trans. Inf. Syst. 38, 4 (2020), 140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Sejdinovic Dino, Sriperumbudur Bharath, Gretton Arthur, and Fukumizu Kenji. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. (2013), 22632291.Google ScholarGoogle Scholar
  42. [42] Shafahi Ali, Huang W. Ronny, Najibi Mahyar, Suciu Octavian, Studer Christoph, Dumitras Tudor, and Goldstein Tom. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 61036113.Google ScholarGoogle Scholar
  43. [43] Shan Shawn, Wenger Emily, Zhang Jiayun, Li Huiying, Zheng Haitao, and Zhao Ben Y.. 2020. Fawkes: Protecting privacy against unauthorized deep learning models. In Proceedings of the USENIX Security Symposium. 1589– 1604.Google ScholarGoogle Scholar
  44. [44] Shen Heng Tao, Liu Luchen, Yang Yang, Xu Xing, Huang Zi, Shen Fumin, and Hong Richang. 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33, 10 (2020), 33513365.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.Google ScholarGoogle Scholar
  46. [46] Song Xuemeng, Feng Fuli, Han Xianjing, Yang Xin, Liu Wei, and Nie Liqiang. 2018. Neural compatibility modeling with attentive knowledge distillation. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Su Shupeng, Zhong Zhisheng, and Zhang Chao. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 30273035.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Xing Xu, Kaiyi Lin, Yang Yang, Alan Hanjalic, and Heng Tao Shen. 2022. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6 (2022), 3030–3047.Google ScholarGoogle Scholar
  50. [50] Thys Simen, Ranst Wiebe Van, and Goedemé Toon. 2019. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Tramèr Florian, Papernot Nicolas, Goodfellow Ian, Boneh Dan, and McDaniel Patrick. 2017. The space of transferable adversarial examples. arXiv:1704.03453. Retrieved from https://arxiv.org/abs/1704.03453.Google ScholarGoogle Scholar
  52. [52] Wang Bokun, Yang Yang, Xu Xing, Hanjalic Alan, and Shen Heng Tao. 2017. Adversarial cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 154162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Wang Qinyong, Yin Hongzhi, Chen Tong, Yu Junliang, Zhou Alexander, and Zhang Xiangliang. 2021. Fast-adapting and privacy-preserving federated recommender system. The VLDB J. (2021), 120.Google ScholarGoogle Scholar
  54. [54] Wang Weiran, Arora Raman, Livescu Karen, and Bilmes Jeff. 2015. On deep multi-view representation learning. In Proceedings of the International Conference on Machine Learning. 10831092.Google ScholarGoogle Scholar
  55. [55] Wang Yongxin, Chen Zhen-Duo, Luo Xin, and Xu Xin-Shun. 2021. High-dimensional sparse cross-modal hashing with fine-grained similarity embedding. In Proceedings of the Web Conference. 29002909.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Xie De, Deng Cheng, Li Chao, Liu Xianglong, and Tao Dacheng. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 36263637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Xu Xing, Lin Kaiyi, Yang Yang, Hanjalic Alan, and Shen Heng Tao. 2020. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google ScholarGoogle Scholar
  58. [58] Zhan Yu-Wei, Luo Xin, Wang Yongxin, and Xu Xin-Shun. 2020. Supervised hierarchical deep hashing for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 33863394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Zhang Dongqing and Li Wu-Jun. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence. 713.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zhang Peng-Fei, Huang Zi, and Xu Xin-Shun. 2021. Proactive privacy-preserving learning for retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 33693376.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Zhang Peng-Fei, Luo Yadan, Huang Zi, Xu Xin-Shun, and Song Jingkuan. 2021. High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 2 (2021), 563583.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Zhang Shijie, Yin Hongzhi, Chen Tong, Huang Zi, Nguyen Quoc Viet Hung, and Cui Lizhen. 2022. Pipattack: Poisoning federated recommender systems for manipulating item promotion. In Proceedings of the ACM International Conference on Web Search and Data Mining. 14151423.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Zhen Liangli, Hu Peng, Peng Xi, Goh Rick Siow Mong, and Zhou Joey Tianyi. 2022. Deep multimodal transfer learning for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. 33, 2 (2022), 798–810.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Zhen Liangli, Hu Peng, Wang Xu, and Peng Dezhong. 2019. Deep supervised cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1039410403.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Zhen Yi and Yeung Dit-Yan. 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 13761384.Google ScholarGoogle Scholar
  66. [66] Zhu Xiaofeng, Huang Zi, Shen Heng Tao, and Zhao Xin. 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the ACM International Conference on Multimedia. 143152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Peng-Fei Zhang, Chuan-Xiang Li, Meng-Yuan Liu, Liqiang Nie, and Xin-Shun Xu. 2017. Semi-relaxation supervised hashing for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 1762–1770.Google ScholarGoogle Scholar

Index Terms

  1. Proactive Privacy-preserving Learning for Cross-modal Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 41, Issue 2
      April 2023
      770 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3568971
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 January 2023
      • Online AM: 28 June 2022
      • Accepted: 5 June 2022
      • Revised: 28 April 2022
      • Received: 28 November 2021
      Published in tois Volume 41, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format