research-article

Proactive Privacy-preserving Learning for Cross-modal Retrieval

Authors:
Peng-Fei Zhang

The University of Queensland, Brisbane, QLD, Australia

The University of Queensland, Brisbane, QLD, Australia

0000-0002-6790-2098
View Profile

,
Guangdong Bai

The University of Queensland, Brisbane, QLD, Australia

The University of Queensland, Brisbane, QLD, Australia

0000-0002-6390-9890
View Profile

,
Hongzhi Yin

The University of Queensland, Brisbane, QLD, Australia

The University of Queensland, Brisbane, QLD, Australia

0000-0003-1395-261X
View Profile

,
Zi Huang

The University of Queensland, Brisbane, QLD, Australia

The University of Queensland, Brisbane, QLD, Australia

0000-0002-9738-4949
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 41 Issue 2Article No.: 35pp 1–23https://doi.org/10.1145/3545799

Published:25 January 2023Publication History

ACM Transactions on Information Systems

Abstract

Deep cross-modal retrieval techniques have recently achieved remarkable performance, which also poses severe threats to data privacy potentially. Nowadays, enormous user-generated contents that convey personal information are released and shared on the Internet. One may abuse a retrieval system to pinpoint sensitive information of a particular Internet user, causing privacy leakage. In this article, we propose a data-centric Proactive Privacy-preserving Cross-modal Learning algorithm that fulfills the protection purpose by employing a generator to transform original data into adversarial data with quasi-imperceptible perturbations before releasing them. When the data source is infiltrated, the inside adversarial data can confuse retrieval models under the attacker’s control to make erroneous predictions. We consider the protection under a realistic and challenging setting where the prior knowledge of malicious models is agnostic. To handle this, a surrogate retrieval model is instead introduced, acting as the target to fool. The whole network is trained under a game-theoretical framework, where the generator and the retrieval model persistently evolve to fight against each other. To facilitate the optimization, a Gradient Reversal Layer module is inserted between two models, enabling a one-step learning fashion. Extensive experiments on widely used realistic datasets prove the effectiveness of the proposed method.

REFERENCES

[1] Andrew Galen, Arora Raman, Bilmes Jeff, and Livescu Karen. 2013. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning. 1247–1255.Google ScholarDigital Library
[2] Biggio Battista, Corona Igino, Maiorca Davide, Nelson Blaine, Šrndić Nedim, Laskov Pavel, Giacinto Giorgio, and Roli Fabio. 2013. Evasion attacks against machine learning at test time. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 387–402.Google ScholarDigital Library
[3] Chao Li, Shangqian Gao, Cheng Deng, De Xie, and Wei Liu. 2019. Cross-modal learning with adversarial samples. In Proceedings of the International Conference on Neural Information Processing Systems. 10791–10801.Google Scholar
[4] Chen Zhi, Luo Yadan, Qiu Ruihong, Wang Sen, Huang Zi, Li Jingjing, and Zhang Zheng. 2021. Semantics disentangling for generalized zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 8692–8700.Google ScholarCross Ref
[5] Chen Zhi, Wang Sen, Li Jingjing, and Huang Zi. 2020. Rethinking generative zero-shot learning: An ensemble learning perspective for recognising visual patches. In Proceedings of the ACM International Conference on Multimedia. 3413–3421.Google ScholarDigital Library
[6] Cheng Miaomiao, Jing Liping, and Ng Michael K.. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Inf. Syst. 38, 3 (2020), 1–25.Google ScholarDigital Library
[7] Cherepanova Valeriia, Goldblum Micah, Foley Harrison, Duan Shiyuan, Dickerson John, Taylor Gavin, and Goldstein Tom. 2021. LowKey: Leveraging adversarial attacks to protect social media users from facial recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
[8] Chua Tat-Seng, Tang Jinhui, Hong Richang, Li Haojie, Luo Zhiping, and Zheng Yantao. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 1–9.Google ScholarDigital Library
[9] Cui Hui, Zhu Lei, Li Jingjing, Yang Yang, and Nie Liqiang. 2019. Scalable deep hashing for large-scale social image retrieval. IEEE Trans. Image Process. 29 (2019), 1271–1284.Google ScholarCross Ref
[10] Ding Guiguang, Guo Yuchen, and Zhou Jile. 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2075–2082.Google ScholarDigital Library
[11] Dong Yinpeng, Liao Fangzhou, Pang Tianyu, Su Hang, Zhu Jun, Hu Xiaolin, and Li Jianguo. 2018. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9185–9193.Google ScholarCross Ref
[12] Ganin Yaroslav and Lempitsky Victor. 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning. 1180–1189.Google Scholar
[13] Ganin Yaroslav, Ustinova Evgeniya, Ajakan Hana, Germain Pascal, Larochelle Hugo, Laviolette François, Marchand Mario, and Lempitsky Victor. 2016. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1 (2016), 2096–2030.Google Scholar
[14] Goodfellow Ian J., Shlens Jonathon, and Szegedy Christian. 2014. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.Google Scholar
[15] Gretton Arthur, Borgwardt Karsten M., Rasch Malte J., Schölkopf Bernhard, and Smola Alexander. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, 1 (2012), 723–773.Google ScholarDigital Library
[16] Gu Tianyu, Dolan-Gavitt Brendan, and Garg Siddharth. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733. Retrieved from https://arxiv.org/abs/1708.06733.Google Scholar
[17] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[18] Hotelling Harold. 1992. Relations between two sets of variates. In Breakthroughs in Statistics. 162–190.Google Scholar
[19] Hu Hengtong, Xie Lingxi, Hong Richang, and Tian Qi. 2020. Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3123–3132.Google ScholarCross Ref
[20] Hu Mengqiu, Yang Yang, Shen Fumin, Xie Ning, Hong Richang, and Shen Heng Tao. 2018. Collective reconstructive embeddings for cross-modal hashing. IEEE Trans. Image Process. 28, 6 (2018), 2770–2784.Google ScholarCross Ref
[21] Huiskes Mark J. and Lew Michael S.. 2008. The MIR flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 39–43.Google ScholarDigital Library
[22] Ilyas Andrew, Santurkar Shibani, Tsipras Dimitris, Engstrom Logan, Tran Brandon, and Madry Aleksander. 2019. Adversarial examples are not bugs, they are features. In Proceedings of the International Conference in Neural Information Processing Systems. 125–136.Google Scholar
[23] Jiang Qing-Yuan and Li Wu-Jun. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3232–3240.Google ScholarCross Ref
[24] Kan Meina, Shan Shiguang, Zhang Haihong, Lao Shihong, and Chen Xilin. 2015. Multi-view discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1 (2015), 188–194.Google ScholarDigital Library
[25] Kang Guoliang, Jiang Lu, Yang Yi, and Hauptmann Alexander G.. 2019. Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4893–4902.Google ScholarCross Ref
[26] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.Google Scholar
[27] Kumar Shaishav and Udupa Raghavendra. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence. 1360–1365.Google Scholar
[28] Li Chao, Tang Haoteng, Deng Cheng, Zhan Liang, and Liu Wei. 2020. Vulnerability vs. reliability: Disentangled adversarial examples for cross-modal learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 421–429.Google ScholarDigital Library
[29] Li Qizhang, Guo Yiwen, and Chen Hao. 2020. Practical no-box adversarial attacks against DNNs. In Proceedings of the International Conference on Neural Information Processing Systems. 12849–12860.Google Scholar
[30] Liu Song, Qian Shengsheng, Guan Yang, Zhan Jiawei, and Ying Long. 2020. Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 1379–1388.Google ScholarDigital Library
[31] Liu Xianglong, Huang Lei, Deng Cheng, Lu Jiwen, and Lang Bo. 2015. Multi-view complementary hash tables for nearest neighbor search. In Proceedings of the IEEE International Conference on Computer Vision. 1107–1115.Google ScholarDigital Library
[32] Lu Xu, Zhu Lei, Cheng Zhiyong, Nie Liqiang, and Zhang Huaxiang. 2019. Online multi-modal hashing with dynamic query-adaption. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 715–724.Google ScholarDigital Library
[33] Luo Yadan, Huang Zi, Zhang Zheng, Wang Ziwei, Li Jingjing, and Yang Yang. 2019. Curiosity-driven reinforcement learning for diverse visual paragraph generation. In Proceedings of the ACM International Conference on Multimedia. 2341–2350.Google ScholarDigital Library
[34] Moosavi-Dezfooli Seyed-Mohsen, Fawzi Alhussein, and Frossard Pascal. 2016. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2574–2582.Google ScholarCross Ref
[35] Mopuri Konda Reddy, Garg Utsav, and Babu R. Venkatesh. 2017. Fast feature fool: A data independent approach to universal adversarial perturbations. In Proceedings of the British Machine Vision Conference.Google Scholar
[36] Oh Seong Joon, Fritz Mario, and Schiele Bernt. 2017. Adversarial image perturbation for privacy protection a game theory perspective. In Proceedings of the IEEE International Conference on Computer Vision. 1491–1500.Google ScholarCross Ref
[37] Papernot Nicolas, McDaniel Patrick, and Goodfellow Ian. 2016. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv:1605.07277. Retrieved from https://arxiv.org/abs/1605.07277.Google Scholar
[38] Qiu Ruihong, Huang Zi, Li Jingjing, and Yin Hongzhi. 2020. Exploiting cross-session information for session-based recommendation with graph neural networks. ACM Trans. Inf. Syst. 38, 3 (2020), 1–23.Google ScholarDigital Library
[39] Ranjan Viresh, Rasiwasia Nikhil, and Jawahar C. V.. 2015. Multi-label cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 4094–4102.Google ScholarDigital Library
[40] Ren Xuhui, Yin Hongzhi, Chen Tong, Wang Hao, Hung Nguyen Quoc Viet, Huang Zi, and Zhang Xiangliang. 2020. CRSAL: Conversational recommender systems with adversarial learning. ACM Trans. Inf. Syst. 38, 4 (2020), 1–40.Google ScholarDigital Library
[41] Sejdinovic Dino, Sriperumbudur Bharath, Gretton Arthur, and Fukumizu Kenji. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. (2013), 2263–2291.Google Scholar
[42] Shafahi Ali, Huang W. Ronny, Najibi Mahyar, Suciu Octavian, Studer Christoph, Dumitras Tudor, and Goldstein Tom. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 6103–6113.Google Scholar
[43] Shan Shawn, Wenger Emily, Zhang Jiayun, Li Huiying, Zheng Haitao, and Zhao Ben Y.. 2020. Fawkes: Protecting privacy against unauthorized deep learning models. In Proceedings of the USENIX Security Symposium. 1589– 1604.Google Scholar
[44] Shen Heng Tao, Liu Luchen, Yang Yang, Xu Xing, Huang Zi, Shen Fumin, and Hong Richang. 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans. Knowl. Data Eng. 33, 10 (2020), 3351–3365.Google ScholarCross Ref
[45] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.Google Scholar
[46] Song Xuemeng, Feng Fuli, Han Xianjing, Yang Xin, Liu Wei, and Nie Liqiang. 2018. Neural compatibility modeling with attentive knowledge distillation. In Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval. 5–14.Google ScholarDigital Library
[47] Su Shupeng, Zhong Zhisheng, and Zhang Chao. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3027–3035.Google ScholarCross Ref
[48] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
[49] Xing Xu, Kaiyi Lin, Yang Yang, Alan Hanjalic, and Heng Tao Shen. 2022. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6 (2022), 3030–3047.Google Scholar
[50] Thys Simen, Ranst Wiebe Van, and Goedemé Toon. 2019. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarCross Ref
[51] Tramèr Florian, Papernot Nicolas, Goodfellow Ian, Boneh Dan, and McDaniel Patrick. 2017. The space of transferable adversarial examples. arXiv:1704.03453. Retrieved from https://arxiv.org/abs/1704.03453.Google Scholar
[52] Wang Bokun, Yang Yang, Xu Xing, Hanjalic Alan, and Shen Heng Tao. 2017. Adversarial cross-modal retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 154–162.Google ScholarDigital Library
[53] Wang Qinyong, Yin Hongzhi, Chen Tong, Yu Junliang, Zhou Alexander, and Zhang Xiangliang. 2021. Fast-adapting and privacy-preserving federated recommender system. The VLDB J. (2021), 1–20.Google Scholar
[54] Wang Weiran, Arora Raman, Livescu Karen, and Bilmes Jeff. 2015. On deep multi-view representation learning. In Proceedings of the International Conference on Machine Learning. 1083–1092.Google Scholar
[55] Wang Yongxin, Chen Zhen-Duo, Luo Xin, and Xu Xin-Shun. 2021. High-dimensional sparse cross-modal hashing with fine-grained similarity embedding. In Proceedings of the Web Conference. 2900–2909.Google ScholarDigital Library
[56] Xie De, Deng Cheng, Li Chao, Liu Xianglong, and Tao Dacheng. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 3626–3637.Google ScholarDigital Library
[57] Xu Xing, Lin Kaiyi, Yang Yang, Hanjalic Alan, and Shen Heng Tao. 2020. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google Scholar
[58] Zhan Yu-Wei, Luo Xin, Wang Yongxin, and Xu Xin-Shun. 2020. Supervised hierarchical deep hashing for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 3386–3394.Google ScholarDigital Library
[59] Zhang Dongqing and Li Wu-Jun. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence. 7–13.Google ScholarCross Ref
[60] Zhang Peng-Fei, Huang Zi, and Xu Xin-Shun. 2021. Proactive privacy-preserving learning for retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 3369–3376.Google ScholarCross Ref
[61] Zhang Peng-Fei, Luo Yadan, Huang Zi, Xu Xin-Shun, and Song Jingkuan. 2021. High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 2 (2021), 563–583.Google ScholarDigital Library
[62] Zhang Shijie, Yin Hongzhi, Chen Tong, Huang Zi, Nguyen Quoc Viet Hung, and Cui Lizhen. 2022. Pipattack: Poisoning federated recommender systems for manipulating item promotion. In Proceedings of the ACM International Conference on Web Search and Data Mining. 1415–1423.Google ScholarDigital Library
[63] Zhen Liangli, Hu Peng, Peng Xi, Goh Rick Siow Mong, and Zhou Joey Tianyi. 2022. Deep multimodal transfer learning for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. 33, 2 (2022), 798–810.Google ScholarCross Ref
[64] Zhen Liangli, Hu Peng, Wang Xu, and Peng Dezhong. 2019. Deep supervised cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10394–10403.Google ScholarCross Ref
[65] Zhen Yi and Yeung Dit-Yan. 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 1376–1384.Google Scholar
[66] Zhu Xiaofeng, Huang Zi, Shen Heng Tao, and Zhao Xin. 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the ACM International Conference on Multimedia. 143–152.Google ScholarDigital Library
[67] Peng-Fei Zhang, Chuan-Xiang Li, Meng-Yuan Liu, Liqiang Nie, and Xin-Shun Xu. 2017. Semi-relaxation supervised hashing for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 1762–1770.Google Scholar

Index Terms

Proactive Privacy-preserving Learning for Cross-modal Retrieval
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Privacy Protection in Deep Multi-modal Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Deep learning techniques have ushered in significant progress in large-scale multi-modal retrieval. Nevertheless, the advanced techniques may be used nefariously to conduct a search that violates the privacy of individuals. In this paper, we propose a ...
Read More
HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-...
Read More
Learnable Privacy-Preserving Anonymization for Pedestrian Images
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

This paper studies a novel privacy-preserving anonymization problem for pedestrian images, which preserves personal identity information (PII) for authorized models and prevents PII from being recognized by third parties. Conventional anonymization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 41, Issue 2
April 2023
770 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3568971
Editor:
Min Zhang
Tsinghua University, China
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 January 2023
- Online AM: 28 June 2022
- Accepted: 5 June 2022
- Revised: 28 April 2022
- Received: 28 November 2021
Published in tois Volume 41, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Privacy protection
cross-modal retrieval
deep learning
adversarial data
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 1,188
  Total Downloads
- Downloads (Last 12 months)783
- Downloads (Last 6 weeks)78
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Proactive Privacy-preserving Learning for Cross-modal Retrieval

ACM Transactions on Information Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Privacy Protection in Deep Multi-modal Retrieval

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

Learnable Privacy-Preserving Anonymization for Pedestrian Images