skip to main content
10.1145/3442381.3449801acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Deep Co-Attention Network for Multi-View Subspace Learning

Published:03 June 2021Publication History

ABSTRACT

Many real-world applications involve data from multiple modalities and thus exhibit the view heterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users’ posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner.

To address these issues, in this paper, we propose a deep co-attention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.

References

  1. Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep canonical correlation analysis. In Proceedings of International Conference on Machine Learning. 1247–1255.Google ScholarGoogle Scholar
  2. Avrim Blum and Tom M. Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-Training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, Madison, Wisconsin, USA, July 24-26, 1998.92–100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kamalika Chaudhuri, Sham M. Kakade, Karen Livescu, and Karthik Sridharan. 2009. Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009. 129–136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Changying Du, Changde Du, Xingyu Xie, Chen Zhang, and Hao Wang. 2018. Multi-view Adversarially Learned Inference for Cross-domain Joint Distribution Matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. 1348–1357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason D. R. Farquhar, David R. Hardoon, Hongying Meng, John Shawe-Taylor, and Sándor Szedmák. 2005. Two view learning: SVM-2K, Theory and Practice. In Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada]. 355–362.Google ScholarGoogle Scholar
  6. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 1 (2016), 2096–2030.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. 6602–6611.Google ScholarGoogle Scholar
  8. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2672–2680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.Google ScholarGoogle Scholar
  10. Isabelle Guyon. 2003. Design of experiments of the NIPS 2003 variable selection benchmark. In NIPS 2003 workshop on feature extraction and feature selection.Google ScholarGoogle Scholar
  11. Jia He, Changying Du, Fuzhen Zhuang, Xin Yin, Qing He, and Guoping Long. 2016. Online Bayesian Max-Margin Subspace Multi-View Learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 1555–1561.Google ScholarGoogle Scholar
  12. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.Google ScholarGoogle ScholarCross RefCross Ref
  13. Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321–377.Google ScholarGoogle ScholarCross RefCross Ref
  14. Zhenyu Jiao and Chao Xu. 2017. Deep multi-view robust representation learning. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. 2851–2855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. W. Koh and P. Liang. 2017. Understanding black-box predictions via influence functions. ICML (2017) (2017).Google ScholarGoogle Scholar
  16. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarGoogle ScholarCross RefCross Ref
  17. Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. 2017. MMD GAN: Towards Deeper Understanding of Moment Matching Network. In Advances in Neural Information Processing Systems 30. 2203–2213.Google ScholarGoogle Scholar
  18. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers. 1–10.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yiding Liu, Yulong Gu, Zhuoye Ding, Junchao Gao, Ziyi Guo, Yongjun Bao, and Weipeng Yan. 2020. Decoupled Graph Convolution Network for Inferring Substitutable and Complementary Items. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2621–2628.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In 2015 IEEE ICCV, Santiago, Chile, 2015. 3730–3738.Google ScholarGoogle Scholar
  21. Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances in neural information processing systems. 289–297.Google ScholarGoogle Scholar
  22. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765–4774.Google ScholarGoogle Scholar
  23. Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian Goodfellow. 2016. Adversarial Autoencoders. In International Conference on Learning Representations. http://arxiv.org/abs/1511.05644Google ScholarGoogle Scholar
  24. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google ScholarGoogle Scholar
  25. M. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In SIGKDD (2016).Google ScholarGoogle Scholar
  26. Ryan M. Rifkin and Aldebaro Klautau. 2004. In Defense of One-Vs-All Classification. Journal of Machine Learning Research 5 (2004), 101–141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.Google ScholarGoogle ScholarCross RefCross Ref
  28. Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of International Conference on Machine Learning workshop on learning with multiple views. 74–79.Google ScholarGoogle Scholar
  29. Shuzhi Su, Hongwei Ge, and Yunhao Yuan. 2017. A label embedding kernel method for multi-view canonical correlation analysis. Multimedia Tools Appl. 76, 12 (2017), 13785–13803.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, and Dimitris N. Metaxas. 2018. CR-GAN: Learning Complete Representations for Multi-view Generation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden.942–948.Google ScholarGoogle ScholarCross RefCross Ref
  31. E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. 2017. Adversarial Discriminative Domain Adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2962–2971.Google ScholarGoogle Scholar
  32. Fan Wang, Qixing Huang, Maks Ovsjanikov, and Leonidas J. Guibas. 2014. Unsupervised Multi-class Joint Image Segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. 3142–3149.Google ScholarGoogle Scholar
  33. Qi Wang, Claire Boudreau, Qixing Luo, Pang-Ning Tan, and Jiayu Zhou. 2019. Deep Multi-view Information Bottleneck. In Proceedings of the 2019 SIAM International Conference on Data Mining. SIAM, 37–45.Google ScholarGoogle ScholarCross RefCross Ref
  34. Weiran Wang, Raman Arora, Karen Livescu, and Jeff A. Bilmes. 2015. On Deep Multi-View Representation Learning. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. 1083–1092.Google ScholarGoogle Scholar
  35. Weiran Wang, Honglak Lee, and Karen Livescu. 2016. Deep Variational Canonical Correlation Analysis. CoRR abs/1610.03454(2016).Google ScholarGoogle Scholar
  36. P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. 2010. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001. California Institute of Technology.Google ScholarGoogle Scholar
  37. JR Westbury. 1994. X-ray microbeam speech production database user’s handbook: Madison. WI: Waisman Center, University of Wisconsin(1994).Google ScholarGoogle Scholar
  38. Chang Xu, Dacheng Tao, and Chao Xu. 2013. A Survey on Multi-view Learning. CoRR abs/1304.5634(2013).Google ScholarGoogle Scholar
  39. Shipeng Yu, Balaji Krishnapuram, Rómer Rosales, Harald Steck, and R. Bharat Rao. 2007. Bayesian Co-Training. In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2007. 1665–1672.Google ScholarGoogle Scholar
  40. Yuan Zhang, Regina Barzilay, and Tommi S. Jaakkola. 2017. Aspect-augmented Adversarial Networks for Domain Adaptation. TACL 5(2017), 515–528.Google ScholarGoogle ScholarCross RefCross Ref
  41. Lecheng Zheng, Yu Cheng, and Jingrui He. 2019. Deep Multimodality Model for Multi-task Multi-view Learning. In Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, Calgary, Alberta, Canada, May 2-4, 2019. SIAM, 10–18.Google ScholarGoogle ScholarCross RefCross Ref
  42. Dawei Zhou, Jingrui He, K. Selçuk Candan, and Hasan Davulcu. 2015. MUVIR: Multi-View Rare Category Detection. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015. AAAI Press, 4098–4104.Google ScholarGoogle Scholar
  43. Dawei Zhou, Jingrui He, Yu Cao, and Jae-sun Seo. 2016. Bi-Level Rare Temporal Pattern Detection. In IEEE 16th International Conference on Data Mining, ICDM 2016, December 12-15, 2016, Barcelona, Spain. IEEE Computer Society, 719–728.Google ScholarGoogle Scholar
  44. Dawei Zhou, Jiebo Luo, Vincent M. B. Silenzio, Yun Zhou, Jile Hu, Glenn Currier, and Henry A. Kautz. 2015. Tackling Mental Health by Integrating Unobtrusive Multimodal Sensing. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA. AAAI Press, 1401–1409.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dawei Zhou, Lecheng Zheng, Jiawei Han, and Jingrui He. 2020. A Data-Driven Graph Generative Model for Temporal Interaction Networks. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020. ACM, 401–411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Dawei Zhou, Lecheng Zheng, Jiejun Xu, and Jingrui He. 2019. Misc-GAN: A multi-scale generative model for graphs. Frontiers in Big Data 2(2019), 3.Google ScholarGoogle ScholarCross RefCross Ref
  47. Dawei Zhou, Lecheng Zheng, Yada Zhu, Jianbo Li, and Jingrui He. 2020. Domain Adaptive Multi-Modality Neural Attention Network for Financial Forecasting. In Proceedings of The Web Conference 2020. 2230–2240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tinghui Zhou, Philipp Krähenbühl, Mathieu Aubry, Qi-Xing Huang, and Alexei A. Efros. 2016. Learning Dense Correspondence via 3D-Guided Cycle Consistency. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. 117–126.Google ScholarGoogle ScholarCross RefCross Ref
  49. Tinghui Zhou, Yong Jae Lee, Stella X. Yu, and Alexei A. Efros. 2015. FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. 1191–1200.Google ScholarGoogle Scholar
  50. Yao Zhou and Jingrui He. 2016. Crowdsourcing via Tensor Augmentation and Completion. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. IJCAI/AAAI Press, 2435–2441.Google ScholarGoogle Scholar
  51. Yao Zhou and Jingrui He. 2017. A Randomized Approach for Crowdsourcing in the Presence of Multiple Views. In 2017 IEEE International Conference on Data Mining, ICDM 2017, New Orleans, LA, USA, November 18-21, 2017. IEEE Computer Society, 685–694.Google ScholarGoogle ScholarCross RefCross Ref
  52. Yao Zhou, Lei Ying, and Jingrui He. 2017. MultiC2: an Optimization Framework for Learning from Task and Worker Dual Heterogeneity. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, Texas, USA, April 27-29, 2017. SIAM, 579–587.Google ScholarGoogle ScholarCross RefCross Ref
  53. Yao Zhou, Lei Ying, and Jingrui He. 2019. Multi-task Crowdsourcing via an Optimization Framework. ACM Trans. Knowl. Discov. Data 13, 3 (2019), 27:1–27:26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Thomas Goldstein, and Jingjing Liu. 2020. Free{LB}: Enhanced Adversarial Training for Language Understanding. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  55. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. 2242–2251.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 June 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format