skip to main content
research-article

"Seeing is believing: the quest for multimodal knowledge" by Gerard de Melo and Niket Tandon, with Martin Vesely as coordinator

Published:13 April 2016Publication History
Skip Abstract Section

Abstract

There is a growing conviction that the future of computing will crucially depend on our ability to better exploit data to produce more intelligent systems. Increasingly, this will involve drawing simultaneously on multiple heterogeneous modalities, to take full advantage of the vast quantities of images and videos now available on the Web and elsewhere. We give several examples of methods that leverage prior knowledge for better, more semantically informed visual analytics, as well as methods that use multimodal data for better textual analytics. Important progress may come from approaches specifically geared towards harvesting rich multimodal knowledge. For example, our Knowlywood system relies on Hollywood movies to learn about human activities. Once acquired, knowledge of this sort can then be re-used across different tasks, much like humans draw on their accumulated knowledge when making sense of the world.

References

  1. ANTOL, S., AGRAWAL, A., LU, J., MITCHELL, M., BATRA, D., ZITNICK, C. L., AND PARIKH, D. 2015. VQA: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CHEN, J., TANDON, N., AND GERARD DE MELO. 2015. Neural word representations from large-scale commonsense knowledge. In Proceedings of WI 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. DE MELO, G. AND WEIKUM, G. 2010. Providing multilingual, multimodal answers to lexical database queries. In Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010). ELRA, Paris, France, 348--355.Google ScholarGoogle Scholar
  4. DE MELO, G. AND WEIKUM, G. 2014. Taxonomic data integration from multilingual Wikipedia editions. Knowledge and Information Systems 39, 1 (April), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. DENG, J., DONG, W., SOCHER, R., LI, L., LI, K., AND LI, F. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  6. GAN, C., LIN, M., YANG, Y., DE MELO, G., AND HAUPTMANN, A. G. 2016. Concepts not alone: Exploring pairwise relationships for zero-shot video activity recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016). AAAI Press.Google ScholarGoogle ScholarCross RefCross Ref
  7. GE, T., WANG, Y., DE MELO, G., SHARF, A., AND CHEN, B. 2016. ShapeExplorer: Querying and exploring shapes using visual knowledge. In Proceedings of EDBT 2016.Google ScholarGoogle Scholar
  8. HOFFART, J., SUCHANEK, F. M., BERBERICH, K., LEWIS-KELHAM, E., DE MELO, G., AND WEIKUM, G. 2011. YAGO2: Exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th International World Wide Web Conference (WWW 2011), S. Srinivasan, K. Ramamritham, A. Kumar, M. P. Ravindra, E. Bertino, and R. Kumar, Eds. ACM, New York, NY, USA, 229--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. KRISHNA, R., ZHU, Y., GROTH, O., JOHNSON, J., HATA, K., KRAVITZ, J., CHEN, S., KALANDITIS, Y., LI, L.-J., SHAMMA, D. A., BERNSTEIN, M., AND FEI-FEI, L. 2016. Visual Genome: Connecting language and vision using crowdsourced dense image annotations.Google ScholarGoogle Scholar
  10. MARCUS, G. 2014. What Comes After the Turing Test? The New Yorker, June 9, 2014.Google ScholarGoogle Scholar
  11. ROHRBACH, A., ROHRBACH, M., TANDON, N., AND SCHIELE, B. 2015. A dataset for movie description. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  12. SHUTOVA, E., TANDON, N., AND DE MELO, G. 2015. Perceptually grounded selectional preferences. In Proceedings of ACL 2015. 950--960.Google ScholarGoogle ScholarCross RefCross Ref
  13. TANDON, N., DE MELO, G., DE, A., AND WEIKUM, G. 2015. Knowlywood: Mining activity knowledge from Hollywood narratives. In Proceedings of CIKM 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. TANDON, N., DE MELO, G., SUCHANEK, F. M., AND WEIKUM, G. 2014. WebChild: Harvesting and organizing commonsense knowledge from the web. In Proceedings of ACM WSDM 2014. 523--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. TANDON, N., DE MELO, G., AND WEIKUM, G. 2011. Deriving a Web-scale common sense fact database. In Proceedings of the Twenty-fifth AAAI Conference on Artificial Intelligence (AAAI 2011). AAAI Press, Palo Alto, CA, USA, 152--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. TANDON, N., DE MELO, G., AND WEIKUM, G. 2014. Acquiring comparative commonsense knowledge from the web. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014). AAAI, 166--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. TANDON, N., HARIMAN, C., URBANI, J., ROHRBACH, A., ROHRBACH, M., AND WEIKUM, G. 2016. Commonsense in parts: Mining part-whole relations from the web and image tags. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016).Google ScholarGoogle ScholarCross RefCross Ref
  18. TAYLOR, A., MARCUS, M., AND SANTORINI, B. 2003. Treebanks: Building and Using Parsed Corpora. Springer Netherlands, Dordrecht, Chapter The Penn Treebank: An Overview, 5--22.Google ScholarGoogle Scholar
  19. THOMEE, B., ELIZALDE, B., SHAMMA, D. A., NI, K., FRIEDLAND, G., POLAND, D., BORTH, D., AND LI, L.-J. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (Jan.), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. VENUGOPALAN, S., ROHRBACH, M., DONAHUE, J., MOONEY, R., DARRELL, T., AND SAENKO, K. 2015. Sequence to sequence - video to text. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. VINYALS, O., KAISER, L. U., KOO, T., PETROV, S., SUTSKEVER, I., AND HINTON, G. 2015. Grammar as a foreign language. In Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2755--2763.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGWEB Newsletter
    ACM SIGWEB Newsletter  Volume 2016, Issue Spring
    Spring 2016
    23 pages
    ISSN:1931-1745
    EISSN:1931-1435
    DOI:10.1145/2903513
    Issue’s Table of Contents

    Copyright © 2016 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 April 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader