research-article

"Seeing is believing: the quest for multimodal knowledge" by Gerard de Melo and Niket Tandon, with Martin Vesely as coordinator

Authors:
Gerard de Melo

IIIS, Tsinghua University

IIIS, Tsinghua University
View Profile

,
Niket Tandon

Max Planck Institute for Informatics

Max Planck Institute for Informatics
View Profile

Authors Info & Claims

ACM SIGWEB Newsletter Volume 2016 Issue SpringSpring 2016Article No.: 4pp 1–9https://doi.org/10.1145/2903513.2903517

Published:13 April 2016Publication History

ACM SIGWEB Newsletter

Abstract

There is a growing conviction that the future of computing will crucially depend on our ability to better exploit data to produce more intelligent systems. Increasingly, this will involve drawing simultaneously on multiple heterogeneous modalities, to take full advantage of the vast quantities of images and videos now available on the Web and elsewhere. We give several examples of methods that leverage prior knowledge for better, more semantically informed visual analytics, as well as methods that use multimodal data for better textual analytics. Important progress may come from approaches specifically geared towards harvesting rich multimodal knowledge. For example, our Knowlywood system relies on Hollywood movies to learn about human activities. Once acquired, knowledge of this sort can then be re-used across different tasks, much like humans draw on their accumulated knowledge when making sense of the world.

References

ANTOL, S., AGRAWAL, A., LU, J., MITCHELL, M., BATRA, D., ZITNICK, C. L., AND PARIKH, D. 2015. VQA: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google ScholarDigital Library
CHEN, J., TANDON, N., AND GERARD DE MELO. 2015. Neural word representations from large-scale commonsense knowledge. In Proceedings of WI 2015. Google ScholarDigital Library
DE MELO, G. AND WEIKUM, G. 2010. Providing multilingual, multimodal answers to lexical database queries. In Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010). ELRA, Paris, France, 348--355.Google Scholar
DE MELO, G. AND WEIKUM, G. 2014. Taxonomic data integration from multilingual Wikipedia editions. Knowledge and Information Systems 39, 1 (April), 1--39.Google ScholarDigital Library
DENG, J., DONG, W., SOCHER, R., LI, L., LI, K., AND LI, F. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 248--255.Google ScholarCross Ref
GAN, C., LIN, M., YANG, Y., DE MELO, G., AND HAUPTMANN, A. G. 2016. Concepts not alone: Exploring pairwise relationships for zero-shot video activity recognition. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016). AAAI Press.Google ScholarCross Ref
GE, T., WANG, Y., DE MELO, G., SHARF, A., AND CHEN, B. 2016. ShapeExplorer: Querying and exploring shapes using visual knowledge. In Proceedings of EDBT 2016.Google Scholar
HOFFART, J., SUCHANEK, F. M., BERBERICH, K., LEWIS-KELHAM, E., DE MELO, G., AND WEIKUM, G. 2011. YAGO2: Exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th International World Wide Web Conference (WWW 2011), S. Srinivasan, K. Ramamritham, A. Kumar, M. P. Ravindra, E. Bertino, and R. Kumar, Eds. ACM, New York, NY, USA, 229--232. Google ScholarDigital Library
KRISHNA, R., ZHU, Y., GROTH, O., JOHNSON, J., HATA, K., KRAVITZ, J., CHEN, S., KALANDITIS, Y., LI, L.-J., SHAMMA, D. A., BERNSTEIN, M., AND FEI-FEI, L. 2016. Visual Genome: Connecting language and vision using crowdsourced dense image annotations.Google Scholar
MARCUS, G. 2014. What Comes After the Turing Test? The New Yorker, June 9, 2014.Google Scholar
ROHRBACH, A., ROHRBACH, M., TANDON, N., AND SCHIELE, B. 2015. A dataset for movie description. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
SHUTOVA, E., TANDON, N., AND DE MELO, G. 2015. Perceptually grounded selectional preferences. In Proceedings of ACL 2015. 950--960.Google ScholarCross Ref
TANDON, N., DE MELO, G., DE, A., AND WEIKUM, G. 2015. Knowlywood: Mining activity knowledge from Hollywood narratives. In Proceedings of CIKM 2015. Google ScholarDigital Library
TANDON, N., DE MELO, G., SUCHANEK, F. M., AND WEIKUM, G. 2014. WebChild: Harvesting and organizing commonsense knowledge from the web. In Proceedings of ACM WSDM 2014. 523--532. Google ScholarDigital Library
TANDON, N., DE MELO, G., AND WEIKUM, G. 2011. Deriving a Web-scale common sense fact database. In Proceedings of the Twenty-fifth AAAI Conference on Artificial Intelligence (AAAI 2011). AAAI Press, Palo Alto, CA, USA, 152--157. Google ScholarDigital Library
TANDON, N., DE MELO, G., AND WEIKUM, G. 2014. Acquiring comparative commonsense knowledge from the web. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014). AAAI, 166--172. Google ScholarDigital Library
TANDON, N., HARIMAN, C., URBANI, J., ROHRBACH, A., ROHRBACH, M., AND WEIKUM, G. 2016. Commonsense in parts: Mining part-whole relations from the web and image tags. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016).Google ScholarCross Ref
TAYLOR, A., MARCUS, M., AND SANTORINI, B. 2003. Treebanks: Building and Using Parsed Corpora. Springer Netherlands, Dordrecht, Chapter The Penn Treebank: An Overview, 5--22.Google Scholar
THOMEE, B., ELIZALDE, B., SHAMMA, D. A., NI, K., FRIEDLAND, G., POLAND, D., BORTH, D., AND LI, L.-J. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (Jan.), 64--73. Google ScholarDigital Library
VENUGOPALAN, S., ROHRBACH, M., DONAHUE, J., MOONEY, R., DARRELL, T., AND SAENKO, K. 2015. Sequence to sequence - video to text. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google ScholarDigital Library
VINYALS, O., KAISER, L. U., KOO, T., PETROV, S., SUTSKEVER, I., AND HINTON, G. 2015. Grammar as a foreign language. In Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2755--2763.Google Scholar

Recommendations

"A new age of search systems" by Ujwal Gadiraju with Martin Vesely as coordinator: understanding knowledge gained by users through search

More than half of the world's population has access to the Internet today. Satisfying one's information need has never been easier and more ubiquitous. We can catch ourselves turning to web search with regularity to find information, answer questions, ...
Read More
"Applying fuzzy ontologies to implement the social semantic web" by Edy Portmann, Patrick Kaltenrieder and Noémie Zurlinden with Martin Vesely as coordinator

Because the knowledge in the World Wide Web is continuously expanding, Web Knowledge Aggregation, Representation and Reasoning (abbreviated as KR) is becoming increasingly important. This article demonstrates how fuzzy ontologies can be used in KR to ...
Read More
Seeing is believing
Health Informatics

Why visualization will play a critical role in bringing big data decision making to a hospital bed near you.

Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGWEB Newsletter Volume 2016, Issue Spring
Spring 2016
23 pages
ISSN:1931-1745
EISSN:1931-1435
DOI:10.1145/2903513
Editor:
Jessica Rubart
Issue’s Table of Contents
Copyright © 2016 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 April 2016
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 84
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

"Seeing is believing: the quest for multimodal knowledge" by Gerard de Melo and Niket Tandon, with Martin Vesely as coordinator

ACM SIGWEB Newsletter

Abstract

References

Cited By

Recommendations

"A new age of search systems" by Ujwal Gadiraju with Martin Vesely as coordinator: understanding knowledge gained by users through search

"Applying fuzzy ontologies to implement the social semantic web" by Edy Portmann, Patrick Kaltenrieder and Noémie Zurlinden with Martin Vesely as coordinator

Seeing is believing