Extracting Visual Knowledge from the Internet: Making Sense of Image Data

Yao, Yazhou; Zhang, Jian; Hua, Xian-Sheng; Shen, Fumin; Tang, Zhenmin

doi:10.1007/978-3-319-27671-7_72

Yazhou Yao^19,20,
Jian Zhang¹⁹,
Xian-Sheng Hua²¹,
Fumin Shen²² &
…
Zhenmin Tang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

International Conference on Multimedia Modeling

2976 Accesses
1 Citations

Abstract

Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the ever-increasing size of labeled training data. Extensive research has been devoted to the first two, but much less attention has been paid to the third. Due to the high cost of manual data labeling, the size of recent efforts such as ImageNet is still relatively small in respect to daily applications. In this work, we mainly focus on how to automatically generate identifying image data for a given visual concept on a vast scale.

With the generated image data, we can train a robust recognition model for the given concept. We evaluate the proposed webly supervised approach on the benchmark Pascal VOC 2007 dataset and the results demonstrates the superiority of our method over many other state-of-the-art methods in image data collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, X., Shrivastava, A., Gupta, A.: Neil: extracting visual knowledge from web data. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1409–1416. IEEE (2013)
Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Article Google Scholar
Collosal, C.: How well does the world wide web represent human language. The Economist (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: webly-supervised visual concept learning. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3270–3277. IEEE (2014)
Google Scholar
Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.: What makes paris look like paris? ACM Trans. Graph. 31(4), 101:9 (2012)
Article Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from internet image searches. Proc. IEEE 98(8), 1453–1466 (2010)
Article Google Scholar
Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 242–256. Springer, Heidelberg (2004)
Chapter Google Scholar
Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)
Chapter Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Kankanhalli, M.S., Mehtre, B.M., Wu, R.K.: Cluster-based color matching for image retrieval. Pattern Recogn. 29(4), 701–708 (1996)
Article Google Scholar
Li, L.-J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88(2), 147–168 (2010)
Article Google Scholar
Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., Huang, T.: Large-scale image classification: fast feature extraction and svm training. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696. IEEE (2011)
Google Scholar
Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, pp. 169–174. Association for Computational Linguistics (2012)
Google Scholar
Lucchi, A., Weston, J.: Joint image and word sense discrimination for image retrieval. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 130–143. Springer, Heidelberg (2012)
Chapter Google Scholar
Malisiewicz, T., Efros, A., et al.: Recognition by association via learning per-exemplar distances. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Mezuman, E., Weiss, Y.: Learning about canonical views from internet image collections. In: Advances in Neural Information Processing Systems, pp. 719–727 (2012)
Google Scholar
Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Perona, P.: Vision of a visipedia. Proc. IEEE 98(8), 1526–1534 (2010)
Article Google Scholar
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3282–3289. IEEE (2012)
Google Scholar
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1939–1946. IEEE (2013)
Google Scholar
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 754–766 (2011)
Article Google Scholar
Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.: Learning binary codes for maximum inner product search. In: The IEEE Conference on Computer Vision (ICCV), December 2015
Google Scholar
Shen, F., Shen, C., Liu, W., Tao Shen, H.: Supervised discrete hashing. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 37–45, June 2015
Google Scholar
Shen, F., Shen, C., Shi, Q., Van Den Hengel, A., Tang, Z.: Inductive hashing on manifolds. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1562–1569. IEEE (2013)
Google Scholar
Siddiquie, B., Gupta, A.: Beyond active noun tagging: modeling contextual interactions for multi-class active learning. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2979–2986. IEEE (2010)
Google Scholar
Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 343–350. IEEE (2011)
Google Scholar
Vijayanarasimhan, S., Grauman, K.: Large-scale live active learning: training object detectors with crawled data and crowds. Int. J. Comput. Vis. 108(1–2), 97–114 (2014)
Article MathSciNet Google Scholar
Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, pp. 585–594. ACM (2006)
Google Scholar
Wang, W., Song, H.: Cell cluster image segmentation on form analysis. In: Third International Conference on Natural Computation, ICNC 2007, vol. 4, pp. 833–836. IEEE (2007)
Google Scholar
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Analytics Institute, University of Technology, Sydney, Australia
Yazhou Yao & Jian Zhang
Nanjing University of Science and Technology, Nanjing, China
Yazhou Yao & Zhenmin Tang
Alibaba Group, Hangzhou, China
Xian-Sheng Hua
University of Electronic Science and Technology of China, Chengdu, China
Fumin Shen

Authors

Yazhou Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Sheng Hua
View author publications
You can also search for this author in PubMed Google Scholar
Fumin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhenmin Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yazhou Yao .

Editor information

Editors and Affiliations

University of Texas at San Antonio, San Antonio, USA
Qi Tian
Dept. of Information Engineering, University of Trento, Povo, Trento, Italy
Nicu Sebe
EECS, University of Central Florida, Orlando, Florida, USA
Guo-Jun Qi
EURECOM, Sophia-Antipolis, France
Benoit Huet
Hefei University of Technology, Hefei, Anhui, China
Richang Hong
School of Computing and Information, Hefei University of Technology, Hefei, Anhui, China
Xueliang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, Y., Zhang, J., Hua, XS., Shen, F., Tang, Z. (2016). Extracting Visual Knowledge from the Internet: Making Sense of Image Data. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_72

Download citation

DOI: https://doi.org/10.1007/978-3-319-27671-7_72
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics