Skip to main content

Extracting Visual Knowledge from the Internet: Making Sense of Image Data

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9516))

Included in the following conference series:

Abstract

Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the ever-increasing size of labeled training data. Extensive research has been devoted to the first two, but much less attention has been paid to the third. Due to the high cost of manual data labeling, the size of recent efforts such as ImageNet is still relatively small in respect to daily applications. In this work, we mainly focus on how to automatically generate identifying image data for a given visual concept on a vast scale.

With the generated image data, we can train a robust recognition model for the given concept. We evaluate the proposed webly supervised approach on the benchmark Pascal VOC 2007 dataset and the results demonstrates the superiority of our method over many other state-of-the-art methods in image data collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, X., Shrivastava, A., Gupta, A.: Neil: extracting visual knowledge from web data. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1409–1416. IEEE (2013)

    Google Scholar 

  2. Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)

    Article  Google Scholar 

  3. Collosal, C.: How well does the world wide web represent human language. The Economist (2005)

    Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)

    Google Scholar 

  5. Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: webly-supervised visual concept learning. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3270–3277. IEEE (2014)

    Google Scholar 

  6. Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.: What makes paris look like paris? ACM Trans. Graph. 31(4), 101:9 (2012)

    Article  Google Scholar 

  7. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  8. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  9. Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from internet image searches. Proc. IEEE 98(8), 1453–1466 (2010)

    Article  Google Scholar 

  10. Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 242–256. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)

    Google Scholar 

  13. Kankanhalli, M.S., Mehtre, B.M., Wu, R.K.: Cluster-based color matching for image retrieval. Pattern Recogn. 29(4), 701–708 (1996)

    Article  Google Scholar 

  14. Li, L.-J., Fei-Fei, L.: Optimol: automatic online picture collection via incremental model learning. Int. J. Comput. Vis. 88(2), 147–168 (2010)

    Article  Google Scholar 

  15. Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., Huang, T.: Large-scale image classification: fast feature extraction and svm training. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696. IEEE (2011)

    Google Scholar 

  16. Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, pp. 169–174. Association for Computational Linguistics (2012)

    Google Scholar 

  17. Lucchi, A., Weston, J.: Joint image and word sense discrimination for image retrieval. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 130–143. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Malisiewicz, T., Efros, A., et al.: Recognition by association via learning per-exemplar distances. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  19. Mezuman, E., Weiss, Y.: Learning about canonical views from internet image collections. In: Advances in Neural Information Processing Systems, pp. 719–727 (2012)

    Google Scholar 

  20. Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Article  Google Scholar 

  21. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  22. Perona, P.: Vision of a visipedia. Proc. IEEE 98(8), 1526–1534 (2010)

    Article  Google Scholar 

  23. Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3282–3289. IEEE (2012)

    Google Scholar 

  24. Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1939–1946. IEEE (2013)

    Google Scholar 

  25. Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 754–766 (2011)

    Article  Google Scholar 

  26. Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.: Learning binary codes for maximum inner product search. In: The IEEE Conference on Computer Vision (ICCV), December 2015

    Google Scholar 

  27. Shen, F., Shen, C., Liu, W., Tao Shen, H.: Supervised discrete hashing. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 37–45, June 2015

    Google Scholar 

  28. Shen, F., Shen, C., Shi, Q., Van Den Hengel, A., Tang, Z.: Inductive hashing on manifolds. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1562–1569. IEEE (2013)

    Google Scholar 

  29. Siddiquie, B., Gupta, A.: Beyond active noun tagging: modeling contextual interactions for multi-class active learning. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2979–2986. IEEE (2010)

    Google Scholar 

  30. Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 343–350. IEEE (2011)

    Google Scholar 

  31. Vijayanarasimhan, S., Grauman, K.: Large-scale live active learning: training object detectors with crawled data and crowds. Int. J. Comput. Vis. 108(1–2), 97–114 (2014)

    Article  MathSciNet  Google Scholar 

  32. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic wikipedia. In: Proceedings of the 15th International Conference on World Wide Web, pp. 585–594. ACM (2006)

    Google Scholar 

  33. Wang, W., Song, H.: Cell cluster image segmentation on form analysis. In: Third International Conference on Natural Computation, ICNC 2007, vol. 4, pp. 833–836. IEEE (2007)

    Google Scholar 

  34. You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: The Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI) (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yazhou Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Yao, Y., Zhang, J., Hua, XS., Shen, F., Tang, Z. (2016). Extracting Visual Knowledge from the Internet: Making Sense of Image Data. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27671-7_72

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27670-0

  • Online ISBN: 978-3-319-27671-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics