Skip to main content

Multimodal Geolocation Estimation of News Photos

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2023)

Abstract

The widespread growth of multimodal news requires sophisticated approaches to interpret content and relations of different modalities. Images are of utmost importance since they represent a visual gist of the whole news article. For example, it is essential to identify the locations of natural disasters for crisis management or to analyze political or social events across the world. In some cases, verifying the location(s) claimed in a news article might help human assessors or fact-checking efforts to detect misinformation, i.e., fake news. Existing methods for geolocation estimation typically consider only a single modality, e.g., images or text. However, news images can lack sufficient geographical cues to estimate their locations, and the text can refer to various possible locations. In this paper, we propose a novel multimodal approach to predict the geolocation of news photos. To enable this approach, we introduce a novel dataset called Multimodal Geolocation Estimation of News Photos (MMG-NewsPhoto). MMG-NewsPhoto is, so far, the largest dataset for the given task and contains more than half a million news texts with the corresponding image, out of which 3000 photos were manually labeled for the photo geolocation based on information from the image-text pairs. For a fair comparison, we optimize and assess state-of-the-art methods using the new benchmark dataset. Experimental results show the superiority of the multimodal models compared to the unimodal approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Source code & dataset: https://github.com/TIBHannover/mmg-newsphoto.

References

  1. Armitage, J., Kacupaj, E., Tahmasebzadeh, G., Swati, Maleshkova, M., Ewerth, R., Lehmann, J.: MLM: a benchmark dataset for multitask learning with multiple languages and modalities. In: International Conference on Information and Knowledge Management, CIKM, pp. 2967–2974. ACM (2020). https://doi.org/10.1145/3340531.3412783

  2. Avrithis, Y., Kalantidis, Y., Tolias, G., Spyrou, E.: Retrieving landmark and non-landmark images from community photo collections. In: International Conference on Multimedia, MM, pp. 153–162. ACM (2010). https://doi.org/10.1145/1873951.1873973

  3. Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR (2016). http://arxiv.org/abs/1607.06450

  4. Baatz, G., Saurer, O., Köser, K., Pollefeys, M.: Large scale visual geo-localization of images in mountainous terrain. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 517–530. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_37

    Chapter  Google Scholar 

  5. Berton, G.M., Masone, C., Caputo, B.: Rethinking visual geo-localization for large-scale applications. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4868–4878. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.00483

  6. Biten, A.F., Gómez, L., Rusiñol, M., Karatzas, D.: Good news, everyone! context driven entity-aware captioning for news images. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 12466–12475. Computer Vision Foundation/IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Biten_Good_News_Everyone_Context_Driven_Entity-Aware_Captioning_for_News_Images_CVPR_2019_paper.html

  7. Boiarov, A., Tyantov, E.: Large scale landmark recognition via deep metric learning. In: Zhu, W., et al. (eds.) Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM, pp. 169–178. ACM (2019). https://doi.org/10.1145/3357384.3357956

  8. Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents based on wikipedia concepts. Informatica (Slovenia) (2018). http://www.informatica.si/index.php/informatica/article/view/2228

  9. Brejcha, J., Čadík, M.: State-of-the-art in visual geo-localization. Pattern Anal. Appl. 20(3), 613–637 (2017). https://doi.org/10.1007/s10044-017-0611-1

    Article  MathSciNet  Google Scholar 

  10. Cheng, J., Wu, Y., AbdAlmageed, W., Natarajan, P.: QATM: quality-aware template matching for deep learning. In: Conference on Computer Vision and Pattern Recognition, CVPR. pp. 11553–11562. Computer Vision Foundation/IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Cheng_QATM_Quality-Aware_Template_Matching_for_Deep_Learning_CVPR_2019_paper.html

  11. Crandall, D.J., Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M.: Mapping the world’s photos. In: International Conference on World Wide Web, WWW, pp. 761–770. ACM (2009). https://doi.org/10.1145/1526709.1526812

  12. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  13. Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society (2008)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90

  15. Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017). https://spacy.io

  16. Izbicki, M., Papalexakis, E.E., Tsotras, V.J.: Exploiting the earth’s spherical geometry to geolocate images. In: European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD, pp. 3–19. Springer (2019). https://doi.org/10.1007/978-3-030-46147-8_1

  17. Kim, H.J., Dunn, E., Frahm, J.: Learned contextual feature reweighting for image geo-localization. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3251–3260. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.346

  18. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015). http://arxiv.org/abs/1412.6980

  19. Kordopatis-Zilos, G., Galopoulos, P., Papadopoulos, S., Kompatsiaris, I.: Leveraging efficientnet and contrastive learning for accurate global-scale location estimation. In: International Conference on Multimedia Retrieval, ICMR, pp. 155–163. ACM (2021). https://doi.org/10.1145/3460426.3463644

  20. Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE, 1971–1986 (2017). https://doi.org/10.1109/JPROC.2017.2688799

  21. Kordopatis-Zilos, G., Popescu, A., Papadopoulos, S., Kompatsiaris, Y.: Placing images with refined language models and similarity search with pca-reduced VGG features. In: MediaEval 2016 Workshop. CEUR-WS.org (2016). http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf

  22. Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43

  23. Larson, M.A., Soleymani, M., Gravier, G., Ionescu, B., Jones, G.J.F.: The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE MultiMedia, 93–96 (2017). https://doi.org/10.1109/MMUL.2017.9

  24. Mackenzie, J.M., Benham, R., Petri, M., Trippas, J.R., Culpepper, J.S., Moffat, A.: CC-News-En: A large english news corpus. In: International Conference on Information and Knowledge Management, CIKM, pp. 3077–3084. ACM (2020). https://doi.org/10.1145/3340531.3412762

  25. Müller-Budack, E., Pustu-Iren, K., Ewerth, R.: Geolocation estimation of photos using a hierarchical model and scene classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 575–592. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_35

    Chapter  Google Scholar 

  26. Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Ewerth, R.: Multimodal analytics for real-world news using measures of cross-modal entity consistency. In: International Conference on Multimedia Retrieval, ICMR, pp. 16–25. ACM (2020). https://doi.org/10.1145/3372278.3390670

  27. Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., Ewerth, R.: Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multimed. Inf. Retrieval 10(2), 111–125 (2021). https://doi.org/10.1007/s13735-021-00207-4

    Article  Google Scholar 

  28. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz, J., Joachims, T. (eds.) International Conference on Machine Learning (ICML), pp. 807–814. Omnipress (2010). https://icml.cc/Conferences/2010/papers/432.pdf

  29. Nominatim. https://nominatim.org/release-docs/latest/api/Reverse/. Accessed 19 May 2022

  30. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, ICML, pp. 8748–8763. PMLR (2021). http://proceedings.mlr.press/v139/radford21a.html

  31. Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: Breakingnews: Article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell., 1072–1085 (2018). https://doi.org/10.1109/TPAMI.2017.2721945

  32. Seo, P.H., Weyand, T., Sim, J., Han, B.: CPlaNet: enhancing image geolocalization by combinatorial partitioning of maps. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 544–560. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_33

    Chapter  Google Scholar 

  33. Serdyukov, P., Murdock, V., van Zwol, R.: Placing flickr photos on a map. In: SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 484–491. ACM (2009). https://doi.org/10.1145/1571941.1572025

  34. Singhal, S., Shah, R.R., Chakraborty, T., Kumaraguru, P., Satoh, S.: Spotfake: a multi-modal framework for fake news detection. In: IEEE International Conference on Multimedia Big Data, BigMM, pp. 39–47. IEEE (2019). https://doi.org/10.1109/BigMM.2019.00-44

  35. Theiner, J., Müller-Budack, E., Ewerth, R.: Interpretable semantic photo geolocation. In: Winter Conference on Applications of Computer Vision, WACV, pp. 1474–1484. IEEE (2022). https://doi.org/10.1109/WACV51458.2022.00154

  36. Thomee, B., et al.: The new data and new challenges in multimedia research. CoRR (2015). http://arxiv.org/abs/1503.01817

  37. Tomesek, J., Cadík, M., Brejcha, J.: Crosslocate: cross-modal large-scale visual geo-localization in natural environments using rendered modalities. In: Winter Conference on Applications of Computer Vision, WACV, pp. 2193–2202. IEEE (2022). https://doi.org/10.1109/WACV51458.2022.00225

  38. Trevisiol, M., Jégou, H., Delhumeau, J., Gravier, G.: Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach. In: International Conference on Multimedia Retrieval, ICMR, pp. 1–8. ACM (2013). https://doi.org/10.1145/2461466.2461468

  39. Uzkent, B., et al.: Learning to interpret satellite images using Wikipedia. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 3620–3626. ijcai.org (2019). https://doi.org/10.24963/ijcai.2019/502

  40. Vo, N.N., Jacobs, N., Hays, J.: Revisiting IM2GPS in the deep learning era. In: International Conference on Computer Vision, ICCV, pp. 2640–2649. IEEE Computer Society (2017)

    Google Scholar 

  41. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM, 78–85 (2014). https://doi.org/10.1145/2629489

  42. Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2 - a large-scale benchmark for instance-level recognition and retrieval. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2572–2581. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00265

  43. Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially funded by the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 812997 (CLEOPATRA ITN), and by the Ministry of Lower Saxony for Science and Culture (Responsible AI in digital society, project no. 51171145).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Golsa Tahmasebzadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tahmasebzadeh, G., Hakimov, S., Ewerth, R., Müller-Budack, E. (2023). Multimodal Geolocation Estimation of News Photos. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28238-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28237-9

  • Online ISBN: 978-3-031-28238-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics