Multimodal Geolocation Estimation of News Photos

Tahmasebzadeh, Golsa; Hakimov, Sherzod; Ewerth, Ralph; Müller-Budack, Eric

doi:10.1007/978-3-031-28238-6_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13981))

Included in the following conference series:

European Conference on Information Retrieval

1690 Accesses
2 Citations

Abstract

The widespread growth of multimodal news requires sophisticated approaches to interpret content and relations of different modalities. Images are of utmost importance since they represent a visual gist of the whole news article. For example, it is essential to identify the locations of natural disasters for crisis management or to analyze political or social events across the world. In some cases, verifying the location(s) claimed in a news article might help human assessors or fact-checking efforts to detect misinformation, i.e., fake news. Existing methods for geolocation estimation typically consider only a single modality, e.g., images or text. However, news images can lack sufficient geographical cues to estimate their locations, and the text can refer to various possible locations. In this paper, we propose a novel multimodal approach to predict the geolocation of news photos. To enable this approach, we introduce a novel dataset called Multimodal Geolocation Estimation of News Photos (MMG-NewsPhoto). MMG-NewsPhoto is, so far, the largest dataset for the given task and contains more than half a million news texts with the corresponding image, out of which 3000 photos were manually labeled for the photo geolocation based on information from the image-text pairs. For a fair comparison, we optimize and assess state-of-the-art methods using the new benchmark dataset. Experimental results show the superiority of the multimodal models compared to the unimodal approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Source code & dataset: https://github.com/TIBHannover/mmg-newsphoto.

References

Armitage, J., Kacupaj, E., Tahmasebzadeh, G., Swati, Maleshkova, M., Ewerth, R., Lehmann, J.: MLM: a benchmark dataset for multitask learning with multiple languages and modalities. In: International Conference on Information and Knowledge Management, CIKM, pp. 2967–2974. ACM (2020). https://doi.org/10.1145/3340531.3412783
Avrithis, Y., Kalantidis, Y., Tolias, G., Spyrou, E.: Retrieving landmark and non-landmark images from community photo collections. In: International Conference on Multimedia, MM, pp. 153–162. ACM (2010). https://doi.org/10.1145/1873951.1873973
Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR (2016). http://arxiv.org/abs/1607.06450
Baatz, G., Saurer, O., Köser, K., Pollefeys, M.: Large scale visual geo-localization of images in mountainous terrain. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 517–530. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_37
Chapter Google Scholar
Berton, G.M., Masone, C., Caputo, B.: Rethinking visual geo-localization for large-scale applications. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4868–4878. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.00483
Biten, A.F., Gómez, L., Rusiñol, M., Karatzas, D.: Good news, everyone! context driven entity-aware captioning for news images. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 12466–12475. Computer Vision Foundation/IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Biten_Good_News_Everyone_Context_Driven_Entity-Aware_Captioning_for_News_Images_CVPR_2019_paper.html
Boiarov, A., Tyantov, E.: Large scale landmark recognition via deep metric learning. In: Zhu, W., et al. (eds.) Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM, pp. 169–178. ACM (2019). https://doi.org/10.1145/3357384.3357956
Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents based on wikipedia concepts. Informatica (Slovenia) (2018). http://www.informatica.si/index.php/informatica/article/view/2228
Brejcha, J., Čadík, M.: State-of-the-art in visual geo-localization. Pattern Anal. Appl. 20(3), 613–637 (2017). https://doi.org/10.1007/s10044-017-0611-1
Article MathSciNet Google Scholar
Cheng, J., Wu, Y., AbdAlmageed, W., Natarajan, P.: QATM: quality-aware template matching for deep learning. In: Conference on Computer Vision and Pattern Recognition, CVPR. pp. 11553–11562. Computer Vision Foundation/IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Cheng_QATM_Quality-Aware_Template_Matching_for_Deep_Learning_CVPR_2019_paper.html
Crandall, D.J., Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M.: Mapping the world’s photos. In: International Conference on World Wide Web, WWW, pp. 761–770. ACM (2009). https://doi.org/10.1145/1526709.1526812
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society (2008)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017). https://spacy.io
Izbicki, M., Papalexakis, E.E., Tsotras, V.J.: Exploiting the earth’s spherical geometry to geolocate images. In: European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD, pp. 3–19. Springer (2019). https://doi.org/10.1007/978-3-030-46147-8_1
Kim, H.J., Dunn, E., Frahm, J.: Learned contextual feature reweighting for image geo-localization. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3251–3260. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.346
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015). http://arxiv.org/abs/1412.6980
Kordopatis-Zilos, G., Galopoulos, P., Papadopoulos, S., Kompatsiaris, I.: Leveraging efficientnet and contrastive learning for accurate global-scale location estimation. In: International Conference on Multimedia Retrieval, ICMR, pp. 155–163. ACM (2021). https://doi.org/10.1145/3460426.3463644
Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE, 1971–1986 (2017). https://doi.org/10.1109/JPROC.2017.2688799
Kordopatis-Zilos, G., Popescu, A., Papadopoulos, S., Kompatsiaris, Y.: Placing images with refined language models and similarity search with pca-reduced VGG features. In: MediaEval 2016 Workshop. CEUR-WS.org (2016). http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43
Larson, M.A., Soleymani, M., Gravier, G., Ionescu, B., Jones, G.J.F.: The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE MultiMedia, 93–96 (2017). https://doi.org/10.1109/MMUL.2017.9
Mackenzie, J.M., Benham, R., Petri, M., Trippas, J.R., Culpepper, J.S., Moffat, A.: CC-News-En: A large english news corpus. In: International Conference on Information and Knowledge Management, CIKM, pp. 3077–3084. ACM (2020). https://doi.org/10.1145/3340531.3412762
Müller-Budack, E., Pustu-Iren, K., Ewerth, R.: Geolocation estimation of photos using a hierarchical model and scene classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 575–592. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_35
Chapter Google Scholar
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Ewerth, R.: Multimodal analytics for real-world news using measures of cross-modal entity consistency. In: International Conference on Multimedia Retrieval, ICMR, pp. 16–25. ACM (2020). https://doi.org/10.1145/3372278.3390670
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., Ewerth, R.: Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multimed. Inf. Retrieval 10(2), 111–125 (2021). https://doi.org/10.1007/s13735-021-00207-4
Article Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz, J., Joachims, T. (eds.) International Conference on Machine Learning (ICML), pp. 807–814. Omnipress (2010). https://icml.cc/Conferences/2010/papers/432.pdf
Nominatim. https://nominatim.org/release-docs/latest/api/Reverse/. Accessed 19 May 2022
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, ICML, pp. 8748–8763. PMLR (2021). http://proceedings.mlr.press/v139/radford21a.html
Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: Breakingnews: Article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell., 1072–1085 (2018). https://doi.org/10.1109/TPAMI.2017.2721945
Seo, P.H., Weyand, T., Sim, J., Han, B.: CPlaNet: enhancing image geolocalization by combinatorial partitioning of maps. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 544–560. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_33
Chapter Google Scholar
Serdyukov, P., Murdock, V., van Zwol, R.: Placing flickr photos on a map. In: SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 484–491. ACM (2009). https://doi.org/10.1145/1571941.1572025
Singhal, S., Shah, R.R., Chakraborty, T., Kumaraguru, P., Satoh, S.: Spotfake: a multi-modal framework for fake news detection. In: IEEE International Conference on Multimedia Big Data, BigMM, pp. 39–47. IEEE (2019). https://doi.org/10.1109/BigMM.2019.00-44
Theiner, J., Müller-Budack, E., Ewerth, R.: Interpretable semantic photo geolocation. In: Winter Conference on Applications of Computer Vision, WACV, pp. 1474–1484. IEEE (2022). https://doi.org/10.1109/WACV51458.2022.00154
Thomee, B., et al.: The new data and new challenges in multimedia research. CoRR (2015). http://arxiv.org/abs/1503.01817
Tomesek, J., Cadík, M., Brejcha, J.: Crosslocate: cross-modal large-scale visual geo-localization in natural environments using rendered modalities. In: Winter Conference on Applications of Computer Vision, WACV, pp. 2193–2202. IEEE (2022). https://doi.org/10.1109/WACV51458.2022.00225
Trevisiol, M., Jégou, H., Delhumeau, J., Gravier, G.: Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach. In: International Conference on Multimedia Retrieval, ICMR, pp. 1–8. ACM (2013). https://doi.org/10.1145/2461466.2461468
Uzkent, B., et al.: Learning to interpret satellite images using Wikipedia. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 3620–3626. ijcai.org (2019). https://doi.org/10.24963/ijcai.2019/502
Vo, N.N., Jacobs, N., Hays, J.: Revisiting IM2GPS in the deep learning era. In: International Conference on Computer Vision, ICCV, pp. 2640–2649. IEEE Computer Society (2017)
Google Scholar
Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM, 78–85 (2014). https://doi.org/10.1145/2629489
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2 - a large-scale benchmark for instance-level recognition and retrieval. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2572–2581. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00265
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Chapter Google Scholar

Download references

Acknowledgements

This work was partially funded by the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 812997 (CLEOPATRA ITN), and by the Ministry of Lower Saxony for Science and Culture (Responsible AI in digital society, project no. 51171145).

Author information

Authors and Affiliations

TIB–Leibniz Information Centre for Science and Technology, Hannover, Germany
Golsa Tahmasebzadeh, Ralph Ewerth & Eric Müller-Budack
L3S Research Center, Leibniz University Hannover, Hannover, Germany
Golsa Tahmasebzadeh, Ralph Ewerth & Eric Müller-Budack
Computational Linguistics, University of Potsdam, Potsdam, Germany
Sherzod Hakimov

Authors

Golsa Tahmasebzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Sherzod Hakimov
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Ewerth
View author publications
You can also search for this author in PubMed Google Scholar
Eric Müller-Budack
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Golsa Tahmasebzadeh .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tahmasebzadeh, G., Hakimov, S., Ewerth, R., Müller-Budack, E. (2023). Multimodal Geolocation Estimation of News Photos. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-28238-6_14
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Geolocation Estimation of News Photos