Abstract
The widespread growth of multimodal news requires sophisticated approaches to interpret content and relations of different modalities. Images are of utmost importance since they represent a visual gist of the whole news article. For example, it is essential to identify the locations of natural disasters for crisis management or to analyze political or social events across the world. In some cases, verifying the location(s) claimed in a news article might help human assessors or fact-checking efforts to detect misinformation, i.e., fake news. Existing methods for geolocation estimation typically consider only a single modality, e.g., images or text. However, news images can lack sufficient geographical cues to estimate their locations, and the text can refer to various possible locations. In this paper, we propose a novel multimodal approach to predict the geolocation of news photos. To enable this approach, we introduce a novel dataset called Multimodal Geolocation Estimation of News Photos (MMG-NewsPhoto). MMG-NewsPhoto is, so far, the largest dataset for the given task and contains more than half a million news texts with the corresponding image, out of which 3000 photos were manually labeled for the photo geolocation based on information from the image-text pairs. For a fair comparison, we optimize and assess state-of-the-art methods using the new benchmark dataset. Experimental results show the superiority of the multimodal models compared to the unimodal approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source code & dataset: https://github.com/TIBHannover/mmg-newsphoto.
References
Armitage, J., Kacupaj, E., Tahmasebzadeh, G., Swati, Maleshkova, M., Ewerth, R., Lehmann, J.: MLM: a benchmark dataset for multitask learning with multiple languages and modalities. In: International Conference on Information and Knowledge Management, CIKM, pp. 2967–2974. ACM (2020). https://doi.org/10.1145/3340531.3412783
Avrithis, Y., Kalantidis, Y., Tolias, G., Spyrou, E.: Retrieving landmark and non-landmark images from community photo collections. In: International Conference on Multimedia, MM, pp. 153–162. ACM (2010). https://doi.org/10.1145/1873951.1873973
Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR (2016). http://arxiv.org/abs/1607.06450
Baatz, G., Saurer, O., Köser, K., Pollefeys, M.: Large scale visual geo-localization of images in mountainous terrain. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 517–530. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_37
Berton, G.M., Masone, C., Caputo, B.: Rethinking visual geo-localization for large-scale applications. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4868–4878. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.00483
Biten, A.F., Gómez, L., Rusiñol, M., Karatzas, D.: Good news, everyone! context driven entity-aware captioning for news images. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 12466–12475. Computer Vision Foundation/IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Biten_Good_News_Everyone_Context_Driven_Entity-Aware_Captioning_for_News_Images_CVPR_2019_paper.html
Boiarov, A., Tyantov, E.: Large scale landmark recognition via deep metric learning. In: Zhu, W., et al. (eds.) Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM, pp. 169–178. ACM (2019). https://doi.org/10.1145/3357384.3357956
Brank, J., Leban, G., Grobelnik, M.: Semantic annotation of documents based on wikipedia concepts. Informatica (Slovenia) (2018). http://www.informatica.si/index.php/informatica/article/view/2228
Brejcha, J., Čadík, M.: State-of-the-art in visual geo-localization. Pattern Anal. Appl. 20(3), 613–637 (2017). https://doi.org/10.1007/s10044-017-0611-1
Cheng, J., Wu, Y., AbdAlmageed, W., Natarajan, P.: QATM: quality-aware template matching for deep learning. In: Conference on Computer Vision and Pattern Recognition, CVPR. pp. 11553–11562. Computer Vision Foundation/IEEE (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Cheng_QATM_Quality-Aware_Template_Matching_for_Deep_Learning_CVPR_2019_paper.html
Crandall, D.J., Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M.: Mapping the world’s photos. In: International Conference on World Wide Web, WWW, pp. 761–770. ACM (2009). https://doi.org/10.1145/1526709.1526812
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing (2017). https://spacy.io
Izbicki, M., Papalexakis, E.E., Tsotras, V.J.: Exploiting the earth’s spherical geometry to geolocate images. In: European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD, pp. 3–19. Springer (2019). https://doi.org/10.1007/978-3-030-46147-8_1
Kim, H.J., Dunn, E., Frahm, J.: Learned contextual feature reweighting for image geo-localization. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3251–3260. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.346
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations, ICLR (2015). http://arxiv.org/abs/1412.6980
Kordopatis-Zilos, G., Galopoulos, P., Papadopoulos, S., Kompatsiaris, I.: Leveraging efficientnet and contrastive learning for accurate global-scale location estimation. In: International Conference on Multimedia Retrieval, ICMR, pp. 155–163. ACM (2021). https://doi.org/10.1145/3460426.3463644
Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE, 1971–1986 (2017). https://doi.org/10.1109/JPROC.2017.2688799
Kordopatis-Zilos, G., Popescu, A., Papadopoulos, S., Kompatsiaris, Y.: Placing images with refined language models and similarity search with pca-reduced VGG features. In: MediaEval 2016 Workshop. CEUR-WS.org (2016). http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_13.pdf
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011). https://repository.upenn.edu/asc_papers/43
Larson, M.A., Soleymani, M., Gravier, G., Ionescu, B., Jones, G.J.F.: The benchmarking initiative for multimedia evaluation: Mediaeval 2016. IEEE MultiMedia, 93–96 (2017). https://doi.org/10.1109/MMUL.2017.9
Mackenzie, J.M., Benham, R., Petri, M., Trippas, J.R., Culpepper, J.S., Moffat, A.: CC-News-En: A large english news corpus. In: International Conference on Information and Knowledge Management, CIKM, pp. 3077–3084. ACM (2020). https://doi.org/10.1145/3340531.3412762
Müller-Budack, E., Pustu-Iren, K., Ewerth, R.: Geolocation estimation of photos using a hierarchical model and scene classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 575–592. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_35
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Ewerth, R.: Multimodal analytics for real-world news using measures of cross-modal entity consistency. In: International Conference on Multimedia Retrieval, ICMR, pp. 16–25. ACM (2020). https://doi.org/10.1145/3372278.3390670
Müller-Budack, E., Theiner, J., Diering, S., Idahl, M., Hakimov, S., Ewerth, R.: Multimodal news analytics using measures of cross-modal entity and context consistency. Int. J. Multimed. Inf. Retrieval 10(2), 111–125 (2021). https://doi.org/10.1007/s13735-021-00207-4
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz, J., Joachims, T. (eds.) International Conference on Machine Learning (ICML), pp. 807–814. Omnipress (2010). https://icml.cc/Conferences/2010/papers/432.pdf
Nominatim. https://nominatim.org/release-docs/latest/api/Reverse/. Accessed 19 May 2022
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, ICML, pp. 8748–8763. PMLR (2021). http://proceedings.mlr.press/v139/radford21a.html
Ramisa, A., Yan, F., Moreno-Noguer, F., Mikolajczyk, K.: Breakingnews: Article annotation by image and text processing. IEEE Trans. Pattern Anal. Mach. Intell., 1072–1085 (2018). https://doi.org/10.1109/TPAMI.2017.2721945
Seo, P.H., Weyand, T., Sim, J., Han, B.: CPlaNet: enhancing image geolocalization by combinatorial partitioning of maps. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 544–560. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_33
Serdyukov, P., Murdock, V., van Zwol, R.: Placing flickr photos on a map. In: SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 484–491. ACM (2009). https://doi.org/10.1145/1571941.1572025
Singhal, S., Shah, R.R., Chakraborty, T., Kumaraguru, P., Satoh, S.: Spotfake: a multi-modal framework for fake news detection. In: IEEE International Conference on Multimedia Big Data, BigMM, pp. 39–47. IEEE (2019). https://doi.org/10.1109/BigMM.2019.00-44
Theiner, J., Müller-Budack, E., Ewerth, R.: Interpretable semantic photo geolocation. In: Winter Conference on Applications of Computer Vision, WACV, pp. 1474–1484. IEEE (2022). https://doi.org/10.1109/WACV51458.2022.00154
Thomee, B., et al.: The new data and new challenges in multimedia research. CoRR (2015). http://arxiv.org/abs/1503.01817
Tomesek, J., Cadík, M., Brejcha, J.: Crosslocate: cross-modal large-scale visual geo-localization in natural environments using rendered modalities. In: Winter Conference on Applications of Computer Vision, WACV, pp. 2193–2202. IEEE (2022). https://doi.org/10.1109/WACV51458.2022.00225
Trevisiol, M., Jégou, H., Delhumeau, J., Gravier, G.: Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach. In: International Conference on Multimedia Retrieval, ICMR, pp. 1–8. ACM (2013). https://doi.org/10.1145/2461466.2461468
Uzkent, B., et al.: Learning to interpret satellite images using Wikipedia. In: International Joint Conference on Artificial Intelligence, IJCAI, pp. 3620–3626. ijcai.org (2019). https://doi.org/10.24963/ijcai.2019/502
Vo, N.N., Jacobs, N., Hays, J.: Revisiting IM2GPS in the deep learning era. In: International Conference on Computer Vision, ICCV, pp. 2640–2649. IEEE Computer Society (2017)
Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM, 78–85 (2014). https://doi.org/10.1145/2629489
Weyand, T., Araujo, A., Cao, B., Sim, J.: Google landmarks dataset v2 - a large-scale benchmark for instance-level recognition and retrieval. In: Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2572–2581. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00265
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Acknowledgements
This work was partially funded by the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 812997 (CLEOPATRA ITN), and by the Ministry of Lower Saxony for Science and Culture (Responsible AI in digital society, project no. 51171145).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tahmasebzadeh, G., Hakimov, S., Ewerth, R., Müller-Budack, E. (2023). Multimodal Geolocation Estimation of News Photos. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13981. Springer, Cham. https://doi.org/10.1007/978-3-031-28238-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-28238-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28237-9
Online ISBN: 978-3-031-28238-6
eBook Packages: Computer ScienceComputer Science (R0)