Abstract
Real estate ads are a rich source of information when studying social representation of residential space. However, extracting knowledge from them poses some methodological challenges namely in terms its spatial content. The use of techniques from artificial intelligence to find and extract knowledge and relationships from textual data improves the classical approaches of Natural Language Processing (NLP). This paper will first conceptualize what kind of information on urban space can be targeted in real estate ads. It will then propose an automated protocol based on artificial intelligence to extract named entities and relationships among them. The extracted information will finally be modeled as RDF graphs and queried through GeoSPARQL. First results will be proposed from the case study of real estate ads on the French Riviera, with a focus on toponymy. Perspectives of quantitative spatial analysis of the geolocated RDF models of real-estate ads will also be highlighted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stosic, D.: ‘par’ et ‘à travers’ dans l’expression des relations spatiales: comparaison entre le français et le serbo-croate (2002). https://hal.archives-ouvertes.fr/tel-00272907/
Relph, E.: Place and placelessness (1976). https://doi.org/10.4135/9781446213742.n5
Alba, M., et al.: La publicité immobilière à l’assaut de l’environnement dans une grande ville du Sud, Mexico, 1950–2000. Ecol. Polit. 39(1), 55 (2010). https://doi.org/10.3917/ecopo.039.0055
Blanchi, A., et al.: The real estate ads, a new data source to understand the social representation of urban space. In: ECTQG21 (2021)
Shearmur, R., et al.: From Chicago to L.A. and back again: a Chicago-inspired quantitative analysis of income distribution in Montreal. Prof. Geogr. 56(1), 109–126 (2004). https://doi.org/10.1111/j.0033-0124.2004.05601016.x
Thomas, M.-P.: Les choix résidentiels: Une approche par les modes de vie, pp. 1–41 (2018)
Sigaud, T.: Accompagner les mobilités résidentielles des salariés: l’épreuve de l’entrée en territoire. Espaces et sociétés 162, 129–145 (2015)
Bailly, A.: Ditances et espaces : vingt ans de géographie des représentations. Espac. géographique 14(3), 197–205 (1985).https://doi.org/10.3406/spgeo.1985.4033
McKenzie, G., et al.: The ‘nearby’ exaggeration in real estate. In: Proceedings of the Cognitive Scales of Spatial Information, CoSSI 2017 (2017)
Lancia, F.: Word co-occurrence and similarity in meaning: some methodological issues. Mind Infin. Dimens., 1–39 (2007)
McKenzie, G., et al.: Identifying urban neighborhood names through user contributed online property listings. ISPRS Int. J. GeoInf. 7(10), 388 (2018)
Hu, Y., et al.: A Semantic and sentiment analysis on online neighborhood reviews for understanding the perceptions of people toward their living environments. Ann. Am. Assoc. Geogr. 109(4), 1052–1073 (2019)
Shrivarsheni: How to Train spaCy to Autodetect New Entities (NER) (2020). https://www.machinelearningplus.com/nlp/training-custom-ner-model-in-spacy/
Andrey from Prodigy Support: Former ensemble NER et extraction de relations (RE), pp. 3–5 (2021). www.support.prodi.gy/t/training-ner-and-relations-extraction-re-together/3911
Wang, J., et al.: NeuroTPR: a neuro-net toponym recognition model for extracting locations from social media messages. Trans. GIS 24(3), 719–735 (2020). https://doi.org/10.1111/tgis.12627
Benesty, M.: NER algo benchmark: spaCy, Flair, m-BERT and camemBERT on anonymizing French commercial legal cases. Towards Data Science (2019). https://towardsdatascience.com/benchmark-ner-algorithm-d4ab01b2d4c3
Hu, Y., et al.: How do people describe locations during a natural disaster: an analysis of tweets from hurricane Harvey. In: Leibniz International Proceedings of Informatics, LIPIcs, vol. 177, no. 23, pp. 1–16 (2020)
Cadorel, L., et al.: Geospatial knowledge in housing advertisements: capturing and extracting spatial information from text (2021). HAL Id: hal-03518717
Duffy, S.: Is Flair a suitable alternative to SpaCy? (2020). https://medium.com/@sapphireduffy/is-flair-a-suitable-alternative-to-spacy-6f55192bfb01
Perera, N., Dehmer, M., Emmert-Streib, F.: Named entity recognition and relation detection for biomedical information extraction. Front. Cell Dev. Biol. 8, 673 (2020)
Sanford NLP Group: Stanza - A Python NLP Library for Many Human Languages | Stanza. https://stanfordnlp.github.io/stanza/. https://universaldependencies.org/
Alfared, R.: Acquisition de grammaire catégorielle de dépendances de grande envergure (2013). HAL Id: tel-00822996
Hérault, M.: La Riviera, pays de l’éternel printemps: Imaginaire paysager et transferts culturels, à Nice et dans son territoire, du Grand Tour à nos jours, Thèse de Doctorat, Sorbonne Université, Paris (2021). https://www.theses.fr/2021SORUL022
Acknowledgement
This research was carried out thanks to a research grant by KCityLabs, KINAXIA Group (CIFRE Agreement with UMR ESPACE).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Blanchi, A., Fusco, G., Emsellem, K., Cadorel, L. (2022). Studying Urban Space from Textual Data: Toward a Methodological Protocol to Extract Geographic Knowledge from Real Estate Ads. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13378. Springer, Cham. https://doi.org/10.1007/978-3-031-10562-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-10562-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10561-6
Online ISBN: 978-3-031-10562-3
eBook Packages: Computer ScienceComputer Science (R0)