Skip to main content
Log in

SpatialML: annotation scheme, resources, and evaluation

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geo-coordinates, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with several annotated corpora. Inter-annotator agreement on SpatialML extents is 91.3 F-measure on a corpus of SpatialML-annotated ACE documents released by the Linguistic Data Consortium. Disambiguation agreement on geo-coordinates on ACE is 87.93 F-measure. An automatic tagger for SpatialML extents scores 86.9 F on ACE, while a disambiguator scores 93.0 F on it. Results are also presented for two other corpora. In adapting the extent tagger to new domains, merging the training data from the ACE corpus with annotated data in the new domain provides the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://sourceforge.net/projects/spatialml.

  2. Note that even in situations where it is acceptable for a place to be construed as a point, its punctuality is only an abstraction at some level of resolution.

  3. http://projects.ldc.upenn.edu/ace/annotation/2005Tasks.html.

  4. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDGEOG.

  5. http://www.opengis.net/gml/.

  6. http://www.ontospace.uni-bremen.de/linguisticOntology.html.

  7. http://callisto.mitre.org.

  8. http://gnswww.nga.mil/geonames/GNS/index.jsp.

  9. http://geonames.usgs.gov/pls/gnispublic.

  10. http://www.alexandria.ucsb.edu/downloads/gazprotocol/.

  11. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T03.

  12. http://www.promedmail.org (we are investigating the possibility of sharing this corpus).

  13. http://www.ice.gov/ (this data can be shared).

  14. http://sourceforge.net/projects/carafe.

  15. http://ccil.org/~cowan/XML/tagsoup/.

  16. In the ProMED study, which was conducted early in the project, LatLongs had to agree exactly as strings, with leading or trailing zeros treated as errors. This scoring accounts for some of the lower performance on ProMED.

  17. http://www.uni-hildesheim.de/logclef/LAGI_TaskGuidelines.html.

  18. On the ACE Mandarin corpus, as a baseline, the entity tagger scores 61.8 F-measure without the benefit of a Chinese place name list feature.

References

  • Barker, E., & Purves, R. (2008). A caption annotation system for georeferencing images. In Fifth workshop on geographic information retrieval (GIR’08). ACM 17th Conference on Information and Knowledge Management, Napa, CA, October 30, 2008.

  • Bateman, J. (2008). The long road from spatial language to geospatial information, and the even longer road back: the role of ontological heterogeneity. Invited talk, LREC workshop on methodologies and resources for processing spatial language. http://www.sfbtr8.spatial-cognition.de/SpatialLREC/.

  • Clementini, E., Di Felice, P., & Hernández, D. (1997). Qualitative representation of positional information. Artificial Intelligence, 95(2), 317–356.

    Article  Google Scholar 

  • Cohn, A. G., Bennett, B., Gooday, J., & Gotts, N. M. (1997). Qualitative spatial representation and reasoning with the region connection calculus. GeoInformatica, 1, 275–316.

    Article  Google Scholar 

  • Cristiani, M., & Cohn, A. G. (2002). SpaceML: A mark-up language for spatial knowledge. Journal of Visual Languages and Computing, 13, 97–116.

    Article  Google Scholar 

  • Daume III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of ACL’2007.

  • Egenhofer, M., & Herring, J. (1990). Categorizing binary topological relations between regions, lines, and points in geographic databases/technical report. Department of Surveying Engineering, University of Maine, 1990.

  • Garbin, E., & Mani, I. (2005). Disambiguating toponyms in news. In Proceedings of the human language technology conference and conference on empirical methods in natural language processing (pp. 363–370).

  • Leidner, J. L. (2006). Toponym resolution: A first large-scale comparative evaluation. Research Report EDI-INF-RR-0839.

  • Levinson, S. C. (2006). Space in language and cognition: Explorations in cognitive diversity. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Mandl, T., Agosti, M., Di Nunzio, G. M., Yeh, A., Mani, I., Doran, C. et al. (2009). LogCLEF 2009: The CLEF 2009 multilingual logfile analysis track overview. Working notes for the CLEF 2009 workshop, Corfu, Greece. http://clef.isti.cnr.it/2009/working_notes/LogCLEF-2009-Overview-Working-Notes-2009-09-14.pdf.

  • Mardis, S., & Burger, J. (2005). Design for an integrated gazetteer database: Technical description and user guide for a gazetteer to support natural language processing applications. Mitre technical report, MTR 05B0000085. http://www.mitre.org/work/tech_papers/tech_papers_06/06_0375/index.html.

  • Papadias, D., Theodoridis, Y., Sellis, T. K., & Egenhofer, M. J. (1995). Topological relations in the world of minimum bounding rectangles: A study with R-trees. In Proceedings of the 1995 ACM SIGMOD international conference on management of data (pp. 92–103). San Jose, California. May 22–25, 1995.

  • Pustejovsky, J., Ingria, B., Sauri, R., Castano, J., Littman, J., Gaizauskas, R., et al. (2005). The specification language timeML. In I. Mani, J. Pustejovsky, & R. Gaizauskas (Eds.), The language of time: A reader (pp. 545–557). Oxford: Oxford University Press.

    Google Scholar 

  • Pustejovsky, J., & Moszkowicz, J. L. (2008). Integrating motion predicate classes with spatial and temporal annotations. In Proceedings of COLING 2008: Companion volume—posters and demonstrations (pp. 95–98).

  • Randell, D. A., Cui, Z., & Cohn, A. G. (1992). A spatial logic based on regions and connection. In Proceedings of 3rd international conference on knowledge representation and reasoning, Morgan Kaufmann, San Mateo (pp. 165–176).

  • Rashid, A., Shariff, B. M., Egenhofer, M. J., & Mark, D. M. (1998). Natural-language spatial relations between linear and area objects: The topology and metric of english-language terms. International Journal of Geographic Information Science, 12(3), 215–246.

    Google Scholar 

  • Schilder, F., Versley, Y., & Habel, C. (2004). Extracting spatial information: Grounding, classifying and linking spatial expressions. Workshop on geographic information. Retrieval at the 27th ACM SIGIR conference, Sheffield, England, UK.

  • Sundheim, B., Mardis, S., & Burger, J. (2006). Gazetteer linkage to WordNet. In The Third International WordNet Conference, South Jeju Island, Korea. http://nlpweb.kaist.ac.kr/gwc/pdf2006/7.pdf.

Download references

Acknowledgments

This research has been funded by the MITRE Innovation Program (Public Release Case Number 09-3827). We would like to thank three anonymous reviewers for their comments. We fondly and gratefully remember our late co-author Janet Hitzeman (1962–2009), without whom this work would not have been possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inderjeet Mani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mani, I., Doran, C., Harris, D. et al. SpatialML: annotation scheme, resources, and evaluation. Lang Resources & Evaluation 44, 263–280 (2010). https://doi.org/10.1007/s10579-010-9121-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-010-9121-0

Keywords

Navigation