doi:10.1016/j.compenvurbsys.2005.08.001
Copyright © 2005 Elsevier Ltd All rights reserved.
Web-based delineation of imprecise regions
Avi Arampatzisa,
, Marc van Krevelda,
, Iris Reinbachera,
, Christopher B. Jonesb,
,
, Subodh Vaidb,
, Paul Cloughc,
, Hideo Johoc,
and Mark Sandersonc, 
aInstitute of Information and Computing Sciences, Utrecht University, P.O. Box 80.089, 3508TB Utrecht, The Netherlands
bSchool of Computer Science, Cardiff University, UK
cDepartment of Information Studies, University of Sheffield, UK
Received 18 January 2005;
accepted 18 July 2005.
Available online 16 November 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
This paper describes several steps in the derivation of boundaries of imprecise regions using the Web as the information source. We discuss how to use the Web to obtain locations that are part of and locations that are not part of the region to be delineated, and then we propose methods to compute the region algorithmically. The methods introduced are evaluated to judge the potential of the approach.
Keywords: Geographical information systems (GIS); World-Wide Web (WWW); Imprecise regions; Fuzzy boundaries; Geometric algorithms
Fig. 1. Example Google search result for “* is located in the Midlands”.
Fig. 2. α-Shape of a set of red points (circles) and its adaptation so that a blue point (square) is no longer inside.
Fig. 3. Construction illustrating how a polygon is adapted so that the blue point p is no longer inside.
Fig. 4. Delaunay triangulation of a set of red and blue points, and a polygon that separates them by connecting midpoints of Delaunay edges.
Fig. 5. Illustration of the green angle of four of the points.
Fig. 6. The polygon obtained by two recolorings of the points in Fig. 5.
Fig. 7. Number of snippets and those containing at least one correct location (useful) by region.
Fig. 8. Number of snippets and those containing at least one correct location (useful) by trigger phrase.
Fig. 9. Different values of α affect the boundary obtained by the α-shape algorithm considerably. Shown are the outcome after choosing α = 315, α = 400, α = 600, α = 700 for Wales.
Fig. 10. Different values of α affect the boundary obtained by the α-shape algorithm considerably. Shown are the outcome after choosing α = 315, α = 400, α = 600, α = 700 for the Midlands.
Fig. 11. Delineated polygon for Wales before recoloring, and the outcome of the recoloring algorithm with angles 185, 215, and 260.
Fig. 12. Delineated polygon for the Midlands before recoloring, and the outcome of the recoloring algorithm for the Midlands with angles 185, 215, and 260.
Fig. 13. Delineated polygon for East Anglia with the α-shape algorithm (α = 600) and the recoloring method (angle 215).
Fig. 14. Delineated polygon for South East with the adaptation method (α = 600) and the recoloring method (angle 215).
Table 1.
Trigger phrases used to identify geo-references

Table 2.
Evaluation results for geo-parsing

Where C = correct; PC = partially correct; M = missing; FP = false positives; F1 strict = F1 computed using correct; F1 lenient = F1 computed using correct and partially correct; F1 avg = average of F1 strict and F1 lenient.
Table 3.
Number of locations identified which are region members and ambiguous (using full IE)

Table 4.
Locations extracted from the local context of the target region (the sentence)

Table 5.
Top 20 locations extracted from the Google snippets (sentence only) and titles ranked by ascending order of frequency
