Document information

doi:10.1038/npre.2008.2110.1
0 votes

Building and Using Geospatial Ontology in the BioCaster Surveillance System

Son Doan1, Quoc-Hung Ngo 2, Ai Kawazoe1, & Nigel Collier1

Correspondence: (Login to view email address)

  1. National Institute of Informatics, Japan
  2. University of Information Technology, Vietnam National University (HCM), Vietnam
Document Type:
Poster
Date:
Received 22 July 2008 20:28 UTC; Posted 24 July 2008
Subjects:
Bioinformatics
Tags:
Abstract:

This abstract presents an approach to building a geospatial ontology from Wikipedia and using it in BioCaster, a system for detecting and tracking infectious disease outbreaks from online news. Motivated by the need to interpret the geospatial dynamics of events we built a database containing the names of countries and major cities from Wikipedia. We started by automatically extracting country and dependent territory names and sub-country (subdivision and dependent area) names in the form of ISO 3166-1 and ISO 3166-2, respectively. Then, we re-created the part-whole relation between countries and sub-countries by verifying links from countries to their sub-countries. Verification was done by manual checking. The building process is semi-automatically implemented with automatically extracting locations and verification with human-aid. In addition,
we extracted absolute longitudes/latitudes of each location for the use in Google Map and Google Earth applications. Finally we combined the geospatial hierarchy from Wikipedia with the BioCaster ontology (BCO). The preliminary results show a geospatial ontology with two administrative levels: 243 countries and 4,025 sub-countries. The geospatial ontology was integrated into the extant BCO, a multilingual public health ontology focusing on infectious diseases and was available at http://biocaster.nii.ac.jp.

The geospatial ontology was used to develop an algorithm for detecting locations of outbreaks that occur in news stories. Firstly, locations in news stories are automatically tagged with a named entity recognizer based on a support vector machine trained on 1,000 manually annotated texts. Secondly, we mapped location names from the text to identifiers in the geospatial ontology at the country and sub-country levels. Grounding proceeded as follows: First, we ranked pairs of disease-location by frequency in a set of collected articles which shared similar date stamps. We then chose the top disease-location pairs to re-map into each news story. The re-mapping process is done by regular expression matching. In order to infer country names where this information was missing from the text we manually constructed a ranked list of sub-country and country pairs based on population size.

Data collected in a 10 week period (Dec 20, 2007 to Feb 20, 2008) showed that the system detected 7,412 English articles, covering 110 countries and 360 sub-countries, of which 58.00% Africa, 18.23% Asia, 11.37% South America, 5.30 % North America, 3.40% Middle East, 2.86% Europe and 0.34% Ocean. Relevant articles came predominantly from a few sources such as Google News, the European Media Monitor and ProMED-mail. Among disease/country outbreaks successfully detected during this period were ebola in Uganda (Bundibugyo, Kampala, Mbarara), yellow fever in Brazil (Goias, Sao Paulo), avian influenza in Indonesia (Jakarta, Banten), and cholera in Vietnam (Ha Noi, Ha Tay).

The results were plotted on a publicly available Google Map and indicate that our geospatial ontology met our requirements. In the future, we plan to extend the ontology into deeper levels like districts and sub-districts (wards, towns, villages). Evaluation and comparison of our geospatial ontology to other available resources like GAZ and dbpedia will also be considered.

Presented at:
Bio-Ontologies : Knowledge in Biology 2008 , 20 July 2008

Discussion

Votes:

0 votes

(Login to vote)

Comments:

0 comments

(Login to post a comment)

(Login to share with a colleague)

Additional information

License:
This document is licensed to the public under the Creative Commons Attribution 3.0 License
How to cite this document:

Doan, Son, Ngo , Quoc-Hung, Kawazoe, Ai, and Collier, Nigel. Building and Using Geospatial Ontology in the BioCaster Surveillance System. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2008.2110.1> (2008)

Version info:

Other versions of this document in Nature Precedings

None.

Other versions of this document elsewhere on the web

None known.

Participate

Advertisement