Copyright © 2004 Elsevier B.V. All rights reserved.
Bio2X: a rule-based approach for semi-automatic transformation of semi-structured biological data to XML
Received 21 May 2004;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
Data integration of geographically dispersed, heterogeneous, complex biological databases is a key research area. One of the key features of a successful data integration system is to have a simple self-describing data exchange format. However, many of the biological databases provide data in flat files which are poor data exchange formats. Fortunately, XML can be viewed as a powerful data model and better data exchange format. In this paper, we present the Bio2X system that transforms flat file data into highly hierarchical XML data using rule-based machine learning technique. Bio2X has been fully implemented using Java. Our experiments to transform real world biological data demonstrate the effectiveness of the Bio2X approach.
Author Keywords: Flat files; Rule base; Machine learning; XML; Transformer
Article Outline
- 1. Introduction
- 2. Structure of biological data
- 3. Design of extraction rules
- 3.1. Overview
- 3.2. Extracting children of the root
- 3.3. Extracting hierarchical structure from values
- 3.4. Disjunctive rules
- 4. Rule induction system
- 5. Experimental results
- 6. Related work
- 7. Conclusions and future work
- References
- Vitae






E-mail Article
Add to my Quick Links

Cited By in Scopus (2)






