QDex: A Database Profiler for Generic Bio-data Exploration and Quality Aware Integration

Moussouni, F.; Berti-Équille, L.; Rozé, G.; Loréal, O.; Guérin, E.

doi:10.1007/978-3-540-77010-7_2

F. Moussouni¹,
L. Berti-Équille²,
G. Rozé¹,
O. Loréal¹ &
…
E. Guérin¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4832))

Included in the following conference series:

International Conference on Web Information Systems Engineering

802 Accesses

Abstract

In human health and life sciences, researchers extensively collaborate with each other, sharing genomic, biomedical and experimental results. This necessitates dynamically integrating different databases into a single repository or a warehouse. The data integrated in these warehouses are extracted from various heterogeneous sources, having different degrees of quality and trust. Most of the time, they are neither rigorously chosen nor carefully controlled for data quality. Data preparation and data quality metadata are recommended but still insufficiently exploited for ensuring quality and validating the results of information retrieval or data mining techniques.

In a previous work, we built a data warehouse called GEDAW (Gene Expression Data Warehouse) that stores various information: data on genes expressed in the liver during iron overload and liver diseases, relevant information from public databanks (mostly in XML), DNA-chips home experiments and also medical records. Based on our past experience, this paper reports briefly on the lessons learned from biomedical data integration and data quality issues, and the solutions we propose to the numerous problems of schema evolution of both data sources and warehousing system. In this context, we present QDex, a Quality driven bio-Data Exploration tool, which provides a functional and modular architecture for database profiling and exploration, enabling users to set up query workflows and take advantage of data quality profiling metadata before the complex processes of data integration in the warehouse. An illustration with QDex Tool is shown afterwards.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anathakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating Fuzzy Duplicates in Data warehouses. In: Proc. of Intl. Conf. VLDB (2002)
Google Scholar
Batini, C., Catarci, T., Scannapiceco, M.: A Survey of Data Quality Issues in Cooperative Information Systems. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, Springer, Heidelberg (2004)
Google Scholar
Do, H.-H., Rahm, E.: Flexible Integration of Molecular-biological Annotation Data: The GenMapper Approach. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, Springer, Heidelberg (2004)
Google Scholar
Guérin, E., Marquet, G., Burgun, A., Loréal, O., Berti-Equille, L., Leser, U., Moussouni, F.: Integrating and Warehousing Liver Gene Expression Data and Related Biomedical Resources in GEDAW. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, Springer, Heidelberg (2005)
Google Scholar
Guérin, E., Marquet, G., Chabalier, J., Troadec, M.B., Guguen-Guillouzo, C., Loréal, O., Burgun, A., Moussouni, F.: Combining biomedical knowledge and transcriptomic data to extract new knowledge on genes. Journal of Integrative Bioinformatics 3(2) (2006)
Google Scholar
Harris, M.A., et al.: Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. ( Database issue) 32, D258–D261 (2004)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. ( Database issue) 32, D267–D270 (2004)
Google Scholar
Lacroix, Z., Critchlow, T. (eds.): Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Martinez, A., Hammer, J.: Making Quality Count in Biological Data Sources. In: IQIS 2005. Proc. of the 2nd Intl. ACM Workshop on Information Quality in Information Systems, USA (June 2004)
Google Scholar
Müller, H., Leser, U., Freytag, J.-C.: Mining for Patterns in Contradictory Data. In: IQIS 2004. Proc. of the 1st Intl. ACM Workshop on Information Quality in Information Systems, France, pp. 51–58 (June 2004)
Google Scholar
Müller, H., Naumann, F., Freytag, J.-C.: Data Quality in Genome Databases. In: ICIQ 2003. Proc. of Conference on Information Quality, pp. 269–284. MIT, Cambridge (2003)
Google Scholar
Overton, C.G., Haas, J.: Case-Based Reasoning Driven Gene Annotation. In: Computational Methods in MolecularBiology, Elsevier Science, Amsterdam (1998)
Google Scholar
Rahm, E., Do, H.: Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Google Scholar
Thanaraj, T.A.: A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures. Nucleic Acids Res. 27(13), 2627–2637 (1999)
Article Google Scholar
Wang, R.Y.: Journey to Data Quality. In: Advances in Database Systems, vol. 23, Kluwer Academic Press, Boston (2002)
Google Scholar
Wang, R., Kon, H., Madnick, S.: Data Quality Requirements Analysis and Modelling. In: Ninth International Conference of Data Engineering, Vienna, Austria (1993)
Google Scholar
Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., Huber, W.: BioMart and BioConductor: A powerful link between biological databases and microarray data analysis. Bioinformatics 21(16), 3439–3440 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

INSERM U522 CHU Pontchaillou, 35033 Rennes, France
F. Moussouni, G. Rozé, O. Loréal & E. Guérin
IRISA, Campus Universitaire de Beaulieu, 35042 Rennes, France
L. Berti-Équille

Authors

F. Moussouni
View author publications
You can also search for this author in PubMed Google Scholar
L. Berti-Équille
View author publications
You can also search for this author in PubMed Google Scholar
G. Rozé
View author publications
You can also search for this author in PubMed Google Scholar
O. Loréal
View author publications
You can also search for this author in PubMed Google Scholar
E. Guérin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mathias Weske Mohand-Saïd Hacid Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moussouni, F., Berti-Équille, L., Rozé, G., Loréal, O., Guérin, E. (2007). QDex: A Database Profiler for Generic Bio-data Exploration and Quality Aware Integration. In: Weske, M., Hacid, MS., Godart, C. (eds) Web Information Systems Engineering – WISE 2007 Workshops. WISE 2007. Lecture Notes in Computer Science, vol 4832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77010-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-77010-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77009-1
Online ISBN: 978-3-540-77010-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics