Proceedings of the IV International research conference "Information technologies in Science, Management, Social sphere and Medicine" (ITSMSSM 2017)

Review and analysis of means and methods for automatic data extraction from heterogeneous sources

Authors
Alexey Samoylov, Alexey Tselykh, Nikolay Sergeev, Margarita Kucherova
Corresponding Author
Alexey Samoylov
Available Online December 2017.
DOI
10.2991/itsmssm-17.2017.43How to use a DOI?
Keywords
Big data, data extraction, knowledge discovery, data mining, heterogeneous data sources
Abstract

There is a problem in the process of data analysis, which is related to their extraction and preparation. This problem is the consequence of a necessity for integration of heterogeneous structures both in structure and format. The technical solution to this problem is to use ETL-systems that automate processes of extraction, transformation and loading of data into a storage according to strictly defined rules. To date, scientific research in this area focuses on increasing performance and documenting the semantics of the process for its reuse. The paper presents results of a review and analysis of actual solutions in the field of extraction of heterogeneous data of large volume.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the IV International research conference "Information technologies in Science, Management, Social sphere and Medicine" (ITSMSSM 2017)
Series
Advances in Computer Science Research
Publication Date
December 2017
ISBN
10.2991/itsmssm-17.2017.43
ISSN
2352-538X
DOI
10.2991/itsmssm-17.2017.43How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Alexey Samoylov
AU  - Alexey Tselykh
AU  - Nikolay Sergeev
AU  - Margarita Kucherova
PY  - 2017/12
DA  - 2017/12
TI  - Review and analysis of means and methods for automatic data extraction from heterogeneous sources
BT  - Proceedings of the IV International research conference "Information technologies in Science, Management, Social sphere and Medicine" (ITSMSSM 2017)
PB  - Atlantis Press
SP  - 201
EP  - 204
SN  - 2352-538X
UR  - https://doi.org/10.2991/itsmssm-17.2017.43
DO  - 10.2991/itsmssm-17.2017.43
ID  - Samoylov2017/12
ER  -