Universal Chemical Markup (UCM) - A new format for common chemical data
- Published
- Accepted
- Subject Areas
- Data Science, World Wide Web and Web Science, Software Engineering
- Keywords
- Universal Chemical Markup, UCM, UCM XML structure, UCM built-in validation, UCM examples, UCM VIEWER, recording chemical structures with properties, combining XML schema languages, combining XML formats
- Copyright
- © 2015 Mokrý et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. Universal Chemical Markup (UCM) - A new format for common chemical data. PeerJ PrePrints 3:e1336v1 https://doi.org/10.7287/peerj.preprints.1336v1
Abstract
Background: We wish to introduce a new chemical format called UCM (Universal Chemical Markup). The format is based on XML (Extensible Markup Language) and its first version focuses on recording chemical structures and their properties. Results: UCM currently supports structures containing isotopes, ions and various types of bonding including delocalized bonds. Properties can be expressed by combining UCM with UnitsML (Units Markup Language). Using UnitsML one defines quantities with scientific units, and then refers to them in UCM when recording property values. Users can also add literature references with BibTeXML (BibTeX Markup Language) and annotate the recorded data using plain text or XHTML (Extensible Hypertext Markup Language) descriptions. In contrast to presently available general-purpose chemical formats, UCM offers built-in validation, which combines both grammar and pattern-based XML schema languages. Thus, all recorded data can be precisely validated by UCM schemas in standard XML validators. Conclusions: We developed the structure for UCM from scratch on the basis of an analysis described in our previous article. Starting from scratch allowed us to integrate BibTeXML, UnitsML and XHTML as well as chemical line notations and identifiers into UCM. It also helped us to avoid unnecessary redundant parts and create the implementation that aims to minimize ambiguity and is designed to be easily extensible in the future.
Author Comment
This is a preprint submission to PeerJ Computer Science.
Supplemental Information
Universal Chemical Markup - Supplemental information
Supplemental information for the article "Universal Chemical Markup (UCM) - A new format for common chemical data" includes additional file 1 (UCM examples and schemas) and 2 (UCM documentation).