doi:10.1016/j.future.2005.10.005
Copyright © 2005 Elsevier Ltd All rights reserved.
BioSimGrid: Grid-enabled biomolecular simulation data storage and analysis
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Muan Hong Nga,
,
, Steven Johnstona, Bing Wuc, Stuart E. Murdockb, Kaihsu Taid, Hans Fangohra, Simon J. Coxa, Jonathan W. Essexb, Mark S.P. Sansomd and Paul Jeffreysc
aSouthampton e-Science Centre, SO17 1BJ Southampton, UK
bSchool of Chemistry, University of Southampton, SO17 1BJ Southampton, UK
cOxford e-Science Centre, OX2 6NN Oxford, UK
dDepartment of Biochemistry, University of Oxford, OX1 3QU Oxford, UK
Received 15 August 2005;
revised 20 October 2005;
accepted 21 October 2005.
Available online 2 February 2006.
Abstract
In computational biomolecular research, large amounts of simulation data are generated to capture the motion of proteins. These massive simulation data can be analysed in a number of ways to reveal the biochemical properties of the proteins. However, the legacy way of storing these data (usually in the laboratory where the simulations have been run) often hinders a wider sharing and easier cross-comparison of simulation results. The data is commonly encoded in a way specific to the simulation package that produced the data and can only be analysed with tools developed specifically for that simulation package. The BioSimGrid platform seeks to provide a solution to these challenges by exploiting the potential of the Grid in facilitating data sharing. By using BioSimGrid either in a scripting or web environment, users can deposit their data and reuse it for analysis. BioSimGrid tools manage the multiple storage locations transparently to the users and provide a set of retrieval and analysis tools for processing the data in a convenient and efficient manner. This paper details the usage and implementation of BioSimGrid using a combination of commercial databases, the Storage Resource Broker and Python scripts, gluing the building blocks together. It introduces a case study of how BioSimGrid can be used for better storage, retrieval and analysis of biomolecular simulation data.
Keywords: Biomolecular simulation; Database; Grid computing; Storage resource broker; Python
Fig. 1. The three tier architectural diagram of BioSimGrid depicting the data layer, middle-tier layer and application layer.
Fig. 2. The modular approach of the BioSimGrid deposition modules comprises simulation data parsers, a data validator and a data importer. New parsers can easily be added to support new data formats. (Source: Phil. Trans. R. Soc. A.)
Fig. 3. A retrieval scenario: a frame collection is used to specify which frame from which trajectory is to be accessed; the frame object is used to access its coordinates or metadata.
Fig. 4. This illustrates the federated BioSimGrid in operation during a data deposition routine.
Fig. 5. This illustrates the federated BioSimGrid in operation during an analysis where data is retrieved.
Fig. 6. An example of a Python deposition routine implemented by user ‘Bob’ to deposit a NAMD trajectory into BioSimGrid. The underlying complexity of parsing, validating and importing the simulation data is hidden from the user.
Fig. 7. An example of a Python retrieval routine for getting frames 1, 2, 3 from trajectory ‘BioSimGrid_GB-STH_1’ in lines 1–11. Lines 13–18 illustrate how the frame object is used to access the actual data.
Fig. 8. A Python analysis script that calculates the average structure of a molecule. This shows how an existing analysis tool (BioSim.Analysis.AverageStructure) can be used in a BioSimGrid script.
Fig. 9. The four enzymes: OMPLA, AChE, OmpT and PagP used in a comparative analysis of molecular dynamics simulations.

Corresponding address: Southampton e-Science Centre, School of Engineering Sciences, Building 25 Highfield University of Southampton, SO17 1BJ Southampton, Hants, UK. Tel.: +44 2380598520.