The UK e-Science Core Programme and the Grid

https://doi.org/10.1016/S0167-739X(02)00082-1Get rights and content

Abstract

This paper describes the £120M UK ‘e-Science’ (http://www.research-councils.ac.uk/ and http://www.escience-grid.org.uk) initiative and begins by defining what is meant by the term e-Science. The majority of the £120M, some £75M, is funding large-scale e-Science pilot projects in many areas of science and engineering. The infrastructure needed to support such projects must permit routine sharing of distributed and heterogeneous computational and data resources as well as supporting effective collaboration between groups of scientists. Such an infrastructure is commonly referred to as the Grid. Apart from £10M towards a Teraflop computer, the remaining funds, some £35M, constitute the e-Science ‘Core Programme’. The goal of this Core Programme is to advance the development of robust and generic Grid middleware in collaboration with industry. The key elements of the Core Programme will be outlined including details of a UK e-Science Grid testbed. The pilot e-Science projects that have so far been announced are then briefly described. These projects span a range of disciplines from particle physics and astronomy to engineering and healthcare, and illustrate the breadth of the UK e-Science Programme. In addition to these major e-Science projects, the Core Programme is funding a series of short-term e-Science demonstrators across a number of disciplines as well as projects in network traffic engineering and some international collaborative activities. We conclude with some remarks about the need to develop a data architecture for the Grid that will allow federated access to relational databases as well as flat files.

Introduction

The term ‘e-Science’ was introduced by Dr. John Taylor, Director General of Research Councils in the UK Office of Science and Technology (OST). From his previous experience as Head of Hewlett-Packard’s European Research Laboratories, and from his experience as Director General of the OST, Taylor saw that many areas of science are becoming increasingly reliant on new ways of collaborative, multidisciplinary working. The term e-Science is intended to capture these new mode of working [1]:

e-Science is about global collaboration in key areas of science and the next generation of infrastructure that will enable it.’

The infrastructure to enable this science revolution is generally referred to as the Grid [2]. Two examples of such e-Science projects are from particle physics and astronomy. The world-wide particle physics community is planning an exciting new series of experiments to be carried out on the new ‘Large Hadron Collider’ (LHC) experimental facility under construction at CERN in Geneva. The goal is to find signs of the Higgs boson, key to the generation of mass for both the vector bosons and the fermions of the Standard Model of the weak and electromagnetic interactions. The experimental physicists are also hoping for indications of other new types of matter such as supersymmetric particles which may shed light on the ‘dark matter’ problem of cosmology. These LHC experiments are on a scale never before seen in physics, with each experiment involving a collaboration of over 100 institutions and over 1000 physicists from Europe, USA and Japan. When operational in 2005, the LHC will generate petabytes of experimental data per year, for each experiment. This vast amount of data needs to be pre-processed and distributed for further analysis by all members of the consortia to search for signals betraying the presence of the Higgs boson or other surprises. The physicists need to put in place an LHC Grid infrastructure that will permit the transport and data mining of such distributed data sets. There are a number of funded projects in Europe (EU DataGrid [3], EU DataTag [4], UK GridPP [5] and in the USA (NSF GriPhyN [6], DOE PPDataGrid [7], NSF iVDGL [8]) in which the particle physicists are working to build a Grid that will support these needs. Our second example of e-Science is much more directly data-centric. In the UK, the astronomers are planning to create a ‘virtual observatory’ in their e-Science AstroGrid project. There are similar initiatives in the USA with the NSF NVO [9] project and the EU AVO [10] project. The goal of these projects is to provide uniform access to a federated, distributed repository of astronomical data spanning all wavelengths from radio waves to X-rays. At present, astronomical data using different wavelengths are taken with different telescopes and stored in a wide variety of formats. Their goal is to create something like a ‘data warehouse’ for astronomical data that will enable new types of studies to be performed. Again, the astronomers are considering building a Grid infrastructure to support these Virtual Observatories. Later in this article we shall describe other types of e-Science and e-Engineering problems that have more obvious interest to industry.

The vision for a layer of ‘Grid’ middleware that provides a set of core services to enable such new types of science and engineering is due to Ian Foster, Carl Kesselman and Stephen Tuecke. In their Globus project, they have developed parts of a prototype open source Grid Toolkit [11]. Their choice of the name ‘Grid’ to describe this middleware infrastructure resonates with the idea of a future in which computing resources, compute cycles and storage, as well as expensive scientific facilities and software, can be accessed on-demand like the electric power utilities of today. These ‘e-Utility’ ideas are also reminiscent with the recent trend of the Web community towards a model of ‘Web services’ advertised by brokers and consumed by applications, which have recently been brought together in the Open Grid Services Architecture (OGSA) [12].

In the next section, we outline the general structure of the UK e-Science programme and discuss the ‘existence proof’ of the NASA Information Power Grid (IPG) [13] as the closest example of a working ‘production Grid’. We also set our activity in context by listing some of the other funded Grid projects around the world. In Section 3, we describe the UK e-Science Core Programme in some detail and in Section 4 we describe the presently funded UK e-Science pilot projects in engineering and the physical sciences. We conclude with some remarks about the evolution of Grid middleware architecture and the possible take-up of the Grid by industry.

Section snippets

Funding and structure of the UK e-Science programme

Under the UK Government’s Spending Review in 2000, the OST was allocated £98M to establish a 3-year e-Science R&D Programme. This e-Science initiative spans all the Research Councils—the Biotechnology and Biological Sciences Research Council (BBSRC), the Council for the Central Laboratory of the Research Councils (CCLRC), the Engineering and Physical Sciences Research Council (EPSRC), the Economic Social Research Council (ESRC), the Medical Research Council (MRC), the Natural Environment

Structure of the Core Programme

As we have explained, the goal of the e-science Core Programme is to identify the generic middleware requirements arising from the e-Science pilot projects. In collaboration with scientists, computer scientists and industry, the Director has a mandate to develop a framework that will promote the emergence of robust, industrial strength Grid middleware that will not only underpin individual application areas but also be of relevance to industry and commerce.

The Core Programme has been structured

Conclusions

There are many challenges to be overcome before we can realise the vision of e-Science and the Grid described above. These are not only technical issues such as scalability, dependability, interoperability, fault tolerance, resource management, performance and security but also more people-centric relating to collaboration and the sharing of resources and data.

As an example of a technical issue, we believe that realistic performance estimation will be key to the realisation of the vision of the

Acknowledgements

The authors thank Juri Papay for valuable assistance in preparing this paper and Jim Fleming and Ray Browne for their help in constructing the Core Programme. We also thank Paul Messina for encouragement and insight and Ian Foster and Carl Kesselman for their constructive engagement with the UK programme.

Tony Hey is a Professor of Computation at the University of Southampton and has been Head of the Department of Electronics and Computer Science and Dean of Engineering and Applied Science at Southampton. From 31 March 2001, he has been seconded to the EPSRC and DTI as Director of the UK’s Core e-Science Programme. He is a Fellow of the Royal Academy of Engineering, the British Computer Society, the Institution of Electrical Engineers and the Institution of Electrical and Electronic Engineers.

References (29)

  • John Taylor e-Science definition:...
  • I. Foster, C. Kesselman (Eds.), The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, Los Altos, CA,...
  • B. Segal, Grid Computing: The European Data Project, IEEE Nuclear Science Symposium and Medical Imaging Conference,...
  • The DataTag Project:...
  • GridPP:...
  • Griphyn:...
  • The Particle Physics DataGrid:...
  • International Virtual Data Grid Laboratory:...
  • National Virtual Observatory:...
  • Opticon Astrophysical Virtual Observatory:...
  • I. Foster, C. Kesselman, Globus: a metacomputing infrastructure toolkit, Int. J. Supercomput. Appl. 11 (2) (1997)...
  • I. Foster, C. Kesselman, J. Nick, S. Tuecke, The Physiology of the Grid: Open Grid Services Architecture for...
  • NASA Information Power Grid:...
  • M. Litzkow, M. Livny, M. Mutka, ‘Condor—a hunter of idle workstations’, Proceedings of the Eighth International...
  • Cited by (155)

    • Hybrid models as transdisciplinary research enablers

      2021, European Journal of Operational Research
    • Fostering scientists’ data sharing behaviors via data repositories, journal supplements, and personal communication methods

      2017, Information Processing and Management
      Citation Excerpt :

      Data sharing is essential to contemporary scientific research from the perspective of e-Science and open science. The term e-Science is defined as “networked and data-driven science,” (Hey & Hey, 2006) and a critical aspect of it centers on global collaboration in key areas of science being enabled by data-centric scientific research based on shared data sets (Hey & Trefethen, 2002). Open science refers to conducting research in a collaborative manner by sharing and reusing research data and relevant materials (FOSTER, 2016).

    • Analysis Patterns for Cloud-Centric Atmospheric and Ocean Research

      2016, Cloud Computing in Ocean and Atmospheric Sciences
    • Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

      2014, Information Sciences
      Citation Excerpt :

      The volume of human genome information is also so large that decoding them originally took a decade to process. Otherwise, a lot of other e-Science projects [66] are proposed or underway in a wide variety of other research fields, range from environmental science, oceanography and geology to biology and sociology. One common point exists in these disciplines is that they generate enormous data sets that automated analysis is highly required.

    View all citing articles on Scopus

    Tony Hey is a Professor of Computation at the University of Southampton and has been Head of the Department of Electronics and Computer Science and Dean of Engineering and Applied Science at Southampton. From 31 March 2001, he has been seconded to the EPSRC and DTI as Director of the UK’s Core e-Science Programme. He is a Fellow of the Royal Academy of Engineering, the British Computer Society, the Institution of Electrical Engineers and the Institution of Electrical and Electronic Engineers. Professor Hey is European editor of the journal ‘Concurrency and Computation: Practice and Experience’ and is on the organising committee of many international conferences.

    Professor Hey has worked in the field of parallel and distributed computing since the early 1980s. He was instrumental in the development of the MPI message-passing standard and in the Genesis Distributed Memory Parallel Benchmark suite. In 1991, he founded the Southampton Parallel Applications Centre in 1991 that has played a leading technology transfer role in Europe and the UK in collaborative industrial projects. His personal research interests are concerned with performance engineering for Grid applications, but he also retains an interest in experimental explorations of quantum computing and quantum information theory. As the Director of the UK e-Science Programme, Tony Hey is currently excited by the vision of the increasingly global scientific collaborations being enabled by the development of the next generation ‘Grid’ middleware. The successful development of the Grid will have profound implications for industry and he is much involved with industry in the move towards OpenSource/OpenStandard Grid software.

    Tony Hey is also the author of two popular science books: ‘The Quantum Universe’ and ‘Einstein’s Mirror’. Most recently he edited the ‘Feynman Lectures on Computation’ for publication, and a companion volume entitled ‘Feynman and Computation’.

    Anne E. Trefethen acts as the Deputy Director of the Core e-Science Programme. She is on secondment from NAG Ltd., where she is the Vice President for Research and Development. Anne joined NAG in 1997 and in her role there she leads technical development in NAG’s numerical and statistical products.

    Before joining NAG, Anne was the Associate Director for Scientific Computational Support at the Cornell Theory Center. For most of the 1990s, the Cornell Theory Center was one of four NSF funded national supercomputer centres in the USA. During her first 3 years at the centre, she was a research scientist in the performance and algorithms group, working largely on parallel linear algebra and parallel scientific applications, also teaching on the centre’s courses on parallel techniques and numerical algorithms. As an Associate Director she was responsible for the division who supported computational scientists across the country in terms of software, application and algorithmic consultancy, education and training, strategic applications support, and outreach programs. At that time, as well as many technical activities, she was very much involved in the design and development of Web-based educational courses and led the research activity, MultiMatlab, to create a distributed Matlab environment for large-scale problem solving.

    From 1988 to 1992, Anne was a Research Scientist in the Mathematical and Computational Sciences Group at Thinking Machines Corporation. She was the project leader for the first independent release of the Connection Machine Scientific Software Library (CMSSL).

    View full text