Elsevier

Science of The Total Environment

Volume 534, 15 November 2015, Pages 144-158
Science of The Total Environment

Data management challenges in analysis and synthesis in the ecosystem sciences

https://doi.org/10.1016/j.scitotenv.2015.03.092Get rights and content

Highlights

  • Transdisciplinary collaboration is difficult, but delivers innovative products.

  • Be disciplined: use an end-to-end Data Management Plan.

  • Change the norms: sharing and publishing data requires a new set of skills.

  • Anticipate Reuse: encourage colleagues to agree on minimum elements for dataset reusability.

Abstract

Open-data has created an unprecedented opportunity with new challenges for ecosystem scientists. Skills in data management are essential to acquire, manage, publish, access and re-use data. These skills span many disciplines and require trans-disciplinary collaboration.

Science synthesis centres support analysis and synthesis through collaborative ‘Working Groups’ where domain specialists work together to synthesise existing information to provide insight into critical problems. The Australian Centre for Ecological Analysis and Synthesis (ACEAS) served a wide range of stakeholders, from scientists to policy-makers to managers. This paper investigates the level of sophistication in data management in the ecosystem science community through the lens of the ACEAS experience, and identifies the important factors required to enable us to benefit from this new data-world and produce innovative science.

ACEAS promoted the analysis and synthesis of data to solve transdisciplinary questions, and promoted the publication of the synthesised data. To do so, it provided support in many of the key skillsets required. Analysis and synthesis in multi-disciplinary and multi-organisational teams, and publishing data were new for most. Data were difficult to discover and access, and to make ready for analysis, largely due to lack of metadata. Data use and publication were hampered by concerns about data ownership and a desire for data citation. A web portal was created to visualise geospatial datasets to maximise data interpretation. By the end of the experience there was a significant increase in appreciation of the importance of a Data Management Plan.

It is extremely doubtful that the work would have occurred or data delivered without the support of the Synthesis centre, as few of the participants had the necessary networks or skills. It is argued that participation in the Centre provided an important learning opportunity, and has resulted in improved knowledge and understanding of good data management practices.

Introduction

The size and number of datasets are increasing at a tremendous rate due to advancement in technologies used in data collection and collation activities. This has created an opportunity to address some of the complex, multi-dimensional environmental problems, which require multi-, inter- and trans-disciplinary approaches (Peters, 2010). Policy and decision makers expect clear, understandable information on urgent environmental issues such as climate change, natural resource depletion, biodiversity loss and environmental health. Access to a wide variety of information comes attached with complexity, transparency, integrity and interpretation problems.

There is a wealth of data emerging in the ecosystem sciences from existing sources, and new data are being created all the time. In this domain, every observation is unique, due to its context in space and time. If the information on an observation is lost, it is lost forever because it is almost impossible to measure the observation again in the original context (Ellison, 2010). This is one of the greatest motivations for the re-use of existing data for knowledge creation. With the advancement of technology in capturing and processing data, we have reached the fourth paradigm of data-intensive science and communication, where collaboration between different domain skillsets is required to successfully conduct meta-analysis (Hey et al., 2009). It is important to promote multi-disciplinary teams able to contribute to the meta-analysis, or synthesis, required to solve the complex problems facing our world. Synthesis has become increasingly important for ecologists as the abundance of data and the need for the development of solutions to complex, trans-disciplinary environmental problems has grown (Marx, 2013).

It is difficult to build a team to carry out transdisciplinary synthesis activities in conventional research laboratories or institutes due to the wide range of skillsets required to accomplish any outcomes, and common to the natural resource sciences because of impediments to data-sharing (Volk et al., 2014). Analysis and synthesis of transdisciplinary data can be highly challenging, but can be enhanced by utilising data infrastructure, large-scale networks and innovative tools (Carpenter et al., 2009, Pooley et al., 2014). Synthesis builds on existing data infrastructure, research findings, and expertise obtained through the availability of technological and scientific capital to create new knowledge that is greater than the sum of the components. This is a cost-effective way of capitalising on existing data for a range of scientific problems. As a by-product, the tools and technology innovation delivered as part of synthesis activities have been used widely beyond the applications for which it was purposed (e.g. Metacat: Berkley et al., 2001; Knowledge Network for Biodiversity, KNB: Reichman, 2004; Dryad hosted by NESCent: www.nescent.org).

The maturation of ecology from individual small-scale, short-term research to a large-scale, long term, multidisciplinary field studies has given rise to the creation of synthesis centres (Reichman, 2004). One of the first ecological synthesis centres established to support trans-disciplinary synthesis was the National Centre for Ecological Analysis and Synthesis (NCEAS) in the United States of America in 1995. Since then, several synthesis centres have started in various ecosystem and biological science domains (www.synthesis-consortium.org). Synthesis centres provide a structure that brings together researchers, theorists, modellers, managers, and practitioners within a working group model to solve a common problem. The working group members use existing data to answer new questions and address complex environmental problems that require immediate solutions. The commitment of participants to share their data requires a conducive environment, flexible data policies, technical support, and appropriate data management. Synthesis centres provide the support and expertise to assist the individual and groups make the most of their own and other's data to solve complex scientific problems.

This paper will provide an analysis of the data management challenges experienced in the ecosystem science synthesis centre. This is exemplified through the Australian Centre of Ecological Analysis and Synthesis (ACEAS), established within the Terrestrial Ecosystem Research Network (TERN) to facilitate scientists, policy-makers and managers tackle some of the ecosystem questions facing Australia and the world. After analyzing the wider community’s attitudes to data access, sharing, and publication at the start of ACEAS, we provide insight into the challenges of data management for synthesis, reflect on the data management literacy of participants, and discuss actions taken to mitigate challenges. It is argued that the synthesis centre functions as an incubator of data management practice.

In ecosystem science, data can take quantitative or qualitative forms such as numbers, text, code, GPS co-ordinates, algorithms, models, audio, video and animations. Data management is the development and implementation of policies, plans and processes that manage these data to maintain the integrity, security and useability of data. The ideal outcome is for data to be self-described so that others can discover and re-use it effectively (Strasser et al., 2012). Effective data management is central for ecological synthesis to:

  • improve efficiency and access to scientific data;

  • solve complex, multi-scale environmental problems;

  • allow synthesis products to be more easily accessible by a range of users;

  • enhance transparency and scientific participation in decision making (Faniel and Zimmerman, 2011); and

  • enable longitudinal analyses and experiments (which require access to data collected decades ago).

As well as to enable longitudinal analyses and experiments (which require access to data collected decades ago).

Although we have unprecedented opportunities to generate new knowledge from data-intensive science, the data are not necessarily fit-for-purpose, available (open access), quality assured, and licensed properly for appropriate reuse. Changes in practices in data handling are being driven by new data formats, technological advances in hardware and software, online availability of data and increasing appreciation of value-adding as a consequence of reuse (Faniel and Zimmerman, 2011). Readiness to share data, however, differs between disciplines: for some sharing data is common whereas for others it is not (Tenopir et al., 2011, Hampton et al., 2013). Ecology has historically been a highly individual endeavour and efforts to develop a culture of data sharing have been slow (Jones et al., 2006, Hampton et al., 2013). To efficiently provide timely information requires researchers not only to continue their practice (collection of new data) but also to focus on its reusability (Hampton et al., 2013).

The dispersion, heterogeneity, and provenance of data present real technological challenges for data acquisition and use. Ecosystem science data are characteristically widely dispersed, collected from multiple sites (Reichman, 2011, Marx, 2013) and over differing timescales. The data are often located on individual storage devices that are known only to the collecting team, and only discoverable through personal contact. Ecosystem science data are characteristically heterogeneous. Field ecologists collect biotic and abiotic information across scales, time, and space in response to the nature and behaviour of the animals and plants they study. In consequence, ecosystem data have highly variable terminologies, measurements and experimental outputs (Jones et al., 2006, Reichman, 2011). Tracking and recording provenance is critical to enable researchers to identify suitable data sets and enhances the transparency of scientific outcomes (Reichman et al., 2011).

One of the ways to overcome these challenges is to improve collaborative processes. Establishing trust between parties is critical, and will enhance data sharing, productivity and the generation of useful, robust outcomes (Luna-Reyes et al., 2008). A key aspect of establishing this trust is the establishment of mutually acceptable Intellectual Property Agreements (e.g. Perkmann et al., 2013, Hertzfeld et al., 2006). Once trust is established, research scientists are likely to be more motivated by opportunities for intellectual challenge, peer recognition and collaboration led by scientific peers than by financial reward or other incentives (Crane, 1969, Stern, 2004, Lam, 2011). Current data management practices would be greatly enhanced by implementing a renewed focus on developing and implementing new solutions (Tenopir et al. 2011), which should include solutions based around these motivating factors of collaborative research.

For an individual scientist, managing one's own data is challenging, let alone discovering, manipulating and managing other's data. Participants at a synthesis centre usually concentrate on the analysis phase of the research data lifecycle (Fig. 1), and generally lack the competence or skills to publish their data for future re-use (Costello, 2009).

In ACEAS, the deposition phase involved specific attention, as one of the primary requirements of funding was that the synthesised data were published. Establishing and ensuring ownership of data are critical before analysis can proceed, and in order to deposit data, they must be described properly to be discovered and interpreted in the future. It is the task of the chosen repository to preserve the data.

For the purposes of this study the data workflow has been divided into four components, more meaningful to the analysis and synthesis process (Fig. 2):

  • 1.

    Identification and acquisition (discover)

  • 2.

    Collation and integration

  • 3.

    Analysis and synthesis

  • 4.

    Publication and visualisation (deposition)

  • data quality assurance required throughout but particularly for 1 & 4.

The ACEAS working group activity is described against a background of the four-step data workflow (Fig. 2).

The Australian Centre of Ecological Analysis and Synthesis (ACEAS) was established in 2009 to support ‘disciplinary and inter-disciplinary integration, synthesis and modelling of ecosystem data to aid in the development of evidenced-based environmental management strategies and policy at regional, state and continental scales’ (www.aceas.org.au). ACEAS was required, in addition, to foster trans-organisational synthesis, and the synthesised data were intended for publication. These goals presented a number of challenges, one of which was the questionable readiness of the community for such a step forward in data sharing and management. The connection with the Terrestrial Ecosystem Research Network (TERN: www.tern.org.au), a large data infrastructure observatory and repository for the ecosystem sciences, was unusual among global synthesis centres, as most have stood alone and developed their own substantial informatics teams (e.g. NESCent: Rodrigo et al., 2013). The combination of individual discipline-based researchers in relatively non-hierarchical groups, with an associated large observational infrastructure was intended to create a new paradigm in ecosystem research providing extended expertise and perspective on a range of topics.

Between November 2010 and May 2014, 42 working groups, the major focus of ACEAS activity, were supported through merit-based selection of applications from the ecosystem science and management community. These groups consisted of scientists, policy makers and managers coming together to solve challenging trans-disciplinary ecosystem problems. Projects were proposed by key investigators who nominated members according to the skills and attributes required, within the constraints of the ACEAS funding (group size, career-range, geographical, organisational representation). If successful, membership was further refined in discussion with the ACEAS Director.

The aims of ACEAS were:

  • to solve or investigate ecosystem problems, which although internationally relevant, required case studies centred in Australia;

  • to have meritocratic selection rather than topic-limited;

  • to have a science to policy to management theme, information capture and delivery resulting in:

    • a high degree of inter-disciplinarily across ACEAS groups (remote sensing to stream water monitoring to genetics for conservation to ethno-ecosystem knowledge), and

    • a high degree of disciplinary and organisational heterogeneity within the groups.

ACEAS provided data management support through a data synthesis manager and a research assistant. Association with TERN provided additional capability through facilitated discovery of a variety of data types, expanded expertise in specialised data management, as well as a variety of data deposition and discovery options. Data sharing within ACEAS groups was based on data sharing agreements which enabled use of sensitive data to be limited to the collaboration only.

The data workflow (Fig. 2) was conducted against the background of the ACEAS working group process (Fig. 3). Interaction with ACEAS staff and with fellow group members started at application, but the face-to-face meetings were pivotal to the process. Groups generally met two times face to face for a week in a span of 12–18 months. It is important to appreciate that the ACEAS activity was itself a punctuation point in participants' research life, emerging from previous activity and feeding into the next stage of their research, and although an end-point of data delivery and publication required by ACEAS can be identified, this was, in actuality, rarely final.

A preliminary, non-cumbersome Data Management Plan (DMP) was required for the application, containing details of the datasets required, their availability, and plans for analysis and synthesis. Given the funding constraints, which limited the duration of support for groups and the number of meetings, most datasets had to be identified and available before funding was granted and indeed some funding was held over until this requirement was met. Despite this, quite often a large part of data identification occurred after the first meeting due to the crystallisation of the work through wider group discussion.

ACEAS put a lot of emphasis on the DMP's evolution during the course of the working group, not least because of an evolving identification of the data needed. The use of a DMP was designed not only to ensure robust data management by the group, but to assist ACEAS staff understand the requirements of the working group, identify any potential impediments in sourcing data, and allow appropriate resources to be allocated. ACEAS staff assisted in identifying and acquiring data and if appropriate technical and statistical members were not in the group, the ACEAS director would suggest members and arrange for their inclusion.

Each working group had a designated data manager who was responsible for developing and implementing their DMP. The DMP contained details of the types of data used, data standards for formatting and metadata, directives for data storage, protocols for data access and other information about the longevity of the data (Box 1). Any impediments to data use as a result of ownership were hopefully identified at this stage.

Most of the data preparation and synthesis activities happened in Stage 2 (Fig. 3). Some of the common procedures undertaken in this stage were:

  • (i)

    ensuring appropriate compliance measures for data usage,

  • (ii)

    ensuring temporary storage and access to data for all the working group members,

  • (iii)

    making the data fit to use in the synthesis activity, and

  • (iv)

    describing the data adequately so that it was understood by all the working group members.

Group members frequently brought their own data, or data from their workplace released for the purpose of the working group activity. ACEAS provided a secure, unique username and password-protected temporary repository, a wiki (www.wikispaces.com), to host these datasets accessible to all members of the group. This security was very important for many participants, as the datasets they brought were often only released for the work of the group. The wiki, through the provision of a sophisticated web content management system, enabled the stored files to be annotated, and groups could send emails and blogs from the site. It provided a one-stop-shop bulletin board for members who could not attend meetings, and an aide memoire for those who did. It was a valuable tool for ACEAS to communicate with the groups and promoted continuity. The uptake of the wiki as a main communication and information-sharing medium in ACEAS was a challenge, especially when working group members had their own preferences, perhaps not as good, but with which they were familiar.

Due to the heterogeneous nature of the datasets commonly used, groups needed assistance in harmonising, re-formatting and transforming datasets so they could be collated and were ready for analysis. The ACEAS Research Assistant spent a considerable amount of time in collation and integration activities.

Data analysis and synthesis occurred both during, between and after the workshops were finished. This was the stage at which the ACEAS team felt the groups would need the least informatics support, as this was their expert domain. Once the data were in the correct format, harmonised and accessible, the analysts in the group could take over.

ACEAS took Stage 4 (Fig. 2, Fig. 3) very seriously, as the mandate of linking science outcomes to management required some innovative approaches and disciplined communication skills. ACEAS developed a sustainable data publication infrastructure so that any synthesised data was appropriately described, appropriate attribution was provided for the data creators, data were captured and published in an appropriate format to improve reusability, and make it easy for researchers to search, discover and re-use data.

All groups were required to deposit their derived, synthesised data for re-use. The Principal Investigator and/or designated data managers in the group, working closely with ACEAS staff, took responsibility for data deposition in a publically-accessible KNB-Metacat repository hosted by ACEAS with a metadata feed provided to the TERN Data Discovery Portal (TDDP) (Guru et al., 2013). Since most ACEAS working groups were ecologists, Ecological Metadata Language (EML) was used to describe the datasets. The data were shared under the open access licenses according to the TERN data licensing policy (Bradshaw et al., 2012) and assigned Digital Object Identifiers (DOIs). At time of publication, ACEAS has published 36 datasets accessible from http://aceas-data.science.uq.edu.au/knb.

Because of the ACEAS imperative of transdisciplinarity (sensu Rice, 2013, Polk, 2014), to ensure all stakeholders were informed, reports for each meeting and a final report, or précis of the group's activities and achievements (including a heading, ‘How will this affect ecosystem science and management’) were also required. To make the data delivered more interpretable, where relevant, ACEAS also undertook to support groups to visualise the data delivered. As many of the data products had spatial contextual information, this lent itself to publishing them on a Web Mapping Service (WMS), with the map feature published through a Web Feature Service (WFS) accessible through http://aceas-data.science.uq.edu.au/portal, http://mammalviz.tern.org.au/, and the Indigenous Biocultural Knowledge working group's web delivery http://www.aibk.info. This latter web site was established to deliver data for the Australian Indigenous community so they could take over management of the site once the working group activity was finished. The data itself are downloadable in a range of relevant file formats (SHP, GML, CSV, PDF). This method of delivery showcased the synthesis outcomes and promoted a culture of data sharing. At closure, twelve such web visualisations were produced. An example of this delivery is shown in Fig. 4.

The data workflow is fundamental to the management of ecosystem data. The ability of the researcher to appreciate, understand and participate in all stages of the data workflow is uncertain, and in this paper the role of ACEAS in improving the awareness and ability of members of the ecosystem science and management community in data management is examined. This paper reports on the insights that both participants and staff gained from their ACEAS experience that inform our understanding of the attitudes, readiness and behaviour of the community at large.

Section snippets

Methods

The methodology used in this study fell into three main components: (i) an initial survey of the community at large (the potential users of the synthesis centre): ‘Community Survey’, (ii) a ‘working group analysis’ drawn from selected synthesis centre groups illustrative of each phase of the data workflow, and (iii) a survey of the ACEAS community's experience of the challenges and learning about data management from their working group experience: ‘ACEAS community survey’. Also at our disposal

Community survey

Seven hundred responses were obtained overall for the data questions. Not all replied to every question. Respondents were mainly environmental scientists (61%) and ecologists from academic institutions (mainly universities), and had a higher degree (Table 3).

Not all of these categories were collected for ACEAS participants (e.g. age or qualifications), but the profile was reflective of the larger community (Table 4).

Responses were positive to most questions (Fig. 5). The most positive response

Discussion

The community in which ACEAS was working was not dissimilar in its data collaboration profile to that surveyed by Tenopir et al. (2011) (Fig. 5), despite a significantly lower proportion of academics in the Australian cohort. Academics across the globe are subject to performance criteria highly dependent on citation rates, and this was reflected in the responses to the Tenopir et al. (2011) survey, and was a particular concern for the academic cohort within the Australian respondents (

Conclusion

To tackle the substantial environmental problems facing us today, researchers need to be adept in quantitative and qualitative transdisciplinary synthesis activities, working with environmental managers and policy makers. This is challenging, and does not occur spontaneously.

ACEAS working groups identified many issues that impeded their synthesis work. The culture of the ecosystem science and management community needs to be changed so standard practice and metadata standards are followed when

Acknowledgements

The authors wish to acknowledge the support of the Australian Centre for Ecological Analysis and Synthesis, a facility of the Terrestrial Ecosystem Research Network (www.tern.org.au), funded by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS). Particular thanks to presenters Dr Vicki Thomson (University of Adelaide), Dr Pat Mitchell (University of Tasmania), Mr John Locke (Biocultural Consulting), Dr Phillip Clarke, Dr Ross Dwyer (University of

References (35)

  • C. Berkley et al.

    Metacat: a schema-independent XML database system

  • A. Bradshaw et al.

    Data licensing: the Terrestrial Ecosystem Research Network (TERN) approach

  • S. Carpenter et al.

    Accelerate synthesis in ecology and environmental sciences

    Bioscience

    (2009)
  • M.J. Costello

    Motivating online publication of data

    Bioscience

    (2009)
  • D. Crane

    Social structure in a group of scientists: a test of the “invisible college” hypothesis

    Am. Sociol. Rev.

    (1969)
  • P. Driver et al.

    Ecological monitoring to support Water Sharing Plan evaluation, and protect wetlands of inland New South Wales, Australia

    Ecol. Manag. Restor.

    (2013)
  • A.M. Ellison

    Repeatability and transparency in ecological research

    Ecology

    (2010)
  • Cited by (34)

    • Global perspectives of research data sharing: A systematic literature review

      2019, Library and Information Science Research
      Citation Excerpt :

      Publishers' polices are working to the advantage of researchers because data deposited in publishers' repositories receive a better citation impact. As a result, some studies have reported that researchers are interested in sharing data for visibility and recognition through citation (Costello, 2009; Enke et al., 2012; Fecher et al., 2015, p. 1; Jeng, He, & Oh, 2016; Parr, 2007; Piwowar, 2011; Specht et al., 2015). This study suggests that publishers need to provide an easily accessible link to data sets for other researchers.

    • Relevant social groups for open data use and engagement

      2019, Government Information Quarterly
    • Pairing camera traps and acoustic recorders to monitor the ecological impact of human disturbance

      2018, Global Ecology and Conservation
      Citation Excerpt :

      However, this requires enormous volumes of data to be managed, archived, and openly accessible (e.g., hundreds of terabytes of acoustic data; ABMI, 2016; Mennitt and Fristrup, 2016). Public repositories of data at this scale will require a solid funding base and trans-organizational collaboration (Michener, 2015; Specht et al., 2015). Although there are now mechanisms in place to aid in data sharing (www.esa.org/esa/science/data-sharing/resources-and-tools/), there are many challenges and opportunities facing the storage and maintenance of big ecological data (Hampton et al., 2013; Schimel and Keller, 2015), with much to be learned from industry's data management platforms (Tien, 2013) and museum procurement approaches.

    • Oil, Earth mass and gravitational force

      2016, Science of the Total Environment
    View all citing articles on Scopus
    View full text