1 Introduction

Weather narration for the blind is an extremely challenging objective as the blind can’t visually check or confirm the information. Methodologically and technically there are other contexts for which this project proposal can be of great interest as well. The automatic construction of weather narratives will be developed with a strong focus on previously studied techniques for the narration to blind users of scenes and real-world dynamics in real time, as well as those techniques used in other projects,. Most of the time, computer interfaces for the visually impaired are based on narratives of the information to be transmitted to the user. A simple example is Microsoft Narrator, which narrates the computer screen content, allowing visually impaired users to access Microsoft Windows. In other more complex systems dedicated to blind users narration play a very important role, as it defines how information is presented and communicated to the user. In the systems BLAVIGATOR [1] and C4BLIND [2] an electronic white cane was developed, in collaboration with the University of Texas (UT) (the project C4BLIND UTAPEXPL/EEISII/0043/2014). This cane uses several sensors, characteristic scene elements, and numerical models to detect the surrounding environment and to position the user in a specific context of the scene, as well as to identify the elements of the scene. With this knowledge, it is possible to assist the blind user in specific actions, e.g., navigate to a location, identify an object, etc. In order to effectively assist the user, it is necessary to extract meaningful information out of this knowledge, according to the user’s intention and current context, which is communicated to the user as a narrative. This use case of a narrative to inform a blind user is particularly important, as it attributes to the narration the role of replacing the sense of sight.

In a broader sense, and including other types of information, namely big data from scientific activities, there has been a great progress in the data acquisition, processing, and visualization technologies, making large amounts of data available to the general public, e.g. data from: atmospheric activity, financial markets, consumer marketing, sports analysis, genome, etc. These data and technologies are very useful for science, providing rich and detailed information, though a simple and eloquent narrative is desirable in some contexts and for some purposes.

The observation and recording of weather phenomena generate large amounts of data, covering long periods of time and geographic areas. This data is used to build numerical models, which together with advanced computation techniques, provide forecasts of the evolution of weather systems. This activity is essential to understand our planet, as well as to plan human activities with greater predictability.

2 Literature Review

2.1 Natural Language Narration

The narration approach draws on the experience that the team at INESC TEC acquired with the development of the previous projects funded by FCT, SmartVision (PTDC/EIA/73633/2006) and Blavigator (RIPD/ADA/109690/2009) [1], and C4Blind (UTAPEXPL/EEISII/0043/2014) [2], funded by the UT Austin Portugal Program. Considering the visually impaired and the dynamics of the environments, the static description (narratives) of urban environments are not good enough, so the project developed a mobile digital platform, using noninvasive natural interfaces, well suited to meet user needs. A user-centric design methodology was used, in partnership with the ACAPO (Association of Blind and Partially Sighted Portugal) to conduct extensive tests [3, 4].

The Natural Language Generation (NGL) is included in the artificial intelligence (IA) research field, as a computational linguistics subfield, with the main objective of producing understandable texts in human language, by using computer systems algorithms. These algorithms will input some form of nonlinguistic information and use knowledge about language and application domain in order to automatically produce a text message. This field is characterized by a variety of theoretical approaches, which has the effects of fragmenting the community and reducing communication and cooperative research efforts. Still, several reference architectures have been proposed for NGL systems and the technology is mature enough to be introduced in commercial products [5, 6]. Some NGL systems have been created to produce textual weather reports, from simulation forecast models and human annotated forecast data. An excellent example is the system used by the Canadian Weather Services since 1993, which produces text reports from numerical weather simulation data, annotated by a human forecaster [7,8,9]. An automated narrator system can be based on a NGL architecture, as proposed by Reiter and Dale [6].

Currently, the Apache OpenNLP library is a machine-learning based toolkit for the processing of natural language text that supports the most common NLP simple and advanced tasks [10]. The OpenNLP CCG Library is an open source natural language processing library written in Java, which provides parsing and realization services based on Mark Steedman’s Combinatory Categorial Grammar (CCG) formalism [11]. The library makes use of multimodal extensions to CCG developed by Jason Baldridge [12].

2.2 Advanced Computing and E-Science

Weather systems data analysis, information extraction and visualization are resource-demanding activities regarding data and computation management. The Texas University (TU), Texas Advanced Computer Center (TACC), provides a high-performance computing (HPC) cyberinfrastructure, designed to cope with the most demanding research projects. To do so, it has developed the Agave Platform, which provides an ecosystem of APIs, software development kits and tools to power cyberinfrastructure at the national research level and provide the foundation for the next generation of science gateways. Agave aims to reduce the development burden on science gateway engineers, as well as individual scientists making use of computational resources, by providing a set of flexible, scalable software components that solve common problems across hybrid cloud and high-performance computing. At the heart of the platform is a set of APIs for managing the systems, applications, jobs and metadata involved in any computational experiment. The primary goal of Agave is to accelerate the development of web-enabled science projects and, thereby, drastically reduce the time of discovery [13,14,15].

2.3 Weather Systems Phenomena

Extratropical cyclones, atmospheric fronts and frontal systems are central components of weather (and hence climate) over much of the world. These frequent phenomena (every few days over many extratropical regions) are associated with the day-to-day weather conditions, including among others, precipitation, dramatic changes in temperature and wind (direction and speed) and extreme events [16]. In fact, wind extremes and heavy precipitation events occurring in the winter over land in the midlatitudes are almost always associated with extratropical cyclones (e.g. [17,18,19]). It is well known that the Azores Archipelago, due to its location, is prone to the occurrence of these extreme phenomena and associated hazards [19, 20].

With the development of computer technology and its rapid adoption by the atmospheric and climate sciences, it became clear that comprehensive cyclonic and frontal analysis would benefit greatly from automatic, computer analysis [16, 21,22,23]. Algorithms on data provided on a four-dimensional structured grid for the efficient detection and tracking of features in spatiotemporal atmospheric data have and continue to be developed at an increasing level of complexity. These may include the precise localization of the occurring genesis, lysis, merging and splitting events, and may allow access to their natural and socioeconomic impacts. However, given the difficulties of arriving at an accepted definition of these weather systems, clearly designing a numerical scheme from 3D analyses to identify cyclones, fronts and frontal zones or atmospheric rivers is a task of even greater difficulty and is still an important research theme [24].

Long-term climate datasets are critical both for understanding climate variations and evaluating their simulation in climate models. Since the 1990s, major national and international efforts have led to the creation of climate datasets, with reconstructions of meteorological and oceanographic fields (see, e.g. [25,26,27,28,29,30]), called retrospective analyses or ‘re-analyses’, which may span extended periods of time. Examples are re-analyses covering the whole 20th century (The Twentieth Century Reanalysis Project; 20CR or the ECMWF EU-funded ERACLIM project; ERA20C reanalysis) or the Last Millennium climate Reanalysis (LMR) project.

3 Plan and Methods

3.1 The Plan

To develop the project, we intent to execute 3 tasks, where task 1 and task 2 have a high degree of overlapping, as represented in the Timeline figure.

In a first phase, and as part of tasks 1 and 2, the weather system’s related information will be researched, including the following:

Identification of the weather phenomena to narrate and to work on, including fronts, atmospheric rivers, and extratropical cyclones; identification of the available data sources and their parameters; identification of the information extraction methodologies for the selected phenomena. At the same time, some narrative format mockups will be studied and tested, in order to design the narrative styles to be developed later. These mockups will be tested with an audience, as defined by the target audiences for the proof-of-concept demonstrators. Also, in this first phase, and in parallel with the previously described work, an operation model will be designed to retrieve and process data, considering the data sources and the data characteristics.

On a second phase, overlapping task 2, the work will be focused on designing the pipeline model and designing the specific components to be developed later.

On a third phase, overlapping task 2, the focus will be on software development, including the development of the necessary components the development of the NLG, and the individual test of these elements.

On a fourth phase, overlapping task3 and a part of task 2, the elements will be combined and tested, as a full featured system pipeline and probably some adjustments and software refactoring will also be executed.

A fifth, and final phase, overlapping task3 and a part of task 2, will focus on assembling the system in specific configurations and data sources, in order to produce the proof-of-concept narratives demonstrators.

3.2 The Methods

Supporting the Full Narration Process

The system to support the process will be designed as a pipeline, constituted of several functional components, properly arranged for each specific narrative type. This approach was chosen in order to have functional independence and isolation of the several components, as well as to keep the system as flexible as possible. This solution provides a framework on which different developers can add components to the systems, as well as use the existing ones in combinations that might suite their specific needs: type of narrative, raw data, weather phenomena, etc. In other previous work, this narration paradigm has been useful to design interactions with users [31,32,33].

The system’s components will be developed as software parts that can be independently designed, implements and tested. This degree of flexibility and the exploratory character of the project are well suited to justify an agile approach, regarding the software development methodology, which will be our main choice.

The final system assembly will be tested in the Agave system [8] at the UT TACC. The weather systems data is particularly demanding in terms of storage, bandwidth and processors. TACC and Agave are the perfect matches to deal with these big data science requirements.

Natural Language Narrative Construction

The narratives of weather phenomena, in natural language, will be produced by a system component of the type “Narration composition”, as described in task 2. This component will be implemented as a simple Natural Language Generator (NLG), to be developed, using the OpenNLP CCG Library. The NGL will have a classic pipeline design, for which a set of document plans will be designed, according to a communicative goal for each specific context. These plans will define the information to be included in the text narrative, as well as provide a structure for a coherent text message. They will be further processed on a microplanning stage, where the plans for the individual sentences will be defined. A final stage of linguistic realization will render a text document, thus producing a text narrative meaningful for a specific context, as previously defined.

Weather Systems Phenomena Data Extraction and Classification

The IDL team has extensive experience in developing and using such objective algorithms, namely for detecting and tracking extratropical cyclones [18, 22] and atmospheric rivers [23]. Thus, extratropical cyclones (ECs), associated frontal systems and atmospheric rivers (ARs) will be the weather systems used for this exploratory project. These algorithms are mostly applied to large gridded data sets that are produced by climate models (e.g. multidecadal period of atmospheric and ocean reanalysis data at different resolutions from several data centres [25]).

Extratropical Cyclones (ECs)

The structure and evolution of ECs, as viewed from the surface, has been described throughout the 20th century by the development of refined conceptual models (namely, the Norwegian, Shapiro and Keyser, cyclone models). Today three-dimensional conceptual models of extratropical cyclones provide a framework for understanding their dynamical evolution (see the annex for details). Detailed analysis of the structure and evolution of individual extratropical cyclones suggest that although there is no universal lifecycle of extratropical cyclones, some general cyclone characteristics can be identified [28, 30].

Atmospheric Rivers (ARs)

ARs are relatively narrow and elongated filaments of high-water vapor transport, with their occurrence generally interpreted as large atmospheric water vapor transport events in the extra-tropics [26]. This phenomenon is associated with tropical moisture exports and occurs often in combination with the passage of extratropical cyclones [19]. The warm, moist air in the cyclones’ warm sector is swept up by the advancing cold front, leading to a filament of high specific moisture content, which is transported northward at the basis of the warm conveyor belt ([19], Fig. 5). As the cyclones travel poleward, such bands of high humidity may trace back over large distances, extending from regions of high sea surface temperature in the subtropics into the midlatitudes, leading to strong precipitation along its path (e.g., storm “Xynthia”) [18]. Such structures transport more than 90% of the total midlatitude vertically-integrated water vapor [26] and can lead to intense precipitation over different continental regions due to its interaction with the topography [28]. Recently, an international agreement has been reached regarding the relationships between ARs, warm conveyor belts, and tropical moisture exports [26]. The term warm conveyor belt refers to the zone of dynamically uplifted heat and vapor transport close to a midlatitude cyclone. This vapor is often transported to the warm conveyor belt by an AR and was earlier brought poleward to the extra-tropics by tropical moisture exports ([26], Fig. 1). The uplift associated with the warm conveyor belt typically leads to heavy rainfall. This generally marks the downwind end of an AR, unless the AR has experienced orographic uplift earlier on, causing rainout over mountain areas.

The impacts of ARs have been analyzed in detail in Western Europe and revealed the importance of ARs in extreme weather events, both for case studies and from a climatological perspective [23].

An objective identification of ARs has been developed by the IDL team by means of the IVT computed with both the NCEP–NCAR and ERA Interim datasets [23].

Weather Fronts

Another important feature associated with cyclones and heavy precipitation events and its impacts are weather fronts. An objective method, the Thermal Method [29] is used to detect and depict weather fronts over a gridded map. The method is based on the choice of a Thermal Frontal Parameter (θe), directly related with the location and intensity of the frontal zone. The link between frontal precipitation and extreme weather events will also be addressed by objectively evaluating the proportion of precipitation collocated to the fronts.

Conceptual Models

All these conceptual models are widely used in educational meteorology courses and textbooks throughout the world to illustrate the basic structure and evolution of weather systems, such as extratropical cyclones and atmospheric rivers. These conceptual models have been the basis for objective detection and tracking methods developed by IDL researchers [21,22,23], which will be used here in this exploratory project.

To describe the structural evolution of ECs or ARs, a combination of surface observations, satellite imagery, radar data and model output are needed. These allow meteorologists to identify and describe the evolution of cyclonic flows such as the warm and cold conveyor belt flows and the dry intrusion. Atmospheric parameters such as horizontal wind (u, v), pressure velocity (ω), temperature and geopotential may be obtained with a 0.125° longitude × 0.125° latitude grid resolution available at 37 isobaric levels from 1000 to 1 hPa, at time intervals of 6 h or more.

The Project Proposal

This exploratory project will be based mainly on reconstructions of meteorological and oceanographic fields, meaning different available reanalysis datasets (e.g. 6hourly ERA-Interim 19792017 reanalysis over the Northern Hemisphere with a 0.75 degree of horizontal resolution; [25] and references therein). However, the methodology will be developed for any gridded data, so that it will also be tested for atmospheric and ocean reanalysis data at different resolutions from several data centres – ECMWF, National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR), or the Japan Meteorological Agency (JMA) as well as for simulations from six Coupled Model Intercomparison Project Phase 5 (CMIP5) global climate models (GCMs) to quantify possible changes during the current century, with emphasis on the Atlantic Ocean.

This project will be aimed at building a general tool which will include future developments expected as reanalysis datasets become more diverse (atmosphere, ocean and land components), more complete (moving towards Earth-system coupled reanalysis), more detailed, and of longer timespans.

4 Tasks

The project is divided into 3 straightforward tasks.

4.1 Task 1 - Identification and Definition of Contexts and Phenomena

In this task we will identify and define the phenomena for which the narratives will be created, including atmospheric rivers, fronts, and extratropical cyclones, as well as the contexts for those narratives. A narrative provides insight and meaning to data in a specific context. Three user scenarios (contexts) will be adopted to develop the narration process. This set represents three very different types of narration, regarding the target audience and the message format:

  • Weather phenomena description, in a natural and conversational language, targeted at the general public. This narrative will provide a description or explanation of a phenomena as it is represented by the data at an exact moment and location.

  • Weather system’s data description, as a metadata classification of the phenomena data, targeted at machine systems. The main goal is to use the narration process as a method to classify and index the original data, thus creating the metadata to be used by other machine systems, e.g., indexing and search engines systems, machine leaning systems, image analysis, etc.

  • Weather phenomena progression, in a natural and scientific language, targeted at climate science students. This narrative describes the progression of a phenomena during its occurrence, as it develops over time and location. It provides a didactic storyline that can be used in teaching or science dissemination, as a solo message or as part of a multimedia message, including video and computer-generated graphics.

According to the previously described contexts, a set of weather phenomena will be chosen for narration. The phenomena will be characterized regarding the detailed methods, including algorithms and criteria that might be used to identify their occurrence and development in the original climate and weather data. These methods will later be used to extract information from the data, which will be rendered and formatted according to the context and target usage, creating a meaningful message.

4.2 Task 2 - Development of the Narration Process

In this task the narration process will be developed, which will be applied to three demonstration scenarios, according to the previously described contexts.

The process will be designed as a pipeline process, in which several components will be combined in order to have a full functional process, starting with the raw data and concluding with a narrative. The types of components will be:

  • Data biding, to connect and bind to a raw data source;

  • Data processing, to process raw data from several sources and create usable datasets;

  • Information extraction, to search and extract information from data;

  • Narration composition, to create a narrative from the extracted information, giving it meaning according to a context.

In this task several components of the previously defined types will be developed, which will then be assembled in distinct configurations, in order to create three narratives with different characteristics, which will act as proofs of concept for this project of weather system’s narration.

The narratives will be created according to the phenomena and context previously defined in task 1.

The pipeline design approach provides flexibility and segregation in the different phases of the process by providing a framework for the independent development and lifecycle management of components regarding the following: distinct data sources, raw data formats, usable data sets, weather systems phenomena, and narration type.

4.3 Task 3 – Integration, Tests and Evaluation

In this task, the previously developed components will be integrated, by assembling them into several pipeline configurations, in order to evaluate the complete narration process, regarding its functionality as a fully automatic narrative creation system. First, the components will be individually tested, as independent functional units, after which they will be assembled and tested as a full functional system. To produce the final narratives, the system will be assembled in distinct pipeline configurations, using the necessary components, according to the desired outcome.

The ultimate evaluation of the system will be carried out by assessing the narratives produced, according to the three contexts defined in task 1 and later implemented in task2. In fact, the quality of these narratives will be an indicator of the overall quality and validity of the project’s proof-of-concept.

The final narratives deliverables to assess are the following: a weather phenomenon description, in a natural and conversational language; a weather system’s raw data metadata classification; and a weather phenomenon progression, in natural and scientific language.

Besides the evaluation of the final outcomes, other aspects of the project will also be evaluated regarding the functionality, sustainability and reproducibility of the automatic weather systems narration process. In fact, the process is heavily based on data and processing capacity, which can be provided by science infrastructures, but after having validated the automatic weather systems narration process, the next step should be to tailor and streamline the process for some marketwise application segments, in order to have the same positive outcomes, but consuming a reasonable amount of resources, accordingly to each application segment.

5 Conclusion

This exploratory project follows up the work started by CE4Blind project, extending the experience of the team to the domain of climate change. The proposal extends to supercomputing, the Big Data analytics domain and climate and atmospheric sciences.

The final goal is to combine and study the areas of weather systems, data computation, narrative creation, and human computer interface for the blind. A major milestone is the development of a working prototype, which may be tested and demonstrated outside the lab.