Elsevier

Journal of Web Semantics

Volume 25, March 2014, Pages 16-23
Journal of Web Semantics

Ztreamy: A middleware for publishing semantic streams on the Web

https://doi.org/10.1016/j.websem.2013.11.002Get rights and content

Highlights

  • We present a framework for scalably publishing semantic event streams on the Web.

  • We empirically show that our framework outperforms other well-known systems.

  • We publish the framework and benchmarking tools under a free software license.

Abstract

In order to make the semantic sensor Web a reality, middleware for efficiently publishing semantically-annotated data streams on the Web is needed. Such middleware should be designed to allow third parties to reuse and mash-up data coming from streams. These third parties should even be able to publish their own value-added streams derived from other streams and static data. In this work we present Ztreamy, a scalable middleware platform for the distribution of semantic data streams through HTTP. The platform provides an API for both publishing and consuming streams, as well as built-in filtering services based on data semantics. A key contribution of our proposal with respect to other related systems in the state of the art is its scalability. Our experiments with Ztreamy show that a single server is able, in some configurations, to publish a real-time stream to up to 40 000 simultaneous clients with delivery delays of just a few seconds, largely outperforming other systems in the state of the art.

Introduction

Nowadays, there are well known best practices for publishing linked data on the Web in an interoperable manner, so that it can be retrieved, queried, browsed, etc. by applications  [1]. However, they are principally aimed at publishing static data, and do not properly accommodate the vast amounts of time-dependent data about the physical world produced by sensors. This kind of dynamic data is not restricted to physical sensors. For example, there are social sensors that capture information published in the social sphere (social networks, blogs, etc.).

Data from sensors is usually processed in the form of streams. In  [2] a data stream is defined as a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. Streams are different to stored data in several aspects: they cannot normally be stored in their entirety, and the order in which the data is received cannot be controlled. Wrapping sensor data into semantic formats like RDF, and extending it with semantic annotations, facilitate data integration. In the same way the linked data initiative does, publishing in an open manner on the Web this dynamic information, accompanied by proper semantic annotations, opens the door to independently-created applications and mashups. These applications can exploit information in unforeseeable ways by integrating sensor data, linking to sources of static information, etc. This kind of platform is called the semantic sensor Web   [3]. Current research in this area is aimed at providing solutions to problems such as the annotation and transformation of data coming from sensors, integration of data from heterogeneous models, integration of sensor data with linked data (and static data in general), discovery of relevant streams, querying and reasoning on streams, provenance of data (e.g. identifying the quality or reliability of different streams, sensors, etc.), large scale distribution of streams, etc.

In this article we present Ztreamy, a middleware for publishing streams of semantically-annotated data on the Web. By using it, data sources can publish their streams, and applications can consume them. It also allows operations such as mirroring, joining, splitting and filtering these streams. There are other frameworks that can be used for publishing sensor data, such as DataTurbine [4]. Some have even been designed for RDF sensor streams, such as the Linked Stream Middleware (LSM)  [5]. Our most important contributions with respect to the previous work are the scalability of the proposal and its use of HTTP, which make it available to a wide range of application environments. As we show in the evaluation, Ztreamy can handle much bigger data rates and number of clients from a single server than other existing solutions. In addition, it provides mechanisms for broadcasting the streams from several servers when additional performance is needed. Another difference with respect to some of the available solutions is that it provides built-in services for manipulating data represented with the RDF data model. This reduces the effort needed to develop applications on top of it, because of the widespread availability of tools (query processors, data stores, reasoners, etc.) for RDF.

The rest of this paper is organized as follows. Section  2 describes a scenario that motivates the need of our proposal and its main requirements. Section  3 describes the design decisions behind Ztreamy. We evaluate its performance and compare it to other systems in Section  4. Section  5 discusses other existing middleware platforms for sensor networks. Finally, Section  6 concludes this article.

Section snippets

Motivation and requirements

In this section we introduce a motivating scenario based on the concept of smart cities, in which sensor networks play a key role. The SmartSantander project  [6] is one of several examples of research initiatives in this area. They have in common that a large number of physical sensors (traffic, weather, air quality, noise, etc.) are deployed across the city. They may be complemented by other kinds of dynamic data sources, such as the current status of public transport services, the schedule

The Ztreamy stream distribution platform

Our main objective is making Ztreamy a scalable middleware platform for publishing and consuming semantically annotated data streams on the Web, for scenarios such as the one presented in the previous section. In this section we explain and justify how we have devised it from the points of view of architecture, data representation and data transport. Further implementation details are available at  [7].

Performance evaluation

We carried out a series of experiments with Ztreamy and other systems in order to measure how their performance evolves as the number of clients connected to the stream and the data rate change. We used the following performance indicators:

  • CPU use: absolute amount of CPU time the server needed to process a given load. Because the experiments were designed so that a source sends data for an average duration of 100 s, the percentage of a core of the CPU used by the server can be approximated by

Related work

Xively8 (formerly COSM and Pachube) is a commercial service to which data consumers and producers connect to exchange real-time data. To the best of our knowledge, there is no public information about the engineering of their infrastructure.

Systems such as Global Sensor Networks (GSN)  [9], DataTurbine  [4] and BRTT Antelope9 are well known platforms for gathering data from sensors. They support

Conclusions

In this paper we presented a scalable platform for publishing semantic streams on the Web. The platform is flexible, in the sense that diverse network layouts can be deployed depending on the needs of the application. Streams can be easily duplicated, aggregated and filtered. Applications written in diverse application environments can consume and publish streams by accessing platform servers through HTTP. A programming interface for Python is also available. Besides handling HTTP

References (18)

There are more references available in the full text version of this article.

Cited by (31)

  • MEdit4CEP-SP: A model-driven solution to improve decision-making through user-friendly management and real-time processing of heterogeneous data streams

    2021, Knowledge-Based Systems
    Citation Excerpt :

    They have plans to apply ML techniques to detect patterns within the generated data. One of the main limitations of this proposal is that events can take up to 13 s to be processed, which is a considerable amount of time compared to our proposal, in which we process each event with an average time of 0.024 ms. In addition, the authors use Ztreamy [36] to stream the information, which they state can process 25 000 events per minute, a much smaller amount of data than what Esper or Kafka Streams can handle. In general, if we compare our SP architecture to these proposals, most of them use Kafka as a messaging system between components in their architecture, while most of their transformations and processing tasks could be more easily achieved using Kafka Streams, as we do.

  • Benchmarking real-time vehicle data streaming models for a smart city

    2017, Information Systems
    Citation Excerpt :

    We therefore consider that Apache Kafka is the most adequate alternative for our case study described in Section 3. In order to compare the efficiency of different infrastructures, two options were implemented and explored: a solution based on the Ztreamy framework [34] and another based on Apache Kafka Streams [35]. In both cases, the same data format was used to send the information from SmartDriver and to receive it from the Streaming Server.

  • Real-time data analytics and event detection for IoT-enabled communication systems

    2017, Journal of Web Semantics
    Citation Excerpt :

    OpenIoT combines and enhances results from leading edge middleware projects, such as GSN and the Linked Sensor Middleware9 (LSM) [29,24]. Ztreamy is another middleware for IoT data acquisition, which can semantically annotate sensor observation using a given ontological representation [27]. Ztreamy facilitates publishing and consuming semantic sensor data stream over the Web.

  • Hybrid approach for selective delivery of information streams in data-intensive monitoring systems

    2016, Advanced Engineering Informatics
    Citation Excerpt :

    Also, quite intensive topic of research is in increasing flexibility of subscription mechanisms. For example, in [26] authors propose the graph-bases subscription mechanism, where the graphs model part of the system entities and simplify subscriptions for users interested in certain type of the information; the work described in [24] deals with semantically annotated data streams, and also proposes filtering mechanisms based on semantics of published data. While publish/subscribe is a powerful approach in redirecting and delivering streams of information in distributed systems, it has some limitations in application to selective delivery in monitoring systems.

View all citing articles on Scopus
View full text