Ztreamy: A middleware for publishing semantic streams on the Web
Introduction
Nowadays, there are well known best practices for publishing linked data on the Web in an interoperable manner, so that it can be retrieved, queried, browsed, etc. by applications [1]. However, they are principally aimed at publishing static data, and do not properly accommodate the vast amounts of time-dependent data about the physical world produced by sensors. This kind of dynamic data is not restricted to physical sensors. For example, there are social sensors that capture information published in the social sphere (social networks, blogs, etc.).
Data from sensors is usually processed in the form of streams. In [2] a data stream is defined as a real-time, continuous, ordered (implicitly by arrival time or explicitly by timestamp) sequence of items. Streams are different to stored data in several aspects: they cannot normally be stored in their entirety, and the order in which the data is received cannot be controlled. Wrapping sensor data into semantic formats like RDF, and extending it with semantic annotations, facilitate data integration. In the same way the linked data initiative does, publishing in an open manner on the Web this dynamic information, accompanied by proper semantic annotations, opens the door to independently-created applications and mashups. These applications can exploit information in unforeseeable ways by integrating sensor data, linking to sources of static information, etc. This kind of platform is called the semantic sensor Web [3]. Current research in this area is aimed at providing solutions to problems such as the annotation and transformation of data coming from sensors, integration of data from heterogeneous models, integration of sensor data with linked data (and static data in general), discovery of relevant streams, querying and reasoning on streams, provenance of data (e.g. identifying the quality or reliability of different streams, sensors, etc.), large scale distribution of streams, etc.
In this article we present Ztreamy, a middleware for publishing streams of semantically-annotated data on the Web. By using it, data sources can publish their streams, and applications can consume them. It also allows operations such as mirroring, joining, splitting and filtering these streams. There are other frameworks that can be used for publishing sensor data, such as DataTurbine [4]. Some have even been designed for RDF sensor streams, such as the Linked Stream Middleware (LSM) [5]. Our most important contributions with respect to the previous work are the scalability of the proposal and its use of HTTP, which make it available to a wide range of application environments. As we show in the evaluation, Ztreamy can handle much bigger data rates and number of clients from a single server than other existing solutions. In addition, it provides mechanisms for broadcasting the streams from several servers when additional performance is needed. Another difference with respect to some of the available solutions is that it provides built-in services for manipulating data represented with the RDF data model. This reduces the effort needed to develop applications on top of it, because of the widespread availability of tools (query processors, data stores, reasoners, etc.) for RDF.
The rest of this paper is organized as follows. Section 2 describes a scenario that motivates the need of our proposal and its main requirements. Section 3 describes the design decisions behind Ztreamy. We evaluate its performance and compare it to other systems in Section 4. Section 5 discusses other existing middleware platforms for sensor networks. Finally, Section 6 concludes this article.
Section snippets
Motivation and requirements
In this section we introduce a motivating scenario based on the concept of smart cities, in which sensor networks play a key role. The SmartSantander project [6] is one of several examples of research initiatives in this area. They have in common that a large number of physical sensors (traffic, weather, air quality, noise, etc.) are deployed across the city. They may be complemented by other kinds of dynamic data sources, such as the current status of public transport services, the schedule
The Ztreamy stream distribution platform
Our main objective is making Ztreamy a scalable middleware platform for publishing and consuming semantically annotated data streams on the Web, for scenarios such as the one presented in the previous section. In this section we explain and justify how we have devised it from the points of view of architecture, data representation and data transport. Further implementation details are available at [7].
Performance evaluation
We carried out a series of experiments with Ztreamy and other systems in order to measure how their performance evolves as the number of clients connected to the stream and the data rate change. We used the following performance indicators:
- •
CPU use: absolute amount of CPU time the server needed to process a given load. Because the experiments were designed so that a source sends data for an average duration of 100 s, the percentage of a core of the CPU used by the server can be approximated by
Related work
Xively8 (formerly COSM and Pachube) is a commercial service to which data consumers and producers connect to exchange real-time data. To the best of our knowledge, there is no public information about the engineering of their infrastructure.
Systems such as Global Sensor Networks (GSN) [9], DataTurbine [4] and BRTT Antelope9 are well known platforms for gathering data from sensors. They support
Conclusions
In this paper we presented a scalable platform for publishing semantic streams on the Web. The platform is flexible, in the sense that diverse network layouts can be deployed depending on the needs of the application. Streams can be easily duplicated, aggregated and filtered. Applications written in diverse application environments can consume and publish streams by accessing platform servers through HTTP. A programming interface for Python is also available. Besides handling HTTP
References (18)
- et al.
A middleware framework for scalable management of linked streams
J. Web Semant.
(2012) - et al.
Binary RDF representation for publication and exchange (HDT)
J. Web Semant.
(2013) - et al.
The SSN ontology of the W3C semantic sensor network incubator group
J. Web Semant.
(2012) - et al.
Linked data-the story so far
Int. J. Semant. Web Inf. Syst. (IJSWIS)
(2009) - et al.
Issues in data stream management
SIGMOD Rec.
(2003) - et al.
Semantic sensor web
IEEE Internet Comput.
(2008) - S. Tilak, P. Hubbard, M. Miller, T. Fountain, The ring buffer network bus (rbnb) dataturbine streaming data middleware...
- J. Galache, J. Santana, V. Gutierrez, L. Sanchez, P. Sotres, L. Munoz, Towards experimentation-service duality within a...
- et al.
Ztreamy: Implementation Details, Tech. Rep.
(2013)
Cited by (31)
MEdit4CEP-SP: A model-driven solution to improve decision-making through user-friendly management and real-time processing of heterogeneous data streams
2021, Knowledge-Based SystemsCitation Excerpt :They have plans to apply ML techniques to detect patterns within the generated data. One of the main limitations of this proposal is that events can take up to 13 s to be processed, which is a considerable amount of time compared to our proposal, in which we process each event with an average time of 0.024 ms. In addition, the authors use Ztreamy [36] to stream the information, which they state can process 25 000 events per minute, a much smaller amount of data than what Esper or Kafka Streams can handle. In general, if we compare our SP architecture to these proposals, most of them use Kafka as a messaging system between components in their architecture, while most of their transformations and processing tasks could be more easily achieved using Kafka Streams, as we do.
Lostrego: A distributed stream-based infrastructure for the real-time gathering and analysis of heterogeneous educational data
2017, Journal of Network and Computer ApplicationsBenchmarking real-time vehicle data streaming models for a smart city
2017, Information SystemsCitation Excerpt :We therefore consider that Apache Kafka is the most adequate alternative for our case study described in Section 3. In order to compare the efficiency of different infrastructures, two options were implemented and explored: a solution based on the Ztreamy framework [34] and another based on Apache Kafka Streams [35]. In both cases, the same data format was used to send the information from SmartDriver and to receive it from the Streaming Server.
Real-time data analytics and event detection for IoT-enabled communication systems
2017, Journal of Web SemanticsCitation Excerpt :OpenIoT combines and enhances results from leading edge middleware projects, such as GSN and the Linked Sensor Middleware9 (LSM) [29,24]. Ztreamy is another middleware for IoT data acquisition, which can semantically annotate sensor observation using a given ontological representation [27]. Ztreamy facilitates publishing and consuming semantic sensor data stream over the Web.
Hybrid approach for selective delivery of information streams in data-intensive monitoring systems
2016, Advanced Engineering InformaticsCitation Excerpt :Also, quite intensive topic of research is in increasing flexibility of subscription mechanisms. For example, in [26] authors propose the graph-bases subscription mechanism, where the graphs model part of the system entities and simplify subscriptions for users interested in certain type of the information; the work described in [24] deals with semantically annotated data streams, and also proposes filtering mechanisms based on semantics of published data. While publish/subscribe is a powerful approach in redirecting and delivering streams of information in distributed systems, it has some limitations in application to selective delivery in monitoring systems.
Query Interface for Smart City Internet of Things Data Marketplaces: A Case Study
2023, ACM Transactions on Internet of Things