1 Introduction

The quantification, prediction, and management of natural resources require multi-scale observations of near-surface environmental processes. Existing remote sensing and ground-based observations alone lack the spatial and/or temporal resolution needed to capture the heterogeneity in complex environments, like mountainous watersheds [1]. Densely distributed environmental sensor networks can overcome some of these limitations, but the concept remains largely unused for near-surface monitoring [2]. The slow adoption can be attributed to the unique challenges posed by field deployments. This includes a lack of power provisions and network infrastructure, climatic and topographic impacts on radio frequency propagation, and mechanical issues related to humidity, moisture, pressure, and even wildlife damage. Commercial devices for environmental monitoring (as used in [3]) are usually the result of a dedicated and costly design process, and low-cost solutions usually consist of prototypes with general purpose hardware (e.g., Arduino, Raspberry Pi) [4]. The cost, replicability, and reliability of these devices prevents system scale-ups to hundreds or thousands of distributed sensors. Another limiting factor in large-scale deployments of environmental monitoring systems, is the wireless connectivity of sensor nodes. Wireless networks of densely distributed sensors can provide data with an unprecedented spatial and temporal resolution for environmental research, providing new insights that improve the predictive understanding of ecosystems and their hydro-biogeochemical processes, and of natural hazards and their triggering processes. Real-time data transmission and remote access is useful for sensor management and for scheduling maintenance, reducing system downtime, and preventing loss of data. In the context of natural resource management and infrastructure monitoring, real-time data can drive crucial decision making processes (e.g., water management, landslide early warning systems) [5, 6]. Despite the wide range of available low-power wide area network (LPWAN) technologies, the unique requirements for environmental monitoring applications result in a non-trivial wireless technology implementation. Despite the long range and low power consumption of NB-IoT and LTE-M [7], these cellular internet of things (IoT) technologies do not provide a solution for environmental sensor networks in remote field sites due to their lack of coverage and operation in licensed spectrum. Alternatives in unlicensed spectrum include Sigfox and LoRaWAN [8]. For sensor nodes, these technologies offer an ultra-low power consumption. However, their base stations were never designed for power constrained applications, and backhaul connectivity is usually required [9]. Wielandt and Dafflon [10] presented an implementation of the LoRa Physical layer in a novel LPWAN protocol stack that offers low power consumption for sensor nodes as well as base stations. Because all devices in the network are time synchronized, packet reception and transmission can be scheduled on each side of a link, resulting in an optimized power and spectrum usage through time division duplex (TDD)  and frequency hopping spread spectrum (FHSS). The resulting network protocol avoids collisions in the network and allows the use of single-channel radios in the nodes and base stations, drastically reducing system complexity, cost, and power consumption of the latter [11]. Given the operation in unlicensed frequency bands, collisions with packets from external networks are still possible. However, the proposed network protocol is aimed towards environmental monitoring applications in remote locations where wireless technologies are generally absent.

In comparison to common internet-of-things devices with few sensors, environmental monitoring systems often produce larger amounts of data, requiring particular attention to data transmission and compression. Here, we adopt the network protocol from [10], and we add a presentation layer that handles data compression, effectively reducing network traffic and power consumption of nodes and base stations. Most LPWAN devices employ standard payload formats (e.g., CayenneLPP [12]) and/or compression techniques (e.g., Huffman encoding) [13]. For the presented implementation, a custom payload format and a lossless compression technique is proposed, taking advantage of the temporal and spatial similarity in environmental sensor data.

The proposed protocol stack is implemented in a custom hardware platform for densely distributed, vertically resolved environmental measurements. Each sensor node comprises a battery powered data logger that is connected to a sensor probe. The sensor probes consist of a tube with an array of cascaded, discrete sensors; here we present two use cases, one with temperature sensors, and one with alternating temperature sensors and accelerometers. When temperature sensors are used, the probes can be vertically deployed above the ground for snowpack monitoring, or below the ground for identifying subsurface thermal regimes, quantifying soil thermal parameters, and estimating heat and/or water fluxes. When temperature sensors are alternated with accelerometers, the probes can be used to measure soil deformation and related thermo-hydrological properties and processes (e.g., soil movement driven by permafrost thaw). The mechanical design, thermal modeling, and the use case scenarios of these sensor arrays have been described and validated by Dafflon et al. [14] and Wielandt et al. [15]. This paper focuses on the system design, the implementation of the novel network protocol stack and data compression techniques, and the effect on power consumption for sensor nodes and base stations.

2 Wireless Networks for Environmental Monitoring

Figure 1 illustrates the architecture and services in a real-time environmental monitoring application. Environmental sensors connect through LoRa to a low-power base station, which communicates over UART with a data publisher. The publisher is connected to a server through a wired or wireless communication technology (e.g., Ethernet, LTE, NB-IoT, WiFi). The server hosts Mosquitto [16], a service that supports the open publish/subscribe MQTT protocol [17], specifically designed for low-cost, low-power operation over bandwidth constrained networks. The incoming sensor data is processed by a Python script, which decodes all messages and feeds data into a real-time data base. InfluxDB is an open-source database specifically targeted at handling time series data. [18]. Grafana is the open source service we suggest for data visualization [19]. The result is a real-time database and data visualization platform that can be accessed by users. Furthermore, users can use the MQTT service to send commands to the sensors for network and/or sensor management.

Figure 1
figure 1

Architecture and services enabling real-time environmental monitoring. Environmental sensors connect through LoRa to a base station, which communicates over UART with a data publisher. The publisher is connected to a server with a real-time database and a visualization platform for users.

Most of the described protocols and technologies have been widely documented and their implementation requires little to no research efforts. However, the implementation of an LPWAN for remote environmental research applications poses a unique set of technical requirements that does not entirely align with current IoT technologies. Environmental sensors require accurate real-time clocks for regular time stamped measurements. Delays in data transmission are generally acceptable, but data loss should be avoided. Furthermore, remote network deployments require not only low-power sensor devices, but also low-power infrastructure.

LoRaWAN relies on LoRa as an underlying physical layer technology that provides long-range, low-power connectivity. However, LoRaWAN requires power intensive multi-channel base stations and backhaul connectivity [9]. The technology reduces collisions by using orthogonal spreading factors, however this does not eliminate collisions in the network. Furthermore, its fair-use policy limits the number of transmissions per day, or at least requires a listen-before-talk approach [20]. These limitations can be overcome with a novel LoRa based network protocol stack, as presented in [10]. The presented stack schedules transmissions in a TDD implementation, preventing package collisions. Pseudorandom network parameters (e.g., channel, bandwidth (BW), spreading factor (SF), code rate) are calculated on both sides of the link based on unique parameters (i.e., network name, device ID, time stamp). The protocol is particularly useful for environmental monitoring applications, since regular measurements, transmissions, and receptions can be implemented in the TDD scheme. Figure 2 illustrates the different states of a network with one base station and two sensor nodes. The base station can either sleep, wake up to receive data from sensors, or broadcast a network management packet that includes the current time and measurement interval. The sensor nodes are mostly asleep, but wake up for a variety of reasons:

  • When a scheduled network management packet is expected, the nodes will enter a LoRa receive (Rx) state.

  • Once every measurement interval, all (m) sensors in the array are read out quasi simultaneously and their data is added to a buffer (i.e. data aggregation).

  • After t measurement intervals, the buffer is full and a compression algorithm is executed, compressing the \(t\cdot m\) sensor values. The algorithm reduces the number of transmissions required, resulting in a lower overall power consumption for sensor nodes and base stations.

  • The compressed data is transmitted in as many packets as required (\(\le t\)). There is one transmission window per measurement interval for each node. When all compressed data is transmitted, no new transmissions will occur until a new compressed data set is available.

$$\begin{aligned} T_S=\dfrac{2^{SF}}{BW} \end{aligned}$$
(1)
$$\begin{aligned} T_\mathrm {preamble}=(n_\mathrm {preamble} +4.25) \cdot T_S \end{aligned}$$
(2)
$$\begin{aligned} \begin{aligned} T_\mathrm {payload}=&T_S \cdot \bigg ( 8 + \max \bigg \{ 0, (CR+4) \\&\cdot \bigg \lceil \frac{8PL - 4SF + 28 + 16CRC - 20H_\mathrm {en}}{4SF -2R_\mathrm {opt}} \bigg \rceil \bigg \} \bigg ) \end{aligned} \end{aligned}$$
(3)
$$\begin{aligned} T_\mathrm {packet}=T_\mathrm {preamble} + T_\mathrm {payload} \end{aligned}$$
(4)
Figure 2
figure 2

Scheme of TDD transmissions, sensor measurements, and data compression in a network with one base station and two sensor nodes (\(t=3\)).

Following FCC requirements [21], air time (\(T_\mathrm {packet}\)) is limited to 0.4 s. The air time depends on the duration of the packet’s preamble (\(T_\mathrm {preamble}\)) and payload (\(T_\mathrm {payload}\)). As expressed in (1-4), these values are determined by the duration of a LoRa symbol (\(T_S\)). For sensor data transmissions, SF is a trade-off between range and throughput. In this research, \(SF=9\) leads to a maximum payload (PL) of 66 bytes following (1-4) [22], assuming an 8 bit preamble (\(n_\mathrm {preamble} = 8\)), \(BW = 125\mathrm {e}3\), 4/5 code rate (\(CR=1\)), cyclic redundancy check (CRC) enabled (\(CRC=1\)), explicit header (\(H_\mathrm {en}=0\)), and low data rate optimization disabled (\(R_\mathrm {opt}=0\)). The package structure is layed out in Fig. 3. The packet starts with a 9 bit packet header, containing an 8 bit signature for authentication (a pseudorandom bit sequence based on the network name, device ID, and time stamp). The subsequent ‘TX_complete’ bit indicates if all compressed data has been transmitted, or if another packet will follow. When a transmission is complete (TX_complete = 1), the base station and the corresponding sensor node will sleep through the remaining time slots to conserve power, as indicated in Fig. 2. Because of the small header size (9 bits vs. 117 bits for LoRaWAN), payloads of up to 519 bits can be transmitted in each packet. When the size of the compressed message exceeds 519 bits, it is split into multiple LoRa packets.

Figure 3
figure 3

Packet structure for compressed sensor data. When the size of a compressed message exceeds 519 bits, it is split into multiple LoRa packets.

Base station transmissions are performed with \(SF=12\) and \(BW = 500\mathrm {e}3\), given the limited amount of network management data and the importance of network range. Following (1-4), this results in a maximum package size of 30 bytes, as presented in Fig. 4. The message starts with an 8 bits pseudorandom signature for identification, followed by a payload descriptor to identify the payload contents. In the most common scenario, the network management payload consists of a UTC time stamp (7 bytes, with epoch 00:00:00, 01/01/2000) and a measurement sample interval (2 bytes).

Figure 4
figure 4

Packet structure for network management transmissions.

3 A Low-Power Hardware Platform with Sensor Node and Base Station Functionality

Figure 5 presents a hardware platform that provides the previously described functionality and meets the requirements of a low-cost, low-power, and reliable system for densely distributed, wirelessly connected thermal and deformation sensor arrays. Each device consists of a main board that can be used as a data logger or wireless base station. In order to allow standalone as well as networked sensor deployments, two wireless technologies are implemented. Bluetooth low energy (BLE) is selected for local connectivity to a computer or smartphone for data offloading and sensor configuration. The choice is motivated by the low power consumption and widely demonstrated technological maturity of BLE [23]. For long-range networking, we use LoRa technology, as previously discussed. In order to implement these technologies, we selected an NRF52832 ARM Cortex M4 as a low-power system-on-chip (SoC) with BLE provisions for short-range communications. This SoC controls a LoRa modem (RFM95W) for single channel communications and can be programmed to operate as a base station or wireless data logger. The PCF2129AT real-time clock takes care of the time sensitive tasks as presented in Fig. 2. A 4 MB low-power flash chip is used for storing sensor data on-device, which enables data logger functionality in offline configurations or data redundancy in online data loggers. In base station mode, received sensor data is stored on an optional microSDHC memory card, and data is shared with an external data publisher over a serial interface, as presented in Fig. 1.

The temperature sensor array consists of a series of TMP117A sensors with a resolution of \(0.0078125\;^\circ\)C and a factory-assured accuracy of \(\pm 0.1\;^\circ\)C [24], which was further improved to \(\pm 0.015\;^\circ\)C using a calibration technique described in [14]. The high resolution and accuracy of the sensors enables observations of important environmental processes, e.g., freeze-thaw interfaces. For the temperature/deformation array each temperature sensor is accompanied by an ADXL345 accelerometer, allowing deformation measurements with a 0.390 mm resolution and a 95% confidence interval of \(\pm 0.73~\mathrm {mm}\) per meter of probe length [15]. The sensors are connected to the SoC through an I\(^2\)C bus. In order to enable sensor arrays of up to 2 m long, the bus speed is limited to 100 kHz and an I\(^2\)C bus buffer with integrated current sources is used (TCA9803). Each temperature sensor on the bus is accompanied by a D-type flip-flop, creating a shift register along the sensor array that is used for individually addressing each sensor.

All components in the system are selected for their operation in the 1.8 V - 3.6 V range, enabling the use of two AA batteries without a need for voltage regulators. Our application uses Energizer L91 Li/FeS\(_2\) cells, which have been rated at 3500 mAh for temperatures between \(-40\;^\circ\)C and \(60\;^\circ\)C [25]. The red blocks in Fig. 5 mark all the components that are directly connected to the power supply. The green blocks indicate the parts that are powered down with a load switch (TPS22919) whenever possible. This eliminates the impact of the MicroSDHC card and the entire sensor array on the device’s sleep power.

Figure 5
figure 5

Block diagram of the hardware platform with LoRa sensor node or base station functionality for environmental sensor arrays. Red blocks are directly connected to the power supply, green blocks can be switched off.

4 Data Compression for Environmental Sensor Arrays

4.1 Use Case 1: Thermal Sensor Arrays

The thermal sensor arrays consist of a series of m TMP117A sensors [24]. Each sensor produces an output of 2 bytes, expressing the temperature in two’s complement format with a 0.0078125 \(^{\circ }\)C resolution. For each measurement, a time stamp (4 bytes) is stored, along with a 10 bits value of the battery voltage (stored as 2 bytes), followed by m temperature values of 2 bytes each. For a temperature probe with 17 sensors, this means that each sample takes up 40 bytes. Given the weight of LoRa transmissions in a device’s power budget [26] and the application’s tolerance towards delayed sensor data, we follow an approach of data aggregation over time, followed by compression, as proposed in [27] and illustrated in Fig. 2. Both data aggregation and compression can minimize energy consumption and network traffic, especially in tree-based network topologies where relay nodes reduce traffic by e.g, eliminating redundant data [28]. In this research, a star topology is employed, limiting opportunities for tree-based data aggregation. Other aggregation strategies are application driven, e.g., event based reporting, feature extraction. The application domain for our technology is environmental science and model development, hence we adopt a regular data reporting strategy without data reduction. Table 1 shows the aggregated temperature data, which forms a data set that exhibits similarity over time and space, because the sensors are deployed in a linear array to measure near-surface temperature gradients, and measurements are repeated regularly. While LPWAN payloads are often transmitted with lossy compression [29] or no compression at all (e.g., CayenneLPP [12]), some studies have investigated lossless data compression in LPWAN networks [13, 30]. Dictionary based compression techniques (e.g., Huffman encoding) can be a solution for efficient lossless compression, but lost packets can impact decoding in the long term and the exchange of dictionaries would cause prohibitive overhead for our monitoring system. The required reliability and the limited number of dynamic, noisy, and high-resolution temperature data points, would result in frequent dictionary updates [31]. In this work we employ a delta encoding scheme, which allows the exploitation of spatial and temporal similarity with limited computational efforts. Instead of transmitting the entire collection of 16 bit temperature values, only \(T_{1,1}\) is transmitted as a reference value, followed by a series of \(\Delta T\) values. First, the variation over space is captured by the \(\Delta T_{1,j}\) values, as expressed in (5). Next, the temporal variability is captured according to (6). k expresses the required number of bits for the spatial variability (\(\Delta T_{1,j}\)) values, and l represents the required number of bits for the temporal variability.

Table 1 Uncompressed temperature data points aggregated over time and space.
$$\begin{aligned} \Delta T_{1,j}=T_{1,j-1}-T_{1,j}\quad \quad j \in \{2,\dotsc ,m\} \end{aligned}$$
(5)
$$\begin{aligned} \Delta T_{i,j}=T_{i-1,j}-T_{i,j}\quad \quad i \in \{2,\dotsc ,t\} \quad j \in \{1,\dotsc ,m\} \end{aligned}$$
(6)

In order to transmit the compressed data set, a message is composed, as presented in Fig. 6. The first 4 bits represent a data format identifier that defines the message format and provides flexibility in terms of future sensors and compression protocols. Next, m, k, and l are specified (respectively 5, 4, and 4 bits), as well as the battery voltage at the time of compression \(V_{\mathrm {bat}}\) (10 bits). Finally, the compressed temperature data set is included starting with a 16 bits value for \(T_{1,1}\). The total size of the message with compressed data (\(s_{\mathrm {message,temp}}\)) is determined by (7).

$$\begin{aligned} s_{\mathrm {message,temp}}=\;&4+5+4+4+10+16 \\&+ k \cdot (m-1) + l \cdot m \cdot (t-1) \quad \mathrm {[bits]} \end{aligned}$$
(7)

Depending on m, k, and l, the length of the composed message can exceed the maximum length of a data package. However, t time slots are available for transmitting the message, and the message can be split up into parts of \(\le 519\mathrm {~bits}\), as demonstrated in the packet structure in Fig. 3. The proposed compression algorithm is lossless, which means that all received data can be decoded without any loss of accuracy. However, under extreme conditions (e.g., a large number of sensors, significant spatial and or temporal variability in measurements) the size of a compressed message could exceed the available network throughput (\(t \times 519\mathrm {~bits}\)). In this case all incoming data will be decodable, but the tail of the compressed message would be lost.

Figure 6
figure 6

Message structure for compressed temperature data.

4.2 Use Case 2: Temperature/Deformation Sensor Arrays

The temperature/deformation probes contain an array of m pairs of TMP117A temperature sensors and ADXL345 accelerometers [15]. The accelerometers are used in static conditions, regularly sampling the Earth’s gravitational vector along three axes (x, y, z). These measurements enable the calculation of each sensor’s tilt and thus the shape of the entire probe and its deformation over time. Each temperature value is represented as a 2 bytes value, and each accelerometer generates three 10 bits values (\(a_\mathrm {x}\), \(a_\mathrm {y}\), \(a_\mathrm {z}\)) to indicate the acceleration in two’s complement format with a resolution of 0.0039 g [32]. For a probe with 16 temperature/acceleration sensor pairs, this results in a total sample size of 736 bits (92 bytes). Given the maximum packet size of 519 bits, data compression does not only result in a reduced power consumption, but it is actually required to ensure that all data can be transmitted using the proposed protocol. In order to compress the measurement values, t samples are accumulated over time, in analogy to the compression algorithm for thermal sensor arrays. Table 2 presents the resulting data set, with temperature values exhibiting the same temporal and spatial similarity as explained before, so we employ the same delta encoding scheme for these values. Acceleration values exhibit different characteristics over time and space, justifying a modified approach for data compression. Unlike temperatures, which exhibit daily variations, the soil movements of interest are usually small, slow, and not reversible, resulting in a lot of potential for data compression. However, spatial similarity is not guaranteed, as soil deformation can be highly heterogeneous. Therefore, we generate a set of reference values (\(a_{i,\mathrm {ref}, \mathrm {x}}\), \(a_{i,\mathrm {ref}, \mathrm {y}}\), \(a_{i,\mathrm {ref}, \mathrm {z}}\)) for all m accelerometers. These reference values are used to calculate each sensor’s delta values (\(\Delta a_{j,i,\mathrm {x}}\), \(\Delta a_{j,i,\mathrm {y}}\), \(\Delta a_{j,i,\mathrm {z}}\)). Given the slow variations in deformation measurements, reference values do not need to be updated in every message. Instead, reference values can be updated gradually and spread over multiple messages. The number of reference values included in each message is a trade-off between compression rate and possible data loss: having all reference values included in each message results in low compression rates, but it guarantees that every received message can be decoded. When reference values are spread over multiple messages, higher compression rates can be achieved, but a single packet loss can affect multiple message decodings. In this research, we arbitrarily chose to include the reference values of two accelerometers in each message. For an \(m=16\) probe with \(t=4\) and a sampling interval of 15 min, this means that the entire reference data set will be refreshed every 8 hours. The calculation of reference values and delta values is presented in 8. In order to determine which reference values to update and transmit, a daily sample count (c) is used. Any network device with a synchronized clock and knowledge of the sampling interval can calculate c, so this value does not need to be included in the message. For deformation measurements, the expected acceleration values are \(a_\mathrm {x} = \mathrm {1~g}\), \(a_\mathrm {y} = \mathrm {0~g}\) and \(a_\mathrm {z} = \mathrm {0~g}\) because the probes are deployed vertically in the soil [15]. We use these expected values in the calculation of reference values, which allows a reduction of the number of bits per reference value (p). The number of bits per delta value is defined as (q). Assuming that \(p\ge 3\) and \(q\ge 1\), each value can be encoded in the message header using just 3 bits, as presented in Fig. 7. This table shows the data format (designated as ‘0001’) for temperature/deformation arrays, consisting of a 49 bits header, followed by the first measurement sample’s encoded temperature and acceleration values (which includes two accelerometers’ updated reference values, according to 8). Subsequently, the temperature and acceleration values for the remaining samples are appended. This approach allows the receiver to start decoding the oldest measurement samples in messages that have not yet arrived completely (i.e. messages that have been split into multiple packets because they exceeded the 519 bits packet payload size). The total size of a message (\(s_\mathrm {message,acc}\)) is expressed by 9.

$$\begin{aligned} \begin{aligned}&\forall i \in \{ 1,...,t\} \\&\forall j \in \{ 1,...,m\} \end{aligned} {\left\{ \begin{array}{ll} {\left. \begin{array}{l} a_{i,\mathrm {ref}, \mathrm {x}} = \Delta a_{i,j,\mathrm {x}} = a_{i,j,\mathrm {x}} -1 \\ a_{i,\mathrm {ref}, \mathrm {y}} = \Delta a_{i,j,\mathrm {y}} = a_{i,j,\mathrm {y}} \\ a_{i,\mathrm {ref}, \mathrm {z}} = \Delta a_{i,j,\mathrm {z}} = a_{i,j,\mathrm {z}} \\ p = \underset{\begin{array}{c} u \in \{\mathrm {x,y,z}\} \\ i,j \end{array}}{\mathrm{max}}{(\lceil \log _{2} |\Delta a_{i,j,u} |\rceil +1 )} \end{array}\right\} } &{} {\begin{aligned} &{} \text {if i = 1 and} \\ &{} \lfloor \frac{j}{2} \rfloor \equiv \frac{c}{t} \left( \mathrm {mod} \frac{m}{2} \right) \\ &{} (\text {Ref. values})\end{aligned}} \\ {\left. \begin{array}{l} \Delta a_{i,j,\mathrm {x}} = a_{i,j,\mathrm {x}} -a_{i,\mathrm {ref}, \mathrm {x}} \\ \Delta a_{i,j,\mathrm {y}} = a_{i,j,\mathrm {y}} -a_{i,\mathrm {ref}, \mathrm {y}} \\ \Delta a_{i,j,\mathrm {z}} = a_{i,j,\mathrm {z}} -a_{i,\mathrm {ref}, \mathrm {z}} \\ q = \underset{\begin{array}{c} u \in \{\mathrm {x,y,z}\} \\ i,j \end{array}}{\mathrm{max}}{ ( \lceil \log _{2} |\Delta a_{i,j,u} |\rceil +1 )} \end{array}\right\} } &{} {\begin{aligned}&{} \mathrm {otherwise} \\ &{} (\Delta \; \mathrm {values})\end{aligned}} \end{array}\right. } \end{aligned}$$
(8)
$$\begin{aligned} s_{\mathrm {message,acc}}=&49 + k \cdot (m-1) + l \cdot m \cdot (t-1) \\&+2\cdot 3 \cdot p + (m\cdot t -2) \cdot 3 \cdot q\quad \mathrm {[bits]} \end{aligned}$$
(9)
Table 2 Uncompressed temperature and acceleration values aggregated over time and space.

5 Results

5.1 Data Compression

To evaluate the efficiency of the proposed compression algorithm and its impact on the battery life of nodes and base stations, three data sets were acquired. The spatial and temporal profiles for these use cases are considerably different, so we evaluate the compression algorithm for each scenario. Figure 8 presents a 10 months data series with 15 min measurement intervals for an above ground thermal sensor array (\(m=29\)) for snowpack monitoring in the East River watershed, Colorado [33]. The data are characterized by pronounced diurnal temperature cycles that fade under snow cover, enabling an estimation of snow thickness over time [34]. Figure 9 depicts subsurface temperature data (\(m=17\)) for the same location, measurement interval, and time frame as the snowpack sensor array. Diurnal temperature variations can be observed near the surface, but temperatures at greater depth show more temporal stability. Figure 10 shows the temperature and acceleration values for a subsurface probe deployed on the Seward Peninsula, near Nome, Alaska, measuring every 15 min over a 6 months period. The temperature data show diurnal variations near the surface, as well as a deepening freezing front over time. As the freezing front evolves, we can also observe a change in acceleration values, indicating soil deformation.

Table 3 Performance of the compression algorithm for two 10-month datasets, and the impact on power consumption.

For each data set, Table 3 compares several performance indicators for compressed and uncompressed data. The compressed data is evaluated as a function of t: as more data is aggregated over time, the efficiency of the compression algorithm can be expected to increase. However, higher values of t result in larger delays and a larger impact of packet loss. In order to assess the performance of the algorithm we consider the number of transmitted packets, and the total size of the data set in KB, which leads to an easier to assess compression ratio (compressed to uncompressed data set size).

Figure 7
figure 7

Message structure for compressed temperature and acceleration data with \(i \in \{2,\cdots ,t\}\), \(j \in \{1,\cdots ,m\}\), \(u \in \{\mathrm {x,y,z}\}.\)  

Figure 8
figure 8

Thermal data series for an above ground sensor array (\(m=29\)). The array was deployed at the \(300~\mathrm {km^2}\) East River watershed, Colorado, and temperatures were sampled every 15 min.

Figure 9
figure 9

Thermal data series for a subsurface sensor array (\(m=17\)). The array was deployed at the East River watershed, Colorado, and temperatures were sampled every 15 min.

Figure 10
figure 10

Data series for a subsurface sensor array for deformation and temperature sensing (\(m=19\)). The array was deployed at a watershed near Nome, Alaska, and data were sampled every 15 min.

Table 3 clearly indicates the impact of t for all data sets. Low values of t result in a strong compression of the data, but in the case of snowpack measurements this does not necessarily lead to a proportional decrease in transmitted packets. This can be explained by the potentially inefficient filling of LoRa packets: higher values of t lead to larger \(s_\mathrm {message}\) values, which means that more LoRa packets contain the maximum payload of 66 bytes. One can also remark that the impact of t diminishes as its value increases. Given the larger delays and the increased impact of packet loss, we advise a compromise of \(t \in \{3,\dotsc ,5\}\).

When we compare the different scenarios, it is clear that subsurface temperature data offers the greatest potential for compression. This can be attributed to the single data type (there is only thermal data and no acceleration measurements along x, y, and z axes), and the low spatial and temporal variability that can be observed in Fig. 9. The compression ratios for snowpack temperature measurements are higher because of the strong diurnal variation in above ground temperatures as shown in Fig. 8, leading to a larger entropy. The results for subsurface temperature/deformation measurements indicate compression ratios as low as 0.608. Despite the low spatial and temporal temperature variability, the compression ratio is significantly higher than for the other scenarios. This can be linked to the fact that each measurement contains three 10 bits acceleration values, in comparison to a single 16 bits temperature value. Despite the highest compression ratios for the temperature/acceleration values, the impact of the compression algorithm is the most significant for this application. In a scenario without compression the entire data set would require 39,730 packet transmissions, equalling two transmissions per measurement. The compression algorithm reduces the number of packet transmissions to less than one per measurement, not only improving battery life, but ensuring the usability of the proposed network protocol. In related work [30], compression ratios between 0.48 and 0.62 are reported. The difference in data sets prevents a direct comparison of algorithm performance, but the results presented in this paper demonstrate the potential of spatial and temporal similarity for compression purposes.

5.2 Power Consumption

The power consumption of the nodes and base stations is a function of the presented hardware layout, communication protocol, compression algorithm, and acquired data. In order to assess the battery life of each device in the network, a power profile was recorded for each system state. For the calculation of these profiles, we measured the DC current along the VCC line under a constant voltage of 3.3765 V using a Keithley DMM6500 6.5 digit digital multimeter. Figure 11 presents the results of these measurements, along with the required energy for each event. The sleep state is characterized by an average power consumption \(P_\mathrm {avg}\), which takes into account the 1 Hz power spikes that are associated with BLE advertising. The presented results emphasize the impact of LoRa communications on a device’s energy budget. The aggregated data can not be reduced due to the nature of the application and network topology, but the presented data compression technique can reduce network traffic with a minimal energy impact of \(225 \upmu \mathrm {J}\) per compression, indicating the computational simplicity of the algorithm. The energy cost of a LoRa transmission is measured at \(174\mathrm {~mJ}\), which means that the compression algorithm pays off when LoRa transmission can be reduced at least 1/772 (0.13%), a goal that is significantly exceeded as indicated in Table 3. As a result, trade-offs in the configuration of the compression algorithm in our applications are solely related to data delays and potential impacts of packet loss, not energy cost.

Figure 11
figure 11

Power profile for each system state, as presented in Fig. 2.

The acquired power profiles are used to estimate the battery life for each configuration of the compression algorithm in Table 3. For these calculations we consider 15 min measurement intervals and we do not take user initiated BLE data transfers into account. The selected AA batteries provide an energy supply of 41,580 J [25]. Since the power consumption of base stations depends on the amount of network traffic [10] we assume a scenario of a single base station with 100 sensor nodes, all exhibiting the same compression ratios. The results in Table 3 demonstrate that thermal sensor nodes can last for \(>4.5\) years on a single pair of AA batteries without data compression. For temperature/deformation arrays (\(m=19\)), a battery life of 2.8 years can be expected, since each uncompressed measurement sample results in two LoRa packet transmissions. LoRa transmissions account for 70% of the energy, the sleep state is responsible for 27%, and sensor measurements use 2%. This indicates the potential benefit of the compression algorithm. As can be seen in Table 3 a node’s battery life can even be doubled under favorable conditions (subsurface temperature sensors with \(t=8\)). For base stations, the potential battery life improvement is even more significant. For the scenario without data compression the battery life is only one month in networks of temperature sensors, or 16 days in networks of temperature/deformation sensors. For base stations, 98% of the energy budget is associated with the Rx state. This can be improved considerably if all sensor nodes in the network perform data compression. In the case of subsurface temperature arrays, base station battery life can be extended with a factor of up to 3.50. This enables autonomous operation in remote field sites for multiple months or even a year for base stations that use a small solar panel or multiple pairs of AA batteries. One can remark that the compression algorithm has a stronger impact on the battery lifetime of the base stations than the sensor nodes. This is explained by the higher relative importance of the sleep state in a sensor node’s power budget, whereas base station power consumption is almost solely attributed to the Rx state. In [29], Väänänen et al. present a LoRaWAN sensor platform with various lossy data compression methods. The platform exhibits a \(467{\upmu \mathrm {W}}\) sleep power consumption (i.e. 6 times more than the platform presented in this paper), which results in a total battery lifetime of 492 days without compression. The best performing compression algorithm realizes a power savings factor of 1.28 (versus 1.99 in our research). Another solution is presented in [13], which reports a power savings factor of 1.45 for a lossless compression method. In related work, the power consumption of an iC880a LoRaWAN concentrator is reported. Consuming at least 1.44 W, this would result in a battery life of less than 8 hours when operating on a couple of AA batteries, which is significantly lower than any scenario presented in this paper.

6 Conclusions

Distributed environmental sensor arrays for above and below surface sensing require specific provisions in terms of power and connectivity. In this study, we presented a framework that covers the network protocol, a data compression technique, and a hardware platform that can be used as a wireless base station or a data logger for environmental sensor arrays. The presented wireless interface uses LoRa in a TDD implementation, enabling battery operated base stations and eliminating limitations concerning package collisions and fair use policies. A lossless data compression method was developed, relying on spatial and temporal similarity in the acquired data. This algorithm was implemented in a custom AA battery powered hardware platform and the power consumption was analyzed for various scenarios. As a wireless sensor node, the device exhibits a battery life of \(>4.5\) years, which can even be doubled with the proposed compression algorithm. As a LoRa base station, the platform’s battery life is limited to several weeks or months, depending on the network configuration. However, the proposed compression algorithm can increase the battery lifetime with a factor of up to 3.50, making environmental distributed sensor deployments a viable option. In future work, we will further investigate the impact of measurement sampling intervals, the number of nodes, the number of sensors per node, etc. We will also develop relay nodes and deploy a sensor network in a mountainous watershed to study performance as a function of topography, vegetation, and antenna siting.