1 Introduction

One of the primary problems in sensor networks is that of node localization [4, 9, 21]. For most systems, GPS is the primary means for localizing the network. But for systems where GPS is denied, another approach must be used. A way to achieve this is through the use of special nodes that can localize themselves without GPS, called anchors. Anchors can act as reference points through which other nodes may be localized. One step in localizing a node is to find the direction of an anchor with respect to the node. This problem is known as Radio Direction Finding (RDF or DF). Besides sensor networks, RDF has applications in diverse areas such as emergency services, radio navigation, localization of illegal, secret or hostile transmitters, avalanche rescue, wildlife tracking, indoor position estimation, tracking tagged animals, reconnaissance and sports [1, 13, 15, 16, 23, 25] and has been studied extensively both for military [8] and civilian [6] use.

Direction finding has also been studied extensively in academia. One of the most commonly used algorithms for radio direction finding using the signal received at an antenna array is called MUSIC [22]. MUSIC and related algorithms are based on the assumption that the signal of interest is Gaussian and hence they use second order statistics of the received signal for determining the direction of the emitters. Porat et al. [18] study this problem and propose the MMUSIC algorithm for radio direction finding. Their algorithm is based on the Eigen decomposition of a matrix of the fourth order cumulants. Another commonly used algorithm for determining the direction of several emitters is called the ESPRIT algorithm [19]. The algorithm is based on the idea of having doublets of sensors.

Recently, researchers have used both unsupervised and supervised learning algorithms for direction finding. Graefenstein et al. [10] used a robot with a custom built rotating directional antenna with fully characterized radiation pattern for collecting the RSSI values. These RSSI values were normalized and iteratively rotated and cross-correlated with the known radiation pattern for the antenna. The angle with the highest cross-correlation score was reported as the most probable angle of the transmitter. Zhuo et al. [27] used support vector machines with a known antenna model for classifying the directionality of the emitter at 3 GHz. Ito et al. [14] studied the related problem of estimating the orientation of a terminal based on its received signal strength. They measured the divergence of the signal strength distribution using Kulback-Leibler Divergence [17] and estimated the orientation of the receiver. In a related work, Satoh et al. [20] used directional antennas to sense the 2.4 GHz channel and applied Bayesian learning on the sensed data to localize the transmitters.

Fig. 1.
figure 1

Data processing system for learning algorithm

In this paper we use an uncalibrated receiver (directional) to sense the 2.4 GHz channel and record the resulting RSSI values from a directional as well as an omni-directional source. We then use feature engineering along with machine learning and data mining techniques (see Fig. 1) to learn the bearing information for the transmitter. Note that this is a basic ingredient of a DF system. Just replicating our single receiver with multiple receivers arranged in a known topology, can be used to determine the actual location of the transmitter. Hence pushing the boundary of this problem will help DF system designers incorporate our methods in their design. Moreover, such a system based only on learning algorithms would make them more accessible to people irrespective of their academic leaning. We describe the data acquisition system next.

2 Data Acquisition System

The data collection system is driven by an Intel NUC (Intel i7 powered) running Ubuntu 16.04. For sensing the medium, we use an uncharacterized COTS 2.4 GHz WiFi Yagi antenna. This antenna is used both as a transmitter as well as a receiver. For the receiver, the antenna is mounted on a tripod and attached to a motor so that it can be rotated as it scans the medium. We also use a TP-Link 2.4 GHz 15 dBi gain WiFi omni-directional antenna for transmission to ensure that the system is agnostic to the type of antenna being used for transmission. For both transmission and reception the antenna is connected to an Ettus USRP B210 software defined radio. To make the system portable and capable of being used anywhere we power the system with a 12V6Ah deep cycle LiFePO battery. A Nexus 5X smart-phone is used to acquire compass data from its on-board sensors and this data is used for calibrating the direction of the antenna at the start of each experiment.

Fig. 2.
figure 2

(a): The full yagi setup, (b): plate adapter, (c): Pan Gear system composed of motor and motor controller mounted on standing bracket, (d): motor controller, (e): B210 Software Defined Radio, (f): NUC compact computer, (g): StarkPower lithium ion battery and holder, (h): chain of connectors from B210 to antenna including a rotating SMA adapter located in standing bracket

There are two main components for our setup: the receiver and the transmitter. For our tests, we placed the receiver at the origin of the reference frame. The transmitter was positioned at various locations around the receiver. The transmitter was programmed to transmit at 2.4 GHz, and the receiver was used to sense the medium at that frequency as it rotated about its axis. Our experiments were conducted both indoors and outdoors.

For our analysis we consider one full rotation of the receiver as the smallest unit of data. Each full rotation is processed, normalized and considered as a unique data point that is associated with a given bearing to the transmitter. For each experiment we collected several rotations at a time, with the transmitter being fixed at a given bearing with respect to the receiver, by letting the acquisition system operate for a certain amount of time. We call each experiment a run and each run consists of several rotations.

There are two important aspects of the receiver that need to be controlled: the rotation of the yagi antenna and the sampling rate of the SDR. The rotation API has two important functions that define the phases of a run: first, finding north and aligning the antenna to this direction, so that every time the angles are recorded with respect to this direction; and second, the actual rotation, that makes the antenna move and at the same time uses the Ettus B210 for recording the spectrum. In the first phase the yagi is aligned to the magnetic north using the compass of the smart phone that we used in our system. In the second phase the yagi starts to rotate at a constant angular velocity. While rotating, the encoder readings are used to determine the angle of the antenna with respect to the magnetic north, and the RSSI values are recorded with respect to these angles. It should be noted that the angles from the compass are not used because the encoder readings are more accurate and frequent. The end of each rotation is determined based on the angles obtained from the encoder values.

In order to record the RSSI, we created a GNURadio Companion flow graph [2]. Our flow graph gets the I/Q values from the B210 (UHD: USRP Source) at a sample rate and center frequency of 2 MHz and 2.4 GHz respectively. We run the values through a high-pass filter to alleviate the DC bias. The data is then chunked into vectors of size 1024 (Stream to Vector), which is then passed through a Fast-Fourier Transform (FFT) and then flattened out (Vector to Stream). This converts the data from the time-domain to the frequency-domain. The details of the flow-graph is shown in Fig. 3.

Fig. 3.
figure 3

GNURadio companion flow graph

3 Data Analysis

Now we are ready to describe the algorithms used for processing the data. Our approach has three phases: the feature engineering phase takes the raw data and maps it to a feature space; the learning phase uses this representation of the data in the feature space to learn a model for predicting the direction of the transmitter; and finally we use a cross validation/testing phase to test the learned model on previously unseen data. We start with feature engineering.

3.1 Feature Engineering

As mentioned before, our data consists of a series of runs. Each run consists of several rotations and each rotation is vector of (angle, power) tuples. The length of this vector is dependent on the total time of a rotation (fixed for each run) and speed at which the SDR samples the spectrum, which varies. Typically, each rotation has around 2200 tuples. In order to use this raw data for further analysis we transformed each rotation into a vector of fixed dimension, namely \(k=360\). We achieved this by simulating a continuous mapping from angles to powers based on the raw data for a single rotation and by reconstructing the vector using this mapping for k evenly spaced points within the range 0 to \(2\pi \). The new set of rotation vectors denoted by R is a subset of \(\mathbb {R}^{k}\). For our analysis, we let \(k =360\) because each run is representative of a sampling from a circle.

Fig. 4.
figure 4

Example rotation with markers for the actual angle and two predicted angles using the max RSSI and Decision Tree methods

During the analysis of our data, we noticed a drift in one of the features (moving average max value which is defined below). This led us to believe that the encoder measurements were changing with time during a run (across rotations). Plotting each run separately revealed a linear trend with high correlation (Fig. 5). In order to correct the drift, we computed the least squares regression for the most prominent runs (runs which displayed very high correlation), averaged the slopes of the resulting lines, and used this value to negate the drift. The negation step was done on the raw data for each run because at the start of each run, the encoder is reset to zero. Once a run is corrected it can be split into rotations. Since each rotation vector can be viewed as a time-series, we use time series feature extraction techniques to map the data into a high dimensional feature space. Feature extraction from time series data is a well studied problem, and we use the techniques described by Christ et al. [5] to map the data into the feature space. In all there were 86 features that were extracted using the algorithm. In addition to the features extracted using this method, we also added a few others based on the idea of a moving average [12].

More precisely, we use the moving average max value, which is the index (angle) in the rotation vector where the max power is observed after applying a moving average filter. The filter takes a parameter d, the size of the moving average which for a given angle is computed by summing the RSSI values corresponding to the preceding d angles, the angle itself and the succeeding d angles. Finally this sum is divided by the total number of points (\(2d + 1\)), which is always odd. We use the moving average max value with filter sizes ranging from 3 to 45, using every other integer. This gives an additional 22 features, which brings the total to 108 features.

3.2 Learning Algorithms

Note that we want to predict the bearing (direction) of the transmitter with respect to the receiver for each rotation. As the bearing is a continuous variable, we formulate this as a regression problem. We use several regressors for predicting the bearing: (1) SVR: Support vector regression is a type of regressor that uses an \(\epsilon \)-insensitive loss function to determine the error [17]. We used the RBF kernel and GridSearchCV for optimizing the parameters with cross validation (2) KRR: Kernel ridge regression is similar to SVR but uses a squared-error loss function [17]. Again, we use the RBF kernel with GridSearchCV to get the optimal parameter (3) Decision Tree [3]: we used a max depth of 4 for our model and finally (4) AdaBoost with Decision Tree [7, 26]: short for adaptive boosting, uses another learning algorithm as a “weak learner”. We used a decision tree with max depth 4 as our “weak learner”.

Although we have a total of 108 features, not all of them will be important for prediction purposes. As a result, we try two different approaches for selecting the most useful features: (1) the first one ranks each feature through a scoring function, and (2) the second prunes features at each iteration and is called recursive feature extraction with cross-validation [11]. For the first we use the function SelectKBest, and for the later we used RFECV, both implemented in ScikitLearn.

We also use neural networks for the prediction task. We used Keras for our experiments, which is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. We used the Sequential model in Keras, which is a linear stack of layers. The results on our dataset are described in Sect. 4.

4 Experiments and Results

In this section we present the results of our experiments. In total, we collected 1467 rotations (after drift correction) at 76 unique angles (an example of a rotation reduced to 360 vertices can be seen in Fig. 4). After reducing each rotation to 360 power values, we ran the dataset through the feature extractor, which produced 108 total features. We tried out several regressors, namely, (SVR, KR, DT, and AB) and strategies:- (moving average max value without learning, moving average max value with learning, SelectKBest, RFECV, and neural networks). The objective for this set of tests was to find the predictor that yielded the lowest mean absolute error (MAE), which is the average of the absolute value of each error from that test.

4.1 Regressors

We used the data from the feature selection phase to test a few regressors. For each regressor, we split the data (50% train, 50% test), trained and tested the model, and calculated the MAE. We ran this 100 times for each regressor and took the overall average to show which regressor preformed the best with all the features. The results from these tests are in Table 1.

Table 1. The average error for the each regressors over 100 runs with 50–50 split

From the results we can see that decision trees give the lowest MAE compared to the other regressors. We also noticed that decision trees ran the train/test cycle much faster than any other regressor. Based on these results, we decided to use the decision tree regressor for the rest of our tests.

4.2 Moving Average Max Value

One of the first attempts for formulating a reliable predictor was to use the moving average max value (MAMV). We considered using this feature by itself as a naive approach. We predict as follows: whichever index the moving average max value falls on is the predicted angle (in degrees). For our tests, we used a moving average with size 41 (MAMV-41), which was ranked the best using SelectKBest, for smoothing the angle data. Since no training was required, we used all the rotations to calculate the MAE. As seen in Table 2 the MAE was \(57.1^{\circ }\). Figure 6 shows the errors for each rotation, marking the inside and outside rotations, as well.

Fig. 5.
figure 5

Errors before drift correction. Lines are runs

Fig. 6.
figure 6

Errors after drift correction. Lines are runs

Fig. 7.
figure 7

Errors for MAMV-41 with Decision Tree learning. Even/odd for test/train split

Fig. 8.
figure 8

MAE vs. ranked features from SelectKBest using Decision tree over 1000 runs

Our next step was to use the decision trees with the MAMV-41 feature. We applied a 50/50 train/test split in the data and calculated the MAE for each run. We averaged and reported the MAE for all runs. The average MAE over 1000 runs was \(25.9^{\circ }\) (Table 2). Figure 7 shows a graph of errors for the train/test split where the odd index rotations were for training and the even index rotations were for testing.

4.3 SelectKBest

As mentioned before, we used SelectKBest to rank all the features. In order to get stable rankings, we ran this 100 times and averaged those ranks. Once we had the ranked list of features, we created a “feature profile” by iteratively adding the next best feature, running train/test with the decision tree regressor for that set of features, and recording the MAE. We repeated this process 1000 times and the results are shown in (Fig. 8). It is to be noted that the error does not change considerably for a large number of consecutive features but there are steep drops in the error around certain features. This is because many features are similar and using them for the prediction task does not change the error significantly. The first plateau consists mostly of MAMV features since they were ranked the best among all the other features. The first major drop is at ranked feature 24, which marks the start of the set of continuous wavelet transform coefficients (CWT). The second major drop is cause by the addition of the 2nd coefficient from Welch’s transformation [24] (2-WT) at ranked feature 46. Beyond that, no significant decrease in MAE is achieved by the inclusion of another feature. The best average MAE over the whole profile is \(15.7^{\circ }\) at 78 features (Table 2).

4.4 RFECV and Neural Network

We ran RFECV with a decision tree regressor using a different random state every time. RFECV consistently returned three features: MAMV-23, MAMV-41, and 2-WT. Using these three features, we trained and tested on the data with a 50/50 split 10000 times with the decision tree regressor. The average MAE was \(11.0^{\circ }\) (Table 2). Between RFECV and SelectKBest, there are four unique features which stand out among the rest. To be thorough, we found the average MAE for all groups of three and four from these four features. None of them were better than the original three features from RFECV.

Fig. 9.
figure 9

Neural net vs. RFECV performance. The x-axis represents the percentage of the dataset tested with the other partition being used for training (for example 5% tested means 95% of the dataset was used for training).

Table 2. Comparison of average error among predictor methods

For the neural network approach, we used all the features which were produced from the feature selection phase. We settled on a \(108 \Rightarrow 32 \Rightarrow 4 \Rightarrow 1\) layering. The average MAE over 100 runs with a 50/50 train/test split was \(15.7^{\circ }\) (Table 2). In order to show how the neural network stacked against feature selection, we performed an experiment showing each method’s performance versus a range of train/test splits. Figure 9 shows that RFECV with it’s three features performed better than our neural network at all splits.

4.5 Discussion

From our results in Table 2, we determined that RFECV was the best feature selector. The amount of time it takes to filter out significant features is comparable to SelectKBest, but RFECV produces fewer features which leads to lower training and testing times. Figure 9 shows that RFECV beats neural networks consistently for a range of train/test splits. There are a couple of possible reasons why RFECV performed better. The SelectKBest strategy ranks each feature independent of the other features, which means similar features will have similar ranks. As features are iteratively added, many consecutive features will be redundant. This is evident in Fig. 8 where the addition of similar features cause very little change in MAE creating plateaus in the plot. Our SelectKBest method was, in a way, good at finding some prominent features (where massive drops in MAE occurred), but not in the way we intended whereas RFECV was better at ranking diverse features.

5 Conclusion and Future Work

The main contribution of this paper is to show that using pure data mining techniques with the RSSI values, one can achieve good accuracies in direction-finding using COTS directional receivers. There are several directions that can be pursued for future work: (1) How accurately can cell phones locations be analyzed with the current setup? (2) Can we minimize the total size of our receive system? (3) How well does this system work for different frequencies and ranges around them? (4) When used in a distributed setting, how much accuracy can one achieve for localization, given k receivers operating at the same time (assuming the distances between them is known). In the journal version of this paper, we will show how to theoretically solve this problem, but extensive experimental results are lacking. In the near future, we plan to pursue some of these questions using our existing hardware setup.