doi:10.1016/S0167-9236(03)00083-6
Copyright © 2003 Elsevier B.V. All rights reserved.
Distribution forecasting of high frequency time series
Department of Computer Science, University of York, Heslington, York YO10 5DD, UK
Available online 9 July 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
The availability of high frequency data sets in finance has allowed the use of very data intensive techniques using large data sets in forecasting. An algorithm requiring fast k-NN type search has been implemented using AURA, a binary neural network based upon Correlation Matrix Memories. This work has also constructed probability distribution forecasts, the volume of data allowing this to be done in a nonparametric manner. In assistance to standard statistical error measures the implementation of simulations has allowed actual measures of profit to be calculated.
Author Keywords: Financial forecasting; Neural networks; Associative memories; Probability distribution forecasting; High frequency time series
Fig. 1. Training. The CMM is updated to learn the relationship between the input and the features (clusters). The frequency store holds a historical simulation for each cluster.
Fig. 2. Forecasting. A recall on the CMM produces the nearest neighbours from which the forecast is constructed. Each neighbour has its own distribution in the frequency store.
Fig. 3. The effect of the window size and number of bins on the ADA error measure. The x and y axes show the parameter values and the z axis shows the error value.
Fig. 4. The effect of the window size and number of bins on the Theil's U statistic error measure. The x and y axes show the parameter values and the z axis shows the error value.
Fig. 5. The effect of the window size and number of bins on the simulation return. The x and y axes show the parameter values and the z axis shows the return.
Fig. 6. The effect of lowering the recall threshold on forecast accuracy and the number of features used. The average given is the mean number of features used per forecasts over the whole test set.
Fig. 7. Iterative extension using the distribution forecast. All possible values given by the previous forecast are treated as observations to make a possible forecast for one further step ahead. An average of all possible forecasts combines them to a single forecast. The weighted average is calculated by the probability given by the distributions of the series following that path.
Fig. 8. The effect of extending k on forecast accuracy.
Table 1. The effect of lowering the recall threshold on forecast accuracy for a range of parameter sets

Table 2. A comparison of different techniques for forecast extension

Table 3. Results of check that the training distributions and test distributions are samples of the same distribution

The Mann–Whitney and Smirnov tests have been carried out for each feature.