This blog post and the related Github repository do not constitute trading advice, nor encourage people to trade automatically.
Using a neural network applied to the Deutsche Börse Public Dataset, we implemented an approach to predict future movements of stock prices using trends from the previous 10 minutes. Our motivation was to gain insights into this dataset and establish an architecture and approach from which we can iterate. We also intended for this work to help anyone seeking to get started with machine learning and data analysis with the Deutsche Börse Public Dataset.
We began by asking ourselves if it is indeed feasible to guess if a traded stock is about to go up or down in value, given information about its recent prices over time?
Fig 1. An example of minute-by-minute stock price movements over successive days [BMW stock on March, 26, 2018]
Price movements of stocks exhibit noise from one moment to the next. While a given overall trend may be upwards, we can see temporary dips along the way. Predicting these fluctuations and guessing their direction is a key part of the challenge we set for ourselves. To achieve this, we applied neural network approaches to the Public Dataset (PDS) from Deutsche Börse. We chose this approach because of the inherent flexibility of neural networks in expressing multiple models, from very simple to advanced. At the same time, we acknowledged that a key weakness of neural networks is that they ignore outliers, which can have huge importance in this domain as “Black Swan” market events.
The PDS is a free and openly available data set, consisting of aggregated trading information from the XETRA and EUREX trading engines, which comprise a range of types of securities, all detailed at minute-by-minute granularity and updated in near real-time at the same frequency. We chose to focus on working with equity securities exclusively, which are provided as part of the XETRA component of the PDS. The activity of each stock has detailed trading information on a per minute level, including volumes and maximum, minimum, start and end prices. In effect, the data here is the same as that served to customers, only delayed by a few minutes and aggregated.
If we possess the ability to predict if a stock price will go up or down in the next minute based on an analysis of its historical behaviour, we would theoretically have one component of a trading strategy. However, being able to predict the price movement is not enough to make money algorithmically on the stock market. Other considerations are related to trading costs, slippage, tax, and risk management in general.
Initially, we chose to predict the end price at the next minute because the dataset is organized on a minute-by-minute basis and we wanted to be as close as possible to a “real-time” forecast. Through our analysis we found that prediction of the direction of the end price one or more minutes in the future is difficult due to presence of large amounts of noise. Our repositories contain supplementary notebooks detailing the performance of predicting distributional statistics, such as the mean end price, across time intervals. Those are easier to predict than the end price and are practically more meaningful.
Here, we describe our approach, from transforming the data and conducting exploratory data analysis to applying Neural Networks to the task of stock market prediction. This work is intended to serve as basic introduction to the Public Dataset and to help inspire other projects based on it. It can be generalised to other data sets, including those of other stock markets, including cryptocurrencies.
We used a highly simplified trading strategy, and showed that this could generate positive “ideal” returns. Our definition of return was simply defined as the difference between current and previous price over the previous price, while in Finance the return is measured through the alpha, which depends on a market index or benchmark. We also did not take into consideration trading costs, slippage or integration with our trading strategies. Therefore the outcome is optimistic and may not be applicable in practice.
We consider our work only a starting point in a challenging field. We would appreciate any feedback, delivered by filing Github issues regarding the methodology, notable omissions or tips on how to improve the code.
This work clearly does not constitute trading advice, nor encourages people to trade automatically.
We began by obtaining an extract of the data from the PDS AWS S3 bucket and examining its structure. The data comes with the following fields:
|Column Name||Data Description||Data Dictionary|
|ISIN||ISIN of the security||string|
|Mnemonic||Stock exchange ticker symbol||string|
|SecurityDesc||Description of the security||string|
|SecurityType||Type of security||string|
|Currency||Currency in which the product is traded||string|
|SecurityID||Unique identifier for each contract||int|
|Date||Date of trading period||date|
|Time||Minute of trading to which this entry relates||time (hh:mm)|
|StartPrice||Trading price at the start of period||float|
|MaxPrice||Maximum price over the period||float|
|MinPrice||Minimum price over the period||float|
|EndPrice||Trading price at the end of the period||float|
|TradedVolume||Total value traded||float|
|NumberOfTrades||Number of distinct trades during the period||int|
Table1. PDS XETRA Data Features
Using Python 2.5 and the Pandas library, we established the steps needed to put the XETRA time series into a well-formatted data frame, and then created a data transformation pipeline to standardise the data output for any input from the PDS.
We used this pipeline to create a working data frame for analysis and prediction tasks, containing the top 50 stocks by trade volume, for days in January to March 2018, excluding those with no trades within the timeframe.
After the preparation of this blog post, it came to our attention after review by a trader, that we need to split, dividend, and delisting adjust this data for it to provide a better tradeable signal in backtesting. In future demos, we plan to take those suggestions into account.
In general, before performing any form of machine learning, we need to thoroughly understand the data. Since we are not financial market experts, we have to build up a picture of the data’s behaviour and characteristics from the ground up.
We expect that this analysis and the accompanying notebooks will be useful to non-experts, but experts may find them obvious. We also acknowledge that quants may expect to see autocorrelation plots and tests for seasonality and stationarity, but these have been left for future work since for Neural Networks they are largely of limited use.
We did not account splits or delisting in this work, since we have used top stocks by volume throughout the study, which should have a low likelihood of this occurring.
We began by inspecting the main features: MaxPrice, MinPrice, StartPrice, EndPrice, TradedVolume and NumberOfTrades.
Fig 2: Price features for BMW stock from 8am to 3pm on March, 26, 2018
Fig 3: TradedVolume for BMW stock from 8am to 3pm on March, 26, 2018
Fig 4: NumberOfTrades for BMW stock from 8am to 3pm on March, 26, 2018
The data was resampled into smoothed 15-minute intervals to observe trends more easily, using the Pandas library to discretize the time series and to compute median values. We did not compare with an exponentially weighted moving average since it was only important to reveal patterns that could be easily interpreted. After inspection, we found the following crucial behaviour:
- In downward trends, EndPrice tends to be closer to MinPrice than MaxPrice, and below StartPrice.
- In upward trends, EndPrice tends to be closer to MaxPrice than MinPrice, and above StartPrice.
- In downward trends, StartPrice tends to be closer to MaxPrice than MinPrice.
- In upward trends, StartPrice tends to be closer to MinPrice than MaxPrice.
Fig 5: The relationship between MinPrice, MaxPrice and EndPrice
This behaviour resulted in the decision to generate a new feature of the stock movement direction, which we called Direction Feature (DF). Mathematically, it is defined as:
DF = (MaxPrice[t -1] – EndPrice[t-1]) – (EndPrice[t – 1] – MinPrice[t-1])
DF = MaxPrice[t-1] – MinPrice[t-1] – 2(EndPrice[t-1])
Looking at the Direction Feature, you can see that it just describes whether the EndPrice for a given minute’s stock trading is closer to the minute’s MaxPrice or the MinPrice. We believed that this metric could function as a simplistic but effective proxy for supply or demand, and therefore relate to upward or downward trends in price for a stock over time. After some research, we found out that this feature is similar to concepts in the pre-existing Accumulation/Distribution Line technical indicator and the well-known hammer trading strategy.
We found here, in this particular case, the Direction Feature correlates (0.33) with the rate of return in the next minute. We did not compare this performance to other existing indicators in financial technical analysis, such as such as RSI, MACD, A/D, Bollinger Bands and so on, but we wish to make such comparisons in the future.
As discussed earlier, the rate of return was simply defined as the relative amount of movement in the EndPrice from one minute to the next, also known as the simple One-Period Return for price, P, at time, t:
R = (P[t+1] – P[t])/P[t]
We used this definition because it was easy to interpret as a percent change and does not require financial knowledge. Moreover, we could easily normalize the data, which was important since we used multiple stocks.
The strength of correlation of the Direction Feature with rate of return was found to vary by which time period it was being calculated for. In other words, at certain times, the Direction Feature was found to correlate strongly with rate of return and at other times and with other stocks it did not. This is not unexpected given how many influences there are on a stock’s pricing. We hypothesize that this is due to the market microstructure. While the discretization of our measurements is fixed by the minute, the stocks move at different speeds at different times. We also found that the same Direction Feature has stronger and stable correlations with the mean end price 30 minutes ahead. This confirms the difficulty of trying to predict the exact end price vs. the mean price. The end price contains much more noise.
We designed a simple neural network approach using Keras & Tensorflow to predict if a stock will go up or down in value in the following minute, given information from the prior ten minutes. A notable difference from other approaches is that we pooled the data from all 50 stocks together and ran the network on a dataset without stock ids. The dataset consists of records where the predicted variable is the movement (up or down) and the features are extracted from the last 10 minutes. We found that feature normalization is critical.
We assessed not only how many correct predictions the model would achieve, but also how much overall return from a trading strategy it could make. We benchmarked it against a range of heuristic strategies to grant understanding of how well it performed against easily relatable, simpler alternatives. The total set of strategies employed were:
- A random trading strategy, where the decision to buy or sell for the following minute is randomly generated.
- An ‘always up’ strategy, where the price is always predicted to go upwards and so buying for the next minute always occurs.
- An ‘always down’ strategy, where the price is always predicted to go downwards and so selling for the next minute always occurs.
- An ‘omniscient’ strategy, where the price movement is always predicted correctly and buying and selling occurs appropriately. This gives the maximum possible gain from the market.
- A ‘DirectionFeature-based’(DF-based) strategy, where,
- If DF is positive (i.e. EndPrice is closer to the MaxPrice), buy stock for the following minute.
- If DF is negative (i.e. EndPrice is closer to the MinPrice), sell stock for the following minute.
- A strategy based on the neural network, which, for a given minute, gives the predicted direction of the next minute’s price movement using a prediction score. This score can have a value between -1 and 1. In general,
- If the score is >0, assume the price will go up in the following minute, and buy stock
- If the score is =0, assume the price will not change in the following minute, and we do not trade
- If the score is <0, assume the price will go down in the following minute, and sell stock.
Each strategy, including the neural network, allocated 1 Eur at every minute for buying and selling actions. If the decision was correct, the return from the 1 Eur was calculated and added to a total return amount. If incorrect, the loss would be deducted from the total return. This, of course, is an extremely simplified approach that doesn’t employ a “portfolio” of shares or take into account transaction costs but can tell us how the strategies would perform relative to one another. Here, our goal was to give immediately meaningful interpretations of the predictions without employing back-testing. In future work we would like to compare these methods to established strategies such as the exponential moving average and baselines like “Buy and Hold” and MACD.
We created three data sets from the full set of data, for training, validation and testing. The training set consisted of 60% of the data (120 days) and was used to train the model. The validation set, 5% of the data (10 days), was used to benchmark the training process without providing features to the model, aimed at avoiding overfitting. The test set consisted of the remaining 35% of the data (70 days) and was used to evaluate the model overall at these prediction tasks. A common validation + training set was created using the first 130 days, then 10 days were sampled from it to create the validation set. The test set was created from the last 70 days of data. Our test on the test data is out-of-sample, walk-forward.
We then processed the data, building on the experience gained previously and creating additional steps. We removed any rows where “N/A” values were present, which occurred where there were no trades, and were mostly found at the beginning and end of a day’s trading. We then concatenated each day’s stock data to the end of the previous day’s trading to build a continuous time series. This did not represent the breaks in trading, but enabled us to build a simpler, more robust model. We also clipped extreme outliers, since values need to be bounded for neural networks to perform well. We recognised that the model would not be trained to represent extreme movements, but the trade-off was worthwhile, since we were interested in predicting the direction of the movement more rather than the value.
A feature set was then computed, which included:
- The Indicator
- Linear combinations of EndPrice, MinPrice and MaxPrice
- Rolling smoothed standard deviation of EndPrice, calculated using windowed data from the previous ten minutes.
- The sign (+ or -) of the calculated Return.
The neural network architecture was constructed as a simple 2-layer dense model, with an L2 penalty to prevent overfitting. More complicated models can be built and tested in the future. As the Indicator is itself a linear combination of component variables, it is also conceivable that the neural network could synthesize it as a feature in a hidden unit.
|Trading Strategy||Return (Eur)||% of Max. Possible Return|
Table 2. Performance of the trading strategies on the test set, measured and ranked by descending total returns.
The omniscient approach gave the maximum possible return of 1638.60 Euros, obtained from this data set. To make sense of this number one should divide it by the number of stocks and days and will obtain the daily rate of return per day. The neural network strategy achieved 14.5% of the theoretical maximum, which is a highly encouraging result for a first step into this domain (this should not be read as generating 14.5% excess, however). Notably, the Indicator-based approach achieved a return of 6% of the maximum, which implies that it could be a good predictor of stock movements in itself. Note that these numbers are not returns on the financial market but percentages of the theoretical maximum. The numbers are only useful to compare models between each other but not to make informed decisions if a model will be useful for a trading strategy. We use this simplified strategy in order to have a more interpretable metric than the error rate and we do not recommend trading on such a simplified strategy. We would ideally need to test this further on more time series and stocks before drawing strong conclusions about the performance of these approaches; however, they are promising initial results.
The Indicator and neural network strategies traded only on the minutes that matched the conditions for trading, not all minutes. The Indicator-based strategy traded 1,436,174 times, while the neural network traded 2,128,392 times, 48% more often, which indicates the neural network can take advantage of more situations than the Indicator-based strategy.
Table 3 gives a breakdown of the performance of the neural network approach bucketed by prediction score percentiles; so, for example, predictions within the top 10% of absolute prediction scores achieved an accuracy of 58.8% and covered 7.7% of all trades. This gives us an understanding of how the accuracy changes with the scores, which can be used for further performance tuning.
|Prediction Score Percentiles||Accuracy (%)||Percent Answered (%)||Achieved Normalised Returns (Eur)|
Table 3. Performance of the neural network at predicting stock movements
Note that the Achieved Normalised Returns per trade are lower than typical transaction costs per trade. Clearly, this means that in reality we would be operating at a net loss. In practice, you would only trade when you expect the magnitude of the price change to be larger than the cost. With a more sophisticated trading strategy, this would be taken into account.
Additional work also showed that different stocks and different days exhibited different performance when it came to price movement predictions. It is not clear whether this is the result of some fundamental difference or just a phase of behaviour that some stocks experience. More work would be needed to determine this.
We have shown that we can use a neural network to predict future movements of stocks in the Deutsche Boerse Public Dataset and used this as the basis of a simplified trading strategy. The neural network model used here is intentionally simple, and there are a range of models and techniques that could yield better results.
Long-Short Term Memory (LSTM) and convolutional layers are promising candidates, which could allow us to make use of data far beyond the previous 10 minutes. Another approach would be to craft a network that used the prices of other stocks to predict the value of each other, particularly those which are known to be correlated. Applying typical neural network tricks such as hyper-parameter tuning should also improve performance. Building a model that tunes itself to the variance of the features in the time series, rather than normalizing all the features before feeding them to the network would also be a potential route to follow.
Other ideas include making use of trading volumes and number of trades (as defined in the PDS dataset) as features in the model, using multiple input and multiple output models that can simultaneously learn to predict both volumes and prices and making use of global market trends from the past days or months as predictors.
To maximise returns, we believe that a lot of work can be done to build upon our simple trading strategy. An obvious area is to try predicting magnitudes of the price movements, along with uncertainties, thus allowing the strategy to place bigger bets when the magnitude is high and uncertainty is low. Implementing Monte Carlo slippage modelling and trading costs would also help develop a strategy better suited for real-world trading that could be fully back-tested. We would also move away from the simple definition of return employed here to use alpha, in line with current practices in the industry. We would also seek to compare the strategies assessed here against established ones in the industry, to understand its performance against tried-and-tested approaches.
After having experimented thoroughly with different predictive features and seeing different effects over different trading days and stocks, we believe our work only shows a promising lower bound to the problem of predicting and understanding stock price movements. At the same time, it is an interesting test-bed for ideas in machine learning. Price movements operate on different scales for different stocks and trading days. The observable effects are non-linear with a turning point that reverses a local or global trend. This is in stark contrast to many machine learning problems where the data is stationary and many effects are additive.
We hope that this work inspires more projects for stock market prediction, especially using the DBG Public Dataset and its minute-by-minute granularity. We look forward to seeing further output from the Machine Learning community in this area.
During the exploratory phase, we used Seasonal Decomposition to begin understanding the stock data as a time series. This method deconstructs a given time series into 3 components:
- Trend, which relates how the data is changing over time once seasonality has been removed.
- Seasonal, which is the underlying pattern that a data series might follow if it has any seasonal component, such as a daily, weekly or monthly cycle.
- Residual, which reflects the remainder of the time series once the Trend and Seasonal components have been subtracted.
We applied this to the time series of individual stocks to examine their underlying nature.
Using price information, we found that, while the seasonal components typically show a clear daily cycle with strong underlying trends, the magnitude of the residuals are relatively large. This confirmed to us that adopting a simple seasonal model for predicting prices would not yield good results.
Fig 6: Seasonal Decomposition Example
Also during the exploratory phase, we sought to identify if stocks showed similarity in terms of their price trends over time and if they could be grouped according to this. It is known that stock price movements are often correlated with other stocks. We attempted a simple clustering of stocks by calculating the correlations of every stock’s EndPrice movements with those of others over time and visualizing the results. Our motivation was that, by using the price movements of other stocks, we could better predict the price movement of a target stock.
We built a pairwise correlation matrix for the stocks time series by computing the Pearson correlation of each pair of stocks in our full data set and storing this in a 100×100 element array. Using Singular Value Decomposition (SVD), we then reduced the number of dimensions describing each stock from 100 to 50. This gives a more “condensed” representation of the vector space and can help the clustering process produce better results. We applied the HDBSCAN clustering algorithm to the vectors, with a minimum cluster number of 2. This was then visualised using the T-SNE algorithm to represent the clusters in 2-dimensions.
Fig 7: Clustering of Stocks Example
The algorithm produced 4 clusters, with 9 stocks that were classified as outliers. We see one large cluster totalling 74 stocks, with a secondary cluster numbering 12 stocks. The 2 other clusters number just 4 and 2, respectively. We can see that most stocks have similar behaviour, and that other clusters explain the majority of the others. It may be possible to refine this clustering approach to seek more fine-grained clusters but these results were illustrative for our exploratory purposes.
As well as looking at the Indicator, we devised a high-throughput methodology to identify existing and generated features with predictive power over rate of return in EndPrice in the following minute.
The approach was based on the idea of ranking features by their correlation with the rate of change in EndPrice in the following minute. To increase the breadth of the investigation, we generated new features by applying transformations to the features, such as “MinMax”, “Windowing” and “Rate of Change”, and ranked these as features in themselves. This resulted in generating around 3,000 features, and a similar number of comparisons.
We observed several features, such as volume, correlated with the absolute rate of return, but not with the direction of the delta. In other words, these features could be used to predict the magnitude of coming price changes but not whether they were negative or positive. However, nothing outperformed the Indicator based on EndPrice, MaxPrice and MinPrice. We had expected volumes to correlate with the rate of return in some way since we knew that High Frequency Traders often exploit the difference of buy and sell limit orders in the order book to anticipate price directions. This information is not part of the PDS, with only the volume as a proxy, to some degree.
We found that stocks exhibited different correlations with the price, which indicates a range of behaviours in the stock time series. We also observed that most of the correlations, while strong for a particular day or stock, may flip entirely to the opposite sign for another day or another stock. This is the basis behind pairs trading, where if a linear combination of stocks is stationary, we can then form a strategy around trading a basket of them.
We are thankful to our Originate reviewers for their feedback, and to Ajay Mansukhani for his explanations of how trading works in practice.