Can Data Scientists Predict the Financial Markets?

It is evident that the study of data science and machine learning has revolutionized most, if not all, industries and it feels like every week a new state-of-the-art system is developed that breaks boundaries. Yet there is one industry that seems to still elude the most talented data scientists, the financial markets. But how hard can it really be to predict stock markets and how many hidden breakthroughs have there been in this field?

Algorithmic Stock Trading

Algorithmic trading is defined by Investopedia as the process of using computers, programmed to follow a defined set of instructions for placing a trade in order to generate profits at a speed and frequency that is impossible for a human trader. While this is the most generic and true definition of Algorithmic trading, I can’t help but read this and feel it is antiquated.

Algorithmic trading came about because it offers the possibility to automate trading, decrease human error and trade at incredible speeds. It has been around for decades in many forms, the most common being High Frequency Trading where a computer will be able to see when an order is being placed by a consumer, and manage to place an order at fractions of a second before the consumer order to take advantage of the cheaper price. (read Flash Boys by Michael Lewis to discover how everyday people are being ripped off, incredible read).

However, now that we are in 2017, I’d like to propose an alternative definition for algorithmic trading which is centered around my belief of where it is heading in the next 10 years. Algorithmic trading, for me, is the leveraging of computing power and AI to learn complex systems, and make high frequency intelligent trading decisions. I see algorithmic trading as more around capitalizing on computing power and AI to make smarter decisions than humans could make. I’d like to point out the example of Renaissance Technologies.

Renaissance Technologies was the birth-child of James Simons, an award-winning mathematician and a code-breaker during the Cold War. Simons founded Renaissance in 1985 as a hedge-fund that operates fully using quantitative analysis and statistical methods to detect correlations in the market and profit from them. Their most famous achievement is the Medallion Fund, a fund that is mostly run for the employees of the firm, and has seen annualized returns of more than 35% over a 20 year span. From 1994 to 2014, it averaged annual returns of 71.8%! To put that into perspective, if one had invested $1,000 at the time the fund was created, that investment would now be worth a meer $13,830,598. Moreover, the figure that shocked me the most was that in 2008, when the economic crisis struck, the Medallion fund hit 98.2% returns that year, meaning that their model was not only crisis-proof, but it also managed to turn a devastating crash in the markets into an extremely profitable affair. The details of this fund have been kept secret, seeing as the fund only operates for employees of Renaissance. But one wonders what kind of beastly mathematical model they have figured out and how automated the system is. No doubt the people they have working for them are incredibly bright and able (some of them developed the first machine translation models at IBM such as Robert Mercer).

Numerai: Leveraging Data Scientists

My inspiration for this article came when I read about a new hedge fund that was getting a lot of attention in the machine learning and AI spheres, Numerai. Breakthroughs in machine learning have seen machines perform better in tasks such as translation, image classification, and general predictive models are becoming scarily accurate. Thus, it is natural that advances in machine learning are applied to hedge funds. The natural course of action would be to hire a bunch of data scientists to create a state-of-the-art machine learning model to outperform the benchmark. However, what impressed me about Numerai was they do not hire data scientists, they crowdsource data scientists. Their structure is reminiscent to a Kaggle competition, where data scientists download data provided to them and submit the best models, often for cash prizes. Numerai makes financial data available to data scientists (with an amazing tweak mentioned in section below) and data scientists can submit their best models where the monthly leaders will earn money for their efforts. There are two things that make Numerai incredibly interesting:

1. Homomorphic Encryption

Homomorphic Encryption has been a hot topic in the data science community in recent years, especially with Apple making a big deal about keeping iPhone data private. Financial data is sensitive in its nature, due to the fact that every hedge fund and bank considers the data they collect as a huge asset and so are reluctant to hand it out to the machine learning community for analysis. Furthermore, even if they were to encrypt this data so users could not figure out the ground truth of the information, the encryption causes the models to not be transferable into use with the real data, in unencrypted form. That is, until homomorphic encryption arrived. Homomorphic encryption allows for the sensitive data to be encrypted while the underlying STRUCTURE of the data is kept constant. This makes a machine learning model trained on this data transferable. This allows Numerai to do what they are doing, and the technique has many more applications for example in cases where one would want to protect sensitive personal data while still getting useful insights on the population as a whole. The co-founder of Numerai posted an article  on Medium explaining this phenomenon very well.

2. Ensemble Models

Imagine if I gave you three models that perform at around 70% accuracy, each giving slightly different predictions. The classical approach would be to take one of these models and try to improve it through better feature engineering or other classical methods. However, mathematically, it turns out that if you look at the average predictions of these models and take the majority result, then these predictions will provide an accuracy of around 5% more. This is the art of ENSEMBLE models. Every month, thousands of data scientists submit their best crack at creating models to Numerai. Now, should Numerai just take the best model of them all and be content with that performance? Absolutely not. Numerai indeed use an ensemble of the best models to achieve even better performance than any of the models individually. Wonderful, right?

In my opinion, the future of AI in finance looks a lot like what Numerai is doing. Homomorphic encryption will allow big banks and hedge funds to tap into the massive open community of data scientists without releasing any of their sensitive data. I believe it is only through collaboration in the AI and machine learning sphere that real breakthroughs will be made. The finance industry must be ready to open its doors to this wonderful community of geeks.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s