In my work as a trader and portfolio manager I have built numerous trading strategies employing machine learning (ML) methods. My experience has been very much based on trial and error – trying to build practical solutions to real business problems and tackling challenges as they arise. I have an academic background in both physics and psychology, so I know something about numerical methods and and cognitive neuroscience, but I am not a computer scientist or a professional software engineer. So what follows is aimed at numerate ML enthusiasts with an interest in finance and perhaps some ML experts who are interested the practical aspects of tackling complex social forecasting problems – like predicting markets.
Why pair markets and machine learning you might ask? Well the obvious answer is “surely a good ML algo in the right market is a licence to print money!”. I can assure you it’s not quite as simple as that, but it is definitely a fun thing to try. I’m not going to be posting a lot of code or giving step by step instructions on how to build a trading system, but if you’ve got to 1st or even 2nd base in this space yourself, hopefully I can help you reflect and penetrate deeper into some of the issues (and if you’ve got to 3rd base please feel free to enlighten me!). However, I think the real value of blogging about my experiences with ML is that the problem of market forecasting presents a set of challenges that encompass the frontiers of what is possible with machine learning.
That’s a bold statement I hear you say! So let me explain. Right now, machine learning does some pretty cool stuff. It can drive cars, read your handwriting, understand your speech, recognise your face, translate speech, recommend a film or book to you, categorise your customer base, or even match you with your ideal partner. That’s all pretty amazing. We can divide these applications into two groups. The first are more or less stationary problems. Driving, writing, and speaking are cooperative social behaviours. They rely on predictable, consistent behaviour without which they would have no value. Applications of ML have to be accurate in these areas otherwise not only would they have no value, but they would be a liability (causing car crashes, incorrect translations etc). Fortunately there exists a plethora of data for these problems (or if not you can generate it yourself by driving around a car equipped with sensors) and the stationary nature of these applications means your ML models are robust and rarely make mistakes. I still think autonomous driving has some nontrivial challenges with respect to handling unexpected and local issues, but generally speaking this class of problem can be robustly solved with ML.
The second class of problem is a lot fuzzier. Netflix is quite good at recommending TV shows to me, but often it gets it completely wrong. It’s the same with Amazon and books, but both get it right often enough to be useful. No one is going to die if a recommender algo makes a mistake, and thats a good thing because they frequently do. These systems work in the realm of complex, dynamic, personal preferences. Accuracy in these applications is way below Google translate, but it doesn’t really matter if in a particular instance or even the majority of cases the algo is ‘wrong’, as long as there are a few solid hits. A secondary feature of this type of problem is that data can be a lot more scarce. A new subscriber to Netflix must rate numerous films during the ‘induction process’, and it takes a lot of purchases (and views) to build a useable recommender profile on Amazon.
So where do markets come into this? Like book preferences, markets are non-stationary social phenomena. Like history, markets rhyme, but don’t repeat. Why? – because unlike speaking and writing, they are competitive rather than cooperative behaviours. Market participants observe market history and adapt to compete. It’s like the changing nature of book preferences only on steroids.
To make things more difficult, market data is relatively sparse. Yes there are lot of nice stock charts out there, but the number of ‘bits’ of information available is pitiful compared to the sensor data of an autonomous car, or the sum total of digitised and translated written media. Let’s ignore the high frequency trading for now – that’s not a social activity.
Like driving and translating, the consequences of incorrect forecasting are punitive. You lose money when you are wrong in the markets. If Amazon flashes up a cookbook and you hate cooking, who cares? Yes, there’s a tiny opportunity cost for Amazon but they will survive.
So ML algo’s used for financial forecasting have to be (reasonably) accurate because they are subject to punitive consequences, yet they operate in a competitive and changeable social nexus with relatively limited data. To me, this sounds very similar to the problem of hard AI, just in a smaller ‘sandbox’.
In short, financial markets offer ML researchers a rare window into a microcosm of competitive human behaviour with the numerical data to describe it. If you are interested in people and like mathematics, financial markets have always been the place to go. Similarly, if you are interested in producing an algo that has to predict and adapt to human behaviour, financial markets offer a dataset that chronicle real time human strategic decision making as well as longer term social dynamics in terms optimism, pessimism, fear, and greed etc.
I truly believe ML is in the process of radically changing the world. In many ways it’s more important than the internet. I’m not talking about sentient robots or the ‘singularity’, but pretty soon, anything electronic will 1) be connected 2) have sensors and 3) be able to adjust its operation to its environment. Whether it’s your fridge ordering you more milk, or your car taking you to work and dropping the kids off at school in the most energy efficient way, ML algorithms will be making the decisions. On a grander scale, ML will manage electricity networks, water supply, farming, factories; you name it. ML is going to be huge, and to someone who likes coding, psychology, data and problems to solve with it, it’s impossible not to want to be involved in the broader conversation with others in this field.