I’ve been working some cross sectional equity models which are outside my usual domain of macro assets and credit. Rather than build a feature set comprised of standard technical indicators (think TA-Lib) and fundamental data (earnings etc) I decided to build some indicators from first principles, and forget about fundamental data. So what did I find out:
- Standard technical indicators are great if you care about drawing charts that make sense to humans. They are fundamentally expositional methods. They do have predictive value, but it is optimised for the human visual cortex!
- If you start from first principals and ask yourself: “How can I create a vector that represents market factors in an informationally optimised format?”, you get very different indicators. They do not look pretty, and are difficult to makes sense of with the naked eye. They are not necessarily complex, but you will not see these data structures in any equites blog post, or research report. Bloomberg has no function for them because they don’t look nice and you can’t even give them a snappy name.
- I’ve found that developing indicators in this way gives significantly better results on stock picking algos than using standard indicators and with a much smaller feature set (which means you can cross validate, optimise etc. in much shorter time, especially on SVMs, ANNs).
- This is not that same as feature reduction via PCA or some other method, it is about choosing a different starting point for your feature generation.
So whats the takeaway from all of this?
When generating features for any machine learning problem, think like a machine, not a human. Don’t worry about explaining features to humans, think only about parsimoniously filling bandwidth across as few features as possible. Your pipeline might only make sense to a mathematician at certain points, but believe me, it will produce much more effective results.
The above may seem like like a statement of the obvious to people coming fresh to machine learning from a maths / stats background, but for long term domain experts, especially in finance, there is a deep culture of thinking about data and time series problems in certain ways. It’s important to see these ways of thinking as an artefact from the ‘expositional research’ paradigm and move beyond them.