Stop thinking like a human!

I’ve been working some cross sectional equity models which are outside my usual domain of macro assets and credit. Rather than build a feature set comprised of standard technical indicators (think TA-Lib) and fundamental data (earnings etc) I decided to build some indicators from first principles, and forget about fundamental data. So what did I find out:

  • Standard technical indicators are great if you care about drawing charts that make sense to humans. They are fundamentally expositional methods. They do have value predictive value, but it optimised for the human visual cortex!
  • If you start from first principals and ask yourself: “How can I create a vector that represents market factors in an informationally optimised format?”, you get very different indicators. They do not look pretty, and are difficult to makes sense of with the naked eye. They are not necessarily complex, but you will not see these data structures in any equites blog post, or research report. Bloomberg has no function for them because they don’t look nice and you can’t even give them a snappy name.
  • I’ve found that developing indicators in this way gives significantly better results on stock picking algos than using standard indicators and with a much smaller feature set (which means you can cross validate, optimise etc. in much shorter time, especially on SVMs, ANNs).
  • This is not that same as feature reduction via PCA or some other method, it is about choosing a different starting point for your feature generation.

So whats the takeaway from all of this?

When generating features for any machine learning problem, think like a machine, not a human. Don’t worry about explaining features to humans, think only about parsimoniously filling bandwidth across as few features as possible. Your pipeline might only make sense to a mathematician at certain points, but believe me, it will produce much more effective results.

The above may seem like like a statement of the obvious to people coming fresh to machine learning from a maths / stats background, but for long term domain experts, especially in finance, there is a deep culture of thinking about data and time series problems in certain ways. It’s important to see these ways of thinking as an artefact from the ‘expositional research’ paradigm and move beyond them.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s