The Zen of Machine Learning

In my last post I talked about building powerful feature sets by stepping outside of industry specific data conventions that are often historical artefacts (Stop thinking like a human!). It got me thinking more broadly about the psychology of feature selection.

The example I gave was the use of technical indicators in finance but you can read across to other domains. ‘Convention avoidance’ is one reason why the most successful quant hedge funds like to hire scientists and engineers unsullied by a formal training in finance.

In machine learning we often talk about model bias – sometimes in the context of overfitting but also when comparing different algos: “what type of results are we likely get using KNNs vs linear regression? “. But there is another set of biases that need to be overcome: behavioural biases that reside in the minds of machine learning engineers when designing and selecting features. These are some of them:

  • Data convention bias: What are the accepted metrics in your industry and why? Are they the result of convenience, traditional human data analytic methods (excel, charts), the way the world worked 20 years ago?
  • Contextual blindness: most data is produced in some broader context that humans subconsciously rely upon to make sense of it. In finance, a lot of time series data only makes sense when viewed in its entirety, in order and continuously. Any context free data point in meaningless. Are you selecting or designing features that rely upon subconscious context that is not available to your algo? I find this to be the most common problem in finance and presume it’s a problem in other industries to.
  • Bias towards complexity: complex domain specific metrics / indicators may be very appealing to humans, but often make horrendous machine learning features. They frequently consist of overlapping and confounding informational dimensions which can be very difficult for machine learning algorithms to do anything useful with. Simple, dimensionally parsimonious, bandwidth filling indicators are what is needed.
  • Data production bias: Who produces the data you use? What is the reason they produce it in the way they do? Do you simply accept it the way it is? If you dimensionally analyse the data, what dimensions are really relevant to your problem?
  • Human expositional bias: Is the data you are using the way it is so that is fits nicely on a chart or a human readable table? Are its dimensions easily digestible by the human brain? Can it be labelled nicely? Can someone sell it as a ‘value added’ product in this format?

If you manage to navigate these biases, you may end up with a set of indicators that look ugly to the non-practioner, but if you have found the Zen of machine learning, you will savour their parsimony and informational beauty. We all can’t practise Zen – so here are a few concrete methods sharpen you up – ask yourself:

  • Would my feature make sense to an alien who’s just landed on planet earth? Or would the alien need to learn about the history of woollen sock sales in Canada from 2001-2015 as well as what wool is, what socks are…. No data is context free, but informational contingencies need to be minimised.
  • How would I explain my feature to a five year old? There needs to be an ‘axis of simplicity’ somewhere in the feature. If a child can’t grasp a salient aspect of the feature, it’s doubtful a decision tree or batch gradient descent can.
  • What are my emotions when I think about the feature? Is the reason you have told yourself you are using a feature the real reason, or are you trying look / feel clever? Did you invent a ‘pet’ metric that you really want to see work because it appeals to you.

All of the above should of course be integrated with your usual pipeline, but the takeaway here is that initial feature selection and/or generation is often the critical factor in the successful implementation of machine learning algos. Finding excellent features involves thinking outside the box, especially in competitive industries.

People spend a lot of time focussing on algo selection, quantitative methods such as dimensionality reduction, and other technical aspects, but seasoned practitioners should have this all integrated into a model development platform.  The answer is often in stepping back from the problem and the social framework in which it resides with an awareness of your own psychological biases… perhaps in a Zen like state of mind.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s