Using Genetic Programming to evolve Trading Strategies
A friend and I recently worked together on a research assignment where we successfully used Genetic Programming (GP) to evolve solutions to a real world financial classification problem. This problem, called security analysis, involves determining which securities ought to be bought in order to realize a good return on investment in the future. To find a solution to this problem we used Genetic Programming to evolve a population of decision trees that could perform security analysis on sixty two of the technology stocks listed on the S&P 500. That is, we evolved decision trees capable of classifying those stocks according to whether they should be bought or sold short.
Security Analysis Decision Trees
During the study we evolved two types of security analysis decision trees. The first used only indicators from fundamental analysis and the second used only indicators from technical analysis. Fundamental analysis is a method of evaluating a security to measure its intrinsic value by examining related economic, financial and other qualitative and quantitative factors. Technical analysis is a method of evaluating securities by analyzing statistics generated by market activity.
A strategy for security analysis, regardless of whether it uses technical or fundamental indicators, will consist of a number of rules for making investment decisions. That strategy can be represented as a decision tree where the terminal nodes represent investment decisions and the functional nodes represent rules based either on technical or fundamental indicators. Due to this fact, many existing investment strategies are represented in the form of decision trees.
In total forty two different indicators were selected and used from both Technical analysis and Fundamental analysis. The evolved strategies were for a fixed holding period either three months, six months, nine months or twelve months long. The decision trees were then back-tested using market data from 2011 to 2013.
Genetic Programming
Genetic Programming is a specialization of a Genetic Algorithm. Genetic Algorithms are population based, meaning that they operate within a population consisting of many different individuals. Each individual is represented by a unique genotype (usually encoded as a vector). Genetic Algorithms model the process of genetic evolution through a number of operators including the selection operator which models survival of the fittest, the crossover operator which models sexual reproduction and the mutation operator which models the genetic mutations that occur randomly to individuals in a population. These operators, when combined, produce what computer scientists refer to as a Genetic Algorithm.
The difference between a Genetic Algorithm and the Genetic Programming Algorithm is the way in which individual genotypes are represented. In Genetic Algorithms genotypes are represented either as Strings or as Vectors whereas in Genetic Programming these genotypes are represented using tree data structures. The crossover operation on tree structures can happen in a few ways, either a sub-tree is swapped out, a leaf node is remove or changed, or the values of some node are adjusted. An illustration of this is shown below,

This diagram depicts the crossover strategy of a decision tree used by genetic programming for security analysis
After this study we concluded that Genetic Programming has great potential to evolve new strategies for security analysis and investment management provided that better functions for calculating fitness can be derived. Throughout our research study we saw that decision trees evolved using Genetic Programming were able to produce stock classifications that beat the average market return consistently over all four quarters. This is true for decision trees that used technical indicators as well as decision trees that used fundamental indicators. A number of other conclusions were derived from our research including the optimal sizes and level of heterogeneity for the decision trees and the value added by the different indicators and the performance of the strategies relative to one another. Some results are included below.
- Genetic Programming in Finance project (2013)
- Performance of security analysis decision trees vs. the market.
- Relationship between the size of the decision tree to the fitness
- Most popular indicators used in the final decision tree
- Average tree sizes per iteration
- Example Security Analysis Decision Tree
- Example Security Analysis Decision Tree
Conclusion
Two independent research reports were produced by myself and my friend. Both reports go into much more detail about our research study, the approach taken, our design and implementation, the testing strategies we used, our conclusions and recommendations for further research. You can also download a copy of the source code created during the implementation. For my colleagues more technical account of the project please click here.
-
[Comment copied from LinkedIn Computational Finance Group]
Very nice work. The write up is gorgeous as well.I only had a chance to glance at the report. Some statistics that would be good to look at: how does your GA portfolio compare to portfolios of the same assets. I'd look at two comparison portfolios: an equal weighted portfolio and an S&P style portfolio that is weighted by market capitalization.
As it turns out, it can be surprisingly hard to beat an equally weighted portfolio. Rebalance the portfolios quarterly, since some stocks will go up and some will go down (e.g., you want to keep the portfolio weights equal, as prices change). If your genetic algorithm beats these portfolios then you have "alpha" (excess return over the benchmark).
Of course alpha is not everything. You should look at the Expected Tail Loss (ETL) (also known as CVaR, Expected Shortfall) for both the GA portfolio and the "benchmark". If you have less risk for the same return then you can consider that you beat the benchmark. The ETL measure is a better measure than the Sharpe ratio when it comes to risk, since the Sharpe ratio measures variance, which is two sided. ETL only measures loss.
An observation: a problem with GA and neural nets (NN) is that they are black boxes. It is difficult to determine why they make the "choices" they do. So imagine you're a portfolio manager. Your GA or NN starts performing poorly. What steps can you take to address this? The problem is, all you can really do is retrain and you don't know if retraining will do better. Of course with a decision tree its not quite so bad, since at least you know what decisions it made. The problem is, if you're constantly tweeking it to make the "right" decisions then you've got a problem too.
These issues are reasons that you don't see these algorithms used that much (although they are used).
-
[Comment copied from LinkedIn Computational Finance Group]
I think the most reasonable approach for backtesting is to compare your results to what happens with random trading that still obeys whatever constraints you are imposing on the portfolio. This is discussed at:
http://www.portfolioprobe.com/2010/11/05/backtesting-almost-wordless/ -
[Response copied from LinkedIn Computational Finance Group]
Thanks Patrick, that is a good suggestion. I understand the approach because the concept of backtesting an algorithm against a random trading strategy is conceptually similar to testing a search algorithm against random search. Which is something I have done before. How popular would you say that backtesting strategy is? -
[Response copied from LinkedIn Computational Finance Group]
Stuart: Depressingly unpopular. But it has to start somewhere. -
[Comment copied from LinkedIn Computational Finance Group]
Looks good, what is the reason of using the 62 technology stocks, not 500 stocks? -
[Response copies from LinkedIn Computational Finance Group]
Hi JZ, that is a good question and I'm glad you asked it. We debated our approach and an external opinion would be much appreciated. We limited our test sample to just one sector because of two reasons:1) We believe that decision trees using Fundamental Indicators could vary drastically between different industries. This is because financial ratio's can vary between different industries and we thought that an investor using this approach would want to evolve decision trees for each sector independently.** and
2) We were only given three weeks to complete the assignment and we were concerned that adding more stocks would be too time consuming. This turned out to be an unfounded concern since our implementation could easily handle all of the 500 stocks on the S&P500 without any significant performance issues.
** Note: this doesn't apply to decision trees using Technical Analysis indicators
-
Extremely interesting. Well done Stuart.
-
It's an interesting exercise, but I don't see what advantage does GP have against simply training whole Decision Tree using some impurity measure. It seems it does the same just very inefficiently and probably with less accuracy too.
-
Hi Stewart,
I am trying re implement the GA in python. What are some python libraries you would recommend.
Thanks
-
Stuart,
Have you tried to trade your system live?
Lawrence
Comments