Fitness Landscape Analysis for Computational Finance
Some of the most interesting new research coming out of the Computational Intelligence Research Group (CIRG), which is applicable to numerous computational finance and machine learning optimization problems, is the development of fitness landscape analysis techniques. Fitness landscape analysis aims to characterize high dimensional fitness landscapes associated with optimization problems. This is being done so that researchers can make more informed decision regarding the selection of appropriate optimization algorithms.
This article discusses optimization problems and explains how they can become quite complex, it introduces some of the fitness landscape analysis techniques presented in the journal article, A survey of techniques for characterizing fitness landscapes and some possible ways forward by Dr Katherine Malan and Professor Andries Engelbrecht, and ends off by discussing why these techniques are relevant to computational finance. This article simplifies and (probably) does an injustice to the original, so please read the original if you find this interesting.
When taken to its logical conclusion, fitness landscape analysis could enable researchers to preemptively determine which classes of optimization algorithms might perform better on the problem at hand and possibly even preemptively determine what parameter configurations for a specific algorithms will perform the best - Key Takeaway
Optimization problems consist of an objective function which represents some quantity being optimized, a set of unknowns or variables which influence the value of the objective function, and possibly a set of constraints which restrict the values which may be assigned to the unknown variables. Optimization algorithms are search methods which find feasible values for the unknowns which maximize or minimize the objective function.
An example of an optimization problem is portfolio optimization. The objective function quantifies how good the portfolio is (fitness). Popular objective functions are measures of risk-adjusted return such as the Sharpe ratio. The unknown variables are weights which represent what proportion of the capital is allocated to each asset. The constraints could include that all of the capital is allocated (equality constraint), maximum and minimum weights for each asset (boundary constraints), and a minimum number of non-zero weights (cardinality constraint).
In addition to the shape of the objective function, which will be discussed shortly, there are many other things which make optimization problems more complex and more difficult to solve. These include the number of variables (the dimensionality of the problem), the types of variables (continuous vs. discrete), the number and types of constraints used, and the number of optimization criteria also known as multiobjective optimization.
What I am trying to say is that whilst a real world optimization problem may be trivial in its simplest form once it is made more realistic it can quite easily become non-trivial. Keeping with the portfolio optimization example, the problem becomes trickier when asset classes and individual assets are assigned boundary constraints, linear and non-linear transaction costs are included in the objective function, quadratic VaR constraints are included, and additional objectives or constraints relating to risk factors such as liquidity and credit are included.
Fitness landscape analysis defines the characteristics of fitness landscapes which make then more difficult to solve. It also provides us with useful metrics which may indicate whether or not an optimization problem exhibits those characteristics. Taken to its logical conclusion, fitness landscape analysis could enable researchers to preemptively determine which classes of optimization algorithms might perform better on the problem at hand and possibly even preemptively determine what parameters configurations for a specific algorithms will perform the best. That having been said, this goal is still a very long way away and much research still needs to happen.
Complexity and Fitness Landscape Analysis
The "shape" of the objective function has a large impact on the performance of different optimization algorithms. In their paper my supervisor identified eleven characteristics of objective functions which make them more or less difficult to solve. These characteristics included the degree of variable inter-dependency, noise, fitness distribution, fitness distribution in search space, the landscape structure of optima (modality), information to guide search and deception, global landscape structure (funnels), ruggedness, neutrality, symmetry, and searchability.
These characteristics are not independent so an increase in one characteristic may result in a change in one of the others. For example, increased noise will also increase the modality and ruggedness of the fitness landscape. Other researchers have argued that fitness landscapes can be fully explained by just ruggedness, smoothness, and neutrality. Whilst this may be true, considering problems from a different view-point can also shed some light.
The rest of this section uses a lot of benchmark optimization problems to explain these characteristics. Whilst benchmark problems are certainly useful for comparing algorithms I personally prefer not to use them because I think they don't mean anything and are arguably over-complicated. That having been said at the end of this article I will show that some real world optimization problems can also exhibit difficult characteristics.
Credit for many of the cool images for the benchmark functions goes to DEAP (Distributed Evolutionary Algorithms Package) for Python some of the others came from Wikipedia or Wolfram Alpha.
1. Degree of variable inter-dependency (including epistasis)
In genetics, epistasis refers to the degree of dependency between the individual genes (variables) in a chromosome (solutions are usually encoded as vectors). If the variables contribute independently to the fitness of the solution then the system has a low epistasis. On the other hand if the value of one variable affects other variables then the system has a high epistasis. This is also known as the non-linear separability of the variables.
When variables in an optimization problem are dependent on one another then it is impossible to find the optimal value of each variable independently of the others. Portfolio optimization is an example of a high-epistasis problem because any increase in an asset's weight must be offset by an equivalent decrease in one or more other securities weights. Some researchers believe that epistasis influences the difficulty of the optimization problem.
Techniques for determining epistasis include epistasis variance, the site-wise optimization measure, and bit-wise variance. Epistasis variance works by constructing a simple linear regression to approximate the fitness value given the solution vector. The difference between the output from the regression and the real fitness function acts a measure of non-linearity in the problem which may indicate non-linear separability and epistasis.
Noise is the presence of randomness in the objective function. In computational finance noise can easily be introduced when an objective function makes use of simulation methods and the random number generator is not seeded with the same seed for each objective function call. Large amounts of noise can 'confuse' gradient based and heuristic optimization algorithms whereas small amounts of noise has been shown to actually assist optimization algorithms. Evolutionary algorithms are widely believed to perform well in noisy environments.
The two images below show a simple quadratic fitness landscape without and with noise,
One approach to dealing with noisy fitness landscapes is to resample the objective function multiple times and take the average of the results. If the randomness in the objective function is Gaussian noise meaning that it has a mean of zero, then an average will be unbiased. Similarly showing that the standard-deviation of calls made to an objective function for a given set of inputs is non-zero indicates the presence of noise.
3. Fitness Distribution
A statistical analysis of the distribution of fitness values can be used to better understand the problem at hand. The biggest challenge with determining fitness distributions is that fitness values often needs to be sampled which can result in a sampling statistical bias. A useful fitness landscape analysis technique which uses fitness distributions to ascertain information about the optimization problem is the density of states technique.
The density of states techniques uses the Boltzmann sampling method to construct a probability density function of fitness values across the fitness landscape. This distribution can be used to classify optimization problems. According to the original paper, the technique can also be used to approximately determine the optimal fitness value for an unsolved problem. This is particularly useful for determining the effectiveness of different optimization algorithms. Additionally, characteristics of the probability density function can correlate to the characteristics of the fitness landscape meaning that it is useful for reasoning about the landscape.
For example, maximization problems which demonstrate a fast decay in the density of states (probability density functions with very high kurtosis) may indicate that there will be a fast decay in the probability of finding a better solution in the fitness landscape. This implies that the problem will be harder to solve.
4. Fitness Distribution in Search Space
Another method for characterizing optimization problems would be to measure how the fitness values are distributed across the search space. This differs from the fitness distribution technique in that the topology of the fitness landscape is preserved. In my opinion an obvious technique for doing this would be to try and use a Self Organizing Map (SOM). For binary representation problems the Hamming Distance In level (HDIL) and Hamming Distance Between a Level (HBDL) metrics can be used. For more information see this paper.
5. Modality and the landscape structure of optima
Unimodal fitness landscapes are ones for which only one local optima exists which is also the global optimum (such as the quadratic function). Multimodal fitness landscapes are ones for which many local optima exist one or more of which may be the global optimum. Local search algorithms are not able to find global minimum because of a lack of information to direct search outside of the local minimum. In other words, they can get trapped.
In addition to the number of optima the shape of the optima i.e. their depth and distribution can contribute to the difficulty of the fitness landscape for optimization algorithms. Fitness landscapes with a relatively small number of basins of attraction are called isolated. An example of a unimodal function is the Sphere benchmark function and two example of isolated multimodal functions (Needle-in-a-Haystack problems) are the Shekel and H1 Functions. For more information about benchmark functions click here, here, here, here, or here.
Techniques for determining the modality of the fitness landscape include this technique for calculating the number and distribution of local minima, the escape rate measure for measuring the sizes of basins of local optima for combinatorial optimization problems, and this technique for compressing the fitness landscape into a graph called a local optima network. The aforementioned local optima network serves as a characterization of a fitness landscape and the distribution of local optima in that fitness landscape. A simple example is shown below,
6. Information to guide search and deception
Some optimization problems produce a fitness landscapes which are structured in such a way that the solution is guided more easily towards the global optima. In order for optimization algorithms to perform well the fitness landscape should ideally provide high quality and quantity information to guide the search in the right direction. Deception is the presence of misleading information which may guide the algorithm away from the global optima.
The location and density of local optima in relation to the global optima impacts on the deceptiveness of the fitness landscape. Isolated multi-modal fitness landscapes such as the Shekel function (shown previously) are quite deceptive. That having been said the exact definition of deceptiveness will depend on the optimization algorithm being used and the heuristics or meta-heuristics it uses. What is deceptive to the Stochastic Gradient Descent Algorithm may not be deceptive to a Genetic Algorithm or a Particle Swarm Optimizer and vice versa.
7. Global Landscape Structure (Funnels)
A funnel is a global basin shape consisting of clustered local optima. Fitness landscapes can have singular or multiple funnels. Single and multiple funnel landscapes can both exhibit multiple modality. Examples of single funnel fitness landscape are the Rosenbrock function (unimodal), and Rastrigin function (multimodal with one funnel structure). An example of a multiple funnel, multimodal fitness landscape is the Schwefel function.
Funnels may represent a problem for algorithms which rely solely on local information because they may become trapped in sub-optimal funnels. A useful technique for estimating the presence of funnels in a fitness landscape is the dispersion metric. The dispersion metric requires that a distance metric such as Euclidean, Manhattan, or Minkowski distance exist between different solutions. Assuming this is true the metric works as follows:
"Given two sample points below the fitness threshold: if a decrease in the threshold results in an increase in the dispersion of the points from the sample that are below the threshold, then this would indicate the presence of multiple funnels in the landscape. Dispersion is simply the average pairwise distance between the solutions in the sample. The dispersion metric is equal to the dispersion of a subset of the fittest points from a sample minus the dispersion of a sample of other solutions." - K. Malan. For more information on this metric see this paper.
8. Ruggedness and Smoothness
Ruggedness refers to the number and distribution of local optima. If the fitness of relatively close solutions differ considerably this is probably due to a very large number and high density of local optima meaning that the fitness landscape is considered to be quite rugged. Generally speaking all optimization algorithms struggle with rugged fitness landscapes such as Griewank and Rastrigin functions (shown previously). One model for thinking about ruggedness is the NK Model. A challenging but smooth fitness landscape is the Himmelblau function.
A random walk (also called a stochastic process) is a path that consists of a succession of random steps of a pre-specific step size. We can estimate the ruggedness of a fitness landscape by constructing multiple random walks with different step sizes along the fitness landscape and sampling the fitness at each successive point. If the fitness landscape was smooth the fitness values would change relatively slowly whereas if the fitness landscape was rugged it would change relatively quickly. This behaviour can be measured using an autocorrelation function. For more information see Correlated and Uncorrelated Fitness Landscapes and how to tell the difference.
One challenge with this approach to determining ruggedness is that it depends a lot on the ability of the random walk to cover the search space. In other words, the random walk should ideally be representative of the fitness landscape. Simple unbiased random walks such as Brownian Motion do not satisfy this requirement so it it often favourable to use a progressive random walk instead. For more information see A progressive random walk algorithm for sampling continuous fitness landscapes. A comparison in 2D taken from this paper is shown below,
Alternative approaches to characterizing ruggedness are the first and second entropic measures of random walks through the fitness landscape and the amplitude spectra metric which is based on Fourier analysis.
Neutrality exists when successive neighboring solutions in a fitness landscape has equal or near equal fitness values. Areas of neutrality often appear as plateaus or ridges in the fitness landscape. Neutrality is a challenge for optimization algorithms because moving through a neutral portfolio of the fitness landscape can be misinterpreted as a local optima. Additionally, the gradient on flat surfaces is zero meaning that gradient-based algorithms could get struck in a sub-optimal area of the fitness landscape. Three benchmark functions which exhibit areas of neutrality are the Egg holder, Schaffer, and Easom benchmark functions.
Two techniques for identifying neutrality in fitness landscapes are neutral walks and neutral network analysis. A neutral walk starts in one location in the search space and generates all neural neighbours of that location for which the distance from the stating point increases. This process is repeated until there are no neutral neighbours which rules in the total distance from the starting location increasing. The degree of neutrality is measured by the number of steps in this random walk (for more information see this paper). Neutral network analysis only works for discrete fitness landscapes; it works by generating the set of all neutral networks. A neutral network is sub-set of solutions which have equal fitness values as measured by either the average neutrality ratio, average fitness gain, or non-improvable solution metrics (for more information see this paper).
Symmetry in a fitness landscape is when there are many points with the same fitness value. this leads to large sets of equivalence classes. Two types of symmetry are when the fitness landscape is symmetrical with respect to a dimension (axial bias) and when the fitness landscape is symmetrical with respect to any point a fixed distance from an optima. Some studies have shown that symmetry can cause a deterioration in performance of genetic algorithms possibly because the combination of two equally fit solutions results in inferior children solutions.
Last, but not least, searchability is simply the ability of an optimization algorithm to move towards an area of better fitness. The searchability depends largely on the optimization algorithm being used. In other words some fitness landscapes may be searchable by some optimization algorithms but not by others. This phenomenon has given rise to the famous 'No-Free-Lunch' theorems for search and optimization by Wolpert and Macready. These theorems argue that no one optimization algorithm is at all times superior to another. For example, whilst gradient descent may work well for training convex, continuous fitness landscapes (such as those found in Neural Networks) it would perform poorly on any fitness landscape which contained ridges, neutral areas, or multiple optima.
Applications to Computational Finance
The boring definition of computational finance that you will find in the dictionary is, "a branch of applied computer science which deals with problems of practical interest to finance". I personally dislike this definition because it doesn't tell you what computational finance is about so I prefer to tell people that computational finance involves the design, development, testing, and implementation of software which realizes quantitative financial models.
Despite the fact that fitness landscape analysis has been around for decades it is hardly ever applied in practice. Personally, I think that this may just be because there aren't any nice open source packages which make the techniques accessible to the general public ... which is really a shame considering that these techniques would be useful to related fields including, but not limited to, operations research, machine learning, and data science.
Robust Portfolio Optimization
My Masters research aims to determine the constraints and conditions under which portfolio optimization becomes sufficiently complex enough to warrant the use of computational intelligence optimization algorithms such as the Particle Swarm Optimization algorithm and Genetic Algorithms over tried-and-tested mathematical optimization techniques such as Levenberg-Marquardt and Sequential Least Squares Programming (SLSQP).
This research involves using fitness landscape analysis to investigate the impact of factors including, but not limited to, real world constraints such as liquidity, credit, and VaR, multiple objectives, fixed and floating rate transaction costs, simulation methods, additional risk factors such as foreign exchange, and the choice of objective function when using risk-adjusted return, on the difficulty of the robust portfolio optimization problem. This information should then allow me to comment on the suitability of more traditional mathematical optimization algorithms.
Stochastic Process Calibration
Another computational finance optimization problem which I recently encountered and believe to be rather challenging from a fitness landscape perspective is the calibration of complex stochastic processes such as the Merton Jump Diffusion Process. This was the Masters research topic of my friend and fellow quant, Wilson Mongwe. He was nice enough to share the maximum likelihood objective function for this optimization which he derived during his research ... I won't pretend to understand all of it but I can say confidently that it is complex,
This model has five parameters which are non-separable namely, sigma (volatility of returns), sigma jumps (volatility of jumps), mu (average daily return), mu jumps (average jump size), and lambda (instantaneous jump probability). The problem of model calibration is to find the optimal values for these parameters such that the returns generated by the model resemble the returns seen in the real world. In a way this is a kind of financial Turing test; if we can distinguish between the returns generated by the model and the market, the model fails.
To find the optimal parameters we would use an optimization algorithm or rather a whole bunch of optimization algorithms hoping one works. But we have no way of knowing what this characteristics this function exhibits in high dimensional space. To illustrate this it most probably demonstrates many of the challenging characteristics discussed in this article I have plotted the function four times with three of the parameters fixed each time,
Now whilst these fitness landscapes may not look as complex as some of the benchmark functions keep in mind that each image only represents three of the total five dimensions and that they are only summed from 1 through to 5 rather than infinity as specified in the model. To understand what the characteristics of this fitness landscape is we would need to apply the fitness landscape analysis techniques discussed in this article in all 5 dimensions.
The real benefit of fitness landscape analysis is that it could allow researchers to avoid the time-consuming process of searching for optimization algorithms and their parameter configurations which work via trial and error.
Fitness landscape analysis defines the characteristics of fitness landscapes which make then more difficult to solve and provides researchers with useful metrics which may indicate whether or not an optimization problem exhibits those characteristics. This information could be used by researchers in many fields including operations research, computational finance, machine learning, and data science to preempt which algorithms and which parameters for those algorithms would perform well on the optimization problem. I imagine that such relationships between the characteristics of the fitness landscape and the performance of any particular algorithm given a set of parameter values could be learnt by other machine learning models such as artificial neural networks.
The real benefit of fitness landscape analysis is that it allows researchers to side-step the time-consuming trial-and-error approach to optimization problems and focus instead on the optimization problem itself. That having been said, despite the fact that fitness landscape analysis has existed for many decades it is not widely used in practice. This state of affairs may be attributed to the lack of open source packages for fitness landscape analysis.
Epic charting Stuart! What tool did you make use of?
Interesting post...unfortunately I'm a bit missing a practical implentation to portfolio optimization. have you any reference?
May I use a picture or two for a book that I'm writing: particular the first picture in the section, 5. Modality and the landscape structure of optima?
I really enjoyed reading your article. It is well-written and easy to understand for a beginner like me.
If you could provide codes for fitness landscape analysis measures, that would really help me in my work. Thank you so much.