Clustering using Ant Colony Optimization
Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/customer/www/turingfinance.com/public_html/wp-content/plugins/latex/latex.php on line 47
For many years entomologists have studied the behaviour of ant colonies and marveled at their ability to solve complex problems collectively. An example of this collective intelligence observed by entomologists is that ants leaving their colony will often follow very efficient routes between their colony and surrounding food sources. Another example is that ants form mounds of dead ants which distract predators from their colony.
Entomologists study ants to understand how they are able to perform such intelligent tasks. Computer scientists then use this information to derive algorithms which mimic the ants in order to replicate their intelligence. Artificial intelligence algorithms which mimic intelligent behaviour observed in nature are called nature-inspired algorithms. Ant colony optimization algorithms represent an interesting subset of nature-inspired algorithms.
Two popular ant algorithms are inspired by the two examples mentioned previously. The first algorithm is used to solve the classical shortest path problem and the second algorithm is used to perform dynamic clustering on large data sets. Both of these problems' complexity grows exponentially as the number of dimensions increase. Such problems are called NP-complete problems and cannot be solved analytically or by brute force.
Single Class Clustering
When ants form mounds of dead ants, also known as cemeteries, around their colony they are performing a simple form of clustering. Clustering is the problem of forming groups of similar items, in this example there is just one type or call of items namely the dead ants. Each living ant helps the colony to form clusters of these dead ants by autonomously following the very simple two-step algorithm described below,
Step One ~ While walking around, if you encounter a dead ant and it is surrounded by many other dead ants, do nothing, otherwise, if it is not surrounded by many other dead ants, pick it up.
Step Two ~ While waling around, if you are carrying a dead ant for each step you take check if you are surrounded by many other dead ants, if so then drop the dead ant here, otherwise, do nothing.
If you have many ants, essentially they will be walking around and either picking up ants, dropping ants, or doing nothing. Intuitively it makes sense that the ants would slowly start to form cemeteries of dead ants to lure away potential predators. This is called an emergent property. Emergence is when order (also known as a pattern) arises from the uncoordinated interactions of simpler entities that themselves do not exhibit such order.
Here is a video which demonstrates this emergent property,
The best thing about this algorithm is that the ants act autonomously meaning that no central control of the ants is required. This implies that the algorithm, if implemented using concurrent programming, is highly scalable and is therefore quite relevant to large data-sets. Notice that in the video there are two classes of items being clustered, red items and blue items. The simple algorithm mentioned previously can easily be extended multiple classes.
Multiple Class Clustering
Clustering is only really useful when a given data set contains of two or more classes of input patterns and you want to partition it into subsets. In order to do this you need a way of distinguishing patterns from one another. One approach is to use distance metrics, which measure the distance between any two patterns.
For continuous valued patterns Manhattan, Euclidean, and Minkowski distances work well. Another approach for measuring the distance between continuous valued patterns is to measure their correlation, cosine similarity is an example of this. For discrete valued patterns Hamming Distance and the Jaccard Coefficient are popular choices.
Minkowski distance is a generalized distance metric which can be equivalent to Manhattan or Euclidean distance. Given two patterns (vectors) and , the Minkowski distance of order between and is given by,
is equivalent to the Manhattan distance between and when ; the Euclidean distance between and when ; and the Chebyshev distance between and in the limiting case where . For some more information on distance metrics please see the related post, Monte Carlo K-Means Clustering of Countries.
This concept is incorporated into the algorithm by specifying that the ants should pick up an object if it is surrounded by dissimilar objects and drop an item if it is surrounded by similar items. The challenge now becomes determining how large the neighbourhood should be for any ant, and specifying the pick up and drop rules probabilistically so that the algorithm is stochastic which allows it to break down sub-optimal clusters.
Ant Colony Optimization
The ant colony optimization algorithm is defined by the pick up and drop off rules followed by the ants. Given a point in space these rules look at the surrounding points and determine the average similarity of the surrounding patterns either to the pattern at that point or to the pattern being carried by the ant. The number of surrounding points which are looked at is called the neighbourhood function. A neighbourhood function could be as complex as a copula or as simple as a radius which specifies the number of points in every direction to consider.
In the image above we see how the simple radius-based neighbourhood function works. Starting at the location of the ant every point on the grid within an radius is looked at. For each block which contains an item, the similarity between either the item at that point on the grid, or the item the ant is carrying, is calculated. These are summed up and divided by the total number of points looked at which, for a grid, is given by . This can be done using a simple nested-for loop as shown below. Note that this could be done more quickly in parallel.
This essentially calculates the average dissimilarity of the item at that point on the grid (item being carried) to the surrounding items. If this is high, then the item should be picked up (not dropped); if this is low then the item should be not picked up (dropped). Let the output from this function be denoted by for density. Observe that this term is being normalized in the code by the maximum average distance seen to date. This is done to ensure that when we calculate the probabilities in the drop and pick up rules is in the range .
The probability of dropping an ant, is,
The probability of dropping an ant, is,
The probability of picking up an ant, is,
The probability of picking up an ant, is,
where is a parameter for controlling convergence. The larger is, the faster the ants will converge on a solution, because the less likely it is to drop or pick up an ant for any given density. This is illustrated in the graphs below.
Now that we know how the ants will pick up and drop off items, and how we measure the similarity between those items, the only remaining component is defining how the ants move around. This process is quite simple and is defined by two parameters, the number of ants, , and the step size of each ant, . Initially the ants are uniformly initialized across the grid. From that point on their next location (xy coordinate) is their current location plus a two dimensional vector of random integers between and . The resulting coordinate is then modded by the height, , and width, , of the grid to generate a new location that is guaranteed to be on the grid.
Taking the modulo of the location essentially means that there are no edges on the 'grid' because it wraps around. This reduces the maximum possible distance between any two patterns to . The resulting shape (grid without edges on the top, bottom, or either side) is basically a three dimensional doughnut. The ants move around this shape randomly picking up and dropping items according to the pick up and drop probability rules.
Putting It Together
The final form of the algorithm ends up being something quite simple. This algorithm is consists of the steps presented below when given a set of un-ordered input patterns, ,
Initialize a grid of size where Randomly scatter patterns across the grid Initialize ants in random locations While stopping condition is not true ----For each ant : --------Move to new location --------If the location contains a pattern : ------------If the ant is not already carrying a pattern : ----------------Assign ----------------Assign ----------------If : --------------------Pick up the pattern --------Else if location contains no pattern: ------------If the ant is carrying a pattern: ----------------Assign ----------------Assign ----------------If : --------------------Drop the pattern Identify the clusters on the grid using a cluster finding algorithm
One major benefit of the algorithm is that it can be run indefinitely and new patterns can be added to or the patterns in can be updated over time and the algorithm will adapt to the changing conditions quite well. This is why I refer to this as a dynamic clustering algorithm. That having been said the algorithm does suffer from a number of shortcomings. The biggest shortcoming is the trade-off between run-time and large clusters as defined by . The larger is the larger the clusters will be and the less likely the ants are to produce multiple clusters of the same class, however, the larger is the more time-consuming the calculation of and therefore and become. However, this could be largely offset with a good parallel implementation of the algorithm.
Additionally this algorithm works only with memory-less ants i.e. ants which move around randomly. This algorithm could be improved through the implementation of memory which would essentially bias the direction of the ants towards a spot on the grid in their memory where they saw a similar pattern in the past.
Python Implementation and Applications
One potential application of dynamic clustering in finance is to automated portfolio construction. One challenge with constructing portfolios is picking investments which are dissimilar from one another so as to diversify the portfolio. One measure of this is correlation but correlation is subject to a number of fatal flaws.
An alternative approach taken by a computational investor might involve clustering the universe of available securities according to quantities for factors such as value, firm size, momentum, geographic location, etc. The wisdom here is that securities exposed to a similar array of factors should behave similarly and therefore exhibit stable high correlations over time. The investor would then need to select the best investments from each cluster.
Another benefit to using Ant Colony Optimization to do this is simply that the investor would not need to specify the number of clusters up-front as they would with other clustering techniques such as K-Means Clustering.
This implementation was used to produce the clustering video shown previously. This was done by generating images of the grid using Matplotlib's matshow feature and then merging these images into a video.