Image Image Image Image Image Image Image Image Image Image

Turing Finance | July 16, 2024

Scroll to top



Lossless Compression Algorithms and Market Efficiency?

Lossless Compression Algorithms and Market Efficiency?

In Hacking The Random Walk Hypothesis we applied the NIST suite of cryptographic tests for randomness to binarized daily market returns. Overall the NIST suite failed on the data. This result was taken to mean that markets are not quite the "coin flipping competition" famously posited by some notable academics. That having been said, some of the tests in the suite did pass indicating some degree of noise in market returns.

One such test, the linear complexity test, caught my eye because it applies compression to test the randomness of a sequence. This idea - that randomness is related to compressibility - dates back to Claude Shannon's information theory. Which, through algorithmic information theory, is related to Alonzo Church and Alan Turing's computability theory. Being a computer scientist these topics fascinate me; almost more-so than the markets do.

In this article we will expand upon this line of thought by applying greedy lossless compression algorithms to binarized and de-trended market index returns and, spoiler alert, we will arrive at some unexpected conclusions. Following this article we shall continue our non-random walk down wall street by considering lossy compression algorithms (including auto-encoder neural networks) and then Lo's test for long term memory in stock markets


As mentioned in my last post, Stock Markets Do Not Follow Random Walks, the featured images for this series of articles are simply screenshots from some of my favourite Wall Street inspired films. This featured image is taken from the classic 1987 Wall Street movie and is, of course, portraying the infamous Gordon Gekko. 

Article Outline

The article is broken up into the following sub sections,

  1. The Efficient Market and Random Walk hypotheses.
  2. Algorithmic Information Theory and Randomness.
  3. Compression and Market Randomness; Challenges.
  4. Proposed Compression Test of Market Randomness.
  5. Overview of Lossless Compression Algorithms in R.
  6. Results obtained and their meaning(lessness)?
  7. Conclusions and further research efforts.
  8. Appendix A: Link to all of the source code.

The Efficient Market and Random Walk hypotheses

The strong form of the efficient market hypothesis states that markets accurately and instantaneously discount all publicly and privately-available information about a security into its price. As such, the best forecast of any securities price tomorrow is just its price today because its price today is necessarily correct.

Furthermore, this hypothesis implies that no careful study of past prices - or fundamental or economic information - will result in a consistently more accurate forecast of the securities price in the future. In an informationally efficient market the change in price from today to tomorrow depends upon unknowable information from tomorrow; which is why the change is random from today's perspective and a rational reaction from tomorrow's perspective. Only time-travellers and high frequency information arbitrageurs can hope to beat this market.

Despite my inclination towards more "complex" market hypotheses such as evolutionary and behavioural finance, I cannot argue that the efficient market hypothesis is without merit. It has merit for three reasons. Firstly, markets are damn difficult to forecast. Secondly, the average active manager does not beat the market consistently. And thirdly, the efficient market hypothesis has enabled exciting financial innovations such as derivatives securities. However, having merit does not imply that the hypothesis is true; it most probably isn't true.

The strongest criticism of the efficient market hypothesis originates from behavioural finance which challenges the underlying assumption that investors are rational, or rather, that they are rational enough to accurately price securities. Given all the available information about a security it's unlikely that market agents would agree on a price. Agents do not have homogeneous expectations in reality; as such, there is no intrinsic value in reality.

My favourite criticism of the efficient market hypothesis comes from evolutionary finance. According to evolutionary finance markets are complex systems consisting of heterogeneous agents which interact with one another and are subject to evolutionary forces. Through these interactions and as a result of evolutionary forces statistical properties emerge. Emergent properties of markets include randomness, momentum, value, and reversion but none of these properties are persistent or truer than the others. Their magnitudes change as the population of market agents adapt, evolve, and succumb. Such properties are, therefore, not anomalies (as they are called under the efficient markets hypothesis), they are expected emergent properties of markets.

In response to such criticism two variations of this hypothesis were proposed. The semi-strong form of the efficient market hypothesis relaxes the definition by stating that only publicly available information is discounted into a securities price and the weak-form of the efficient market hypothesis states that, at the very least, all historical price and volume data is discounted into a securities price. The key takeaway is that regardless of the definition you choose, in the absence side-information, the evolution of asset prices follows a Markov process; a random walk.

This theory, dubbed the random walk hypothesis, lies at the heart of almost all modern day asset pricing models which essentially assume that prices evolve according to a Markov process or random walk. Whether or not this is true has obvious and profound implications for you, me, and (almost) everybody else on this planet. Which might explain why academics and practitioners have spent decades studying the relative efficiency and / or randomness of markets. Our last article introduced one such test: Lo and MacKinlay's heteroskedastic variance ratio test. In this article we will consider, amongst others, the popular Lempel-Ziv compression test.

References for this discussion

If you enjoy the philosophy of markets as much as I do, here are a few of my favourite publications:

Random Walks

Non-random Walks

Algorithmic Information Theory and Randomness

The study of randomness is not an activity unique to quantitative analysis and economics, it is central to many fields including cryptography, quantum mechanics, and genetics. Therefore, it is quite unsurprising that we have borrowed and re-purposed ideas from other fields to argue in favour of, or against, the efficiency or randomness of markets. Nowhere is this more true than the field of algorithmic information theory, an amalgamation of computability theory and information theory (a.k.a a computer scientists dream). As Gregory Chaitin once stated,

"It [algorithmic information theory] is the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."

As mentioned previously one of the ideas borrowed from algorithmic information theory by economists is the idea that random sequences are incompressible. Let's take some time to explain the thought process behind this ...

In the 1930's Alan Turing introduced an abstract model for computation called the automatic machine or, as it has come to be known today, the Turing Machine. I'm not going to explain the mechanism behind the Turing Machine (you can find that at Cambridge's website or Wolfram Mathworld) because it is not relevant to our discussion of randomness. All you need to know is this: "despite the model's simplicity, given any computer algorithm, a Turing machine can be constructed that is capable of simulating that algorithm's logic" - Wikipedia.


A model of a Turing Machine. Image source:


What is an algorithm? In it's most simple form an algorithm is a deterministic sequence of steps for transforming a set of inputs into a set of outputs. Consider the the Fibonacci sequence as an example. The sequence of numbers starts with 0, 1 and proceeds as follows: 1, 2, 3, 5, 8, 13, .... The algorithm for reproducing this sequence can be stated very simply: the next term in the sequence is just the sum of the previous two terms. Additionally it is extremely easy for a Turing Machine to emulate this logic; the resulting program - found here - is quite simple.

What is complexity? According to another great mathematician, Andrey Kolmorogov, the complexity of a sequence can be measured by the length of the shortest computer program that produces the sequence as output. If that program is shorter than the original sequence then we have compressed that sequence; we have encoded it.

What happens when we give our Turing Machine a random sequence, S? If the data is random that means that there is no logic that was used to produce it. If there is no logic there is nothing for the Turing Machine to simulate. Therefore, the only program that will produce the random sequence as an output, is the random sequence: print S

The program, print S, is longer than the random sequence, S. Therefore a random sequence necessarily has a Kolmorogov complexity greater than the original the sequence itself. This can be interpreted as saying: a random sequence is incompressible ... but here's the [halting] problem: how do we know for a fact that the program we have found which outputs the sequence is the shortest one possible? We would have had to have to tried every single possible program, of which there are an infinity, to make such a conclusion. It's an impossible task.

Put simply the Kolmorogov complexity of a sequence is incomputable and, as such, randomness is incomputable. We cannot prove that a sequence is random but, that doesn't mean that we can't test the probability that a sequence might be random. This is where all of our statistical tests for randomness come into play ... some of which use compression algorithms to test whether or not a statistically significant amount of compression is achievable.

Some of those who we have to thank for algorithmic information theory:
1955, Moston, Manchester, England, UK --- Doctor Prinz, of the Manchester firm Ferranti, sets one of its computers a chess problem. The computer, which is used at Manchester University, takes 15 minutes to respond to the problem but achieves checkmate in two moves. --- Image by © Hulton-Deutsch Collection/CORBIS

Claude Shannon


Andrey Kolmorogov


Alonzo Church


Alan Turing


Compression and Market Randomness; Challenges

There have been quite a few academics who have attempted to use compression algorithms to test the efficiency of markets. At face value this may seem like a well formulated idea, but the idea faces some unique challenges. 

Challenge 1 - Infinite vs. Finite Sequences

The first challenge (which isn't really a problem) is that our information theory definition of randomness applies as sequences approach infinite length. Any sufficiently finite sequence is likely to be slightly compressible on the basis of probability alone. As such when using compression to test relatively short sequences of market returns we are not interested in whether the algorithm was able to compress the returns, we are interested in whether the degree of compression is statistically significantly more than what is expected on a random sequence.

Challenge 2 - Market Drift

The second problem is that there are different types of randomness. The first type, often called the Martingale property or true, algorithmic, or Kolmorogov randomness, is the one most often associated with information theory and computer science (especially cryptography). The second type of randomness, called predictability or the Markov property, is most often associated with finance, stochastic processes, and the random walk hypothesis.

What this means is that we need to be extremely careful - admittedly, more careful than I have been in the past - when selecting our statistical tests for market randomness as well as when choosing our inputs. This is because a statistical test designed to test for true randomness will have a very different confidence interval than a statistical test designed to look for breakdowns in the Markov property. A simple explanation of this is given below.

Suppose we have two models. The first is a Brownian Motion model and the second is a Geometric Brownian Motion model which a non-zero drift. In this example the first model satisfies both the Martingale and Markov properties but the second model only satisfies the Markov property. As such if we applied tests designed to test for true randomness the output from the first model it should pass but output from the second model should fail.

This is a major problem for individuals wishing to refute the random walk hypothesis using compression because the hypothesis only really concerns itself with the excess predictability of the market above drift i.e. ones ability to "beat the market". Which is why academics such as Andrew Lo and Jonathan MacKinlay spent decades designing more powerful tests of market randomness such as the Heteroskedastic Variance Ratio Test.


Asset Prices Simulated using the Brownian Motion Stochastic Process - Many

Model 1 (Brownian Motion) - on average the paths go nowhere; zero drift. When binarized this model becomes an unbiased coin flipping sequence.

Asset Prices Simulated using the Geometric Brownian Motion Stochastic Process - Many

Model 2 (Geometric Brownian Motion) - on average the paths go up; non-zero drift. When binarized this model becomes an biased coin flipping sequence.

Challenge 3 - Stochastic Volatility & Jumps

The third problem is that the non-stationarity of volatility can cause certain returns to appear more or less frequently than they theoretically should. This problem is commonly called "fat tails" and manifests itself because of either stochastic volatility or discontinuities (jumps) in market returns. However, if we assume that the noise present in market returns is centred around zero, then we can show that regardless of jumps and stochastic volatility by binarizing the market returns we remove this problem all-together - see the next section.

So what do these challenges mean for researchers wishing to refute the random walk hypothesis using compression-based tests of randomness? It just means that we need to think about our methodology carefully.

Proposed Compression Test for Market Randomness

Academics have applied lossless compression algorithms to market returns in order to test or rank their efficiency for many years now. At the end of this section you will find links to relevant papers which take this approach.

Most of the papers concluded that developed markets exhibit high levels of efficiency whereas some developing markets exhibit excess compressibility and, therefore, some degree of inefficiency. Such inefficiencies can only be explained by specific return patterns having higher-than-expected probabilities through time.

Our testing methodology

The following testing methodology is followed for three well-known compression algorithms: {gzip, bzip2, and xz}, on a set of developed and developing market indices, and foreign exchange rates.

  1. Download the market index prices from
  2. Compute the logarithmic returns of the index and denote them as .
  3. Compute the mean logarithmic return (drift component), .
  4. De-trend by subtracting (drift) from .
  5. Binarize  by replacing >0 returns with 1's and <0 returns with 0's.
  6. Split  into overlapping discrete windows, , of 7 years each (+-450 data points).
  7. Convert each window, , into hexadecimal (4-day sub-sequences), .
  8. Compress each , and compute the compression ratio,
  9. Return the average compression ratio over all windows, .
  10. Compute the expected compression ratio using pseudo-random data, .
  11. If : market exhibits no excess compressibility which implies efficiency.
  12. If : market exhibits some excess compressibility which implies inefficiency.

The test proposed above is similar to those used in the literature - we use binarized returns and multiple lossless compression algorithms - but also dissimilar in some important ways - we de-trend market returns to remove the influence of drift and we compute the average compressibility using a walk-forward methodology.

Our test in R

Very little R code was required to construct this test 🙂 due to the memCompress and memDecompress functions which exist in R. Similar Python functions, encode and decode, exist for readers who favour Python.

Some relevant academic papers:
  1. Shmilovici, Armin, Yael Alon-Brimer, and Shmuel Hauser. "Using a stochastic complexity measure to check the efficient market hypothesis." Computational Economics 22.2-3 (2003): 273-284.
  2. Giglio, Ricardo, Raul Matsushita, and Sergio Da Silva. "The relative efficiency of stockmarkets." Economics Bulletin 7.6 (2008): 1-12.
  3. Risso, Wiston Adrián. "The informational efficiency: the emerging markets versus the developed markets." Applied Economics Letters 16.5 (2009): 485-487.
  4. Shmilovici, Armin, et al. "Measuring the efficiency of the intraday forex market with a universal data compression algorithm." Computational Economics 33.2 (2009): 131-154.
  5. Giglio, Ricardo, et al. "Algorithmic complexity theory and the relative efficiency of financial markets." EPL (Europhysics Letters) 84.4 (2008): 48005.


Overview of Lossless Compression Algorithms in R

A detailed discussion of each of the lossless compression algorithms available to R's memCompress and memDecompress functions is beyond the scope of this article. Below you will find a very brief summary of the three compression schemes used as well as references to their underlying compression algorithms.

gzip compression

gzip is based on the DEFLATE algorithm, which combines LZ1 and Huffman coding. The LZ1 algorithm, named after Abraham Lempel and Jacob Ziv, works by replacing repeated patterns in the sequence with references to a copy of that pattern from earlier in the uncompressed sequence. Each match is encoded using a length-distance pair which tells the decoder that: the next y characters is equal to the characters x distance characters behind it. The Huffman Coding algorithm, named after David Huffman, works by constructing an optimal prefix tree given the occurrences of patterns in the sequence. For a great visualization of this checkout this YouTube video.

bzip2 compression

bzip2 uses three mechanisms in its compression: the Burrows-Wheeler Transform, the Move-to-Front Transform, and then Huffman Coding. The Burrows-Wheeler Transformation is a reversible method for rearranging the sequence into runs of similar characters so that it can be more easily compressed by Run-Length-Encoding schemes or Move-to-Front encoding. Move-to-front transformations work as follows: letters in the sequence are replaced with their indices in a stack containing the most recently observed letters. As such, long sequences of identical letters are replaced by smaller integers whereas infrequent letters are replaced with larger integers. Lastly the transformed data is fed through the Huffman Coding algorithm as with the DEFLATE algorithm.

xz compression

xz implements the (very cool sounding) Lempel–Ziv–Markov chain algorithm (LZMA) which was developed in the 7z compression format. LZMA is a dictionary compression algorithm derived from the LZ1 algorithm. It's output is encoded using a model which makes a probability prediction of each bit, called an adaptive binary range coder

All of these algorithms are available natively in R via the memCompress and memDecompress functions. The biggest challenge in using these functions (at least for our purposes) was that the input needs to be given as a raw hexadecimal sequence. Converting daily, binarized returns into hexadecimal sequences is done by the function bin2rawhex. Essentially what this does is break our daily returns sequence into discrete, non-overlapping, four-day, sub-sequences which can then be converted directly into hexadecimal. It's not ideal, but it seems to work.

Results obtained and their meaning(lessness)?

For this article we ran our compression test on 51 global stock market indices from developed and developing markets and 12 liquid currency pairs. These sequences were tested on both a weekly and daily frequency.

The exact breakdown of tested assets is given below:

  • Dow Jones Industrial (USA): YAHOO/INDEX_DJI
  • S&P 500 Index (USA): YAHOO/INDEX_GSPC
  • S&P 400 Mid-cap Index (USA): YAHOO/INDEX_MID
  • S&P 600 Small-cap Index (USA): YAHOO/INDEX_SML
  • NASDAQ Composite (USA): YAHOO/NDAQ
  • Russel 1000 Index (USA): YAHOO/INDEX_RUI
  • Russel 2000 Index (USA): YAHOO/INDEX_RUT
  • Russel 3000 Index (USA): YAHOO/INDEX_RUA
  • NYSE AMEX Composite Index (USA): YAHOO/INDEX_XAX
  • Merval Index (Argentina): YAHOO/INDEX_MERV
  • Bovespa Index (Brazil): YAHOO/INDEX_BVSP
  • Inmex Index (Mexico): YAHOO/INDEX_MXX
  • NYSE ARCA Mexico Index (Mexico): YAHOO/INDEX_MXY
  • S&P TSX Index (Canada): YAHOO/INDEX_GSPTSE
  • ISEQ General Total Return Index (Ireland): YAHOO/INDEX_IEQR_IR
  • Athens Composite Index (Greece): YAHOO/INDEX_GD_AT
  • CAC40 Index (France): YAHOO/INDEX_FCHI
  • DAX Index (Germany): YAHOO/INDEX_GDAXI
  • OMX 20 Copenhagen (Denmark): YAHOO/INDEX_OMXC20_CO
  • SATRIX Top 40 (South Africa): GOOG/JSE_STX40
  • SATRIX Industrials (South Africa): GOOG/JSE_STXIND
  • SATRIX Financials (South Africa): GOOG/JSE_STXFIN
  • SATRIX Resources (South Africa): GOOG/JSE_STXRES
  • EGX 70 Price Index (Egypt) : YAHOO/INDEX_CCSI
  • Russia RTS Index (Russia): YAHOO/INDEX_RTS_RS
  • Swiss Market Index (Switzerland): YAHOO/INDEX_SSMI
  • Tel Aviv Top 100 Index (Israel): YAHOO/INDEX_TA100
  • Eurostoxx 50 (Europe): YAHOO/INDEX_STOXX50E
  • AEX Index (Netherlands): YAHOO/INDEX_AEX
  • BEL20 Index (Belgium): YAHOO/INDEX_BFX
  • ATX Index (Austria): YAHOO/INDEX_ATX
  • OMX30 Index (Sweden): YAHOO/INDEX_OMX
  • Nikkei 225 (Japan): YAHOO/INDEX_N225
  • Japan AMEX Index (Japan): YAHOO/INDEX_JPN
  • Shanghai Composite Index (China): YAHOO/INDEX_SSEC
  • CSI300 Index (China): YAHOO/SS_000300
  • Hang Send Index (Hong Kong): YAHOO/INDEX_HSI
  • KOSPI Composite Index (South Korea): YAHOO/INDEX_KS11
  • BSE 30 Sensitivity Index (SENSEX) (India): YAHOO/INDEX_BSESN
  • Nifty Fifty Index (India): YAHOO/INDEX_NSEI
  • Colombo All Share Index (Sri Lanka): YAHOO/INDEX_CSE
  • New Zealand Stock Exchange (New Zealand): YAHOO/INDEX_NZ50
  • Taiwan Weighted Index (Taiwan): YAHOO/INDEX_TWII
  • All Ordinaries Index (Australia): YAHOO/INDEX_AORD
  • S&P/ASX 200 Index (Australia): YAHOO/INDEX_AXJO
  • Jakarta Composite Index (Indonesia): YAHOO/INDEX_JKSE
  • Strait Times Index (Singapore): YAHOO/INDEX_STI
  • US Dollar to British Pound: CURRFX/USDGBP
  • US Dollar to Euro: CURRFX/USDEUR
  • British Pound to Euro: CURRFX/GBPEUR
  • US Dollar to South African Rand: CURRFX/USDZAR
  • US Dollar to Brazilian Real: CURRFX/USDBRL
  • US Dollar to Russian Ruble: CURRFX/USDRUB
  • Euro to South African Rand: CURRFX/EURZAR
  • Euro to Brazilian Real: CURRFX/EURBRL
  • Euro to Russian Ruble: CURRFX/EURRUB
  • British Pound to South African Rand: CURRFX/GBPZAR
  • British Pound to Brazilian Real: CURRFX/GBPBRL
  • British Pound to Russian Ruble: CURRFX/GBPRUB

All of these assets were compressed using gzip, bzip2, and xz lossless compression algorithms within the walk-forward averaging methodology explained earlier and, for once, the results are really easy to communicate ...

With the exception of the US Dollar to Russian Ruble foreign exchange rate, none of the assets demonstrated excess compressibility. In fact, none of the assets demonstrated any compressibility whatsoever.

Our results with gzip, bzip2, and xz - sophisticated compression algorithms - confirm our negative result obtained in Hacking The Random Walk Hypothesis with the linear complexity test. But they contradict our results obtained with NIST tests and our results obtained with Lo and MacKinlay's Heteroskedastic Variance Ratio Test. Why?

Hypothesis 1: Markets are efficient.

The first interpretation of this result is that markets, across the world, are relatively efficient; they appear quite random. This is the conclusion most of the academics in the papers above arrived at.


Hypothesis 2: Compression isn't appropriate.

The second interpretation of this result is that there is something fundamentally "wrong" with using lossless compression algorithms to test the relative efficiency or randomness of financial markets. 

Since I was unsatisfied with the results I decided to dig a bit deeper and develop a test for the compression test ... and what I found was really, really interesting ;-).

Testing the compression test

People who refute the random walk hypothesis, like me, are not arguing that markets are 100% deterministic; we are arguing that they are not 100% stochastic. In other words, we are arguing that markets have two faces: the signal and the noise. The signal manifests itself in the many hundreds of so-called anomalies: value, momentum, reversion, indexation, etc. and the noise manifests itself as apparent randomness and efficiency. They are two sides of the same coin; both emergent properties of a complex adaptive system of heterogeneous, imperfect agents. We then go one step further and claim that the amount of noise produced by the system does not drown out the signal and that it is, therefore, possible to beat the market by identifying such signals and exploiting them. 

With that in mind I set out to design a simple test of our compression test:

Consider the following toy market:  where

As we increase the size of the standard deviation of the noise component, , we see that the compressibility of the returns begins to degrade. When the standard deviation passes 0.01 the compression test almost-always fails:

toy market compression

Example outputs from our toy market for , and  look as follows:

According to our compression test all of these toy markets are random or efficient because they could not be compressed using the lossless compression algorithms gzip, bzip2, or xz. I must admit they look quite random too! But remember, they consist of both a signal and a noise so they are definitely not truly random.

Now, for added good measure, consider the following mean-reversion betting strategy / algorithm:

  • Inputs: window size, .
  • For time, , in time:
    • Compute the cumulative return of the past days excluding day , .
    • If < 0.0: go long 100% the asset (bet the asset goes up)
    • Else: go short 100% the asset (bet the asset goes down)

Below you will find two charts for each of the noise levels. The first chart will show thirty equity curves generated by our simple mean reversion betting strategy and the second chart will show the average toy market return (from above) against the average mean reversion betting strategy return on these markets. 

Now obviously this is a silly / extreme example but it highlights a contradiction:

If a market (even our toy market) is incompressible it "follows" that it is efficient but if it were truly efficient then there could not exist any betting strategy which consistently beats it, such as the above one. 

To resolve this we must accept that because we cannot prove randomness, the sequence is actually not random and our compression test has failed. Why? I think that the test is failing because it is more sensitive to noise than our trading strategy - my simulations indicate that in the case of our toy market the compression test in up to 10x more sensitive to noise than our trading strategy - this may be due to using greedy or lossless compression algorithms. 

The code for running this test of the compression test is shown below.

Conclusions and further research efforts

In this article we introduced the concept of using compression as a test for randomness and explained how such tests have been applied to the market to conclude that they are relatively efficient. We then developed a "meta test" for market randomness tests based on greedy lossless compression algorithms which indicates that such tests are highly sensitive to any noise. This may explain why they are contradicted by other randomness tests.

From this article we might want to conclude the following:

  1. After de-trending markets exhibit enough noise to make them incompressible using greedy, lossless compression algorithms such as gzip, bzip2, xz, and a linear feedback shift register. 
  2. If the sensitivity of the test to noise is greater than our trading strategies robustness in the presence of noise, the negative result does not imply that markets are efficient in the sense that they cannot be beaten.
  3. Our toy market shows that the sensitivity of the greedy lossless compression algorithms used to test market efficiency can be as much as 10x more sensitive to noise than a simple betting strategy.
  4. As such, some academic literature in support of the efficient market hypothesis presented in this article may inadvertently be applying a too-strict interpretation of non-randomness to market returns.
  5. Statistical tests for market randomness vary in terms of their size, power, and sensitivity. Multiple tests should be used in favour of single tests for more statistically significant and robust results.

As I mentioned in the introduction, further research efforts stemming from this article with include a study of lossy compression algorithms most notably, a stacked auto-encoder neural network. We will follow much the same methodology as we did here: look for excess compressibility beyond what would be expected from some finite pseudo-random sequence and then test how sensitive the lossy compression tests are to noise.

If you have any criticisms, counter-arguments, compliments, or suggestions for our upcoming article on lossy compression algorithms I would love to hear them in the comment section below :-). Thanks for reading!

Appendix A: All the Source Code

All of the source code required to run this analysis is available in R on GitHub here


  1. First off, I tip my hat to the brilliant awesomeness of the work on this blog. Seriously amazing.

    I'm no where near as educated in mathematics as you are, but I have an idea. If I understood correctly, your measure originates essentially from an arbitrary zero baseline. The measure is random from this point, but it is likely not random when measured from other points such as its daily velocity or moving averages or such. There may also be a timing component. Another possibility is that the data is not random but correlated to an external variable.

    A larger, stratified sample might also result in more insight. I think you are onto something here. Many years ago I worked on some simulation logic for market analysis and discovered that what most people think is correlation is simply different ways of measuring the exact same thing from different baselines, either temporal or some exchange rate for labor, resources, etc. The trick is to find the independent correlations. Sadly, the person leading my project just didn't get that and built a marvelously complex program that told him what he already knew and he went broke quickly.

    Caveat: I don't play in the market so I may be unknowingly stating the obvious.

    • Thank you Harvey, I always appreciate your comments and support.

      You are absolutely correct in pointing out that the conclusion regarding whether or not market returns are random depends a great deal on how you choose to transform and measure the data. In this article we derived a market which is incompressible i.e. efficient but at the same time beatable i.e. inefficient. Such contradictions are why we should consider as many different perspectives as possible. The NIST suite of cryptographic tests offers us many features, although admittedly there were flaws in my original methodology - such as not properly accounting for drift and stochastic volatility - but the journey is far from over. At the end of this year I will write a blog post which summarizes these non-random excursions and proposed some useful considerations for practising quantitative analysts.

      You are also correct to point out that randomness is a function of available information; in the presence of more or less information certain variables may become more or less predictable and therefore, by definition, more or less random. The real-world challenge is to find those omitted / causal variables (the independent correlations as you put it) which explain our target most accurately ... with respect to market efficiency, this distinction is covered up by the different forms of the efficient market hypothesis: strong, semi-strong, and weak. The weak form only considers the price variable.

      None of this is obvious. Once again, thanks for the comment, it made me think :-).

  2. Since lossless compression hasn't proven useful, what about lossy compression? Compress market data using something akin to jpeg or mpeg compression, and compare it to the compression of true noise (or random walk) signals.

    • Hi Matt, I enjoy your blog :-).

      I would disagree that lossless compression has not proven useful. I would argue that it is very useful to know that quite a fair chunk of the published academic literature in support of market efficiency does not actually imply market efficiency at all ;-). That conclusion may just help us financial professionals sleep a bit better knowing that are not (all) charlatans.

      Regarding your suggestion, I did mention that I will be following up with an article which uses lossy compression algorithms. I've written most of the code which uses an auto-encoder neural network to perform the compression. I looked at jpeg and mpeg compression and I don't think they will work; most of the packages was the input data in matrix format which doesn't work for markets?

      Let me know if you have any other suggestions. Keep up the good work!

      • Denis

        Hi Stuart,
        thanks very much for your posts, they are awesome!

        Your question about matrix format for markets wasn't for me still if you don't mind I would like to tell the idea how this question potentially can be solved.

        When you have a sequence of variable readings, say close price/returns/binarized returns etc. you may easily convert it into matrix (and it seems to me you already had it in your post when you referred to windows).
        say you have close prices:
        1.2; 1.23; 1.24; 1.21; 1.19; 1.18; 1.20; 1.17; 1.16; 1.13; 1.15

        by defining a window of size say 3 you may slice you sequence into matrix.
        we will have:
        1.2 1.23 1.24
        1.23 1.24 1.21
        1.24 1.21 1.19
        1.21 1.19 1.18
        1.19 1.18 1.20
        1.18 1.20 1.17
        1.20 1.17 1.16
        1.17 1.16 1.13
        1.16 1.13 1.15

        the size of the window to slice the sequence (in above example it is 3) you may consider as amount of memory you take into account.

        above method is used for example by Caterpillar-SSA method to decompose time series into components and check the noise, trend and oscillatory ones so to then reconstruct the time series from the required ones and to make forecast.

        further if above matrix containing vectors as rows may be considered as vector inputs you may define another window of arbitrary size to slice the output and have for example linear matrix equation of type Ax = y, where each input vector of size N corresponds to output vector of size M.

        For the above sequence let's define input slicing window of size 3 and output slicing window of size 2 and we will have input matrix x and output matrix y:
        input matrix x output matrix y
        1.2 1.23 1.24 1.21 1.19
        1.23 1.24 1.21 1.19 1.18
        1.24 1.21 1.19 1.18 1.20
        1.21 1.19 1.18 1.20 1.17
        1.19 1.18 1.20 1.17 1.16
        1.18 1.20 1.17 1.16 1.13
        1.20 1.17 1.16 1.13 1.15

        if you train and find something which converts/transfers x into y then you may feed it with next input vectors and get forecast of output vectors:
        x input vectors:
        1.17 1.16 1.13
        1.16 1.13 1.15

        referring to mpeg the analogue chain potentially can be we have a movie -> sequence of frames on the market and potentially could try compress this sequence of frames represented by binarized returns

        • Hey Denis, thanks for the awesome comment and apologies for the slow response.

          It took me two reads but I really like this idea, it makes perfect sense, and I think this will make it possible for me to try some of the matrix-based lossy compression algorithms in the follow-up article. Thank you for taking the time to write it down so clearly and for introducing me to Singular Spectrum Analysis :-). I really appreciate the guidance.

  3. Shane

    Hmmm. I just had my morning coffee(s) so this will be a stream of consciousness...

    On the right track I think with a de-noising auto encoder. As your dealing with time series, it also be worth kicking the tires on a generative recurrent neural network (GRU or a mutated LSTM perhaps). which attempts to output the next sequence.

    Also, what about smoothing the data with a Kalman filter and then attempting the lossless compression? For that matter what about just trying to predict the returns with a particle filter?

    Great read. This provided me some novel thoughts to apply to my own general analytic work.

    Looking forward to the auto-encoder..

    • Hey Shane, thanks for comment :-).

      Trying a generative recurrent neural network is a very good suggestion (just need to find a decent package) as is smoothing the data before applying the compression. I'll try to incorporate both into the next blog post. The plan for the next post is to move beyond the binarization approach and attempt to compress the raw, de-trended returns and compare those results against simulated data from two calibrated stochastic models which have fat tails, namely the Heston Model and the Merton Model. Smoothing ... or testing different sampling frequencies ... would add another interesting dimension.

      • Shane

        Hi Stuart wrt packages; and with the risk of introducing my personal bias, Keras is my preference as the best balance of ease of use versus flexibility in terms of activations, objectives, etc.

        They have quite the stable of recurrent layers that are fairly plug and play and a reasonable zoo of examples.

        But with your abilities you may just want to implement a bare bones vanilla RNN without any fancy mutations to observe the interactions. It's always good to look under the hood and with Keras (or any NN library really) that's non-trivial. In any event just sharing some package experience.


  4. Shane

    Just a quick follow up on the last comment, I intended the "your abilities" statement as a complement, in the sense that you would be able to lay out the math and code behind even the vanilla RNN, which is also quite involved (and I myself have never done).

    Just wanted to make sure it wasn't misconstrued as I probably phrased it poorly (again coffee).


    • Hey Shane,

      LOL! Don't stress 🙂 I wouldn't have taken it in the wrong way anyway.

      Thanks for the suggestion. I'll definitely take a look at Keras this weekend. As much as I prefer to implement my models from scratch - this is the only way I can be one hundred and fifty percent sure I understand them inside out - I don't currently have the time :-(. I like that Keras is in Python, it's been a few months since I coded in Python; I am afraid I might be getting rusty ha ha.

  5. R

    Hi, have you seen this interview? I find it very enjoyable and you would probably too. :>

    • Hey R, thanks for the comment! I have seen it and I enjoyed it very much. I think I will watch it again, thanks for reminding me about it :-).

  6. This series of posts is amazing! Thank you very much.

    The only problem is that the promised follow-up post seems nowhere to be found:

    "As I mentioned in the introduction, further research efforts stemming from this article with include a study of lossy compression algorithms most notably, a stacked auto-encoder neural network."

    Is it upcoming? Or did I miss something?

Submit a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.