Image Image Image Image Image Image Image Image Image Image

Turing Finance | September 25, 2023

Scroll to top

Top

40 Comments

Hacking the Random Walk Hypothesis

Hacking the Random Walk Hypothesis

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/customer/www/turingfinance.com/public_html/wp-content/plugins/latex/latex.php on line 47

Hackers would make great traders. At a meta level, hackers and traders do the same thing: they find and exploit the weaknesses of a system. The difference is that hackers hack computers, networks, and even people for various good and bad reasons whereas traders hack financial markets to make profits. One type of exploit which has always fascinated me are those on random number generators. You see, random number generators are used everyday to encrypt data and communications but if the random number generators are flawed then they stop being cryptographically secure and hackers can exploit those vulnerabilities to decrypt the encrypted data and communications. For this reason random number generators need to pass robust sets of statistical tests for randomness, such as the NIST suite of cryptographic tests for randomness, to determine whether they are sufficient for cryptographic uses. In this post we are going to subject various financial market returns to the NIST suite of tests and see whether or not we should be able to, in theory, hack the market.

Article Outline

This article is broken up into three sections. The first section will present background information about the random walk hypothesis and compares the statistical definition of randomness to the algorithmic definition. The second section will outline my Python implementation of the NIST test suite, including a brief explanation and source code for each test. The third and final section will subject a number of financial markets to these tests and end off by concluding whether or not the market are random and, if they are random, comment on the nature of that randomness. Depending on your familiarity with the subject matter and time constraints, you may wish to skip some sections. All of the code included below can be found in my repository, r4nd0m, on GitHub. 

  1. Background Information
    1. The Random Walk Hypothesis
    2. Statistical vs. Algorithmic Randomness
  2. NIST Python Implementation
    1. Project Structure
    2. Binary Encoding Approach
    3. Test 01 - Frequency (Monobit)
    4. Test 02 - Block Frequency Test
    5. Test 03 - Runs Test
    6. Test 04 - Longest Runs Test
    7. Test 05 - Binary Matrix Rank Test
    8. Test 06 - Discrete Fourier Transform Test
    9. Test 07 - Non-overlapping Patterns Test
    10. Test 08 - Overlapping Patterns Test
    11. Test 09 - Universal Test
    12. Test 10 - Linear Complexity Test
    13. Test 11 - Serial Test
    14. Test 12 - Approximate Entropy
    15. Test 13 - Cumulative Sums
    16. Test 14 - Random Excursions
    17. Test 15 - Random Excursions Variant
    18. Walsh–Hadamard Transform (QuantAtRisk.com)
  3. Hacking the Market
    1. Experiment Set-up and Markets Considered
    2. Problems with these experiments
    3. Summary of Results
    4. Conclusions and Further Research
  4. A Personal Update on my New Job and Home
  5. Software Disclaimer

On another note I am thrilled to report that this Python implementation passes all of the unit tests specified in the NIST C documentation and, as a bonus, includes tonnes of comments. Given this fact, I hope that the code will be useful to real security researchers as well as as quantitative analysts and traders. That said, my Python implementation carries the same disclaimer as NIST, so please take a moment to view the README file.

Background Information

This section introduces the random walk hypothesis and it's importance to quantitative finance. It also discusses the two definitions of randomness namely, statistical and algorithmic. 


The Random Walk Hypothesis

Many systems in the real world demonstrate the properties of randomness including, for example, the spread of epidemics such as Ebola, the behaviour of cosmic radiation, the movement of particles suspended in liquid, luck at the roulette table, and supposedly even the movement of financial markets as per the random walk hypothesis ... but before we get into the details of the random walk hypothesis, let's discuss a famous test by Professor Burton G. Malkiel. The following extract is take from the Wikipedia page on the random walk hypothesis. It very succinctly describes the test that Professor Malkiel performed and the conclusions he drew from this test.

"Burton G. Malkiel, an economics professor at Princeton University and writer of A Random Walk Down Wall Street, performed a test where his students were given a hypothetical stock that was initially worth fifty dollars. The closing stock price for each day was determined by a coin flip. If the result was heads, the price would close a half point higher, but if the result was tails, it would close a half point lower. Thus, each time, the price had a fifty-fifty chance of closing higher or lower than the previous day. Cycles or trends were determined from the tests. Malkiel then took the results in a chart and graph form to a chartist, a person who “seeks to predict future movements by seeking to interpret past patterns on the assumption that ‘history tends to repeat itself’”. The chartist told Malkiel that they needed to immediately buy the stock. Since the coin flips were random, the fictitious stock had no overall trend. Malkiel argued that this indicates that the market and stocks could be just as random as flipping a coin." - Wikipedia

In effect this is similar to "financial Turing tests" in which people familiar with the markets are asked to take a look at time-series below and identify which one(s) are real market returns data, and which one(s) are simulated using some random process. Have a go with the following three images. Either image may either be a real market return series, or just the output from a calibrated Heston stochastic volatility model. Good luck guessing!

It's quite difficult, isn't it? This observation lead early quantitative researchers to investigate whether or not stock market returns evolve randomly. The theory that market returns so evolve randomly is called the random walk hypothesis. This theory dates back to the early 1800's when Jules Regnault and Louis Bachelier observed the characteristics of randomness in the returns of stock options. The theory was later formalized by Maurice Kendall and popularized in 1965 by Eugene Fama in his seminal paper, Random Walks In Stock Market Prices.

Despite the entertainment value of these tests, they really don't prove that markets are random at all. All they do is prove that to the human eye market returns, in the absence of any additional information, are indistinguishable from random processes. This conclusion does not, in and of itself, provide any useful information about the random characteristics of markets. Furthermore, the random walk hypothesis itself suffers from some shortcomings:

  1. It lumps all markets into one homogeneous bucket and does not differentiate between them.
  2. It cannot explain many empirical examples of people who have consistently beaten the market.
  3. It is based on the statistical definition of randomness not the algorithmic definition meaning that,
    1. It does not distinguish between local and global randomness.
    2. It does not deal with the idea of the relativity of randomness.

The remainder of this article will explain how elements of the algorithmic definition of randomness namely local versus global randomness and the relativity of randomness could potentially explain the anomaly between the implications of the random walk hypothesis and empirically observed long-term investor market out-performance. It will also compare different markets using the NIST suite of statistical tests for randomness.

Whether we love it or hate it and regardless of whether it is right or wrong, we cannot deny that the widespread adoption of the random walk hypothesis by quantitative analysts in the industry has lead to major advances in the field of quantitative finance especially for derivative security and structured product valuations.

Back to the article outline


Algorithmic vs. Statistical Definitions of Randomness

Any function whose outputs are unpredictable is said to be stochastic (random). Similarly, any function whose outputs are predictable is said to be deterministic (non-random). To complicate matters, many deterministic functions can appear to be stochastic. For example, most of the random number generators we use when programming are actually deterministic functions whose outputs appear to be stochastic. Most random number generators are not truly random which is why they are labelled either pseudo or quasi random number generators.

In order to test the validity of the random walk hypothesis we need to determine whether or not the outputs generated by the market (our function) are stochastic or deterministic. In theory there is an algorithmic and a statistical approach to this problem, but in practice only the statistical approach is ever used (for good reasons).

.Back to the article outline

The Algorithmic Approach

Computability theory, also known as recursion theory or Turing computability after the late Alan Turing, is a branch of theoretical computer science which deals with the concept of computable and non-computable functions. Computable functions can be reasoned about in terms of algorithms, in the sense that a function is computable if, and only if, there exists an algorithm which can replicate that function. In other words there is an algorithm that when given any valid input to the function will always return the correct corresponding output.

If randomness is the property of unpredictability this means that the output from the function can never be accurately predicted. Logically this then implies that all random processes are non-computable functions because no algorithm which accurately replicates that function could exist. The famous Church-Turing thesis states that a function is computable if and only if it is computable by a Turing machine (other methods also exist). 

Turing machines are hypothetical devices which work by manipulating an alphabet of symbols on an infinite strip of tape according to a table of rules. Turing machines are able to simulate any algorithm and are therefore, when given an infinite amount of time, are theoretically able to compute any and all computable functions.

Turing Machine

So what? Well, because of the link between computability and randomness in order to prove or disprove the random walk hypothesis all one would need to do is use a Turing machine to determine whether or not an algorithm which replicates the market (our function) exists. In other words, an algorithm exists for which, when given any valid input, will always return the correct corresponding output ... Alas, nothing in this world is simple because this approach to proving (or disproving) the random walk hypothesis runs headlong into the halting problem

The halting problem basically involves determining whether or not a program will halt or if it will continue to run indefinitely. This problem has been proven to be unsolvable, meaning that it is not possible to know up-front whether or not a program will halt. As such, the challenge with using a Turing machine to try and find an algorithm which replicates the stock market is that if the market is actually random no replicating algorithm exists.

This implies that the Turing machine would need to try every possible algorithm before halting which would literally take forever. As such it is, for all intents and purposes, impossible to prove that the market is truly random. 

Despite this fact these observations have given rise to a very interesting field called algorithmic information theory. Algorithmic information theory concerns itself with the relationship between computability theory and information theory. Algorithmic information theory defines different types of randomness, the most popular definition is Martin-Löf randomness which asserts that in order for a sequence to be truly random it must,

  • Be incompressible. Compression involves finding some loss-less representation of the information which uses less information. For example, the infinitely long binary string, 01010101 ..., can be expressed more concisely as 01 repeated infinitely, whereas the infinitely long binary string, 0110000101110110101 ..., has no apparent pattern and therefore cannot be compressed to anything less than the exact same binary string, 0110000101110110101 .... This is equivalent to saying that if the Kolmogorov complexity of the string is greater than or equal to the length of the string, then the sequence is algorithmically random. 
  • Pass statistical tests for randomness. There are many statistical tests for randomness which deal with testing the difference between the distribution of the sequence versus the expected distribution of any sequence which was assumed to be random. This is discussed in more detail in the statistical approach section below. 
  • And be impossible to make money off of. This is an interesting concept which simply argues that if a set of martingales can be constructed on a sequence which is always expected to succeed, then the sequence is not random. A martingale is a model of a fair game where knowledge of past events never helps predict the mean of the future winnings. One might be tempted to think that the existence of the momentum and mean-reversion anomalies in financial markets must therefore disprove the random walk hypothesis ... well, sort of. The problem with these anomalies is that they are not persistent and tend to break down or exhibit cyclical behaviour. As such, if anything they disprove the local random walk hypothesis (this is introduced below).

Another interesting differentiation made by both the statistical and algorithmic definitions of randomness is local versus global randomness. If a sequence appears random in the long-run it is said to be globally random, despite the fact that finite blocks of the sequence the sequence may appear to be non-random at times.

For example, the sequence 01010100011010101110000000001... appears to be random, but if we break the sequence up into four blocks [010101][0001101010111][000000000][1...] the first and third blocks do not appear random. As such, the sequence is said to exhibit global randomness but only partial local randomness.

Relating this distinction back to the random walk hypothesis, we should be able to distinguish between the local and the global random walk hypothesis. The global random walk hypothesis would state that in the long run markets appear to be random whereas the local random walk hypothesis would state that for some minimum period time the market will appear to be random. This view of the world is, at least in my opinion, consistent with the empirical observations of anomalies such as the value, momentum, and mean-reversion factors especially when we acknowledge that these factors tend to exhibit cyclical behaviour. In other words, as with the sequence shown earlier, markets exhibit global randomness but during finite periods of time local randomness breaks down.


Markets exhibit global randomness but local randomness tends to breakdown in some dimensions of time; these breakdowns manifest themselves as anomalies e.g. the value, momentum, and reversion factors.


Unfortunately I have not seen this distinction made anywhere, it is my own opinion on how the empirical observations of individuals beating the market and the random walk hypothesis could be reconciled. Another distinction made by the algorithmic definition of randomness is that  randomness is relative to information.

In the absence of information many systems may appear random, despite the fact that they are deterministic. In statistics this is known as a confounding variable. A good example of this is a random number generator which only appear random in the absence of the seed being used to produce that random sequence. Another, more interesting example is that market returns might appear to be random on their own, but in the presence of earnings reports and other fundamental indicators, that apparent randomness could break down and become non-random. 


In the absence or presence of information markets may appear less or more random i.e. the global and local randomness of the markets is relative to the quality and quantity of available information about the markets. 


These two theories are impossible to prove but they are what I personally believe about the markets (in addition to my belief that they are not random, but rather appear random just like many other complex adaptive systems). The remainder of this article leaves the world of theoretical computer science, information theory, and economics behind and focuses instead on what can realistically be achieved, namely the application of statistical tests for randomness to markets in order to identify potential trading opportunities / attractive markets.

Back to the article outline.

The Statistical Approach

A sequence is said to be statistically random when it does not exhibit any observable patterns. This does not imply true randomness i.e. unpredictability because most pseudo random number generators, whose output are completely predictable (when given a seed), are said to be statistically random. Generally speaking a sequence is labelled statistically random if it is able to pass a battery of tests for randomness such as the NIST suite. Most of these tests involve testing whether the distribution of outputs produced by the supposedly random system is close enough to the distribution expected from a truly random sequence. Most of these tests are specifically designed to test for uniform randomness, as such they are almost always applied to binary sequences (bit strings).

Back to the article outline

NIST Python Implementation

This section provides implementation-level details on the Python implementation of the NIST suite of tests for randomness. All of the code can be found in my GitHub repository, r4nd0m. In the development process much help was derived from the C implementation of the NIST test suite by NIST as well as an earlier implementation of the NIST test suite in Python 2.66 by Ilja Gerhardt. Unfortunately I struggled to get Ilja's code to run because of all the changes made between Python 2.66 and Python 3.4 which is another reason why I decided to reimplement.

Back to the article outline


Project Structure

The whole project is broken up into six classes and a main script which pulls everything together. Each class contains standalone code for performing one or more of the functions required to actually apply the NIST suite to historical market returns. The classes in the project include:

  1. r4nd0m - this is the main script which is used to pull everything together.
  2. RandomnessTester - this class contains all of the NIST tests. Each test has been implemented in such a way that it is static and can, therefore, be copied out of the class and used elsewhere. Note, however, that some of the tests, depend on the scipy.special, scipy.fftpack, scipy.stats, numpy, os, and copy packages in Python. Also note that the binary matrix rank transformation test depends on the BinaryMatrix class.
  3. BinaryMatrix - this class encapsulates the algorithm specified in the NIST documentation for calculating the rank of a binary matrix. This is not the same as the SVD method used to compute the rank of a matrix which is why the scipy.linalg package couldn't be used. This class could be more pythonic (pull requests welcome).
  4. BinaryFrame - this class, as the name suggests, is just a way of converting a pandas DataFrame to a dictionary of binary strings with the same column names. This dictionary and the decimal to binary conversion methods are encapsulated in this class. RandomnessTester simply takes in a BinaryFrame object and applies all of the NIST tests to each one of the binary strings contained in the dictionary.
  5. QuandlInterface and Argument - these two classes work together to allow you to interface with the Quandl.com API and download and join lists of datasets. Interesting dataset lists can be found in the MetaData folder of the project and your personal auth token can be stored in a .private.csv local file on your computer. You can find out more about this in the README.MD file for the GitHub repository online.
  6. Colours - this class just makes things look cool in the console.

The UML diagram below shows how the project is structured (as of 13/09/2015).

r4nd0m architecture

Back to the article outline


Binary Encoding Approach

Discretization involves converting continuous variables into discrete variables. The most popular discretization approach is binning. Binning involves classifying the output from continuous variables into a mutually exclusive set of "bins" each of which represent a possible interval within which the random variable may fall.

Since we don't know the range of market returns (our continuous variable), we simply bin them into positive returns and negative returns which are represented as a 1 and 0 bit respectively. To avoid bias we add a special case for returns equal to exactly 0.0%, these returns are represented at as both 0 and 1 i.e. 01 bit strings:

where represents the bit in the bit string at index and represents the return generated by the security being discretized at time . If the security goes up as often as it goes down, then the number of zeros and ones in the resulting bit string should be uniformly distributed. This is discussed in more detail in the next section.

Two additional conversion methods were implemented and tested which involved converting basis points and floating point returns into binary respectively. The problem with these is that they introduce biases which result in even strong cryptographic random number generators failing numerous tests in the suite.

Back to the article outline


Test 01 - Frequency (Monobit)

This test looks at the proportion of 0 bits to 1 bits in a given binary sequence. Assuming the sequence is uniformly distributed this proportion should be close to 1 meaning that the number of 1 bits was approximately equal to the number of 0 bits. In this context, because a 0 bit represents a down day and a 1 bit represents an up day, this test checks whether down and up days on the market are uniformly distributed over time i.e. equally probable.

A failure of the monobit test may indicate something about the relative profitability of simple buy and hold strategies because, if the market is more likely to go up than down, then simply buying and holding the market is likely to be a profitable trading strategy. This assumes, however, that the average size of an up-day in percentage terms is greater than or equal to the average size of a down-day in percentage terms.

The Gist below contains a standalone Python method which realizes the monobit test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Chung, Kai Lai, and Farid AitSahlia. Elementary probability theory: with stochastic processes and an introduction to mathematical finance. Springer Science & Business Media, 2012.

[3] Jim Pitman, Probability. New York: Springer-Verlag, 1993 (especially pp. 93-108). 

Back to the article outline


Test 02 - Block Frequency Test

This test is a generalization of the Monobit test which looks the proportion of 0 bits to 1 bits in  blocks of size extracted from a given binary sequence. Assuming uniform randomness the number of 0 bits and 1 bits should be approximately per block i.e. each block is filled with, on average, as many 1 bits and 0 bits. In our context this test looks at the probability of up and down days occurring over finite "windows" or periods of time. 

The Gist below contains a standalone Python method which realizes the block frequency test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Maclaren, Nick. "Cryptographic Pseudo-random Numbers in Simulation." Fast Software Encryption. Springer Berlin Heidelberg, 1994.

[3] Knuth, Donald E. "Seminumerical Algorithms, The art of computer programming, Vol. 2." (1981).

[4] Abramowitz, Milton, and Irene A. Stegun. Handbook of mathematical functions: with formulas, graphs, and mathematical tables. No. 55. Courier Corporation, 1964.

Back to the article outline


Test 03 - Runs Test

Arguably the most well known test for randomness, the runs test focusses on the number of runs which appear in a binary sequence. A run is defined as an uninterrupted sequence of identical bits. A run of length is an uninterrupted sequence of identical bits. The runs test checks whether the number of runs in the sequence is significantly different from what is expected from a sequence under the assumption of uniform randomness. The number of runs also indicates the frequency with which the sequence oscillates between 0 and 1 bits.

In this context, a run is a sequence of consecutive trading days wherein the market was up or down. As such, performance on this test may imply something about either the momentum or mean reversion anomalies. Previous academic studies have shown that the market has more runs than expected and therefore the market is said to oscillate more quickly between "up runs" and "down runs" than would be expected from a random sequence. 

The Gist below contains a standalone Python method which realizes the runs test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Gibbons, Jean Dickinson, and Subhabrata Chakraborti. Nonparametric statistical inference. Springer Berlin Heidelberg, 2011.

[3] Godbole, Anant P., and Stavros G. Papastavridis. Runs and patterns in probability: Selected papers. Vol. 283. Springer Science & Business Media, 1994.

Back to the article outline


Test 04 - Longest Runs (in a block) Test

As mentioned in the independent runs test (test 03) a run is any uninterrupted sequence of identical bits. The previous tests looks at the number of runs and checks whether this is statistically significantly different from the expected number of runs from a true random binary sequence. The longest run (of ones in a block) test checks whether the longest run of 1 bits in M-bit blocks is statistically significantly longer (or shorter) than what would be expected from a true random binary sequence. Since a significant difference in the length of the longest run of 1's implies a significant difference in the length of the longest run of 0's only one test is run. 

In this context, this test measures whether or not the number of subsequent up-days in a certain window of time is consistent with what would be produced if market returns were random and generated by a coin flip. A failure of this test may indicate the presence of either momentum or strong mean reversion. Momentum is a property of financial markets which makes the probability of the market having an up day (or down day), given a previous run of up days (or down days) higher. Mean reversion is the opposite - it is a property of financial markets which makes the probability of the market having an up day (or down day), given a previous run of up days (or down days) lower.

The Gist below contains a standalone Python method which realizes the longest runs test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

Back to the article outline


Test 05 - Binary Matrix Rank Test

The Binary Matrix Rank test is quite interesting because it takes the binary sequence and transforms it into a sequence of matrices. The test then works by testing the linear dependence between those matrices. A set of vectors is said to be linearly dependent if one or more of the vectors in the set can be defined as a linear combination of the other vectors. This is related to the concept of dimension, in that if a linearly dependent matrix were added to the set it would not result in any increased dimensionality because it would fall within the span of the previous matrices. For more information on this concept check out the Khan Academy videos.

Relating the concept of linear dependence back to the financial markets context is quite difficult, but let me give it a shot anyway. Each matrix represents a window of days wherein the first row of each matrix represents days. If the first matrix, , were linearly dependent on the next matrix, , then we would be saying that the returns in the second window can be expressed as a function of the returns in the first window meaning that whether the markets went up or down in would determine whether or not they went up or down in . Again this can either relate back to mean-reversion or momentum albeit in a more challenging way.

The Gist below contains a standalone Python method which realizes the binary matrix rank test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

The Gist below contains a standalone Python class for computing the binary rank of a matrix. It is very similar to the C implementation of the NIST suite and could probably be numpy-fied with some help from my readers.

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] George Marsaglia, DIEHARD: a battery of tests of randomness. 

[3] Kovalenko, I. N. "Distribution of the linear rank of a random matrix." Theory of Probability & Its Applications 17.2 (1973): 342-346.

[4] Marsaglia, George, and Liang-Huei Tsay. "Matrices and the structure of random number sequences." Linear algebra and its applications 67 (1985): 147-156.

Back to the article outline


Test 06 - Discrete Fourier Transform Test

The Discrete Fourier Transform test, more widely known as the Spectral test, is one of the best known tests for random number generators. The test was developed by Donald Knuth and was used to identify some of the biggest shortcomings of the popular linear congruential family of pseudo random number generators. This test works by transforming the binary sequence from the time dimension to the frequency dimension using the discrete Fourier transform. Once this has been done the test looks for periodic features (patterns which are near to each other) whose presence would indicate a deviation from true randomness. The image below shows how this test can be used to expose the shortcomings of the aforementioned linear congruential generator family:

RANDU LCG Fail

The image above shows the hidden lattice structure which existed in the RANDU LCG suite. This suite of random number generators was widely used in the 1960's - https://en.wikipedia.org/wiki/RANDU

The Gist below contains a standalone Python method which realizes the discrete Fourier transform test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Knuth, D. "The Art of Computer Programming 1: Fundamental Algorithms 2: Seminumerical Algorithms 3: Sorting and Searching." (1968).

[3] Yang, Li-juan, Bai-hua Zhang, and Xu-zhen Ye. "Fast Fourier transform and its applications." Opto-Electronic Engineering 31 (2004): 1-7.

[4] Kim, Song-Ju, Ken Umeno, and Akio Hasegawa. "Corrections of the NIST statistical test suite for randomness." arXiv preprint nlin/0401040 (2004)

Back to the article outline


Test 07 - Non-overlapping Patterns Test

Unlike the previous two tests, the non overlapping and the overlapping patterns tests (tests 07 and 08), are highly applicable to quantitative finance and, in particular, the development of trading strategies. The non-overlapping patterns test concerns itself with the frequency of pre-specified patterns in the binary string. If this frequency is statistically significantly different from what is expected from a true random sequence, this would indicate that the sequence is non-random and the probability of the pattern we are testing for occurring is too high or too low.

For example lets assume that we believed the market was more likely than a true random sequence to exhibit the following pattern 101100011111 ... namely the Fibonacci pattern. We would then use this test and the next test to either accept or reject our hypothesis within a given level of confidence. Fantastic! The reason why I like these two tests is that they give us quantitative analysts a solid theoretical way of testing the significance of presupposed patterns in market returns. These test can be applied to concepts such as the Elliott Wave Theory and the Fractal Market Hypothesis. I'm not saying I believe in these things, what I'm saying is we can test for them.

Fractal Wave

Testing for "Fractal Waves" just became a whole lot easier.

The Gist below contains a standalone Python method which realizes the non overlapping patterns test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Barbour, Andrew D., Lars Holst, and Svante Janson. Poisson approximation. Oxford: Clarendon Press, 1992.

Back to the article outline


Test 08 - Overlapping Patterns Test

The difference between the overlapping patterns test and the non-overlapping patterns test, apart from some of the calculations as can be seen in the NIST suite documentation, is the way that the patterns are tested. In the non overlapping patterns test once a pattern of length has been matched, you skip to the end of the pattern and continue searching, whereas with the overlapping patterns test you continue searching from the next bit. For more of my thoughts on this test, please see the previous test, the non-overlapping patterns test.

The Gist below contains a standalone Python method which realizes the overlapping patterns test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Chrysaphinou, O., and S. Papastavridis. "A limit theorem on the number of overlapping appearances of a pattern in a sequence of independent trials."Probability Theory and Related Fields 79.1 (1988): 129-143.

[3] Hamano, Kenji, and Toshinobu Kaneko. "Correction of overlapping template matching test included in NIST randomness test suite." IEICE transactions on fundamentals of electronics, communications and computer sciences 90.9 (2007): 1788-1792.

Back to the article outline


Test 09 - Universal Test

Maurer's universal statistical test is an interesting statistical test which checks whether the distance (measured in bits) between repeating patterns is as would be expected from a uniform random sequence. The theoretical idea behind this test and the linear complexity test is that non-random sequences are not supposed to be compressible. To explain this concept, consider the fact that most pseudo-random number generators have a period and when that period is exhausted the bits in the sequence begin to repeat. When this happens, the distance between matching patterns would decrease and eventually the universal test would deem the sequence non-random. Unfortunately, the test requires a very significant amount of data to be statistically significant. In fact, working with daily returns data we would need about 1500 years worth of daily data before we could apply this test!

The Gist below contains a standalone Python method which realizes the universal test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Ziv, Jacob, and Abraham Lempel. "A universal algorithm for sequential data compression." IEEE Transactions on information theory 23.3 (1977): 337-343.

[3] Maurer, Ueli M. "A universal statistical test for random bit generators."Advances in Cryptology-CRYPT0’90. Springer Berlin Heidelberg, 1991. 409-420.

[4] ebastien Coron, Jean-S., and David Naccache. "AN ACCURATE EVALUATION OF MAURER'S UNIVERSAL TEST."

[4] Gustafson, Helen, et al. "A computer package for measuring the strength of encryption algorithms." Computers & Security 13.8 (1994): 687-697.

[5] Ziv, J. "Compression, tests for randomness and estimating the statistical model of an individual sequence." Sequences. Springer New York, 1990.

Back to the article outline


Test 10 - Linear Complexity Test

The linear complexity test is one of the most interesting tests in the NIST suite from a pure Computer Science point of view. Like the universal test, the linear complexity test is concerned with the compressibility of the binary sequence. The linear complexity test aims to check this by approximating a linear feedback shift register (LSFR) for the binary sequence. A shift register is a cascade of flip-flops which can be used to store state information. A LSFR is a shift register whose input bit is a linear function of the previous state. 

The linear complexity test works as follows. Firstly, the shortest LSFR is approximated using the Berlekamp–Massey algorithm and then the length of that LSFR is compared against what would be expected from a true uniform random sequence. The thinking here is that if the LSFR is significantly shorter than expected, then the sequence is compressible and therefore non random. The image below shows an LSFR at work.

LFSR-F4

The Gist below contains a standalone Python method which realizes the linear complexity test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Gustafson, Helen, et al. "A computer package for measuring the strength of encryption algorithms." Computers & Security 13.8 (1994): 687-697.

Back to the article outline


Test 11 - Serial Test

The serial test is similar to the overlapping patterns test, except that instead of test for one -bit long sequence it computes every possible -bit long sequence (of which there are ), and computes their frequencies. The idea behind this test is that random sequences exhibit uniformity meaning that each pattern should appear approximately as many times as every other pattern. If this is not the case, and some of the patterns appear significantly too few or too many times, then the sequence is deemed non random. In this context we could use the test to identify those patterns which appear "too frequently" and bet on them occurring again.

The Gist below contains a standalone Python method which realizes the serial test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Good, I. J. "The serial test for sampling numbers and other tests for randomness." Mathematical Proceedings of the Cambridge Philosophical Society. Vol. 49. No. 02. Cambridge University Press, 1953.

[3] Knuth, D. "The Art of Computer Programming 1: Fundamental Algorithms 2: Seminumerical Algorithms 3: Sorting and Searching." (1968).

Back to the article outline


Test 12 - Approximate Entropy

Generally speaking, an approximate entropy is a statistical technique used to determine how unpredictable (random) fluctuations over time-series data are. The approximate entropy test is similar to the serial test (test 11) in that it also focusses on the frequency of all possible overlapping -bit patterns across the binary sequence. The difference is that this test compares the frequency of overlapping blocks of two consecutive or adjacent lengths ( and ) against the expected result for a random sequence. To be more specific, the frequencies of all possible -bit and -bit frequencies are computed and the Chi-squared statistic is then used to determine whether or not the difference between these two observations implies that the sequence is non-random.

The Gist below contains a standalone Python method which realizes the approximate entropy test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Pincus, Steve, and Burton H. Singer. "Randomness and degrees of irregularity."Proceedings of the National Academy of Sciences 93.5 (1996): 2083-2088.

[3] Pincus, Steve, and Rudolf E. Kalman. "Not all (possibly)“random” sequences are created equal." Proceedings of the National Academy of Sciences 94.8 (1997): 3513-3518.

[4] Rukhin, Andrew L. "Approximate entropy for testing randomness." Journal of Applied Probability 37.1 (2000): 88-100.

Back to the article outline


Test 13 - Cumulative Sums

The next three tests (the cumulative sums test, random excursions test, and random excursions variant test) are my personal favourites from the NIST suite because they deal directly with the concept of a random walk. As some of you will know random walks (stochastic processes) are a topic which I have written about previously and plan to continue writing about over the next few years as I develop the Mzansi stochastic processes R package.

The cumulative sums test turns the supposedly random binary sequence into a random walk by replacing each 0-bit with (-1) and each 1-but with (+1) and them calculating the cumulative sum at each point along the sequence. The test then checks whether the absolute maximum cumulative sum at any point along this sequence is within the range of what would be expected from a true uniform random sequence. This process is illustrated below,

Cumulative Sums

Relating this idea back to the markets, if the absolute peaks (which may represent the bottom of bull markets or bear markets) are significantly higher or lower than what would be expected, then this could indicate the presence of various market phenomena including bull and bear markets, cycles, mean-reversion, or momentum. 

The Gist below contains a standalone Python method which realizes the cumulative sums test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Spitzer, Frank. Principles of random walk. Vol. 34. Springer Science & Business Media, 2013.

[3] Révész, Pál. Random walk in random and non-random environments. Singapore: World Scientific, 2005.

Back to the article outline


Test 14 - Random Excursions

As with the cumulative sums test the random excursions test deals with the concept of a random walk. The difference however, is that the random excursions test introduces the concept of cycles, and states. A cycle is any sub-sequence which starts and ends at 0, and a state is the level of the random walk at each step. The states considered in this test include -4, -3, ..., +3, +4. The test determines whether or not the number of times each state is visited, , in each cycle is as would be expected from a uniform binary sequence. This is illustrated below,

Random Excursions Exaplanation V

The Gist below contains a standalone Python method which realizes the random excursions test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Baron, Michael, and Andrew L. Rukhin. "Distribution of the number of visits of a random walk." Stochastic Models 15.3 (1999): 593-597.

[3] Spitzer, Frank. Principles of random walk. Vol. 34. Springer Science & Business Media, 2013.

[4] Révész, Pál. Random walk in random and non-random environments. Singapore: World Scientific, 2005.

Back to the article outline


Test 15 - Random Excursions Variant

The random excursions variant test, as the name suggests, is a variant of the random excursions test which differs in that it does not partition the random walk, and it computes the total number of times the random walk visits a wider random of states extending from -9, -8, ... to ..., +8, +9. As with the previous test, which returns 8 P-values (one for each state), the random excursions variant test returns 18 P-values. 

The Gist below contains a standalone Python method which realizes the random excursions variant test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , and .

For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:

[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. Booz-Allen and Hamilton Inc Mclean Va, 2001.

[2] Baron, Michael, and Andrew L. Rukhin. "Distribution of the number of visits of a random walk." Stochastic Models 15.3 (1999): 593-597.

[3] Spitzer, Frank. Principles of random walk. Vol. 34. Springer Science & Business Media, 2013.

[4] Révész, Pál. Random walk in random and non-random environments. Singapore: World Scientific, 2005.

Back to the article outline


Walsh–Hadamard Transform

Another interesting test not covered by the NIST suite is the Walsh–Hadamard Transform. This test for randomness was written about previously by friend and fellow quant, Pawel Lachowicz, over at QuantAtRisk.com. For more information on this test please see his posts (below) and upcoming book, Python for Quants.

  1. Walsh–Hadamard Transform and Tests for Randomness of Financial Return-Series
  2. Fast Walsh–Hadamard Transform in Python

Back to the article outline

Hacking the Market

At this point in this article we have accomplished two things. Firstly, we have touched on the theory of randomness and the random walk hypothesis and, secondly, we have introduced a set of fifteen cryptographic tests which seek to identify non-randomness in binary sequences. These tests look either for biases, runs, patterns, or some combination thereof. Put simply, each one of these characteristics improves the predictability of the sequence and reduces the randomness of the sequence. In information security terms this makes the random number generator being tested less cryptographically secure; and in quantitative finance terms this makes the market more attractive. The final part of this article will see whether or not our tests can do better than humans at picking up random versus non-random sequences and hopefully conclude whether or not markets are really random.


Experiment Set-up and Markets Considered

Whilst many considerations went into the experiments I have run for this blog post, no experiment is perfect and leaves a margin for an incorrect conclusion to be drawn. These issues are discussed in the next section.

Firstly, here are the stock exchange indices I looked at:

  1. The DJIA Index from 1900 to 2015 (daily) ~ 115 Years
  2. The S&P 500 Index from 1950 to 2015 (daily) ~ 65 Years
  3. The Hang Seng Index from 1970 to 2015 (daily) ~ 45 Years
  4. The Shanghai Composite Index from 1990 to 2015 (daily) ~ 25 Years

And in addition to these I looked at some other interesting asset classes:

  1. Gold Prices in US Dollars from 1970 to 2015 (daily) ~ 45 Years
  2. The Currency Exchange Rates of USD vs GBP from 1990 to 2015 (daily) ~ 25 Years
  3. Europe Brent Crude Oil Spot Price FOB from 1990 to 2015 (daily) ~ 25 Years

Secondly, the NIST test suite works on samples of the original data. As such These returns are discretized and then sliced up into periods. The window period I considered were 3, 5, 7, and 10 year windows.

Thirdly there are two ways to produce the windows namely, as overlapping windows or non-overlapping windows. Whilst the former is better because it shows the walk-forward randomness of the market it also affects the quality of the aggregate P-values computed because the windows are not independent. The general procedure used to produce the windows, is shown below and can also be found here in the code:

  1. Download the data from Quandl.com starting on 01-01-YYYY and ending on 01-01-2015
  2. Use discretization to turn the returns data, , into a binary sequence,
  3. Calculate the number of binary digits in the sequence, .
  4. Determine the number of years in the sequence () e.g. 45 Years
  5. Specify the length of each window,
  6. Determine how many binary digits are in each window
  7. Calculate the number of samples, , where:
    1. for overlapping samples and
    2. for independent samples
  8. Partition the binary sequence into samples,
  9. Apply each test (40 tests) to each sample, and calculate the P-value
  10. Determine which samples passed. If P-value < 0.005 (99.5% confident) then the sequence is non-random
  11. If the number of samples passed is greater than or equal to 90% then pass the test **
  12. Calculate the aggregate P-value over all samples using Chi-square test (this isn't used)
  13. Calculate a "score" for each data set for each window. The score equals the percentage of tests which passed.

** Note that the NIST documentation recommends that 96% of all samples pass in order to pass the test, not 90%. I lowered this requirement (meaning it was easier for the market to look random) because of the relatively low number of samples producible when using the non-overlapping sampling method on the shorter data sets.

Last, but not least, I included two comparison simulated data-sets. The first data-set is binary numbers produced using the same discretization strategy on the output from the Numpy Mersenne Twister algorithm. The Mersenne Twister is one of the best pseudo-random number generators out there but it is still not cryptographically secure! The second data-set is a binary numbers produced from the SIN function. This really exemplifies how a truly predictable, non-random sequence would perform. These are just benchmarks.

The results section just includes a tabulated summary of the results for each data set using the two sampling methods. For more detailed outputs go open the code up and play around with it. The output when running the tests in PyCharms (my preferred Python IDE) looks like this (warning spoiler alert!!).

Each number represents a P-value. Red P-values are below the threshold 0.005, and green values are above. Each column represents approximately one window in this image of 5 years. The first column after the test name indicates whether the test has passed (PASS!) or failed (FAIL!). The number of tests passed tells us how random a sequence is whereas the types of test failed tell us about potential biases in those sequences.

As can be seen below almost all of the samples passed all of the tests and the conclusion for each test was a pass because > 90% of the samples passed each test.

Hacking the Random Walk Hypothesis

Each number represents a P-value. Red P-values are below the threshold 0.005, and green values are above. Each column represents approximately one window in this image of 5 years. The first column after the test name indicates whether the test has passed (PASS!) or failed (FAIL!). The number of tests passed tells us how random a sequence is whereas the types of test failed tell us about potential biases in those sequences.

As can be seen below almost all of the samples failed all of the tests and the conclusion for many of the tests was a fail because < 90% of the samples passed each test.

Hacking The Random Walk Hypothesis

Each number represents a P-value. Red P-values are below the threshold 0.005, and green values are above. Each column represents approximately one window in this image of 5 years. The first column after the test name indicates whether the test has passed (PASS!) or failed (FAIL!). The number of tests passed tells us how random a sequence is whereas the types of test failed tell us about potential biases in those sequences.

As can be seen below quite a few of the samples failed some of the tests and the conclusion for some of the tests was a fail because < 90% of the samples passed each test.

Hacking The Random Walk Hypthesis

Back to the article outline


Problems with these experiments

As I mentioned previously, no experiment is without its flaws. The best we can do as researchers is be transparent, open-source, and honest. So here are some of the problems I have with my own research:

  1. Some of these tests require a lot more data than the market has produced (unless minute or tick data were used) meaning that their statistical significance is less than ideal. More data should be used! - I wish I had more data. If somebody out there has tick data please further my research and get in touch!
  2. The NIST suite tests only for uniform randomness this does not mean the markets are not normally, or otherwise distributed and still random. This is to some extent true and for die-hard efficient market and random walk hypothesis junkies this argument will certainly help you sleep tonight ;). Personally, I think that these hypotheses are ideological and not real at all. Markets exhibit patterns which these tests are identifying. The simplest of such patterns is a general propensity to go up rather than down.
  3. Arbitrarily chosen time-periods (starting on the 1st of January of each year) and significance level (0.005). The tests should be applied over a much more robust set of samples which start every month, or quarter (not year) into the future. The P-value didn't have too any impact on the conclusions because whether set to 0.001, 0.005, or 0.05, the market still failed some of the tests during interesting periods e.g. 1954 - 1959.

Now that that is out of the way, let's wrap this up with a summary of results, some conclusions, a list of further research efforts which I would like to see come out of this little project, and my personal news.

Back to the article outline


Summary of Results

Results obtained for when using non-overlapping samples

The table below shows the scores computed for each data set, against the scores obtained by the two benchmarks. When the number of samples dropped below 5, no results were recorded. 

Non Overlapping Sample Results

Results obtained when using overlapping samples

The table below shows the scores computed for each data set, against the scores obtained by the two benchmarks.

Overlapping Results

Observations made from the tables

  1. The scores for the data sets lie between the scores of the two benchmarks meaning that markets are less random than a Mersenne twister and more random than a SIN function, but still not random.
  2. The scores for the data sets vary quite a lot with,
    1. Dimension - the size of the window has a big impact in some cases, and
    2. Uniqueness - markets are not equally random, some appear more random than others.
  3. The scores for the benchmarks are consistently good for the Mersenne Twister (>90% of tests passed on average) and poor for the SIN graph (10 - 30% of tests passed on average).

Back to the article outline


Conclusions and Further Research

At the beginning of this article I recounted a story of how Professor Burton Malkiel (the author of one of my all-time favourite books, A Random Walk Down Wall Street) presented a random walk constructed from successive coin flips to an unlucky chartist. When the chartist responded with the information that the "stock" was a good buy Professor Malkiel likened the stock market to a coin-flipping contest and advocated a passive buy-and-hold investment strategy. Whilst I have the utmost respect for the professor, I believe that this conclusion was erroneous because all his test really tells us is that, in the eyes of a chartist, there is no distinction between a coin-flipping contest and the market. However, that does not imply that to the eyes, or rather the algorithms, of a Quantitative Trader that there is no distinction between a coin-flipping contest and the market. In this article I have shown that whilst I may not personally (with my own two eyes) be able to tell the difference, the NIST suite of cryptographic tests for randomness sure as hell can. This is because markets are, quite simply, not random.


Markets are, quite simply, not random.


Additionally two slightly more sophisticated conclusions from these results can be drawn:

  1. Assuming randomness is not binary, one could conclude that not all markets are made equally "random". Some of the markets, namely the foreign exchange rate between the USD and GBP currencies and the S&P 500 Index, exhibit much lower levels of randomness than others such as the Hang Seng Index.
  2. The apparent randomness of the markets, unlike a strong pseudo-random number generator, appear to be affected by time dimension. In other words, certain window sizes cause markets to appear less random. This may indicate the presence of cyclical non-random behaviours in the markets e.g. regimes.

Furthermore, the theoretical discussion and commentary has highlighted some major flaws in the random walk hypothesis; most notably that randomness is relative and in the presence of new or additional information (e.g. fundamental or economic data) the apparent randomness of the market may break down. 

Another positive point about these conclusions is that they are consistent with the empirical evidence of individuals and firms who have been able to consistently beat the market over decades, a feat which would not be possible if markets were in fact random walks. On that high note, allow me end off this article with a link to a much better, anecdotal argument by Warren Buffet about the supposed coin-flipping contest he has been winning. 

Back to the article outline

A Personal Update

As some of you will already know, I have been awarded a fantastic opportunity to work as a Quantitative Strategist at a young Hedge Fund called NMRQL which focusses on applying cutting-edge quantitative and machine learning models to the markets. The fund is being run by Thomas Schlebusch (ex CIO of Sanlam International Investments) and is chaired by Michael Jordaan (ex CEO of First National Bank). I am lucky to be their first code-monkey / quant. Please get in touch with Thomas or myself should you wish to invest in the fund, the minimum investment amount is R1m (one-million-rand). I also now live in Stellenbosch in the heart of the Western Cape winelands.

Stellenbosch

Software Disclaimer

This software was developed by Stuart Reid. Stuart Reid assumes no responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, readibility, or any other characteristic. Usage of this implementation should acknowledge this repository, and the original NIST implementation.

Comments

  1. schaal

    nice work. I'm officially impressed.

    • Thanks 🙂 this one was quite a lot of work; but it was fun.

  2. Ekrem Çağlar

    Outstanding study with elegant presentation!

    Helped me a lot. Thanks.

    • It's a pleasure; glad to hear it helped.

  3. Rafeh Qazi

    This was a great article! I really enjoyed reading it. There was tons of stuff that went right over my head, but I love having exposure to such information! I will definitely come back for more.

    • Thanks Rafeh, I'm happy to hear that you'll be coming back. If this stuff interests you, you should also check out the NIST documentation because it is considerably more details and of a very high quality. I noticed today that some of their websites are down, but the Google cache system is still working. Enjoy!

  4. Harvey

    Geez... wow. I am awed. Kudos, Stuart, kudos! I will be reading this again and again, and I see your code as incredibly interesting and useful in the study of this and other processes.

    Looking forward to your book. If you aren't writing one, you should be.

    • Thanks Harvey, not working on any books yet but it is definitely one of my lifelong goals :-).

    • If you use the code in any other domains, please do let me know. That's cool!

  5. WM

    Hello, good article.

    I am personally not a believer of global randomness, so it's no surprise that the percentage of passed tests decreased as the time window increased. You can take several markets and notice by eye the long-term, non-random trends.

    I think it would be interesting to test the same experiment in very small time windows, which would also be interesting to high frequency traders. Take millisecond tick data for 1 year and make your windows on the order of seconds, minutes, hours, etc.

    Thank you!

    • Thanks. Couldn't have said it better myself; like I said in the article the challenge comes in getting enough data to keep the results statistically significant. So, if you have high frequency data at your disposal let me know and I'd be happy to help with replicating the experiment on lower time-dimensions. I think you are onto a really fantastic idea. Nice one!

      • WM

        A good source I often use is the millisecond level tick data from Duckascopy. Tick downloaders such as Tick Story Lite or Strategy Quant's Tick Data Downloader connect directly to Duckascopy's database. They both download and process the tick data into CSV format, with some options such as weekend data, time zone conversion, etc.

        Not every market in Duckascopy has reliable data for all time periods (Brent and WTI for example are not good), but EUR.USD seem to be very reliable with tick data several years back.

        If I have time someday it would be interesting that I adapt the experiment to such data. Otherwise, the idea is now at least up in the air 🙂

  6. This is top quality work. Congratulations and good luck with your new job.

    I have done some work with runs tests and I have noticed that results depend on timeframe and look-back period. For example, with a look-back period of n bars, in daily timeframe the result may be random state but in weekly timeframe it may be a trend. The fractal nature of markets allows for randomness inside trend and vice versa.

    Also, for runs tests, I have notice that as the look-back period increases, the results show trend no matter what. For daily data in S&P 500 and for look-back period greater than 100, the p-value goes close to 0 and stays there indicating trend.

    Do you have similar experience with these tests?

    Best.

    • Hi Michael! First off, great blog. Secondly, let me get back to you later tonight. As I mentioned in the article that frequency of observations is likely to affect the results, and I think you are 100% right but I would like to run it through my program and see what the results are. Doing so is quite easy, I just need to collapse the returns to a weekly frequency using the Quandl API. I didn't look at lower frequencies, because you need many data-point for some of the tests to get statistically significant results. That said, the runs test doesn't require a lot of data points, so for this test it should work fine.

  7. Greg Jaxon

    I'm interested in the relationship between liquidity and entropy.
    Just yesterday, WSJ featured a chart showing how the ask-bid spread peaks at the market open and takes about a half-hour to decay to the usually steady state of high liquidity throughout the day. I think the data points you've examined differ in their information content. Some are a more accurate measure of "price" than others. What I'd expect is to find the highest liquidity market samples seem the most locally random. The question then is whether, at their points of least liquidity, they become inefficient enough to show a "hackable signal".

    • That's a really interesting idea ... given the feedback from this article I will likely do a follow-up article. I'd like to investigate this idea, so I'm sending you an email. Thanks again!

  8. Picatris

    You miss one very important and critical thing. Market randomness is different from other types of randomness. Why? because it is adaptable. As soon as a "weakness" appears it is immediately exploited. You mention the Malkiel book, but you did not get that part. He points out a large number of systems that worked until they were discovered and published. So this is not exactly like tossing a fair coin. The current market is "more random than random" so it is not unexpected that any tests will demonstrate it is not random. [I would be really surprised if they would, actually, but that's another story]

    The market is random as it is not predictable [the basic definition of randomness]. If it was predictable [thus not random] it would very soon adapt so that it would be again unpredictable. This meta-randomness has to fail all randomness tests as it is a characteristic of a different type.

    • Thanks for the comment :-). I like your argument and I agree with some aspects of it. Where we disagree is semantics really because what you are describing is not randomness it is actually chaos. As I mentioned in the article, I like to believe that markets are actually complex adaptive systems which do exhibit the properties of randomness from time to time, but that does imply that they are random walks. This is supported by the fact that markets returns are not produced by some random number generator, they are produced by supply and demand driven by human beings / robots. This supports the argument that markets are complex adaptive systems.

      For more information on this I'd like to refer you to some of my previous work and some papers:

      1. Perfect Imperfection, Agent Based Models
      2. Agent-based Computational Economic Models (ACE)
      3. The Adaptive Market Hypothesis by Andrew Lo

      The only other comment I have is that some patterns have existed for many decades and are highly stable. For example, the anomalies / patterns I mentioned in the article namely drift, momentum, and mean-reversion have been observed and studied for many decades. People have made, and are still making, tonnes of money off of these simple concepts and have managed to beat the market for decades. This empirical observations also supports the belief that markets are not random because, as per the definition of a martingale, it should be possible to devise a strategy which beats a random walk in the long run.

      Anyway, thanks for the comment. I like the debate, it's more important to me than the article.

      • Picatris

        Thank you for your answer. The question of semantics is important. But the substance of my [hopefully] constructive criticism is not addressed. Obviously that the market is a chaotic highly adaptive process, but also by their nature, random. And by "Random" I mean unpredictable. and no matter what tests of randomness may say, Markets will still be unpredictable, no matter which forces control them, whether algorithms for high frequency trading or large corporations with hidden agendas.

        Yet you say "People have made, and are still making, tonnes of money off of these simple concepts and have managed to beat the market for decades." Of that I have never seen any proofs, as the works of Malkiel and Taleb amply debate, so I would hardly qualify them as evidence. This said, it does not mean that there are no such methods, but if they exist they are not up to scrutiny and it is far from clear if they would pass the test of time.

      • This is an interesting discussion. Here I would like to concentrate on this interesting statement:

        "For example, the anomalies / patterns I mentioned in the article namely drift, momentum, and mean-reversion have been observed and studied for many decades. People have made, and are still making, tonnes of money off of these simple concepts and have managed to beat the market for decades."

        Drift: Many people have profited from it but also many others have lost a lot of money.

        Momentum: Some authors claim that momentum effects were due to some random events, such as the dot com bubble.

        http://papers.ssrn.com/sol3/papers.cfm?abstract_id=968176

        My own research shows that momentum profits are not significant and can be due to luck. Especially profits from time-series momentum are due to luck after the 1990s top. This is also related to a switch from momentum to mean reversion

        http://www.priceactionlab.com/Blog/2015/08/momersion-indicator/

        mean-reversion: There are signs that mean-reversion is fading out after 3013.

        One problem is that hypothesis testing is conditioned on historical data. If the sample is not representative, then there is error when we find evidence against the null. It could be that we need an order of magnitude of more data to perform accurate hypothesis testing and that would lead to rejection of the alternative.

        Therefore, my conclusion is that markets are random and our tests are fooled by non representative samples due to some other random events that disturb randomness (oxymoron but could be true) and that opportunities exist in random markets due to temporary anomalies that appear and disappear and are related to similar patterns in participant reactions.

        I have identified price patterns and published them in books that stayed profitable for 10 years. Everything is documented in my blog. I used algos that made tons of real money but suddenly stopped working. Popular mean-reversion and momentum algos will also stop working at some point without any warning. Like this simple mean-reversion system:

        http://www.priceactionlab.com/Blog/2015/09/wr2-system/

        Thus, we are fooled by randomness into believing there is no randomness due to limited samples but that can work to our advantage as Stuart has written some place (I can't find it now) if we can identify a distribution with a positive mean and skew (and maybe leptokurtic although this is debatable).

        Best.

  9. Chiedza

    Hi Stuart! Congratulations on your new job. It sounds like a very exciting opportunity. Great article - very well written, and interesting conclusions. I will be sharing this with my quant friends.

    • Hey Chiedza, fantastic thank you, I'd be interested to hear what they say. I hope you are keeping well!

  10. I mentioned Reid's fascinating posting in my own "Technically Speaking" blog on the Newsmax Insiders Page: http://www.newsmax.com/richardgrigonis/malkiel-random-sequence-wall-street/2016/01/05/id/708263/

  11. Guilherme

    Excellent post! I hope I find myself the time to work on your code and read some of the papers.

    • Thanks Guilherme! Shout if you run into any problems, but be warned I respond very slowly to emails 🙂

  12. Michael Graczyk

    I believe it would be better to use a random market model for comparison, instead of the Mersenne Twister output. For example, you could include a row in your final tables for data produced by the Heston stochastic volatility model of the same length as the market data. You can even tune/calibrate the model so that your randomness tests are very specifically answering your original question, that is "Can these randomness tests differentiate real and random data even when a human cannot?".

    I believe that would be a more accurate demonstration of your point, otherwise you're sort of comparing apples to oranges. Great post though, I enjoyed reading!

    • Hi Michael! I couldn't agree more. I came to this exact same realization sometime last year while reading Lo and MacKinlay's papers. You live and you learn :-). The "problem" with the NIST suite (in the context of markets) is that it is testing for pure randomness ... which is a stricter form of randomness than most Markov processes (this was discussed briefly in my latest post: Stock Market Prices Do Not Follow Random Walks). My goal for this year is to publish an in-depth set of results on numerous stochastic process models as well as thousands of de-trended and discretized real-world financial time series. I'd like to also contrast the results of the NIST suite against more "purely quantitative" measures of randomness like the variance ratio test and the lempel-ziv compression algorithm. It's in the works, my biggest constraint is time. Thanks again for the comment Michael!

  13. Łukasz

    This post is highly significant 😉 Thanks.

    • I am very greatful to see your pictures. The kids are adorable and I cant wait to see them in person during passover.I am now visiting Aunt Betty & Uncle Al and am using their computer. I will be lergonvtimoarow afternoon.

Submit a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.