Hacking the Random Walk Hypothesis
Hackers would make great traders. At a meta level, hackers and traders do the same thing: they find and exploit the weaknesses of a system. The difference is that hackers hack computers, networks, and even people for various good and bad reasons whereas traders hack financial markets to make profits. One type of exploit which has always fascinated me are those on random number generators. You see, random number generators are used everyday to encrypt data and communications but if the random number generators are flawed then they stop being cryptographically secure and hackers can exploit those vulnerabilities to decrypt the encrypted data and communications. For this reason random number generators need to pass robust sets of statistical tests for randomness, such as the NIST suite of cryptographic tests for randomness, to determine whether they are sufficient for cryptographic uses. In this post we are going to subject various financial market returns to the NIST suite of tests and see whether or not we should be able to, in theory, hack the market.
Article Outline
This article is broken up into three sections. The first section will present background information about the random walk hypothesis and compares the statistical definition of randomness to the algorithmic definition. The second section will outline my Python implementation of the NIST test suite, including a brief explanation and source code for each test. The third and final section will subject a number of financial markets to these tests and end off by concluding whether or not the market are random and, if they are random, comment on the nature of that randomness. Depending on your familiarity with the subject matter and time constraints, you may wish to skip some sections. All of the code included below can be found in my repository, r4nd0m, on GitHub.
 Background Information
 NIST Python Implementation
 Project Structure
 Binary Encoding Approach
 Test 01  Frequency (Monobit)
 Test 02  Block Frequency Test
 Test 03  Runs Test
 Test 04  Longest Runs Test
 Test 05  Binary Matrix Rank Test
 Test 06  Discrete Fourier Transform Test
 Test 07  Nonoverlapping Patterns Test
 Test 08  Overlapping Patterns Test
 Test 09  Universal Test
 Test 10  Linear Complexity Test
 Test 11  Serial Test
 Test 12  Approximate Entropy
 Test 13  Cumulative Sums
 Test 14  Random Excursions
 Test 15  Random Excursions Variant
 Walsh–Hadamard Transform (QuantAtRisk.com)
 Hacking the Market
 A Personal Update on my New Job and Home
 Software Disclaimer
On another note I am thrilled to report that this Python implementation passes all of the unit tests specified in the NIST C documentation and, as a bonus, includes tonnes of comments. Given this fact, I hope that the code will be useful to real security researchers as well as as quantitative analysts and traders. That said, my Python implementation carries the same disclaimer as NIST, so please take a moment to view the README file.
Background Information
This section introduces the random walk hypothesis and it's importance to quantitative finance. It also discusses the two definitions of randomness namely, statistical and algorithmic.
The Random Walk Hypothesis
Many systems in the real world demonstrate the properties of randomness including, for example, the spread of epidemics such as Ebola, the behaviour of cosmic radiation, the movement of particles suspended in liquid, luck at the roulette table, and supposedly even the movement of financial markets as per the random walk hypothesis ... but before we get into the details of the random walk hypothesis, let's discuss a famous test by Professor Burton G. Malkiel. The following extract is take from the Wikipedia page on the random walk hypothesis. It very succinctly describes the test that Professor Malkiel performed and the conclusions he drew from this test.
"Burton G. Malkiel, an economics professor at Princeton University and writer of A Random Walk Down Wall Street, performed a test where his students were given a hypothetical stock that was initially worth fifty dollars. The closing stock price for each day was determined by a coin flip. If the result was heads, the price would close a half point higher, but if the result was tails, it would close a half point lower. Thus, each time, the price had a fiftyfifty chance of closing higher or lower than the previous day. Cycles or trends were determined from the tests. Malkiel then took the results in a chart and graph form to a chartist, a person who “seeks to predict future movements by seeking to interpret past patterns on the assumption that ‘history tends to repeat itself’”. The chartist told Malkiel that they needed to immediately buy the stock. Since the coin flips were random, the fictitious stock had no overall trend. Malkiel argued that this indicates that the market and stocks could be just as random as flipping a coin."  Wikipedia
In effect this is similar to "financial Turing tests" in which people familiar with the markets are asked to take a look at timeseries below and identify which one(s) are real market returns data, and which one(s) are simulated using some random process. Have a go with the following three images. Either image may either be a real market return series, or just the output from a calibrated Heston stochastic volatility model. Good luck guessing!
It's quite difficult, isn't it? This observation lead early quantitative researchers to investigate whether or not stock market returns evolve randomly. The theory that market returns so evolve randomly is called the random walk hypothesis. This theory dates back to the early 1800's when Jules Regnault and Louis Bachelier observed the characteristics of randomness in the returns of stock options. The theory was later formalized by Maurice Kendall and popularized in 1965 by Eugene Fama in his seminal paper, Random Walks In Stock Market Prices.
Despite the entertainment value of these tests, they really don't prove that markets are random at all. All they do is prove that to the human eye market returns, in the absence of any additional information, are indistinguishable from random processes. This conclusion does not, in and of itself, provide any useful information about the random characteristics of markets. Furthermore, the random walk hypothesis itself suffers from some shortcomings:
 It lumps all markets into one homogeneous bucket and does not differentiate between them.
 It cannot explain many empirical examples of people who have consistently beaten the market.
 It is based on the statistical definition of randomness not the algorithmic definition meaning that,
 It does not distinguish between local and global randomness.
 It does not deal with the idea of the relativity of randomness.
The remainder of this article will explain how elements of the algorithmic definition of randomness namely local versus global randomness and the relativity of randomness could potentially explain the anomaly between the implications of the random walk hypothesis and empirically observed longterm investor market outperformance. It will also compare different markets using the NIST suite of statistical tests for randomness.
Whether we love it or hate it and regardless of whether it is right or wrong, we cannot deny that the widespread adoption of the random walk hypothesis by quantitative analysts in the industry has lead to major advances in the field of quantitative finance especially for derivative security and structured product valuations.
Algorithmic vs. Statistical Definitions of Randomness
Any function whose outputs are unpredictable is said to be stochastic (random). Similarly, any function whose outputs are predictable is said to be deterministic (nonrandom). To complicate matters, many deterministic functions can appear to be stochastic. For example, most of the random number generators we use when programming are actually deterministic functions whose outputs appear to be stochastic. Most random number generators are not truly random which is why they are labelled either pseudo or quasi random number generators.
In order to test the validity of the random walk hypothesis we need to determine whether or not the outputs generated by the market (our function) are stochastic or deterministic. In theory there is an algorithmic and a statistical approach to this problem, but in practice only the statistical approach is ever used (for good reasons).
The Algorithmic Approach
Computability theory, also known as recursion theory or Turing computability after the late Alan Turing, is a branch of theoretical computer science which deals with the concept of computable and noncomputable functions. Computable functions can be reasoned about in terms of algorithms, in the sense that a function is computable if, and only if, there exists an algorithm which can replicate that function. In other words there is an algorithm that when given any valid input to the function will always return the correct corresponding output.
If randomness is the property of unpredictability this means that the output from the function can never be accurately predicted. Logically this then implies that all random processes are noncomputable functions because no algorithm which accurately replicates that function could exist. The famous ChurchTuring thesis states that a function is computable if and only if it is computable by a Turing machine (other methods also exist).
Turing machines are hypothetical devices which work by manipulating an alphabet of symbols on an infinite strip of tape according to a table of rules. Turing machines are able to simulate any algorithm and are therefore, when given an infinite amount of time, are theoretically able to compute any and all computable functions.
So what? Well, because of the link between computability and randomness in order to prove or disprove the random walk hypothesis all one would need to do is use a Turing machine to determine whether or not an algorithm which replicates the market (our function) exists. In other words, an algorithm exists for which, when given any valid input, will always return the correct corresponding output ... Alas, nothing in this world is simple because this approach to proving (or disproving) the random walk hypothesis runs headlong into the halting problem.
The halting problem basically involves determining whether or not a program will halt or if it will continue to run indefinitely. This problem has been proven to be unsolvable, meaning that it is not possible to know upfront whether or not a program will halt. As such, the challenge with using a Turing machine to try and find an algorithm which replicates the stock market is that if the market is actually random no replicating algorithm exists.
This implies that the Turing machine would need to try every possible algorithm before halting which would literally take forever. As such it is, for all intents and purposes, impossible to prove that the market is truly random.
Despite this fact these observations have given rise to a very interesting field called algorithmic information theory. Algorithmic information theory concerns itself with the relationship between computability theory and information theory. Algorithmic information theory defines different types of randomness, the most popular definition is MartinLöf randomness which asserts that in order for a sequence to be truly random it must,
 Be incompressible. Compression involves finding some lossless representation of the information which uses less information. For example, the infinitely long binary string, 01010101 ..., can be expressed more concisely as 01 repeated infinitely, whereas the infinitely long binary string, 0110000101110110101 ..., has no apparent pattern and therefore cannot be compressed to anything less than the exact same binary string, 0110000101110110101 .... This is equivalent to saying that if the Kolmogorov complexity of the string is greater than or equal to the length of the string, then the sequence is algorithmically random.
 Pass statistical tests for randomness. There are many statistical tests for randomness which deal with testing the difference between the distribution of the sequence versus the expected distribution of any sequence which was assumed to be random. This is discussed in more detail in the statistical approach section below.
 And be impossible to make money off of. This is an interesting concept which simply argues that if a set of martingales can be constructed on a sequence which is always expected to succeed, then the sequence is not random. A martingale is a model of a fair game where knowledge of past events never helps predict the mean of the future winnings. One might be tempted to think that the existence of the momentum and meanreversion anomalies in financial markets must therefore disprove the random walk hypothesis ... well, sort of. The problem with these anomalies is that they are not persistent and tend to break down or exhibit cyclical behaviour. As such, if anything they disprove the local random walk hypothesis (this is introduced below).
Another interesting differentiation made by both the statistical and algorithmic definitions of randomness is local versus global randomness. If a sequence appears random in the longrun it is said to be globally random, despite the fact that finite blocks of the sequence the sequence may appear to be nonrandom at times.
For example, the sequence 01010100011010101110000000001... appears to be random, but if we break the sequence up into four blocks [010101][0001101010111][000000000][1...] the first and third blocks do not appear random. As such, the sequence is said to exhibit global randomness but only partial local randomness.
Relating this distinction back to the random walk hypothesis, we should be able to distinguish between the local and the global random walk hypothesis. The global random walk hypothesis would state that in the long run markets appear to be random whereas the local random walk hypothesis would state that for some minimum period time the market will appear to be random. This view of the world is, at least in my opinion, consistent with the empirical observations of anomalies such as the value, momentum, and meanreversion factors especially when we acknowledge that these factors tend to exhibit cyclical behaviour. In other words, as with the sequence shown earlier, markets exhibit global randomness but during finite periods of time local randomness breaks down.
Markets exhibit global randomness but local randomness tends to breakdown in some dimensions of time; these breakdowns manifest themselves as anomalies e.g. the value, momentum, and reversion factors.
Unfortunately I have not seen this distinction made anywhere, it is my own opinion on how the empirical observations of individuals beating the market and the random walk hypothesis could be reconciled. Another distinction made by the algorithmic definition of randomness is that randomness is relative to information.
In the absence of information many systems may appear random, despite the fact that they are deterministic. In statistics this is known as a confounding variable. A good example of this is a random number generator which only appear random in the absence of the seed being used to produce that random sequence. Another, more interesting example is that market returns might appear to be random on their own, but in the presence of earnings reports and other fundamental indicators, that apparent randomness could break down and become nonrandom.
In the absence or presence of information markets may appear less or more random i.e. the global and local randomness of the markets is relative to the quality and quantity of available information about the markets.
These two theories are impossible to prove but they are what I personally believe about the markets (in addition to my belief that they are not random, but rather appear random just like many other complex adaptive systems). The remainder of this article leaves the world of theoretical computer science, information theory, and economics behind and focuses instead on what can realistically be achieved, namely the application of statistical tests for randomness to markets in order to identify potential trading opportunities / attractive markets.
The Statistical Approach
A sequence is said to be statistically random when it does not exhibit any observable patterns. This does not imply true randomness i.e. unpredictability because most pseudo random number generators, whose output are completely predictable (when given a seed), are said to be statistically random. Generally speaking a sequence is labelled statistically random if it is able to pass a battery of tests for randomness such as the NIST suite. Most of these tests involve testing whether the distribution of outputs produced by the supposedly random system is close enough to the distribution expected from a truly random sequence. Most of these tests are specifically designed to test for uniform randomness, as such they are almost always applied to binary sequences (bit strings).
This section provides implementationlevel details on the Python implementation of the NIST suite of tests for randomness. All of the code can be found in my GitHub repository, r4nd0m. In the development process much help was derived from the C implementation of the NIST test suite by NIST as well as an earlier implementation of the NIST test suite in Python 2.66 by Ilja Gerhardt. Unfortunately I struggled to get Ilja's code to run because of all the changes made between Python 2.66 and Python 3.4 which is another reason why I decided to reimplement.
Project Structure
The whole project is broken up into six classes and a main script which pulls everything together. Each class contains standalone code for performing one or more of the functions required to actually apply the NIST suite to historical market returns. The classes in the project include:
 r4nd0m  this is the main script which is used to pull everything together.
 RandomnessTester  this class contains all of the NIST tests. Each test has been implemented in such a way that it is static and can, therefore, be copied out of the class and used elsewhere. Note, however, that some of the tests, depend on the scipy.special, scipy.fftpack, scipy.stats, numpy, os, and copy packages in Python. Also note that the binary matrix rank transformation test depends on the BinaryMatrix class.
 BinaryMatrix  this class encapsulates the algorithm specified in the NIST documentation for calculating the rank of a binary matrix. This is not the same as the SVD method used to compute the rank of a matrix which is why the scipy.linalg package couldn't be used. This class could be more pythonic (pull requests welcome).
 BinaryFrame  this class, as the name suggests, is just a way of converting a pandas DataFrame to a dictionary of binary strings with the same column names. This dictionary and the decimal to binary conversion methods are encapsulated in this class. RandomnessTester simply takes in a BinaryFrame object and applies all of the NIST tests to each one of the binary strings contained in the dictionary.
 QuandlInterface and Argument  these two classes work together to allow you to interface with the Quandl.com API and download and join lists of datasets. Interesting dataset lists can be found in the MetaData folder of the project and your personal auth token can be stored in a .private.csv local file on your computer. You can find out more about this in the README.MD file for the GitHub repository online.
 Colours  this class just makes things look cool in the console.
The UML diagram below shows how the project is structured (as of 13/09/2015).
Binary Encoding Approach
Discretization involves converting continuous variables into discrete variables. The most popular discretization approach is binning. Binning involves classifying the output from continuous variables into a mutually exclusive set of "bins" each of which represent a possible interval within which the random variable may fall.
Since we don't know the range of market returns (our continuous variable), we simply bin them into positive returns and negative returns which are represented as a 1 and 0 bit respectively. To avoid bias we add a special case for returns equal to exactly 0.0%, these returns are represented at as both 0 and 1 i.e. 01 bit strings:
where represents the bit in the bit string at index and represents the return generated by the security being discretized at time . If the security goes up as often as it goes down, then the number of zeros and ones in the resulting bit string should be uniformly distributed. This is discussed in more detail in the next section.
Two additional conversion methods were implemented and tested which involved converting basis points and floating point returns into binary respectively. The problem with these is that they introduce biases which result in even strong cryptographic random number generators failing numerous tests in the suite.
Test 01  Frequency (Monobit)
This test looks at the proportion of 0 bits to 1 bits in a given binary sequence. Assuming the sequence is uniformly distributed this proportion should be close to 1 meaning that the number of 1 bits was approximately equal to the number of 0 bits. In this context, because a 0 bit represents a down day and a 1 bit represents an up day, this test checks whether down and up days on the market are uniformly distributed over time i.e. equally probable.
A failure of the monobit test may indicate something about the relative profitability of simple buy and hold strategies because, if the market is more likely to go up than down, then simply buying and holding the market is likely to be a profitable trading strategy. This assumes, however, that the average size of an upday in percentage terms is greater than or equal to the average size of a downday in percentage terms.
The Gist below contains a standalone Python method which realizes the monobit test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Chung, Kai Lai, and Farid AitSahlia. Elementary probability theory: with stochastic processes and an introduction to mathematical finance. Springer Science & Business Media, 2012.
[3] Jim Pitman, Probability. New York: SpringerVerlag, 1993 (especially pp. 93108).
Test 02  Block Frequency Test
This test is a generalization of the Monobit test which looks the proportion of 0 bits to 1 bits in blocks of size extracted from a given binary sequence. Assuming uniform randomness the number of 0 bits and 1 bits should be approximately per block i.e. each block is filled with, on average, as many 1 bits and 0 bits. In our context this test looks at the probability of up and down days occurring over finite "windows" or periods of time.
The Gist below contains a standalone Python method which realizes the block frequency test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Maclaren, Nick. "Cryptographic Pseudorandom Numbers in Simulation." Fast Software Encryption. Springer Berlin Heidelberg, 1994.
[3] Knuth, Donald E. "Seminumerical Algorithms, The art of computer programming, Vol. 2." (1981).
[4] Abramowitz, Milton, and Irene A. Stegun. Handbook of mathematical functions: with formulas, graphs, and mathematical tables. No. 55. Courier Corporation, 1964.
Test 03  Runs Test
Arguably the most well known test for randomness, the runs test focusses on the number of runs which appear in a binary sequence. A run is defined as an uninterrupted sequence of identical bits. A run of length is an uninterrupted sequence of identical bits. The runs test checks whether the number of runs in the sequence is significantly different from what is expected from a sequence under the assumption of uniform randomness. The number of runs also indicates the frequency with which the sequence oscillates between 0 and 1 bits.
In this context, a run is a sequence of consecutive trading days wherein the market was up or down. As such, performance on this test may imply something about either the momentum or mean reversion anomalies. Previous academic studies have shown that the market has more runs than expected and therefore the market is said to oscillate more quickly between "up runs" and "down runs" than would be expected from a random sequence.
The Gist below contains a standalone Python method which realizes the runs test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Gibbons, Jean Dickinson, and Subhabrata Chakraborti. Nonparametric statistical inference. Springer Berlin Heidelberg, 2011.
[3] Godbole, Anant P., and Stavros G. Papastavridis. Runs and patterns in probability: Selected papers. Vol. 283. Springer Science & Business Media, 1994.
Test 04  Longest Runs (in a block) Test
As mentioned in the independent runs test (test 03) a run is any uninterrupted sequence of identical bits. The previous tests looks at the number of runs and checks whether this is statistically significantly different from the expected number of runs from a true random binary sequence. The longest run (of ones in a block) test checks whether the longest run of 1 bits in Mbit blocks is statistically significantly longer (or shorter) than what would be expected from a true random binary sequence. Since a significant difference in the length of the longest run of 1's implies a significant difference in the length of the longest run of 0's only one test is run.
In this context, this test measures whether or not the number of subsequent updays in a certain window of time is consistent with what would be produced if market returns were random and generated by a coin flip. A failure of this test may indicate the presence of either momentum or strong mean reversion. Momentum is a property of financial markets which makes the probability of the market having an up day (or down day), given a previous run of up days (or down days) higher. Mean reversion is the opposite  it is a property of financial markets which makes the probability of the market having an up day (or down day), given a previous run of up days (or down days) lower.
The Gist below contains a standalone Python method which realizes the longest runs test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
Test 05  Binary Matrix Rank Test
The Binary Matrix Rank test is quite interesting because it takes the binary sequence and transforms it into a sequence of matrices. The test then works by testing the linear dependence between those matrices. A set of vectors is said to be linearly dependent if one or more of the vectors in the set can be defined as a linear combination of the other vectors. This is related to the concept of dimension, in that if a linearly dependent matrix were added to the set it would not result in any increased dimensionality because it would fall within the span of the previous matrices. For more information on this concept check out the Khan Academy videos.
Relating the concept of linear dependence back to the financial markets context is quite difficult, but let me give it a shot anyway. Each matrix represents a window of days wherein the first row of each matrix represents days. If the first matrix, , were linearly dependent on the next matrix, , then we would be saying that the returns in the second window can be expressed as a function of the returns in the first window meaning that whether the markets went up or down in would determine whether or not they went up or down in . Again this can either relate back to meanreversion or momentum albeit in a more challenging way.
The Gist below contains a standalone Python method which realizes the binary matrix rank test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
The Gist below contains a standalone Python class for computing the binary rank of a matrix. It is very similar to the C implementation of the NIST suite and could probably be numpyfied with some help from my readers.
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] George Marsaglia, DIEHARD: a battery of tests of randomness.
[3] Kovalenko, I. N. "Distribution of the linear rank of a random matrix." Theory of Probability & Its Applications 17.2 (1973): 342346.
[4] Marsaglia, George, and LiangHuei Tsay. "Matrices and the structure of random number sequences." Linear algebra and its applications 67 (1985): 147156.
Test 06  Discrete Fourier Transform Test
The Discrete Fourier Transform test, more widely known as the Spectral test, is one of the best known tests for random number generators. The test was developed by Donald Knuth and was used to identify some of the biggest shortcomings of the popular linear congruential family of pseudo random number generators. This test works by transforming the binary sequence from the time dimension to the frequency dimension using the discrete Fourier transform. Once this has been done the test looks for periodic features (patterns which are near to each other) whose presence would indicate a deviation from true randomness. The image below shows how this test can be used to expose the shortcomings of the aforementioned linear congruential generator family:
The Gist below contains a standalone Python method which realizes the discrete Fourier transform test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Knuth, D. "The Art of Computer Programming 1: Fundamental Algorithms 2: Seminumerical Algorithms 3: Sorting and Searching." (1968).
[3] Yang, Lijuan, Baihua Zhang, and Xuzhen Ye. "Fast Fourier transform and its applications." OptoElectronic Engineering 31 (2004): 17.
[4] Kim, SongJu, Ken Umeno, and Akio Hasegawa. "Corrections of the NIST statistical test suite for randomness." arXiv preprint nlin/0401040 (2004)
Test 07  Nonoverlapping Patterns Test
Unlike the previous two tests, the non overlapping and the overlapping patterns tests (tests 07 and 08), are highly applicable to quantitative finance and, in particular, the development of trading strategies. The nonoverlapping patterns test concerns itself with the frequency of prespecified patterns in the binary string. If this frequency is statistically significantly different from what is expected from a true random sequence, this would indicate that the sequence is nonrandom and the probability of the pattern we are testing for occurring is too high or too low.
For example lets assume that we believed the market was more likely than a true random sequence to exhibit the following pattern 101100011111 ... namely the Fibonacci pattern. We would then use this test and the next test to either accept or reject our hypothesis within a given level of confidence. Fantastic! The reason why I like these two tests is that they give us quantitative analysts a solid theoretical way of testing the significance of presupposed patterns in market returns. These test can be applied to concepts such as the Elliott Wave Theory and the Fractal Market Hypothesis. I'm not saying I believe in these things, what I'm saying is we can test for them.
The Gist below contains a standalone Python method which realizes the non overlapping patterns test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Barbour, Andrew D., Lars Holst, and Svante Janson. Poisson approximation. Oxford: Clarendon Press, 1992.
Test 08  Overlapping Patterns Test
The difference between the overlapping patterns test and the nonoverlapping patterns test, apart from some of the calculations as can be seen in the NIST suite documentation, is the way that the patterns are tested. In the non overlapping patterns test once a pattern of length has been matched, you skip to the end of the pattern and continue searching, whereas with the overlapping patterns test you continue searching from the next bit. For more of my thoughts on this test, please see the previous test, the nonoverlapping patterns test.
The Gist below contains a standalone Python method which realizes the overlapping patterns test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Chrysaphinou, O., and S. Papastavridis. "A limit theorem on the number of overlapping appearances of a pattern in a sequence of independent trials."Probability Theory and Related Fields 79.1 (1988): 129143.
[3] Hamano, Kenji, and Toshinobu Kaneko. "Correction of overlapping template matching test included in NIST randomness test suite." IEICE transactions on fundamentals of electronics, communications and computer sciences 90.9 (2007): 17881792.
Test 09  Universal Test
Maurer's universal statistical test is an interesting statistical test which checks whether the distance (measured in bits) between repeating patterns is as would be expected from a uniform random sequence. The theoretical idea behind this test and the linear complexity test is that nonrandom sequences are not supposed to be compressible. To explain this concept, consider the fact that most pseudorandom number generators have a period and when that period is exhausted the bits in the sequence begin to repeat. When this happens, the distance between matching patterns would decrease and eventually the universal test would deem the sequence nonrandom. Unfortunately, the test requires a very significant amount of data to be statistically significant. In fact, working with daily returns data we would need about 1500 years worth of daily data before we could apply this test!
The Gist below contains a standalone Python method which realizes the universal test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Ziv, Jacob, and Abraham Lempel. "A universal algorithm for sequential data compression." IEEE Transactions on information theory 23.3 (1977): 337343.
[3] Maurer, Ueli M. "A universal statistical test for random bit generators."Advances in CryptologyCRYPT0’90. Springer Berlin Heidelberg, 1991. 409420.
[4] ebastien Coron, JeanS., and David Naccache. "AN ACCURATE EVALUATION OF MAURER'S UNIVERSAL TEST."
[4] Gustafson, Helen, et al. "A computer package for measuring the strength of encryption algorithms." Computers & Security 13.8 (1994): 687697.
[5] Ziv, J. "Compression, tests for randomness and estimating the statistical model of an individual sequence." Sequences. Springer New York, 1990.
Test 10  Linear Complexity Test
The linear complexity test is one of the most interesting tests in the NIST suite from a pure Computer Science point of view. Like the universal test, the linear complexity test is concerned with the compressibility of the binary sequence. The linear complexity test aims to check this by approximating a linear feedback shift register (LSFR) for the binary sequence. A shift register is a cascade of flipflops which can be used to store state information. A LSFR is a shift register whose input bit is a linear function of the previous state.
The linear complexity test works as follows. Firstly, the shortest LSFR is approximated using the Berlekamp–Massey algorithm and then the length of that LSFR is compared against what would be expected from a true uniform random sequence. The thinking here is that if the LSFR is significantly shorter than expected, then the sequence is compressible and therefore non random. The image below shows an LSFR at work.
The Gist below contains a standalone Python method which realizes the linear complexity test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Gustafson, Helen, et al. "A computer package for measuring the strength of encryption algorithms." Computers & Security 13.8 (1994): 687697.
Test 11  Serial Test
The serial test is similar to the overlapping patterns test, except that instead of test for one bit long sequence it computes every possible bit long sequence (of which there are ), and computes their frequencies. The idea behind this test is that random sequences exhibit uniformity meaning that each pattern should appear approximately as many times as every other pattern. If this is not the case, and some of the patterns appear significantly too few or too many times, then the sequence is deemed non random. In this context we could use the test to identify those patterns which appear "too frequently" and bet on them occurring again.
The Gist below contains a standalone Python method which realizes the serial test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Good, I. J. "The serial test for sampling numbers and other tests for randomness." Mathematical Proceedings of the Cambridge Philosophical Society. Vol. 49. No. 02. Cambridge University Press, 1953.
[3] Knuth, D. "The Art of Computer Programming 1: Fundamental Algorithms 2: Seminumerical Algorithms 3: Sorting and Searching." (1968).
Test 12  Approximate Entropy
Generally speaking, an approximate entropy is a statistical technique used to determine how unpredictable (random) fluctuations over timeseries data are. The approximate entropy test is similar to the serial test (test 11) in that it also focusses on the frequency of all possible overlapping bit patterns across the binary sequence. The difference is that this test compares the frequency of overlapping blocks of two consecutive or adjacent lengths ( and ) against the expected result for a random sequence. To be more specific, the frequencies of all possible bit and bit frequencies are computed and the Chisquared statistic is then used to determine whether or not the difference between these two observations implies that the sequence is nonrandom.
The Gist below contains a standalone Python method which realizes the approximate entropy test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Pincus, Steve, and Burton H. Singer. "Randomness and degrees of irregularity."Proceedings of the National Academy of Sciences 93.5 (1996): 20832088.
[3] Pincus, Steve, and Rudolf E. Kalman. "Not all (possibly)“random” sequences are created equal." Proceedings of the National Academy of Sciences 94.8 (1997): 35133518.
[4] Rukhin, Andrew L. "Approximate entropy for testing randomness." Journal of Applied Probability 37.1 (2000): 88100.
Test 13  Cumulative Sums
The next three tests (the cumulative sums test, random excursions test, and random excursions variant test) are my personal favourites from the NIST suite because they deal directly with the concept of a random walk. As some of you will know random walks (stochastic processes) are a topic which I have written about previously and plan to continue writing about over the next few years as I develop the Mzansi stochastic processes R package.
The cumulative sums test turns the supposedly random binary sequence into a random walk by replacing each 0bit with (1) and each 1but with (+1) and them calculating the cumulative sum at each point along the sequence. The test then checks whether the absolute maximum cumulative sum at any point along this sequence is within the range of what would be expected from a true uniform random sequence. This process is illustrated below,
Relating this idea back to the markets, if the absolute peaks (which may represent the bottom of bull markets or bear markets) are significantly higher or lower than what would be expected, then this could indicate the presence of various market phenomena including bull and bear markets, cycles, meanreversion, or momentum.
The Gist below contains a standalone Python method which realizes the cumulative sums test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Spitzer, Frank. Principles of random walk. Vol. 34. Springer Science & Business Media, 2013.
[3] Révész, Pál. Random walk in random and nonrandom environments. Singapore: World Scientific, 2005.
Test 14  Random Excursions
As with the cumulative sums test the random excursions test deals with the concept of a random walk. The difference however, is that the random excursions test introduces the concept of cycles, and states. A cycle is any subsequence which starts and ends at 0, and a state is the level of the random walk at each step. The states considered in this test include 4, 3, ..., +3, +4. The test determines whether or not the number of times each state is visited, , in each cycle is as would be expected from a uniform binary sequence. This is illustrated below,
The Gist below contains a standalone Python method which realizes the random excursions test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Baron, Michael, and Andrew L. Rukhin. "Distribution of the number of visits of a random walk." Stochastic Models 15.3 (1999): 593597.
[3] Spitzer, Frank. Principles of random walk. Vol. 34. Springer Science & Business Media, 2013.
[4] Révész, Pál. Random walk in random and nonrandom environments. Singapore: World Scientific, 2005.
Test 15  Random Excursions Variant
The random excursions variant test, as the name suggests, is a variant of the random excursions test which differs in that it does not partition the random walk, and it computes the total number of times the random walk visits a wider random of states extending from 9, 8, ... to ..., +8, +9. As with the previous test, which returns 8 Pvalues (one for each state), the random excursions variant test returns 18 Pvalues.
The Gist below contains a standalone Python method which realizes the random excursions variant test for randomness. It has been tested to 1,000,000 bits on the binary expansions of , , , and .
For more information about this test, including a fantastic mathematical description of how to calculate it, worked examples, and recommendations on minimum input sizes please see the following references:
[1] Rukhin, Andrew, et al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. BoozAllen and Hamilton Inc Mclean Va, 2001.
[2] Baron, Michael, and Andrew L. Rukhin. "Distribution of the number of visits of a random walk." Stochastic Models 15.3 (1999): 593597.
[3] Spitzer, Frank. Principles of random walk. Vol. 34. Springer Science & Business Media, 2013.
[4] Révész, Pál. Random walk in random and nonrandom environments. Singapore: World Scientific, 2005.
Walsh–Hadamard Transform
Another interesting test not covered by the NIST suite is the Walsh–Hadamard Transform. This test for randomness was written about previously by friend and fellow quant, Pawel Lachowicz, over at QuantAtRisk.com. For more information on this test please see his posts (below) and upcoming book, Python for Quants.
 Walsh–Hadamard Transform and Tests for Randomness of Financial ReturnSeries
 Fast Walsh–Hadamard Transform in Python
Hacking the Market
At this point in this article we have accomplished two things. Firstly, we have touched on the theory of randomness and the random walk hypothesis and, secondly, we have introduced a set of fifteen cryptographic tests which seek to identify nonrandomness in binary sequences. These tests look either for biases, runs, patterns, or some combination thereof. Put simply, each one of these characteristics improves the predictability of the sequence and reduces the randomness of the sequence. In information security terms this makes the random number generator being tested less cryptographically secure; and in quantitative finance terms this makes the market more attractive. The final part of this article will see whether or not our tests can do better than humans at picking up random versus nonrandom sequences and hopefully conclude whether or not markets are really random.
Experiment Setup and Markets Considered
Whilst many considerations went into the experiments I have run for this blog post, no experiment is perfect and leaves a margin for an incorrect conclusion to be drawn. These issues are discussed in the next section.
Firstly, here are the stock exchange indices I looked at:
 The DJIA Index from 1900 to 2015 (daily) ~ 115 Years
 The S&P 500 Index from 1950 to 2015 (daily) ~ 65 Years
 The Hang Seng Index from 1970 to 2015 (daily) ~ 45 Years
 The Shanghai Composite Index from 1990 to 2015 (daily) ~ 25 Years
And in addition to these I looked at some other interesting asset classes:
 Gold Prices in US Dollars from 1970 to 2015 (daily) ~ 45 Years
 The Currency Exchange Rates of USD vs GBP from 1990 to 2015 (daily) ~ 25 Years
 Europe Brent Crude Oil Spot Price FOB from 1990 to 2015 (daily) ~ 25 Years
Secondly, the NIST test suite works on samples of the original data. As such These returns are discretized and then sliced up into periods. The window period I considered were 3, 5, 7, and 10 year windows.
Thirdly there are two ways to produce the windows namely, as overlapping windows or nonoverlapping windows. Whilst the former is better because it shows the walkforward randomness of the market it also affects the quality of the aggregate Pvalues computed because the windows are not independent. The general procedure used to produce the windows, is shown below and can also be found here in the code:
 Download the data from Quandl.com starting on 0101YYYY and ending on 01012015
 Use discretization to turn the returns data, , into a binary sequence,
 Calculate the number of binary digits in the sequence, .
 Determine the number of years in the sequence () e.g. 45 Years
 Specify the length of each window,
 Determine how many binary digits are in each window
 Calculate the number of samples, , where:
 for overlapping samples and
 for independent samples
 Partition the binary sequence into samples,
 Apply each test (40 tests) to each sample, and calculate the Pvalue
 Determine which samples passed. If Pvalue < 0.005 (99.5% confident) then the sequence is nonrandom
 If the number of samples passed is greater than or equal to 90% then pass the test **
 Calculate the aggregate Pvalue over all samples using Chisquare test (this isn't used)
 Calculate a "score" for each data set for each window. The score equals the percentage of tests which passed.
** Note that the NIST documentation recommends that 96% of all samples pass in order to pass the test, not 90%. I lowered this requirement (meaning it was easier for the market to look random) because of the relatively low number of samples producible when using the nonoverlapping sampling method on the shorter data sets.
Last, but not least, I included two comparison simulated datasets. The first dataset is binary numbers produced using the same discretization strategy on the output from the Numpy Mersenne Twister algorithm. The Mersenne Twister is one of the best pseudorandom number generators out there but it is still not cryptographically secure! The second dataset is a binary numbers produced from the SIN function. This really exemplifies how a truly predictable, nonrandom sequence would perform. These are just benchmarks.
The results section just includes a tabulated summary of the results for each data set using the two sampling methods. For more detailed outputs go open the code up and play around with it. The output when running the tests in PyCharms (my preferred Python IDE) looks like this (warning spoiler alert!!).
Each number represents a Pvalue. Red Pvalues are below the threshold 0.005, and green values are above. Each column represents approximately one window in this image of 5 years. The first column after the test name indicates whether the test has passed (PASS!) or failed (FAIL!). The number of tests passed tells us how random a sequence is whereas the types of test failed tell us about potential biases in those sequences.
As can be seen below almost all of the samples passed all of the tests and the conclusion for each test was a pass because > 90% of the samples passed each test.
Each number represents a Pvalue. Red Pvalues are below the threshold 0.005, and green values are above. Each column represents approximately one window in this image of 5 years. The first column after the test name indicates whether the test has passed (PASS!) or failed (FAIL!). The number of tests passed tells us how random a sequence is whereas the types of test failed tell us about potential biases in those sequences.
As can be seen below almost all of the samples failed all of the tests and the conclusion for many of the tests was a fail because < 90% of the samples passed each test.
Each number represents a Pvalue. Red Pvalues are below the threshold 0.005, and green values are above. Each column represents approximately one window in this image of 5 years. The first column after the test name indicates whether the test has passed (PASS!) or failed (FAIL!). The number of tests passed tells us how random a sequence is whereas the types of test failed tell us about potential biases in those sequences.
As can be seen below quite a few of the samples failed some of the tests and the conclusion for some of the tests was a fail because < 90% of the samples passed each test.
Problems with these experiments
As I mentioned previously, no experiment is without its flaws. The best we can do as researchers is be transparent, opensource, and honest. So here are some of the problems I have with my own research:
 Some of these tests require a lot more data than the market has produced (unless minute or tick data were used) meaning that their statistical significance is less than ideal. More data should be used!  I wish I had more data. If somebody out there has tick data please further my research and get in touch!
 The NIST suite tests only for uniform randomness this does not mean the markets are not normally, or otherwise distributed and still random. This is to some extent true and for diehard efficient market and random walk hypothesis junkies this argument will certainly help you sleep tonight ;). Personally, I think that these hypotheses are ideological and not real at all. Markets exhibit patterns which these tests are identifying. The simplest of such patterns is a general propensity to go up rather than down.
 Arbitrarily chosen timeperiods (starting on the 1st of January of each year) and significance level (0.005). The tests should be applied over a much more robust set of samples which start every month, or quarter (not year) into the future. The Pvalue didn't have too any impact on the conclusions because whether set to 0.001, 0.005, or 0.05, the market still failed some of the tests during interesting periods e.g. 1954  1959.
Now that that is out of the way, let's wrap this up with a summary of results, some conclusions, a list of further research efforts which I would like to see come out of this little project, and my personal news.
Summary of Results
Results obtained for when using nonoverlapping samples
The table below shows the scores computed for each data set, against the scores obtained by the two benchmarks. When the number of samples dropped below 5, no results were recorded.
Results obtained when using overlapping samples
The table below shows the scores computed for each data set, against the scores obtained by the two benchmarks.
Observations made from the tables
 The scores for the data sets lie between the scores of the two benchmarks meaning that markets are less random than a Mersenne twister and more random than a SIN function, but still not random.
 The scores for the data sets vary quite a lot with,
 Dimension  the size of the window has a big impact in some cases, and
 Uniqueness  markets are not equally random, some appear more random than others.
 The scores for the benchmarks are consistently good for the Mersenne Twister (>90% of tests passed on average) and poor for the SIN graph (10  30% of tests passed on average).
Conclusions and Further Research
At the beginning of this article I recounted a story of how Professor Burton Malkiel (the author of one of my alltime favourite books, A Random Walk Down Wall Street) presented a random walk constructed from successive coin flips to an unlucky chartist. When the chartist responded with the information that the "stock" was a good buy Professor Malkiel likened the stock market to a coinflipping contest and advocated a passive buyandhold investment strategy. Whilst I have the utmost respect for the professor, I believe that this conclusion was erroneous because all his test really tells us is that, in the eyes of a chartist, there is no distinction between a coinflipping contest and the market. However, that does not imply that to the eyes, or rather the algorithms, of a Quantitative Trader that there is no distinction between a coinflipping contest and the market. In this article I have shown that whilst I may not personally (with my own two eyes) be able to tell the difference, the NIST suite of cryptographic tests for randomness sure as hell can. This is because markets are, quite simply, not random.
Markets are, quite simply, not random.
Additionally two slightly more sophisticated conclusions from these results can be drawn:
 Assuming randomness is not binary, one could conclude that not all markets are made equally "random". Some of the markets, namely the foreign exchange rate between the USD and GBP currencies and the S&P 500 Index, exhibit much lower levels of randomness than others such as the Hang Seng Index.
 The apparent randomness of the markets, unlike a strong pseudorandom number generator, appear to be affected by time dimension. In other words, certain window sizes cause markets to appear less random. This may indicate the presence of cyclical nonrandom behaviours in the markets e.g. regimes.
Furthermore, the theoretical discussion and commentary has highlighted some major flaws in the random walk hypothesis; most notably that randomness is relative and in the presence of new or additional information (e.g. fundamental or economic data) the apparent randomness of the market may break down.
Another positive point about these conclusions is that they are consistent with the empirical evidence of individuals and firms who have been able to consistently beat the market over decades, a feat which would not be possible if markets were in fact random walks. On that high note, allow me end off this article with a link to a much better, anecdotal argument by Warren Buffet about the supposed coinflipping contest he has been winning.
A Personal Update
As some of you will already know, I have been awarded a fantastic opportunity to work as a Quantitative Strategist at a young Hedge Fund called NMRQL which focusses on applying cuttingedge quantitative and machine learning models to the markets. The fund is being run by Thomas Schlebusch (ex CIO of Sanlam International Investments) and is chaired by Michael Jordaan (ex CEO of First National Bank). I am lucky to be their first codemonkey / quant. Please get in touch with Thomas or myself should you wish to invest in the fund, the minimum investment amount is R1m (onemillionrand). I also now live in Stellenbosch in the heart of the Western Cape winelands.
Software Disclaimer
This software was developed by Stuart Reid. Stuart Reid assumes no responsibility whatsoever for its use by other parties, and makes no guarantees, expressed or implied, about its quality, readibility, or any other characteristic. Usage of this implementation should acknowledge this repository, and the original NIST implementation.
Tags
Binary Matrix Rank Python Block Frequency Test Python Computational Finance Computational Investing Cumulative Sums Test Python Longest Run of Ones Test Python Monobit Test Python NIST Test Suite Overlapping Patterns Test Python Python Random Excursions Test Python Random Walk Hypothesis Randomness Tests Runs Test Python Serial Test Python Stochastic Process Turing Computability
nice work. I'm officially impressed.

Outstanding study with elegant presentation!
Helped me a lot. Thanks.

This was a great article! I really enjoyed reading it. There was tons of stuff that went right over my head, but I love having exposure to such information! I will definitely come back for more.

Geez... wow. I am awed. Kudos, Stuart, kudos! I will be reading this again and again, and I see your code as incredibly interesting and useful in the study of this and other processes.
Looking forward to your book. If you aren't writing one, you should be.

Hello, good article.
I am personally not a believer of global randomness, so it's no surprise that the percentage of passed tests decreased as the time window increased. You can take several markets and notice by eye the longterm, nonrandom trends.
I think it would be interesting to test the same experiment in very small time windows, which would also be interesting to high frequency traders. Take millisecond tick data for 1 year and make your windows on the order of seconds, minutes, hours, etc.
Thank you!

This is top quality work. Congratulations and good luck with your new job.
I have done some work with runs tests and I have noticed that results depend on timeframe and lookback period. For example, with a lookback period of n bars, in daily timeframe the result may be random state but in weekly timeframe it may be a trend. The fractal nature of markets allows for randomness inside trend and vice versa.
Also, for runs tests, I have notice that as the lookback period increases, the results show trend no matter what. For daily data in S&P 500 and for lookback period greater than 100, the pvalue goes close to 0 and stays there indicating trend.
Do you have similar experience with these tests?
Best.

I'm interested in the relationship between liquidity and entropy.
Just yesterday, WSJ featured a chart showing how the askbid spread peaks at the market open and takes about a halfhour to decay to the usually steady state of high liquidity throughout the day. I think the data points you've examined differ in their information content. Some are a more accurate measure of "price" than others. What I'd expect is to find the highest liquidity market samples seem the most locally random. The question then is whether, at their points of least liquidity, they become inefficient enough to show a "hackable signal". 
You miss one very important and critical thing. Market randomness is different from other types of randomness. Why? because it is adaptable. As soon as a "weakness" appears it is immediately exploited. You mention the Malkiel book, but you did not get that part. He points out a large number of systems that worked until they were discovered and published. So this is not exactly like tossing a fair coin. The current market is "more random than random" so it is not unexpected that any tests will demonstrate it is not random. [I would be really surprised if they would, actually, but that's another story]
The market is random as it is not predictable [the basic definition of randomness]. If it was predictable [thus not random] it would very soon adapt so that it would be again unpredictable. This metarandomness has to fail all randomness tests as it is a characteristic of a different type.

Hi Stuart! Congratulations on your new job. It sounds like a very exciting opportunity. Great article  very well written, and interesting conclusions. I will be sharing this with my quant friends.

I mentioned Reid's fascinating posting in my own "Technically Speaking" blog on the Newsmax Insiders Page: http://www.newsmax.com/richardgrigonis/malkielrandomsequencewallstreet/2016/01/05/id/708263/

Excellent post! I hope I find myself the time to work on your code and read some of the papers.

I believe it would be better to use a random market model for comparison, instead of the Mersenne Twister output. For example, you could include a row in your final tables for data produced by the Heston stochastic volatility model of the same length as the market data. You can even tune/calibrate the model so that your randomness tests are very specifically answering your original question, that is "Can these randomness tests differentiate real and random data even when a human cannot?".
I believe that would be a more accurate demonstration of your point, otherwise you're sort of comparing apples to oranges. Great post though, I enjoyed reading!
Comments