# Stock Market Prices Do Not Follow Random Walks

Because volatility seems to cluster in real life as well as the markets, it has been a while since my last article. Sorry about that. Today we will be taking our first giant leap along A Non-Random Walk down Wall Street.

### The Non-Random Walk Series

A Non-Random Walk Down Wall Street is the cheeky title of an academically challenging textbook written by Lo and MacKinlay in response to the best-selling Wall Street classic, A Random Walk Down Wall Street, written by Professor Burton Malkiel. A Non-Random Walk Down Wall Street is a collection of papers which challenge the prevailing random walk hypothesis. Despite containing only outdated results and being mathematically unforgiving, it's an impressive textbook which has inspired me to write a series of articles about it.

This series of articles has the following goals: Bolster or invalidate my original findings using the NIST test suite; Translate Lo and MacKinlay's papers and tests into more intuitive terms; Extend the results to the present day to determine if they are still relevant; Extend the results to emerging markets with a strong focus on South Africa; And bridge the theoretical and practical gap between machine learning and market randomness. Whew!

This series is also inspired by many of the thoughtful comments I received after I published my post about the random walk hypothesis, Hacking The Random Walk Hypothesis. So please keep the comments coming.

P.S. Simply because market randomness tests don't exactly make for great WordPress featured images, the featured images for this series of articles will be screenshots from all of my favourite Wall Street inspired films. This article's featured image is from the most recent Wall Street inspired film to hit the box office: The Big Short. If you haven't seen it yet, do yourself a big favour and go watch it. Afterwards, if you want more information you can always suffer through an earlier, non technical article of mine: A Recipe for the 2008 Financial Crisis.

### Article Outline

This series will début with Lo and MacKinlay's first paper: Stock Markets Do Not Follow Random Walks: Evidence from a Simple Specification Test. In this paper Lo and MacKinlay exploited the fact that under a Geometric Brownian Motion model with Stochastic Volatility variance estimates are linear in the sampling interval, to devise a statistical test for the random walk hypothesis. This post covers the theory and application of this test.

This post is broken up into the following sections:

### Efficiency, The Markov Property, and Random Walks

The random walk hypothesis is a popular theory which purports that stock market prices cannot be predicted and evolve according to a random walk. This hypothesis is a logical consequent of the weak form of the efficient market hypothesis which states that: future prices cannot be predicted by analyzing prices from the past ...

To a statistician the assertion that future prices cannot be predicted by analyzing prices from the past goes by a different name: the Markov property or, more intuitively, memorylessness. Any time series which satisfies the Markov property is called a Markov process and Random Walks are just a type of Markov process.

The idea that stock market prices may evolve according to a Markov process or, rather, random walk was proposed in 1900 by Louis Bachelier, a young scholar, in his seminal thesis entitled: The Theory of Speculation. In his paper he proposed using Brownian motion, a Markov (and Martingale) process, to model stock options. That said, it wasn't really until 1973, when the Black Scholes formula for derivatives pricing was published, that the idea gained traction.

Physical Brownian Motion. Sourced from: https://www.youtube.com/watch?v=R5t-oA796to

Since then the use of stochastic processes for derivatives pricing has become industry standard. That having been said, the philosophical question regarding whether or not stock market prices really evolve according to a random walk or, at the very least, according to the popular stochastic processes used in industry today, remains. To paraphrase Queen, we are left wondering: is this [The Random Walk Hypothesis] real life? Is this just fantasy?

Is this the real life? Is this just fantasy?

Personally my mind rebels against the theory because it is too elegant; too simple. I like complexity; I like chaos. So I choose to spend my spare time learning more about the theory of randomness and I enjoy trying to find ways to test the conventional wisdom ... and hopefully someday learn to beat the market consistently ;-).

Back to the top

### Variants of the Random Walk Hypothesis

A typical test of the random walk hypothesis involves three steps. First off you assume that asset prices do evolve according to a random walk and you select an appropriate stochastic model. Secondly, you define which statistical properties you would therefore expect to see in asset prices. And lastly, you test whether or not the asset prices exhibit the expected properties. If the asset prices don't exhibit the expected properties, then the assets don't evolve according to the model of the random walk hypothesis you assumed they did to begin with.

It's not good enough to simply state that market returns aren't random, you need to also specify what type of random they aren't.

Admittedly the fact that I didn't follow this process exactly in my previous article on Hacking The Random Walk Hypothesis was its biggest shortcoming. Luckily a supportive statistician on Reddit helped me see the light: it is not good enough to simply state that market returns aren't random, you need to also specify what type of random they aren't. In light of this below I have defined three popular forms of the random walk hypothesis.

RW1: The first and strongest form of the random walk hypothesis assumes that the random disturbance, $\epsilon_{t}$, is independent and identically distributed (IID). This corresponds to the Geometric Brownian Motion Model wherein volatility of the random disturbance, $\epsilon_{t}$, allows only for homoskedastic increments (constant $\sigma$). Under this hypothesis, variance is a linear function of time (discussed in more detail in the next section).

RW2: The second, weaker form of the random walk hypothesis relaxes the identically distributed assumption and assumes that the random disturbance, $\epsilon_{t}$, is independent and not identically distributed (INID). This corresponds to the Heston Model wherein the volatility of $\epsilon_{t}$ also allows for unconditional heteroskedastic increments. Under this hypothesis, variance is a non-linear function of time (discussed in the next section).

RW3: The third and weakest form of the random walk hypothesis relaxes the independence assumption meaning that it allows for conditional heteroskedastic increments in $\epsilon_{t}$. This corresponds to some random walk process wherein the volatility either has some sort of non-linear structure (it is conditional on itself) or it is conditional on another random variable. Stochastic processes which employ ARCH (Autoregressive Conditional Heteroscedasticity) and GARCH (Generalized AutoRegressive Conditional Heteroscedasticity) models of volatility belong to this category.

In other words, any successful refutation of the random walk hypothesis must, ultimately, be model dependent. Furthermore, that model must clearly fall within the spectrum above. It just so happens that the weaker the form of the random walk hypothesis, the harder it is to disprove and the more powerful your statistical tests need to be.

Any successful refutation of the random walk hypothesis must, ultimately, be model dependent - paraphrased from Lo and MacKinlay's book.

To illustrate this point consider how easy it would be to show that some asset's prices don't evolve according to Brownian Motion but, on the other hand, how difficult it would be to show that the same asset's prices don't evolve according to some stochastic process without independent increments and with conditional heterskedasticity!

My favourite comic strip, Dilbert, gets it. Sourced from: http://dilbert.com/strip/2001-10-25

Back to the top

# Stochastic Model Specification

The model which we will be basing our statistical test of the efficacy of the random walk hypothesis on is a stochastic log price process which is acted upon by a drift and stochastic volatility component. We also define a simple model of stochastic volatility which is used later on to produce the results for simulated asset prices. That having been said, the reader should please note that the test is robust to most forms of non-conditional heteroskedasticity. In other words, we will be testing the second variant of the random walk hypothesis, RW2.

##### Stochastic Log Price Process

Let $S_{t}$ denote the price of some asset at time $t$ and define $X_{t} = ln(P_{t})$ as the log-price process. This log-price process satisfies the Markov property and is given by the following recurrence relation:

$X_{t} = \mu + X_{t-1} + \epsilon_{t}$

where $\mu$ represents the drift component and $\epsilon_{t}$ is a random disturbance samples from some distribution with an expected value of zero, $E[\epsilon_{t}] = 0$In the results on simulated data section we show that the results of the test on two versions of the above model: one with homoskedastic increments which is essentially Geometric Brownian Motion (this model relates to RW1); and another with unconditional heteroskedastic increments which is essentially Geometric Brownian Motion with Stochastic Volatility (this model relates to RW2).

##### The Homoskedasticity (RW1) and Heteroskedasticity (RW2) Versions

In the homoskedastic version of the model the random disturbances, $\epsilon_{t}$, are sampled from a Gaussian distribution,

$\epsilon_{t} \sim \mathcal{N}(0, \sigma_{0}^2)$

This corresponds directly to the Geometric Brownian Motion model. In the heteroskedastic version of the model the random disturbances are sampled from a Gaussian distribution with a stochastic $\sigma_{0}^2$,

$\epsilon_{t} \sim \mathcal{N}(0, \sigma_{t}^2)$ where $\sigma_{t}^2 \sim \mathcal{N}(0, \frac{\sigma_{0}^2}{2})$

This is essentially Geometric Brownian Motion with stochastic volatility, however, I would like to stress that this is not the same as the Heston model which I have worked with before. It is a simplification.

Nevertheless, the desired effect of stochastic volatility namely, fatter tailed distributions and a higher volatility, is clearly present as can be seen from the two density plots below and the following two time series plots.

Comparison of the distribution of random disturbances sampled from the homoskedastic model (blue) against the heteroskedastic model (red). As can be seen the heteroskedastic distribution has fatter tails.

Comparison of the two sequences of random disturbances sampled from the homoskedastic model (blue) against the heteroskedastic model (red). As can be seen, the heteroskedastic sequence has many more large disturbances.

At this point you may be wondering what all the fuss is about. Well, we want a test for the random walk hypothesis which passes (it concludes the market is random) even if the returns demonstrate heteroskedastic increments and large drifts. Why? Because both of these properties are widely observed in most historical asset price data (just ask Nassim Taleb) and neither invalidate the fundamental principle underpinning the random walk hypothesis, namely the Markov property (unforecastibility of future asset prices given past asset prices).

In the next section we will show how the parameters $\mu$ and $\epsilon_{t}$ can be estimated. In the section following that Lo and MacKinlay's variance ratio test, which is robust in the presence of drift and heteroskedasticity in $\epsilon_{t}$ but is still sensitive to the autocorrelated increments in $X$, is defined. Then results will be shown.

##### R Code for the Stochastic Model

Given values for $\mu$ and $\sigma_{0}^2$, the R functions below work together to produce log-price, price, and discrete returns processes of any length. These processes are later used to test the calibration and the variance ratio test.

Using this code is quite simple. To chart fifteen asset price paths each five years long with an without stochastic volatility I would just need to type the following commands into the R command prompt.

And I would get something like this out in the plots.

# Stochastic Model Calibration

The key to understanding how the variance ratio test works, is to understand the different ways in which the parameters $\mu$ and $\sigma_{0}^2$ can be calibrated using their maximum likelihood estimators.

##### Maximum Likelihood Estimator for $\mu$

The parameter $\mu$ represents the component of daily returns which are attributable to drift. Given a log price process, $X$, containing $2n + 1$ observations, the maximum likelihood estimate of $\hat{\mu}$ is given by,

$\hat{\mu} = \frac{1}{2n}\sum^{2n}_{k=1} \big(X_{k} -X_{k-1}\big)$ or rather,

$\hat{\mu} = \frac{1}{2n} \big(X_{2n} - X_{0}\big)$

This can be computed in R using the following function,

And below is a graph illustrating how the estimates for $\mu$ get more accurate as the number of observations in the log price process gets larger (more data = more accurate). The dots indicate the true value of $\mu$ which was set of 0.1, and the lines indicate estimates of $\mu$ from generated log price processes with stochastic volatility of length 1 year up to 50 years. Ideally we want as much data as possible in our tests which is why the results at the end are limited to assets which have been trading or have been tracked for at least ten years.

##### Maximum Likelihood Estimator for $\sigma_{0}^{2}$ with discrete samples

There are a few ways to estimate $\sigma_{0}^{2}$. You could calculate the standard deviation of the log price process using every observation in $X$. Or, you could sample every second point along $X$ and estimate $\sigma_{0}^{2}$:

Use every observation:

Use a subset of the observations:

We can express this idea in terms of a sampling interval, $q$, whereby every $q^{th}$ observation is used to estimate $\sigma_{0}^{2}$. When $q = 1$ we use every observation, when $q = 2$ we use every second observation, and so on and so forth. Given a log price process, $X$, containing $2n + 1$ observations and a sampling interval, $q$, the unbiased maximum likelihood estimator for the parameter $\sigma_{0}^{2}$ is given by,

$\hat{\sigma}_{0}^{2}(q) \equiv \frac{1}{nq - 1} \sum^n_{k=1} \big(X_{qk} - X_{qk-q} - q\hat{\mu}\big)^2$

This can be computed in R using the following function,

And as with $\mu$ below is a graph illustrating how the estimates for$\sigma_{0}^{2}$ get more accurate as the number of observations in the log price process gets larger (more data = more accurate).

##### Maximum Likelihood Estimator for $\sigma_{0}^{2}$ with overlapping samples

The problem with the above estimator for $\sigma_{0}^{2}$ is that as we increase our sampling interval we reduce the number of observations we have available to use to estimate $\sigma_{0}^{2}$ ... and this results in a deterioration of the estimate.

A significant improvement on this estimator can be obtained by using overlapping samples. Whilst this does bias the estimator, Monte Carlo simulations done by the authors at publication (and myself) show that this is negligible and the estimates of $\sigma_{0}^{2}$ using overlapping samples is, more often than not, more accurate.

Given a log price process, $X$, containing $2n + 1$ observations and a sampling interval, $q$, the overlapping unbiased maximum likelihood estimator for the parameter $\sigma_{0}^{2}$ is given by,

$\hat{\sigma}_{0}^{2}(q) \equiv \frac{1}{m} \sum^{n}_{k=q}\big(X_{k} - X_{k-q} - q\hat{\mu}\big)^2$ where

$m = q(nq - q + 1)\bigg(1 - \frac{q}{nq}\bigg)$

This can be computed in R using the following function,

And as with the previous estimators we see that as our sample size increases the estimates improve. It is hard to see in this graph, but the error is smaller than the previous estimator for $\sigma_{0}^{2}$.

The critical characteristic of our two models is that volatility is linear in the sampling interval, meaning that the volatility calculated using half as many points ($q = 2$) should be twice as large as a variance estimated using all of the points ($q = 1$). The estimators take this into consideration so whether you estimate $\sigma_{0}^{2}$ with $q=1$, $q=2$, or even $q=16$, you should get similar estimates. This observation is the heart of the variance ratio test.

# Variance Ratio Properties and Statistics

Because estimates of $\sigma_{0}^{2}$ at different sampling intervals should converge to the same true value, we can define two test statistics whose expected value under the model is zero. These statistics are called variance ratios.

##### $M_d(q)$: differences using the overlapping samples estimator.

This statistic is computed as the difference between the estimate for $\sigma_{0}^{2}$ given a sampling interval $q$ and the estimate for $\sigma_{0}^{2}$  given a sampling interval of 1. Given that $\hat{\sigma}^2_{0}(q)$ and $\hat{\sigma}^2_{0}(1)$ should converge, the expected value of $\hat{M}_{d}(q)$ is zero. Mathematically this variance ratio is expressed as follows,

$\hat{M}_{d}(q) \equiv \hat{\sigma}^2_{0}(q) -\hat{\sigma}^2_{0}(1)$

And in R we can compute $\hat{M}_{d}(q)$ with the following function,

Now let's test to see whether or not this is true. Below is a graph of the computed $\hat{M}_{d}(q)$ statistics for $q = 2$ and $q = 4$ for 500 randomly generated ten year long log price processes with random values for $\mu$ and random values for $\sigma$ with and without stochastic volatility. As can be seen, the values for $\hat{M}_{d}(q)$ are always close to zero.

One observation that can be made is that as the sampling interval increases there is a perceived "degradation" in performance of the statistic; this is actually expected as the limiting distribution of the statistic widens as we increase $q$. A much more significant observation is that the statistic with and without stochastic volatility for any $q$ is almost indistinguishable, which speaks to the robustness of the statistic.

##### $M_r(q)$: differences using the overlapping samples estimator.

This statistic is computed as the ratio of the estimate for $\sigma_{0}^{2}$ given a sampling interval $q$ and the estimate for $\sigma_{0}^{2}$ given a sampling interval of 1 minus one. As with $M_d(q)$ the expected value of this statistic is zero,

$\hat{M}_{d}(r) \equiv \frac{\hat{\sigma}^2_{0}(q)}{\hat{\sigma}^2_{0}(1)} - 1$

And in R we can compute $\hat{M}_{r}(q)$ with the following function,

Let us again test to see whether or not this is true. Below is a graph of the computed $\hat{M}_{d}(q)$ statistics for $q = 2$ and $q = 4$ for 500 randomly generated ten year long log price processes with random values for $\mu$ and random values for $\sigma$ with and without stochastic volatility. As can be seen, the values for $\hat{M}_{d}(q)$ are always close to zero.

As with the $M_{d}(q)$ statistic, we see a widening of the limiting distribution and an almost indistinguishable difference between the log price processes with and without stochastic volatility. We also see that this statistic is more sensitive than the differences statistic. This is a good characteristic.

# A Heteroskedasticity-consistent Variance Ratio Test

I should warn you, this is the part of the article when things get a bit hairy 😯, it took me quite a few readings of Lo and MacKinlay's paper for the idea below to click and for me to implement it in R. I've tried my best to explain it.

Because most academics and practitioners agree that the volatility of asset prices do fluctuate over time, Lo and MacKinlay wanted to make the variance ratio robust to changing variances a.k.a heteroskedasticity and stochastic volatility. Thus any rejection of the random walk hypothesis using their test would not be due to the stochastic nature of volatility or long-run drifts, but rather due to the presence of autocorrelation in the increments of $X$.

Any rejection of the random walk hypothesis using Lo and MacKinlay's test is not due to the stochastic nature of volatility or long-run drifts

The reason why they wanted to focus on testing for the presence of autocorrelation in the increments of $X$ is because this indicates whether or not $X$ satisfies the Markov property and, as per the logic laid out at the start of this article, whether or not future prices could (at least in theory) be partially forecasted using historical prices. To achieve this objective we start with a rather simple observation, namely:

"As long as the increments are uncorrelated, even in the presence of heteroskedasticity, the variance ratio must still approach unity as the number of observations increase without bound. This is because the variance of the sum of uncorrelated increments must still equal the sum of the variances." - Lo and MacKinlay

That having been said, how quickly these variance ratios tend toward unity and their asymptotic variance depends on the nature of the heteroskedasticity present in $X$. In light of this there are two options: either you can specify the model of heteroskedasticity which you are testing for upfront (limiting); or you can make some simplifying assumptions about the nature of the heteroskedasticity present in $X$ in order to generalize the test to any model of heteroskedasticity which satisfies the simplifying assumptions (more widely applicable).

Lo and MacKinlay opted for the latter approach, assumed that the model of heteroskedasticity has a finite variance, and developed their heteroskedastic-consistent variance ratio test accordingly. What this means is that the statistical test is valid for most forms of stochastic volatility used in mathematical finance but not all. In particular the variance ratio test they defined is not applicable to models of heteroskedasticity from the Pareto-Levy family. The assumptions made by Lo and MacKinlay for the null hypothesis are shown in the box below,

We consider the following null hypothesis $H^*$:

(A1) For all $t$, $E[\epsilon_{t}] = 0$, and $E[\epsilon_r \epsilon_{t - \tau}] = 0$ for any $\tau \neq 0$

Explanation: $X_{t}$ contains uncorrelated increments across all periods of time.

(A2) $\{\epsilon_{t}\}$ is a $\Phi$-mixing with coefficients $\Phi(m)$ of size $\frac{r}{2r - 1}$ or is a $\alpha$-mixing with coefficients $\alpha(m)$ of size $\frac{r}{r-1}$, where $r > 1$, such that for all $t$ and for any $\tau \geq 0$, there exists some $\theta > 0$ for which:

$E|\epsilon_{t} \epsilon_{t-\tau}|^{2r} + \theta < \Delta < \infty$

(A3) $\lim_{nq\to \infty}\frac{1}{nq}\sum^{nq}_{t=1} E(\epsilon_{t}^2) = \sigma_{0}^2 < \infty$

Explanations: The second moment is finite, this is assumed because if it wasn't then the variance ratio is no longer well defined i.e. we can't compute a variance ratio is the variance can be infinite.

(A4) For all $t$, $E(\epsilon_{t} \epsilon_{t-j} \epsilon_{t} \epsilon_{t-k}) = 0$ for any non-zero $j$ and $k$ where $j \neq k$

I know how heavy that looks, but the intuition is simple. Our null hypothesis states that: $X$ possesses uncorrelated increments but does allow for general forms of stochastic volatility provided that the second moments and the estimated values for $\sigma^2_{0}$ are also finite. And assuming this is true then we are correct in stating that:

$\hat{M}_r(q)$ approaches zero under $H^*$

Given this Lo and MacKinlay derive the asymptotic variance of $M_{r}(q)$, from which the distribution of expected values for $M_{r}(q)$ under heteroskedasticity can be calculated and used to test the random walk hypothesis:

Denote by $\delta(j)$ and $\theta(q)$ the asymptotic variances of $\hat{\rho}(j)$ (the autocorrelation co-efficient) and $\hat{M}_{r}$, respectively. Then under the null hypothesis $H^*$:

(1) The statistics $\hat{M}_{r}$ and $\hat{M}_{d}$ all converge almost surely to zero for all $q$ as $n$ increases without bound.

(2) The following is a heteroskedasticity-consistent estimator of $\hat{\delta}(j)$:

$\hat{\delta}(j) = \frac{nq \sum^{nq}_{k = j + 1} \big(X_{k} -X_{k-1} - \hat{\mu} \big)^2 \big(X_{k-j} -X_{k-j-1} - \hat{\mu}\big)^2}{\Bigg[ \sum^{nq}_{k=1} \big( X_{k} -X_{k-1} - \hat{\mu} \big)^2 \Bigg]^2}$

(3) And the following is a heteroskedasticity-consistent estimator of $\theta(q)$:

$\hat{\theta}(q) \equiv \sum^{q-1}_{j=1} \Big[ \frac{2(q-j)}{q} \Big]^2\hat{\delta}(j)$

Given a log-price process $X$ and a sampling interval, $q$, the following R function can be used to estimate $\theta(q)$ (the asymptotic variance of the $M_{r}(q)$ variance ratio statistic),

An when we have $\hat{\theta}(q)$, given the log price process, $X$, and the sampling interval, $q$, we can go ahead and standardize the $M_{r}(q)$ statistic to arrive at the final standardized test statistic, $z^*(q)$. Finally!

$z^*(q) = \frac{\sqrt{nq} \hat{M}_{r}(q)}{\sqrt{\hat{\theta}}}$

The above $z^*$-score can be computed using the following R function,

And since this is still asymptotically standard normal, we can use the very common significance levels to check whether or not the value of $z^*(q)$ for any given asset is statistically significant. If it is, then we are either 95% or 99% certain that the asset prices were not generated by a Geometric Brownian Motion model with stochastic volatility, there is some statistically significant autocorrelation in $X$ and, most importantly, the asset probably doesn't evolve according to a random walk and there may be some level of forecast-ability in $X$.

Sourced from: http://www.sciencecartoonsplus.com/gallery/math/index.php#

P.S. If the above explanation (which is simplified when compared to the full derivation done by Lo and MacKinlay) did not make much sense, then I would like to direct you to their original paper. And if you are interested in the size and power of the statistic for finite samples, Lo and MacKinlay wrote a follow-up article about this topic as well:

1. Lo, Andrew W., and A. Craig MacKinlay. "Stock market prices do not follow random walks: Evidence from a simple specification test." Review of financial studies 1.1 (1988): 41-66.
2. Lo, Andrew W., and A. Craig MacKinlay. "The size and power of the variance ratio test in finite samples: A Monte Carlo investigation." Journal of econometrics 40.2 (1989): 203-238.

# Results obtained on Simulated Asset Prices

Before we get to the results on real asset prices it is a very good idea to test our implementation of Lo and MacKinlay's model by computing $z^*(q)$ for randomly generated log price processes with and without stochastic volatility. If the resulting distribution of $z^*$-scores is normally distributed (which we will test using the Shapiro-Wilk test) then we can be quite confident that our code works and that it is, hopefully, free of any serious bugs.

The graph below shows the density of a 2500 random numbers samples from a normal distribution (red) versus the density of $z^*$-scores computed for 2,500 log price prices with the following parameters:

1. A random number of years uniformly distributed between 5 and 25 years,
2. A randomly selected $\mu$ uniformly distributed between -0.25 and 0.25,
3. A randomly selected $\sigma^2_{0}$ uniformly distributed between 0.05 and 0.75,
4. Homoskedastic increments in $\epsilon_{t}$ i.e. constant volatility, and
5. Computed using a sampling interval of two, $q = 2$

The two p-values computed using the Shapiro Wilk test were 0.0545 and 0.3476 for the normally distributed random variable and the computed $z^*$-scores respectively. These both indicate that the distributions are normal. And lastly, the two line graphs below show the QQ-plot of the normally distributed random variable (right) against the QQ-plot of the computed $z^*$-scores. They both look reasonable to me.

And below we have exactly the same simulation results except that they are with stochastic volatility applied to the log price process. The type of stochastic volatility is the one defined under the model specification section.

The two p-values computed using the Shapiro Wilk test were 0.6363 and 0.7011 for the normally distributed random variable and the computed $z^*$-scores respectively. These both indicate that the distributions are normal. And lastly, the two line graphs below show the QQ-plot of the normally distributed random variable (right) against the QQ-plot of the computed $z^*$-scores. Again, they both look reasonable to me.

What can we tell from the above results? First of all we know that if asset prices are generated using a Brownian Motion model with drift and stochastic volatility then they are, more likely than not, going to be marked as random walks using this test (95% or 99% sure depending on the confidence interval). Secondly, we are more confident that my code works. Testing quant code is hard so sometimes you have to resort to Monte Carlo simulations!

# Results obtained on Real Asset Prices

The results below are broken up into two sub-sections:

1. Results obtained on 50 stock market indices from around the world,
2. Results obtained on the current S&P 500 constituent assets, and

##### Methodology followed in producing results

For each section the following methodology was followed for $q = 2$ and $q = 4$,

• If possible, extract the adjusted close price, otherwise, extract the close price and represent this as $S$. This was done to mitigate the effect of stock splits which can create "discontinuities".
• Compute the log price process as $X=ln(S)$.
• Check for infinite values, replace these with NA (missing) values.
• Omit any and all NA (missing) values from the log price process.
• If we are testing the results on individual stocks then:
• Check if the number of historical days exceeds 10 years. If true, then take the past 10 years as a subset and discard the earlier data. This was done to avoid any small-cap data related issues, it should be noted that the conclusions drawn are the same whether or not you follow this step.
• Estimate the value of $\mu$ and $\sigma^{2}_{0}$ for the log price process.
• Simulate a log price process using $\mu$ and $\sigma^{2}_{0}$, $X_{sim}$.
• Compute and store the $z^{*}$-score of $X$ and $X_{sim}$.
• Plot the densities of the computed $z^{*}$-scores.
• Check for normality and then draw conclusions.

##### Results obtained on 50 global Stock Market indices

The first, and most condemning set of results, was obtained on fifty stock market indices from around the world. The indices considered cover the Americas (South and North), Europe, Africa, and the Middle East, as well as the South Pacific and Asia. For $q=2$ 21 stock market indices (42%) of those tested had statistically significant $z^{*}$-scores at the 95% confidence level and 11 stock market indices (22%) had statistically significant $z^{*}$-scores at the 99% confidence level. The distribution of computed $z^{*}$-scores is shown below,

The red plot shows the density of z*-scores computed on simulated assets with the same mu and sigma as the stock market indices. The blue plot shows the density of z*-scores computed on the indices themselves. A large portion of the indices fall outside of the expected distribution and are therefore deemed to be not random.

In order of significance the least likely to be random stock market indices were: The SATRIX Financials Index (ETF), South Africa ($z^{*} = +5.61$), The Jakarta Composite Index, Indonesia ($z^{*} = -5.01$); The INMEX Index, Mexico ($z^{*} = -4.47$), The Colombo All Share Index, Sri Lanka ($z^{*} = -4.45$); The RTS Index, Russia ($z^{*} = -4.35$); The Nifty Fifty Index, India ($z^{*} = -3.75$); The Strait Times Index, Singapore ($z^{*} = -3.69$); The NYSE ARCA Mexico Index, Mexico ($z^{*} = -3.34$); The Bel 20 Index, Belgium ($z^{*} = -3.11$); The BSE 30 Sensitivity Index, India ($z^{*} = -2.83$); and The Dow Jones Industrial Average ($z^{*} = +2.62$). The full set of results is available as a CSV file.

For $q=4$ the distribution of $z^{*}$-scores was almost identical and 9 of the 50 indices had statistically significant $z^{*}$-scores. Three of the above indices fell out and, interestingly, the Rusell 1000 Index from the United States fell in with a $z^{*}$-score of $+2.77$. The full set of returns for q = 4 is available as a CSV file as well.

For the statistically significant indices, this result means that we are 99% certain that they were not generated by a Brownian Motion model with drift and / or stochastic volatility. We can also say that they they almost surely don't evolve according to a random walk. This will be discussed in more detail in the conclusion.

##### Results obtained on the current constituents of the S&P 500

The next set of results is for the past ten years of prices for 484 of the 500 stocks on the S&P 500 currently. Some stocks were removed because data was not available on Yahoo! finance, and others were removed due to data-related issues. For $q=2$ 142 of the 484 stocks (30%) had statistically significant $z^{*}$-scores at the 95% confidence level, and 88 of the 484 stocks (18%) had statistically significant $z^{*}$-scores at the 99% confidence level!

The red plot shows the density of z*-scores computed on simulated stocks with the same mu and sigma as the the stocks on the S&P 500. The blue plot shows the density of z*-scores computed on the S&P 500 stocks themselves. A large portion of the indices fall outside of the expected distribution and are therefore deemed to be not random.

The results here differ from the results on the stock market indices in one major way: the $z^{*}$-scores for stock market indices are skewed to the left of the mean whereas the$z^{*}$-scores for the constituents of the S&P 500 are skewed to the right of the mean. This observation is discussed in some detail in the conclusion.

I can hear you asking, "so which stock is the least random?". Below I have listed the top ten least random stocks according to our heteroskedastic-consistent variance ratio test at $q=2$ over the past ten years:

1. Public Storage, PSA ($z^{*} = 5.80$) - an American international self storage company head-quartered in Glendale, California that is run as a real estate investment trust (REIT).
2. Boston Properties, Inc, BXP ($z^{*} = 5.74$) - a self-managed American real estate investment trust based in Boston, Massachusetts, USA. They mostly acquires, develops and manages class A office space.
3. Equity Residential Common Share, EQR ($z^{*} = 5.61$) - a member of the S&P 500, a publicly traded real estate investment trust based in Chicago, Illinois.
4. Simon Property Group, Inc., SPG ($z^{*} = 5.58$) - is an American commercial real estate company, ranked #1 in the United States as the largest real estate investment trust.
5. Plum Creek Timber Company, Inc., PCL ($z^{*} = 5.52$) - a timberland owner and manager, as well as a forest products, mineral extraction, and property development company.
6. Welltower, Inc., HCN ($z^{*} = 5.44$) - a real estate investment trust that primarily invests in assisted living facilities and other forms of housing and medical facilities for senior citizens.
7. Vornado Realty Trust, VNI ($z^{*} = 5.23$) - a New York based real estate investment trust. It is the inheritor of real estate formerly controlled by companies including Two Guys and Alexander's.
8. Kimberly-Clark Corporation, KMB ($z^{*} = 5.19$) - an American personal care corporation that produces mostly paper-based consumer products.
9. Federal Realty Investment Trust, FRT ($z^{*} = 5.14$) - an equity based real estate investment trust that owns and operates retail properties.
10. Aflac Incorporated, AFL ($z^{*} = 5.01$) - an American insurance company and is the largest provider of supplemental insurance in the United States.

If you are wondering why so many REITs (Real Estate Investment Trusts) are in this list, so am I. I have listed a few hypotheses in the conclusion section but I welcome any of your thoughts as well.

Once again, the results for $q=4$ were similar to the results for $q=2$; 182 stocks out of the 484 (37.6%) had statistically significant $z^{*}$-scores at the 95% level and 116 out of 484 (23.4%) stocks had statistically significant $z^{*}$-scores at the 99% level. The results for q = 2 and the results for q = 4 are available as CSV files.

# Remarks and Conclusions

In this article we have taken our first step down a Non-Random Walk Down Wall Street. We have understood and implemented the heteroskedasticity-consistent variance ratio test defined by Lo and MacKinlay in their seminal paper, Stock market prices do not follow random walks: Evidence from a simple specification test. The goal of this article was to make this powerful test more accessible to researchers and practitioners. So wherever possible I have focuses on the intuition behind the test rather than its derivation, because the intuition isn't hard to grasp.

##### The intuition ... in a nut shell:
1. Using our estimator it doesn't matter if we estimate variance using every observation, or every second observation because they all converge to the same estimate as the number of observations grows.
2. Therefore the ratio of the estimate of variance using every second observation to the estimate of variance using every observation (for example) approaches zero as the number of observations grows.
3. If we assume that prices are uncorrelated and the model of heteroskedasticity has a finite variance then the above statement is true for random walks with constant volatility or stochastic volatility.
4. Given that the ratio of variances approaches zero, we can construct a statistical test around this which is sensitive to autocorrelation but insensitive to drift and stochastic volatility. And, lastly
5. We can then use that statistical test to test the reasonableness of the random walk hypothesis and any failure on the test is a direct statement about the Markov property in the time series being tested.

In this article we implemented Lo and MacKinlay's test in R (all of the code is publicly available), tested that the test works as expected on model data, and then applied that test to two real world data sets: 50 global stock market indices, and 484 constituents of the S&P 500. For the global stock market indices all the available historical data was used and for the the individual stocks the past ten years worth of daily prices were used. Assuming these assets evolved according to a random walk with drift and stochastic volatility we would have expected to see:

• Between 0 and 1 stock market indices passing for sampling intervals of 2 and 4 (1% on average).
• Between 4 and 5 stocks passing for sampling intervals of 2 and 4 (1% on average).

Instead what we observed is that,

• 11 and 9 stock market indices passed for sampling intervals of 2 and 4 (18 to 22%)
• 88 and 116 stocks passed for sampling intervals of 2 and 4 (18 to 23%)

As such, we can quite confidently conclude that the stock market indices and stocks tested do not evolve according to a random walk with drift and stochastic volatility (this is particularly true for the assets which failed the test but could also be true for the others) and therefore: stock markets do not follow random walks.

Stock markets do not follow random walks.

This statement is as true today as it was 28 years ago in 1988 when Lo and MacKinlay concluded the same result on weekly returns data for a number of stocks and a broad based market index in the United States.

During our investigation two unanticipated, but interesting, results were obtained:

1. The distribution of $z^{*}$-scores for market indices was skewed to the left whereas for distribution of $z^{*}$-scores for individual stocks was skewed to the right, and
2. A large number of the top-ten least random stocks on the S&P 500 over the past ten years according to our test have been listed REITs (Real Estate Investment Trusts).

For these two observations I offer up the following two, untested, hypotheses:

1. The autocorrelation of stock market indices is typically positive whereas for individual stocks autocorrelation is typically negative. This might explain the skewness of the $z^{*}$-score distribution.
2. The period tested contains the single largest real-estate market crash since the 1929 Great Depression, namely the 2008 Financial Crisis. As such, the apparent non randomness of these stocks in particular, may be a time-period artefact related to the financial crisis. Thoughts are welcome in the comment section below.

And that marks another nail in the coffin for the random walk hypothesis. That said, I really don't think this should come as a surprise. For decades academics and practitioners have been discovering anomalies in price and returns data which simply shouldn't exist under the random walk hypothesis. In a follow up post I hope to elaborate on some of these anomalies and question whether they are actually "anomalies" in an otherwise efficient system ... or rather the emergent statistical properties of a complex adaptive system (the adaptive market hypothesis).

# Appendix A: Why R?

Why not R? I am a huge Python fan. I have been using Python for five years now. That said, I don't understand this "rivalry" between R and Python. The truth of the matter is that they are both great programming languages and, ultimately, what tool you use should be determined by your use case, not just your comfort zone. As such, some of my blog posts this year may be in Python, others may be in R, and yet some more still may be in C++, Julia, Scala, Rust, Go, or whatever I feel like. And that is because Computer Science and Quantitative Finance is not about programming languages or technologies, it is about the idea that you can apply scientific thought to the markets.

Computer Science is no more about computers than astronomy is about telescopes - Edsger Dijkstra

### Back to the top

1. Was indeed worth, Gordon, to wait some extra time for your next TuringFinance post.

Marvelous piece, again.

• Thanks Michal! 😀

2. Exceptional work here. I just read the AMH. One question I would have is now that more and more trading becomes algorithmic and algorithms will have a different behavior. How do we make sure that the evolutionary perspective will continue to hold. It would be interesting to see what autocorellation looks like in the "algorithmic era". I don't know how to define that, but it would be recent. I'm not just referring to HFT but the gradual trend towards machine learning of course. I know these systems are man made and emotional responses can lead to pulling the plug on a strategy, etc. However, I still have a notion that as more and more hedge funds rely on, and trust these models that they may deviate from human behavioral patterns. That's not to say they won't have inefficiencies and behave as "fish out of water" occasionally (or frequently), but that landscape looks decidedly different in my minds eye.

Looking forward to the follow on work.

P.S if your looking for votes on what to write the next experiments in, mine would be for Scala 🙂

• Thanks Shane!

I prefer the AMH to EMH for two reasons. Firstly, AMH doesn't shy away from real-world complexity and, secondly, AMH diagnoses so-called "inefficiencies" as emergent properties. Autocorrelation is an example of a small but consistent emergent property that I suspect will exist so long as there are a sufficient number of agents in the system which are acting upon past prices. This is regardless of whether they are using historical prices in trading (using past prices to forecast future prices ... or training neural networks to do so), risk management (using past prices to calibrate and stress test models), or pricing models (using past data to test the accuracy of a pricing model). Most efficient market proponents forget that finance isn't just about buying and selling stocks any more.

The real estate bubble formed in 2008 was a nice example of a complex emergent property of the global economy and the stock market which arose from the seemingly unrelated simple behaviours of many agents ... many of which were supported by "evidence" contained in past price data. To be more specified the behaviours which resulted in the bubble were (1) government deregulation, (2) the FED keeping interest rates low post the dotcom bubble, (3) the popularity of securitization and it's products (MBSs, CDOs, etc.) among mutual funds and retirement funds, (4) insurance and re-insurance by banks and other financial institutions, (5) moral hazard by home-lenders, and most importantly to my point (6) the use of the Gaussian Copula to estimate default probabilities in baskets of mortgage-backed securities ... a model which was designed around and fit historical default probabilities and housing prices. So the bubble wasn't random and it wasn't a massive conspiracy of the banks either, it was an emergent property and the subsequent crash was inevitable, not unpredictable. That having been said, it was very difficult to see and that's why so few people managed to short the bubble successfully.

I'm not sure what the future will look like. But I can tell you that three things concern me (1) as you mentioned, the rise of algorithmic trading (see the flash crash), (2) The proliferation of passive investment strategies, index trackers, and index-linked products (zero diversity) and (3) the venture capital bubble in the USA (see the 1990's). That said, I am 99% sure that the markets will never be efficient and stock market prices will never follow a random walk. Which is not to say that randomness couldn't also be an emergent property of the stock market for finite periods of time. All I am saying is that if we waited another 28 years and re-ran the analysis the markets would be as random as they are now or perhaps even more so and the reasons would be the same: market's just aren't efficient.

3. Well done indeed! More quantitative evidence of what people have long known; that markets are not perfectly random in the Gaussian sense. So, there is the famous Keynes quote: "Markets can remain irrational longer than you can remain solvent."

Let me ask the obvious. How does one use this research to build a trading methodology, or perhaps even identify individual trades?

• Hi Stryder, thanks for the comment. First off, it was stated throughout the article this test extends to many more models of market randomness than the original Gaussian i.i.d specification. Secondly, it is a conflict of interest for me to blog about trading strategies because I work for a hedge fund where I develop proprietary quantitative trading strategies everyday. My advice: get creative and the trading strategies will follow!

4. hi stuart
awesome post - thx for it.
do the conclusions only hold for wkly data? if we looked at mthly or annual
data would findings change?
many thx

• Hi D, thanks for the comment! The above results are actually generated on daily data from Yahoo! finance, but I see where the confusion comes from and I should have been more clear in the article. The original test was done on weekly data (probably because it was done way back in 1988) but in my analysis I used daily data.

Anyway, I ran the results again on discrete weekly returns (just for you) and the conclusion remains the same: we see more stocks and stock market indices with statistically significant results than we would expect and the distribution of returns for stock market indices is skewed to the left, whereas for stocks it is skewed to the right:

- Distribution of z* scores for Stock Market Indices (weekly returns) and CSV file
- Distribution of z* scores for S&P 500 assets (weekly returns) and CSV file.

As expected we get fewer statistically significant results because we have 1/5th of the sample size. I think it would be valuable to write a follow-up post where I rerun the analysis from this post and some of the results from the previous post on rolling weekly, monthly, and annual returns. I'd just need to check the independence assumptions.

5. Stuart.

Great article. You've taken a complex topic and presented it in a very accessible way, not to mention demonstrated how Lo and MacKinlay's work can be applied in practice. Thanks also for sharing your code - very useful indeed.

I subscribe to the theory that markets can't be efficient simply because of the breadth of literature identifying past and current pricing anomalies. However, I intuited (not backed up by anything other than anecdotal evidence and my own limited experience) that the more developed a market, the more efficient it would likely be. It follows that, liquidity constraints aside, emerging markets present significant trading opportunities. I was surprised that you found that the Dow Jones Index made the list of the indices least likely to be random. Having said that, all the other indices listed were emerging markets.

I would imagine that institutional traders would limit their participation in emerging markets due to capacity constraints, but I wonder if these markets are profitable stomping grounds for retailers and smaller portfolios. I haven't traded emerging markets, but its something else to explore.

Finally, in response to Stryder's request for trading ideas: the first thing that jumps out quite obviously is to use Stuart's randomness tests as a market filter. For example, in deciding which markets to trade, you could exclude any that fail the non-randomness test. It may be interesting to apply these tests to markets using a rolling window of data in order to draw conclusions about the potential temporal characteristics of randomness, however you'd more than likely run into issues related to not having enough data for any window size worth considering.

Thanks again Stuart, I very much enjoyed your post.

• Thanks Kris, love your blog as well!

I completely agree with your thoughts. I think there is something to be said for efficiency and bigger markets have much deeper liquidity and should be more efficient and, therefore, more random. As such, I was also surprised to see the DJIA pop up in the list of statistically significant results on the indices side. South Africa is an emerging market and from my own anecdotal experiences I would agree with your assertion that emerging markets might offer more trading opportunities but that those opportunities are smaller due to the size of the market and the lack of liquidity.

That said, I don't think that this logic extends to individual assets. Individual assets are non-random everywhere ;-). Just ask the big guns: Simmons, Buffett, etc.

6. Hi Stuart,

Excellent article. At first glance though, it looks as though your code for M_d(q) and M_r(q) are backwards from their formulaic representations. In sigmaA you use q = 1 while in sigmaB you use q = q, where in the formulas above the code, it appears to be the opposite. Hope that helps!

• Hey Ryan, thanks for the comment. You are right, well spotted! It's actually the code which is the wrong way around so I've updated the GitHub gists. Luckily those functions were only coded up for illustrative purposes; the calibrateAsymptoticVariance function (and its helper function) contain all the logic required to compute the z* scores. Thanks again.

7. Hi Stuart,

Excellent work! Thank you for explaining us a new statistical toll and providing us some good pieces of code and graphics. This definitely gives us some strong empirical evidence against the Markov property of financial time series.

I would like to mention that some interpretations of the EMH postulate the market processes should be Martingales, which are not equivalent to Markovian processes, even though a large class of financial models belong to both classes (the Brownian motion being our greatest example).

Some general martingales would be part of your third kind of random walk, thus they could be a possible alternative for us to reject the Null Hypothesis. It would be interesting to go one step further and test this weaker class of random walk.

PS: In your first definition of a random walk with stochastic volatility, I think you have made a mistake when you defined sigma squared as a Gaussian random variable. In the following parts, you implemented what I consider a fair model, where sigma, not sigma squared, is normally distributed.

• Hi Guilherme, thanks for the comment. I would love to take credit but the statistical test isn't new. The credit for the idea is due solely to Lo and MacKinlay (1988). My objective is simply to make it a bit more accessible.

A Markov process is a process without memory, whereas a Martingale process is one where the average value is constant over time. So I would say that a Martingale is a stricter form of the random walk hypothesis. It is also applicable to models of volatility and interest rates more-so than equity (what was considered here). But perhaps I am wrong in that interpretation. Nevertheless, along with pareto and fractal models of stock market price evolution, this would be an interesting topic to explore. Hopefully I will find some time to cover those topics in the future :-).

P.S. I will double check the notation and include it in an errata / updates section which I will be publishing over the weekend. Thanks again!

8. Man. Those maths are so beautiful, I wanna return to math. Thanks for the post.

• Pleasure :-). It took me a while to get the maths in the original paper, but when I did, I just had to write it up. It is beautiful.

9. Hello Stuart,

Thank you for the post. Really interesting topic and results. Wanted to add my 2 cents to the discussion.

Results produced in your post say that the observed data is not compatible with random walk models in consideration. Results don’t say anything about the probability that the model itself it true or not. These models can still be of use in some applications to quant finance. It depends on the problem you are solving with them.

In real asset results section, it is incorrect to say that there are some stocks are least likely to be random basing your judgments on the magnitude of z-scores. Magnitude of z-scores does not reflect the size of an effect or the importance of a result. It is not correct to say that e.g. Public Storage is less likely to be random than Aflac Incorporated. High z-scores are weak evidence against H0. Conversely, large effects may produce low z-scores. Let alone the selection bias in investment universe presented in this section.

As mentioned, the fact that you see the data incompatible with random walk model as you defined it says nothing about the model. You might define a thousand more random walk models (and there are dozens of them exist in quant world), and test them against data. In that case, you would get some data compatible with some of them, and some data - not really. Some models of random walk may have so many degrees of freedom and be so "sophisticated" that any data would be easily explained by them. And you see that we have a problem of induction here.

It is interesting how biased we all are in terms of drawing conclusions from limited information. We take time series of unknown properties. We take a model. We test it with data. Sometimes data is compatible with the model, sometimes it is not. We don’t test initial model assumptions. We tend to draw overgeneralized conclusions about the model, while in reality statistical method gives us information about the data. We tend not to see the problem of induction.

In the end, these subtle moments in judgement make the difference. Your post is named “Stock Market Prices Do Not Follow Random Walks”, which is basically not correct statement. The produced results don’t allow us to jump to this conclusion. Instead, what we see that some data tested is not compatible with typical basic random walk models used by modern alchemists (quants). That fact may support other judgement about markets and/or quant researchers themselves. This is it.

In the end, I would like to share it with you and your readers the recent ASA's statement on statistical significance misuse and bad science http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108

Citing ASA’s statement, we should remember that scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold (arbitrary?).

Best regards,
Pavel.

• Hey Pavel,

Thanks for the insightful comment -- worth much more to me than just 2 cent :-).

You are quite right in your initial assessment: This refutation of the random walk hypothesis and any other refutation is model specific and I have not tested every possible model of a random walk. You are also correct that the choice of p-value threshold is rather arbitrary in the scientific community and in this work as well. And regarding your statement that "magnitude of z-scores does not reflect the size of an effect or the importance of a result" - you're right this is true. The Z scores are more interesting than they are useful which is why I didn't recommend using them.

Having conceded those points I think that the conclusion, "stock market prices do not follow random walks" still stands for four very simple reasons which I hope you will appreciate:

Firstly, testing every possible model specification is impossible because it is easy to respond to each and every refutation of the random walk hypothesis with an even more convoluted and more nuanced random walk model which does pass even the most sophisticated randomness tests (that's what has been happening for 30 years). But as with all disputed theories there always comes a point where the adapted theory becomes more complex than the simple, unmistakable conclusion that the theory is, quite simply, wrong. We have reached that point with the random walk hypothesis: countless researchers and academics have challenged the theory and attempted to bury it but those statisticians who support it are now adept at responding with ever more convoluted models which undermine empirical results and statistical tests. This is exactly what happened in the late 1800's when scientists kept coming up with exciting new theories like Lorentz Contraction to defend their misguided theory of absolute space. So whilst it would be easy for me to make the conclusion that A: "stock market prices must therefore follow a more idealized random walk specification" it is even easier to make the conclusion that B: "stock market prices do not follow random walks". Ultimately A and B are empirically equivalent but, theory B has fewer assumptions. All we are assuming with B is that there is some degree of predictability in stock market prices - what's so crazy about that? Anyway, as history has taught us it is the simplest theories which usually triumph [1]. That's Occam's razor for you.

The second, third, and fourth reasons are quite trivial and make for less a less exciting debate but they must be said:

The second reason is that this test extends to a large class of random walk models and, as discussed in the article, is really looking for statistically significant autocorrelations in the returns structure. If such autocorrelations exist then this represents a violation of all forms of the random walk hypothesis because the walks are therefore quite literally "not random" i.e. they contain some degree of predictability (however small it may be) given past returns. The third reason is that the original (1987) and the replicated results are statistically significant to many different thresholds: 95%, 99%, 99.9% and every universe of securities I have applied these tests too contain a statistically significant amount of statistically significantly Z-scores. In other words, all markets contain assets which are not strictly unpredictable. The evidence speaks for itself. And, lastly, the fourth reason is that there is zero "selection bias" in the above analysis because the indexes are indexes - no biases there - and the random walk hypothesis, if true, should apply equally to each and every asset including every possible subset of those assets even a subset of stocks which have survived to today (the S&P 500 right now). The inclusion of more assets will not magically shrink the Z-scores of the assets I have already run the analysis on so the conclusion still stands and can only be bolstered by the inclusion of more assets into the universe. This isn't a trading strategy. Speaking practically I don't have an unbiased dataset but if you do I welcome you to prove me wrong :-).

Anyway, getting back to the philosophy of science I must admit that I disagree with your comment on a much deeper level because I despise the nihilistic approach to statistics and the science of uncertain systems which many people seem to have adopted in recent years. Granted, statistical significance testing has flaws - what method doesn't? - but isn't throwing our arms up in exasperation and doing nothing more flawed than using statistical significance testing? And is never coming to a conclusion and therein not furthering the scientific body of knowledge really better than potentially believing the wrong conclusion for some period of time? Even when we recognize that this is the very nature of science to be wrong? I don't think so. Which is why I have no problem using statistical significance testing which has worked for all the great scientific minds who came before me. I will, however, do my very best to make sure that all the assumptions are transparent.

Furthermore, please note that despite my conclusions I have never once argued that the falseness of the random walk hypothesis undermines it's usefulness to quants. I use random walks every day, I just know not to believe them. As they say, all models are wrong but some models are useful. The random walk hypothesis is one such more: wrong, but useful.

Lastly, I just wanted to point out that your comment, whilst great for provoking critical thoughts for my readers, offers no practical advice whatsoever and is, therefore, not actionable. So if you did have any suggestions for further analysis or perhaps a better methodology for performing the above analysis I would really love to hear it. Thanks again for your comment Pavel.