# The Promise of Computing

You would be forgiven for thinking that Moore's law is a law like Newton's laws. It really does seem that as surely as an apple will fall to the ground, so too shall our computers, phones, tablets, and (now) watches capacity increase year-after-year at an exponential rate ... but Moore's law is not a law, it is the promise of computing.

And it is the kind of promise that is hard to keep. The future of computation has always been dependent on the many smart men and women who dedicate their lives to making possible the impossible. Ubiquitous computing, and all that comes with it, only exists because those people overcame the many thousands of theoretical and engineering problems which stood in the way of the next generation of computers.

I recently had the privilege to meet a bunch of these smart people who are trying to push the boundaries of computing through the manipulation of quantum systems a.k.a quantum computing. After this mind-bending encounter I wrote this essay because of the many spooky correlations I see between the current state of quantum computing and the state of classical computing from the 1930's to the mid 1960's.

These people - physics Professors and PhD students from all over the world - congregated at a stunning little beach resort just north of Durban in South Africa to talk candidly about the intersection of quantum computing and machine learning; an area which is now known as "quantum machine learning". The conference was hosted by the University of KZN's Center of Quantum Technology. In particular Seth Lloyd, an MIT professor, pioneer of theoretical quantum computing, and co-creator of the HHL quantum algorithm which lies at the heart of many of published quantum machine learning algorithms since 2008, was in attendance ... I was a just a lucky fly on the wall.

At this point in time quantum machine learning - as grandiose and singularity-inducing as it sounds - is easily summed up as: "writing algorithms for computers which do not exist". This seems quite ridiculous but let's not forget that the same thing could have been said about John McCarthy's conference on "Artificial Intelligence" at Dartmouth in 1956. It took 40 years after this conference for a computer to beat Chess and a further 20 years to beat Go ... despite the fact that many of the fundamental principles were already well-known in the 1950's.

It was this characteristic of the workshop - its ability to talk seriously about algorithms designed to run on computers that do not exist and which may never exist - that made such an impression on me and inspired this essay. What follows is a noisy - in the Information Theory sense - history of classical computation from Hilbert's program to the personal computer as well as the more recent history and current state of quantum computation.

My aim with this essay is to illustrate that, despite all the challenges, quantum computation has the potential to revolutionize computation and machine learning ... but only if we give it the kind of support our predecessors gave classical computation in the previous century. The mantle has been passed and it's up to us to live up to it.

Before I get into the essay I have an announcement and a disclaimer to make. Sorry I have not been blogging regularly, I am working furiously on my Master's Thesis: "An Exact Test of Market Efficiency using Online Machine Learning". Secondly, I am not all-knowing so if (when) you find errors please let me know in the comment section below; and if you find this interesting, I've included a massive list of detailed references at the end of this essay.

Update: I have changed a few things around in the quantum computing section, and will probably do so again shortly, thanks to some extremely insightful inputs from Ryan Sweke.

# Computation "before" computers

It's impossible to pinpoint the beginning of a concept like computation. One might be tempted to start with Alan Turing and his Automatic Machine - later renamed the Universal Turing Machine (UTM) - but that would gloss over too many important nuances of computation. Nevertheless we must draw a line in the sand and start somewhere. So for the sake of some great analogies further down the line, the beginning of our story is 1675 London.

St Paul's Cathedral, characterized by its massive stone domes, is a breathtaking, architectural masterpiece. At the time of construction working out what shape would make the strongest possible arch between two pillars was a challenging problem. Spherical domes, whilst aesthetically pleasing, are not strong structures and could not have supported the weight of the Lantern at the top of the dome.

Luckily an early analog computing device called a rope can solve the problem rather easily. If you hang a slack rope between the two pillars it will assume a specific shape called a catenary. This shape, when inverted, corresponds exactly to the required thrust to build the strongest arch. In physical terms we can say that the rope found its lowest energy state. We can also say that the rope is used to compute the desired quantities. In this specific instance, a rope is acting as a computer.

To be more specific, the rope is acting as an analog computer. An analog computer is one which solves a mathematical problem by means of a physical analogy. A wind tunnel is another example of an analog computer which is still used today to simulate the distribution of air pressure over and under new aircraft designs' wings.

Credit for the rope example goes to Scott Aaronson whose blog, Shtetl-Optimized, has been a great inspiration to me over the past few years. For more information about the example please see this page from plus magazine and this Wikipedia entry on the catenary or just drop me a mail.

The simple idea which our rope computer conveys is that devices can be designed to solve specific mathematical problems. These devices are not universal - meaning that they can solve every computable problem (more on that later) - they are dedicated. Long before the first general purpose computer was built and even before the idea of a general purpose computer even existed there were many dedicated computational devices.

For the sake of completeness I will mention four very famous dedicated computational devices. The first, and oldest known analog computing device from 100 BC, is the Antikythera Mechanism. This Mechanism which predicted astronomical positions and even eclipses. The second is the Analytical Engine developed by far-ahead-of-his-time visionary, Charles Babbage, and programmed by the world's first programmer, Ada Lovelace. The third, is the Differential Analyzer, an analog computer designed to solve complex differential equations such as those that arise when determining the trajectories of missiles or rockets. Lastly, the fourth dedicated quantum computational device is the D-Wave Quantum Computer, but I will get back to that towards the end of this essay.

Dedicated computational devices are nifty, but just imagine how frustrating it would be if almost every mathematical problem you came across needed a different physical device to be solved. This seems alien to us now because we have grown up surrounded by general purpose computers. But as early as the 19th century the hunt was on for systems, whether mechanical or purely logical, which could do more. In the early 1930's these systems finally materialized as Alan Turing's Universal Turing Machine and Alonzo Church's Lambda Calculus.

# Consistent? Complete? Decidable?

The chain of ideas that lead Alan Turing to his discovery of Universal Turing Machines and Church to his discovery of the Lambda Calculus can be traced back to the most towering figure of mathematics in the 19th century: David Hilbert. Hilbert pioneered many fields of mathematics including invariance theory (the same field Jim Simons is known for), functional analysis, and the axiomatization of geometry. Hilbert was a mathematical badass.

That said, I think that Hilbert's biggest contribution came from the direction he gave mathematics under "Hilbert's Program", the goal of which was to reduce all of mathematics to a formal system which was:

1. Consistent ~ no contradictions can be obtained within this formal system,
2. Complete ~ all true propositions can be proven within this formal system, and
3. Decidable ~ there exists an algorithm for determining the truth or falsity of any proposition.

In so doing Hilbert hoped to eradicate the many inconsistencies and contradictions which plagued all fields of mathematics. An example of a contradiction is the proposition that "this statement is false". This was an ambitious goal which set the research agenda of a generation of mathematicians, logicians, and philosophers. In particular two British mathematicians, Bertrand Russell and Alfred Whitehead, answered Hilbert's call.

Russell and Whitehead coauthored a famous three-volume, book called Principia Mathematica (often denoted as PM). The goal of PM was to construct a mathematical formalism consisting of axioms and inference rules within which all mathematical truths could be proven. PM was trying to meet Hilbert's challenge and for many years after it was first published, scholars believed it had achieved - or at least almost achieved - this lofty goal.

It is worth mentioning exactly how PM attempted to meet this goal. Essentially PM made certain methods in mathematics "illegal". These operations included metamathematical propositions (propositions about propositions such as "this statement is false") and recursion. The rationale for this was that these two simple operations appeared to lie at the heart of all the known mathematical inconsistencies and contradictions.

.

A lovely extract from PM which proves that 1 + 1 = 2.

.

Enter Kurt Friedrich Gödel. Gödel was an Austrian logician, mathematician, and philosopher who belonged to the famous Vienna Circle. In 1931 Gödel published two groundbreaking Incompleteness Theorems which proved that any mathematical formalism powerful enough to describe the arithmetic of natural numbers could neither be complete nor could it demonstrate its own consistency. Gödel showed us that there are mathematical truths which cannot be proven and in so doing he also showed us that contradictions and inconsistencies are not extraneous to mathematics, they are an integral part of it. Truth out runs proof. Contradictions are unavoidable.

The amazing thing about Gödel's proof is that he did it within the formalism defined by Russell and Whitehead in PM. To (over)simplify it, Gödel derived a formula which claims that it is unprovable in PM. Thus if the formula were provable, it would also be false, which means the system is not consistent and that provable statements are not always true. This was achieved using a unique and influential mathematical coup de grâce called Gödel numbering.

With Gödel's proof two-thirds of Hilbert's dream had been crushed. All that remained was the hope that the formalism would, at the very least, be decidable meaning that any true but unprovable proposition would be known to the true but unprovable from the outset. This is the problem which gave birth to the computer.

# Computability and the Church-Turing Thesis

In the early 1930's, on opposite sides of the Atlantic, two great mathematicians were working on a disproof of Hilbert's Entscheidungsproblem which, when translated from German, means The Decision Problem. These two men, Alonzo Church and Alan Turing, were putting the final nail in coffin for Hilbert's program with two distinct, but mathematically equivalent, systems of computation:  Lambda (λ) calculus and Turing Machines.

Lambda Calculus is a formal system devised by Alonzo Church which uses two operators, variable binding and substitution, to model computation. Around the time it was published Alan Turing was independently working on another model of computation which he called the Automatic Machine. Upon hearing of the work by Church, Turing left King's College London temporarily to study at the Institute for Advanced Study (IAS) with Church. There Turing proved Lambda Calculus and the Automating Machine are equivalent and later published his seminal paper entitled: "On Computable Numbers with an Application to the Entscheidungsproblem". The contents of this paper formed the basis of the theory of Computability also fondly known as the Church-Turing thesis.

.

.

So what is computability? Computability is simply the ability to solve a problem in a mechanical or algorithmic way. Therefore, computable problems are those problems for which a mechanism or algorithm exists to solve it.

The problem of building the strongest arch between two pillars is a computable problem because we can hang a rope between the pillars to get the desired quantities. Likewise, the problem of determining the distribution of pressure over an airplane's wings is a computable problem because we can build a model of the plane and put it in a wind tunnel to get the desired quantities. But there are many problems ropes and wind tunnels cannot solve ...

Church's and Turing's shared genius was to develop systems which were able to solve any computable function. Put simply, they proved that everything that can be computed can be computed with a Turing Machine or λ-calculus. Turing and Church went on to demonstrate that there exist uncomputable problems and it is this observation which holds the key to the proof that any mathematical formalism powerful enough to describe the arithmetic of natural numbers is not consistent, complete, nor is it decidable ...

Consider the following. We create a Turing Machine, $O$ for oracle, which returns true or false if any Turing Machine halts or does not halt. Then we create a Turing Machine, $D$ for Deceiver, which halts if $O$ halts. What would happen if we asked $O$ whether $D$ halts? This causes a contradiction because $D$ halts if $O$ halts and $O$ can only return true if $D$ halts. A contradiction usually means that our assumptions were not valid. In this case, the existence of the Oracle $O$ is invalid because we can always write a Turing Machine which $O$ cannot decide whether or not it halts. As such, $O$, cannot exist! This problem is called The Halting Problem and it proves that any mathematical formalism powerful enough to describe the arithmetic of natural numbers cannot be decidable because there is simply no way of knowing (deciding upfront) that the program will or will not run forever.

I realize that this is a confusing explanation - most proofs by contradiction are somewhat confusing - so if you are confused perhaps any of these answers will help: answer 0answer 1, answer 2, answer 3,  and answer 4.

For those of you who are as interested in the mathematical foundations of Computer Science as I am ... and who spent as much time in traffic as I do ... I highly recommend listening to copies of the following audio-books. They cover this topic in much more detail than I could hope to in this short essay:

I finished this audio book earlier this year. It is easy listening and spends a considerable amount of time talking about the theoretical side of computer science and the engineering challenges faced by early digital computers.

I am currently listening to this book. It's a really fascinating story of information through the ages and elaborates on a lot of theoretical computer science topics in an easy to understand way. I highly recommend it

# So what exactly is a Turing Machine?

I didn't elaborate on how a Turing Machine actually works in the previous section because it isn't particularly relevant to the theory of computability. There are actually many formal systems equally powerful to the Turing Machine. These systems include λ-calculus, μ-recursive functions, and even Artificial Neural Networks! The important point is that these systems can compute anything which is computable, they are universal.

However, Turing Machines are special because they are inherently mechanical and bear some resemblance to early digital computers. So without further ado ... here's an explanation of a Turing Machine.

.

A diagram representing the architecture of a Turing Machine

An example of a finite set of instructions for a Turing Machine (Busy Beaver Algorithm)

A Turing Machine operates over an infinitely long tape which is divided into discrete cells. A simple one-tape Turing Machine consists of a finite, non-empty set of states, $Q$ and a finite, non-empty set of tape symbols, $\Gamma$. The set of input or output symbols, $\Sigma$, is usually just the set of tape symbols with a blank symbol $b$ included. The Turing Machine also has an initial state, $q_0$, and a set of final or accepting states, $F$, which belong to the set of all states. The brain of the Turing Machine is a transition function, $\delta$, which allows it to transition between the different states. Depending on what state the Turing Machine is in, and what symbols it scans, it can shift left or right along the tape and write a symbol to the tape. The Turing Machine halts when it is in an accepting state.

The tape symbols in a classical Turing Machine are usually binary i.e. $\Gamma = \{0, 1\}$ and the states that the Turing Machine may exist in are classical discrete states e.g. $Q = \{A, B, C\}$. Later I will explain that a Quantum Turing Machine is very similar to a classical Turing Machine except that it operates over quantum states rather than classical states. Just like a Classical Turing Machine a Quantum Turing Machine transitions through various "states" until it reaches an accepting state at which point it halts and the solution has been found.

A prerequisite for the computability proofs by Turing and Church is that the tape be infinitely long. Since no tape can be infinitely long a Turing Machine, despite being mechanical, is a theoretical construct. All real-world implementations of Turing Machines a.k.a. all computers are therefore technically Linearly Bounded Finite State Machines (FSM) also known as a Restricted Turing Machine. However, any sufficiently large Finite State Machine can be used to simulate just about any Turing Machine meaning that you and I don't notice a difference.

In fact there are many different kinds of Turing Machines including: Alternating Turing Machines, Non-deterministic Turing Machines, Probabilistic Turing Machines, and Quantum Turing Machines which I will discuss later on. The study of what types of Turing machines can solve what types of problems is called computational complexity. Generally speaking there are a handful of computational complexity classes you need to know about,

1. $P$ ~ The set of decision problems (problems with a yes or no answer) that can be solved by a Deterministic Turing Machine in polynomial time. These problems are called tractable.
2. $NP$ ~ The set of decision problems that can be solved by a Non-deterministic Turing Machine in polynomial time. A typical NP problem asks: "Are there any solutions that satisfy certain constraints?".
3. $\#P$ ~ The set of counting problems associated with the decision problems in the set NP. In other words a typical $\#P$ problem asks: "How many solutions that satisfy certain constraints?"
4. $NP-Hard$ ~ The class of problems that are at least as hard as the hardest problem in the set of all NP problems. These problems are called "intractable" e.g. The Travelling Salesman Problem.

Quantum Turing Machines introduce a few more complexity classes which, to put it simply, allow them to solve certain NP problems in P time. As such, large-scale quantum computers will have far-reaching implications which I will mention later on. For a list of complexity classes see the Complexity Zoo curated by Scott Aaronson.

But for now let's talk about the early classical computers!

There are at least two YouTube channels which I think everyone interested in mathematics and computer science should watch: Computerphile and Numberphile. The videos are almost always entertaining and have been the source of some of my more inspired ideas.

# The Computer Race! ... and Hybrids?

The race to build the first stored program digital computer, like many other significant inventions, sprang forth from a military use case: breaking codes. At the beginning of World War II the Germans were using an advanced system to encrypt their communications called the Enigma. The Enigma machine was an electromechanical device which implemented a polyalphabetic cipher along with some other annoying complexities.

Mathematically a cipher is just a function for turning ordinary messages, called plaintext, into encrypted messages, called ciphertext. So long as this function is computable, a computer can - at least theoretically - solve it. I say theoretically because the computational complexity of the solution may essentially make it unsolvable. Modern encryption relies on the complexity of a problem called integer factorization for their security ...

Cryptanalysts, researchers who specialize in the making and breaking of codes, knew this and therefore endeavoured to design computational devices which could decrypt the ciphertext. Typically these devices were built to look for patterns, in a similar way to how quants look for patterns in the market. The first computational device for breaking the Enigma was developed by the Polish Cipher Bureau and was called the cryptologic bomb.

After the invasion of Poland the notes and designs for the cryptologic bomb were taken to the UK Government Code and Cypher School (GC&CS) at Bletchley Park. Alan Turing, who was working for GC&CS at the time, extended the design of the cryptologic bomb and produced a new computational device for breaking the Enigma called The Bombe! This story was documented in a movie called The Imitation Game after another paper written by Turing which actually has nothing to do with WW2 code breaking, but does sound pretty cool.

.

The Enigma

The Bombe

.

A little fact that I found mildly interesting was that Alan Turing travelled to the United States to assist with their Bombe project. The US's ability during the war to produce so many Bombes gave them an edge over the Axis forces. On this trip Alan Turing spent time at Bell Laboratories and happened to have lunch everyday with another iconic figure in Computer Science, Claude Shannon, the inventor of Information Theory

An important point is that the Bombe was a computational device dedicated to the breaking of Enigma codes. Just like our arch-computing rope and wind-tunnels, it couldn't be used to solve other problems. By contrast a stored program computer is one which can be configured to solve different problems. As such, a stored program computer is general purpose but not universal, since the memory of a stored program computer is finite.

The first stored program digital computer was also developed at GC&CS for breaking codes and was called the Colossus. The Colossus first worked in December 1943 and was operational in 1944, far too late to make a meaningful difference in WW2. The Colossus project was a closely guarded military secret at the time so the ENIAC, completed in 1945, is often awarded the title "first stored program digital computer".

Actually around this time there were at least five major projects on the go to build stored program computers: the Collosus at GC&CS, the Electronic Numerical Integrator And Computer (ENIAC) in the US Army,  the Electronic Delay Storage Automatic Calculator (EDSAC) by the University of Cambridge, the Electronic Discrete Variable Automatic Computer (EDVAC), the Automatic Computing Engine (ACE) by the National Physics Laboratory in the UK, and the Manchester Mark 1 computer at the University of Manchester.

Instead of focussing on their differences like most historical essay do, I am going to focus on their similarities and the shared technical challenges that these devices faced. Later you will notice the similarities between these computers the current problems facing large-scale general purpose quantum computers.

##### Competing architectures

During this period there were a LOT of different possible architectures and design choices available when trying to build a stored program digital computer. The different options available included, but were not limited to:

1. Binary or Decimal Arithmetic? The Colossus computer used binary arithmetic and  the original ENIAC computer used decimal arithmetic however its successor, the EDVAC, changed to binary arithmetic.
2. Delay lines or Cathode Ray Tubes (CRT)? A number of possible memory stores existed include slower and cheaper delay lines used in the EDSAC and faster and more expensive Williams cathode ray tubes used in the Manchester Mark 1 computer. The William's tube was the first random-access digital storage device.
3. Distributed computation or Accumulators? The ACE computer, designed by Alan Turing, employed distributed processing, whilst the ENIAC and EDVAC computers, designed by Von Neumann, had a centralised structure consisting of many accumulators which could do specific functions such as addition.
4. Purely digital or Hybrid Analog-Digital? Hybrid computers used digital and analog methods together with one another. At this stage it wasn't immediately obvious that digital computing would "win out" over analog. In fact, work on analog computers like the Differential Analyzer continued into the 1960's at NACA/NASA.

There were a lot of ideas floating around about how exactly one should build a computer. For a fascinating example of one such architecture see the somewhat controversial "First Draft of a Report on the EDVAC" by John Von Neumann. Many of the ideas floating around had also never been tried before and were certainly not guaranteed to work. One particularly crazy idea was the use of Gin in delay lines because of its "acoustic properties".

##### Difficult to manufacture and use

It was extremely difficult to manufacture computer components for a few reasons. Firstly, because the components were specialized and not used in many other places so many of the components were hand made! Secondly, many of the components were being built for the first time so information about how well they worked and how they would perform over extended periods of time in conjunction with other components was not available.

In addition to being hard to manufacture these early computers were also notoriously difficult to program. "Programming" these computers usually involved plugging wires into different locations on a massive plug board and hoping you got it right. An interesting fact about early programming was that it was done primarily by woman.

Early programmers working on the ENIAC.

##### Noise and Scalability

Another problem facing early stored program computers was noise. Noise is defined as "irregular fluctuations that accompany a transmitted electrical signal but are not part of it and tend to obscure it." Shannon showed us that the fundamental problem of all communications - whether by African drums, smoke signals, telegrams, telephone, or email - is the reproduction of the message at the end of the channel it was sent through. This is a challenge because all channels are noisy and whatever data is sent through them will, to some extent, be corrupted.

For this reason significant effort was directed towards the construction of error correcting codes and signal amplification. These techniques use redundancy and re-transmission to mitigate the adverse effects of noise. In the 1950's this was a problem a user might conceivably face. Today this problem is solved and few people when writing software on computers are concerned about memory corruption due to channel noise.

Early stored program computers were built using valves and vacuum tubes. The smallest such devices were roughly the size of the top of your thumb! What this meant was that computers were extremely large and often took up entire rooms. Furthermore the valves and vacuum tubes were not exactly reliable. The size of these computers had one noteworthy impact: the amplification of noise. This is due to the fact that the longer the channel over which a message is communicated, the more difficult it is to communicate that message without error.

Early "Miniature" Vacuum Tubes

##### Limited Use Cases

Lastly, the imagined use cases of these early stored program machines were rather narrow. Imagined use cases included cryptography, simulation of physical systems, calculation of difficult mathematical quantities, and abstract mathematics. Alan Turing's paper, "Computing Machinery and Intelligence" which postulated that computers could be used to imitate human intelligence was perhaps the most creative use case. I doubt any of the early pioneers of computing would have believed that computers would become so ubiquitous and used for everything from self-driving cars to playing Pokémon GO ... I'm stuck at level 22 if you were wondering 😉 .

My point here is quite simply that had you been alive and intimately familiar with technology in the 1940's you would have had a great many reasons to be skeptical of these "giant electronic brains". Why? Because they were not a sure thing. There were many uncertainties and risks associated with computing so dedicating your life to computers was as much an act of faith as it was an act of sheer bloody mindedness.

# The Transistor!

At the heart of all modern electronics is the transistor. A transistor is a switch which can occupy one of two states: on or off, 1 or 0. Transistors are easier to produce than vacuum tubes and can also be made a lot smaller. The transistor was first practically implemented in 1947 by John Bardeen, Walter Brattain, and William Shockley who were working at Bell Laboratories at the time. They received the Nobel Prize in Physics in 1956.

In addition to the transistor, the Integrated Circuit was developed in 1958 which allowed sets of electronic circuits to be printed on chips of semiconductor material. As such, by 1959 computers were being designed to work on small printed circuits of transistors. This made for smaller, more powerful, and more reliable computers.

These two technologies, the transistor and the integrated chip, are the foundations of the next greatest invention in the history of classical computing: the microprocessor. A microprocessor essentially implemented the full architecture of a 1950's computer onto a single integrated chip. Intel introduced its first 4-bit microprocessor called the 4004 in 1971 and its 8-bit microprocessor called the 8008 in 1972. These microprocessors had up to a hundred transistors onboard. A 2014 Quad Core i7 Haswell has 1,400,000,000 processors.

The number of transistors on different microprocessors from the Intel 4004 to today.

However, as many of you will know. We have hit the limits of how small we can reliably make transistors. As such, most of the advances in the past few years have come from increasing the number of cpu's on a chip and not the raw speed of any one given CPU. This fact has led many people to claim that "Moore's Law is dead".

This is overly pessimistic and akin to saying "progress is dead" or - if you extrapolate too far - "Science is Dead". Nothing could be further from the truth. There are still many new discoveries to be made and much room for improvement. Plus, whilst classical computing was taking over the world a quiet revolution has been brewing and its name is quantum computing ... but before I get into that let me tell you a bit about the quantum revolution.

# Meanwhile ... the Quantum Revolution

At the turn of the 20th century classical physics was faced by some serious problems, observations and experimental evidence which could not be explained by the laws of physics as they existed then.

The first problem was that if atoms did exist, they shouldn't last long. Remember the model of the atom you learnt about in school? In that model the electrons were orbiting the nucleus in a circular motion. Objects in a circular motion are accelerating due to the change in direction ... but Maxwell's equations of electromagnetism predict that accelerating electric charges emit energy in the form of light. Since the electrons are losing energy they should, according to classical physics, spiral into the nucleus. And all that should happen in an instant, meaning that atoms like you learnt about in school should not exist according to classical physics.

The second problem was that atoms should emit light of all colours, not just discrete spectral lines unique to each element as had been observed. The third problem was called the "Ultraviolet Catastrophe" which was prediction that an ideal blackbody would emit energy, firstly, at all frequencies and, secondly, an infinite amount of it! This contradicted the law of conservation of energy. Lastly, the photoelectric effect - the instantaneous ejection of electrons from a material when light of different frequencies is shone onto it - was not consistent with classical physics because, firstly, it should not occur instantaneously, and secondly, it should depend on the intensity of the light not its frequency. Put simply physics was a mess, kind of like finance today will all of its "anomalies".

A visual explanation of the photoelectric effect, an effect which could not be explained by classical physics. In the photoelectric effect two strange things occur (1) electrons are emitted from the material immediately and (2) the frequency of the light, not its brightness (intensity), that determines the energy of the electrons that are ejected.

.

The transition to quantum started with Max Planck in 1900 who showed that the ultraviolet catastrophe could be resolved by assuming that atomic vibrations are quantized, meaning that they only occur in multiples of some constant, $h$. The resulting formula, $E = hf$, measures the energy, $E$, of an atomic particle vibrating at frequency, $f$Planck’s constant is a measure of the fundamental “graininess” of our universe at small scales.

The transition to quantum continued in 1905 - the same year he wrote seminal papers on Special Relativity and Brownian Motion - when Einstein famously explained the photoelectric effect. Einstein declared that the energy in a light wave is not uniformly distributed, it is concentrated in particle-like quanta which we now call "photons". Furthermore, the energy of a photon is quantized according to Planck's formula, $E = hf$. Einstein explained that the photoelectric occurs because it takes a certain minimum amount of energy to eject an electron from a material. Therefore, if the light shone onto the material had too low of a frequency (it was redder) then fewer electrons will be omitted. But if the light is a higher frequency the electrons only need to be hit by one photon to be ejected.

In 1913 Niels Bohr finalized the transition to quantum when he proposed the Bohr model of atom. In Bohr's model the atomic orbits of electrons are quantized meaning that only certain discrete orbits are allowed. These orbits correspond to the discrete values of an electron's energy measured in units of Planck's constant. Bohr proposed that the angular momentum, $L$, of an orbiting electron is given by $L = \frac{h}{2p}$, where $p$ is the electron's momentum. In Bohr's model electrons don't radiate electromagnetic waves - they don't lose energy - so they don't crash into the nucleus. They stay in orbit and emit electromagnetic waves if, and only if, electrons jump between orbits.

Profiles of Max Planck, Albert Einstein, and Neils Bohr

These contradictions came down to one unintuitive difference between classical physics and the soon-to-be-developed quantum mechanics. In classical physics the world is continuous and deterministic, but in quantum mechanics the world is discontinuous (made up of little bits) and is, as a result, stochastic.

# Quantum Weirdness

An unintended consequence of quantization is non-determinism a.k.a randomness and uncertainty. When we see objects in the real world, our eyes are actually absorbing the photons which have reflected from that object. Planck's constant sets the minimum change in energy that can occur and Einstein tells us that Planck's constant is also equal to the minimum energy of the electromagnetic wave, a photon, itself. So what happens if we want to observe an electron? The only way to do this is to bounce a photon off of it but by doing that we influence the electron, we transfer energy to it. We can't observe the subatomic world without also influencing it.

.

##### The first weird thing: Superposition

This observation lies at the heart of Heisenberg's famous uncertainty principle which states that we cannot simultaneously measure both a particles velocity and position. Suppose we wanted to measure the position of an electron with high precision. We would do this by bouncing one photon off of it. In order to get an accurate position measurement we need to use a photon with a high frequency because the "wiggling" of the wave is confined to a smaller region. However, according to the equation $E = hf$ if the photon has a high frequency then it has a high energy so it transfers a lot of energy to the electron thereby substantially altering its velocity.

This phenomenon has been experimentally verified time and time again using the famous double slit experiment. In this experiment electrons are sent through a microscopic double slit one-at-a-time. When the electrons are left to their own devices - i.e. they are not observed - they behave like waves and form an interference pattern.

But when we try to observe the electrons as they travel through - by bouncing photons off of them - they stop behaving like a wave and start behaving like a particle and no interference pattern is formed! This phenomenon is called wave-particle duality. The electron is both a wave and a particle. It is in a superposition.

.

A diagram showing the two possible outcomes of the double slit experiment without and with observation.

.

Erwin Schrödinger explained this with his wave mechanics. Under this description each particle is described by a wave equation - a partial differential equation - for which each solution corresponds to the probability of an events (a combination of space and time). As such, the wave equation gives us the probabilities of a particle being at any point in space at any given moment in time. However, when we measure a particle in superposition it gives us a definitive answer. At that moment the superposition ceases to exist and the particle exists in a definitive state. The wave function us said to have collapsed and one of the possible events has occurred. This all comes down to Planck's constant which tells us that observation is not side-effect free. The act of observation is a tangible thing.

Think of Planck's constant as limiting the precision with which we can measure the world. Because $h$ is so tiny, this uncertainty has a negligible effect on measurements done in the real world. That's why classical physics works. However, at the subatomic scale this makes a big difference ... it causes randomness and uncertainty.

.

##### The second weird thing: Entanglement

Entanglement is another strange quantum phenomenon in which pairs of particles interact in such a way that the quantum state of each particle cannot be described independently of the others, even when the particles are separated by a large distance. The quantum state must be described for the system as a whole.

Consider a pair of electrons which are produced by a decaying particle and fly apart in opposite directions. All electrons have a property called spin or angular momentum. Since the decayed particle had no spin the electrons must have opposite spins. So if you measure the spin of one of the electrons you immediately know the spin of the other. The two electrons are entangled and remain so regardless of the distance between them.

Entanglement was the subject of some controversy in physics. Einstein, Podolsky, and Rosen argued that entanglement implies that there exists an underlying objective and deterministic reality. However Bohr disagreed with this and argued that the observation of one half of the entangled particles in one location caused the other half to collapse in a different location instantaneously i.e. faster than the speed of light. Einstein called this "spooky action at a distance". However, this is not a violation of General Relativity - which states that information cannot travel faster than the speed of light - because no classical information is communicated.

This debate was pretty much settled by John Stewart Bell who came up with a really ingenious experiment to test whether or not the spin of the particles was pre-determined. The video below explains this better than I can,

.

Now we can get to the fun part :-). How can we take all of these cool (and really weird) phenomena and make a quantum computer out of them? Spoiler alert: scientists are not 100% sure, but progress is being made!

I spent a lot of time in traffic ;-). So for those of you who not only find theoretical computer science interesting but also physics, and in particular, quantum mechanics, I recommend the following audio books:

# Feynman's Conjecture and Quantum Computation

Richard Feynman was an American theoretical physicist who did a lot of work to champion quantum mechanics. In 1965 Feynman received the Nobel Prize in Physics for his work on quantum electrodynamics and in 1982 Feynman proposed a Universal Quantum Simulator after proving that a classical Turing Machine would experience an exponential slowdown when trying to simulate any quantum phenomena. In 1985 David Deutsch extended this ideas and proposed the first theoretical model of a Universal Quantum Turing Machine which is essentially an extension of classical Turing Machine which allows it to take advantage of superposition and entanglement.

.

##### Concept one: The Qubit

A quantum bit (qubit) is a two state quantum mechanical system meaning that it consists of two independent quantum states let's call them $|1\rangle$ and $|0\rangle$ for simplicity. The qubit may be in either $|1\rangle$ or $|0\rangle$ or any state in between, $|\phi\rangle = \alpha |1\rangle+\beta |0\rangle$. Here $\alpha$ and $\beta$ are probability amplitudes and must satisfy the constraint that $|\alpha|^2+|\beta|^2 = 1$. Put simply, $\alpha$ and $\beta$, when squared, are the probabilities of the qubit being either $|1\rangle$ or $|0\rangle$. Just like a particle can be in two places at once according to Schrödinger's wave equations, so too can our qubit be. If you were wondering this notation is called Bra-Ket or Dirac notation. It was introduced by Paul Dirac in 1939.

I am sure you are wondering "how is that even possible?". Good question! When we are talking about a bit in classical computer science we are actually talking about a transistor which is either on or off. There is a physical analog for what we are talking about when we talk about bits. In exactly the same way when we talk about qubits we are talking about things in the real world which exist in superposition. Examples include the Josephson JunctionTrapped Ions, and Electrons which can have up or down spin. A short list can be found here.

A qubit can be built by trapping charged atomic particles, such as ions, in free space using electromagnetic fields.

.

##### Concept two: Hilbert Spaces

Remember when I said that David Hilbert - whose program to find a complete, consistent, and decidable formal systems for all of mathematics indirectly lead to the discovery of the computer - a was mathematical badass? Well in addition to his program he generalized Euclidean geometry - the mathematics of distance and space in two and three dimensions - to any finite and even infinite dimensions. These high dimensional spaces are now called Hilbert Spaces and they are used in  to describe the states that quantum mechanical systems may occupy. The Hilbert Space which describes a qubit is 2-dimensional, the Hilbert Space which describes $n$ qubits is $2^n$-dimensional. This is what is known as the "exponential explosion" and really explains why we need quantum computing.

To illustrate this a little bit more technically consider the example of two vectors on a cartesian plane (2-dimensional Euclidean space). These two vectors, let's call then $\textbf{X}$ and $\textbf{Y}$, have coordinates $(X_1, X_2)$ and$(Y_1, Y_2)$. To calculate the dot product (inner produce) of these two vectors we just sum the product of their coordinates, $(\textbf{X},\textbf{Y})=\sum^{2}_{i=1}X_{i}Y_{i}$. To calculate the norm of the vector (its length) we calculate the square root of the sum of the coordinates squared, $|\textbf{X}|=\sqrt{\sum^{2}_{i=1}X_{i}^{2}}$. One popular Hilbert space, $\mathcal{H}$, is a generalization of this to $\infty$-dimensions. In this case the vectors are given as, $\textbf{X} = (X_{1}, X_{2}, ..., X_{\infty})$ and $\textbf{Y} = (Y_{1}, Y_{2}, ..., Y_{\infty})$ and the inner product of $\textbf{X}$ and $\textbf{Y}$ is given by, $(\textbf{X},\textbf{Y})=\sum^{\infty}_{i=1}X_{i}Y_{i}$ and the norm is given by, $|\textbf{X}|=\sqrt{\sum^{\infty}_{i=1}X_{i}^{2}}$. Or, in the much cleaner bracket notation, the inner product is simply given by $\langle X | Y \rangle$ and the norm by $\langle X | X \rangle$. A Hilbert space is a space where the norm is bounded from above by infinity, $\langle X | X \rangle < \infty$ and is complete.

##### A Better Explanation

Ryan also sent me the following comment over email with a much better explanation of Hilbert Spaces. So I thought I would rather include it here and award him the credit.

Technically speaking a Hilbert space is just any "complete, inner product space", meaning that it is any vector space (including finite vector spaces) equipped with an inner product, under which it is complete with respect to the norm induced by the inner product. In other words, almost all vector spaces which you can equip with an inner product are Hilbert spaces!

The Hilbert space which describes n qubits is $2^n$ dimensional (to be precise the Hilbert space is $\mathbb{C}^{2^n}$). This is called the "exponential explosion" and is exactly why we need quantum computers! To be more precise, the size of the Hilbert space grows exponentially with the number of particles (qubits), and therefore to simulate such systems on classical computers requires resources which grow exponentially.

The dimension of the Hilbert space (which is basically just an augmented vector space) of a many particle quantum system grows exponentially with the number of particles. This is because of the tensor product structure of many particle systems. Therefore, to describe the state of an $n$-qubit system you have to provide $2^n$ complex numbers - the amplitudes of all the basis vectors of the vector space, and the square roots of the associated probabilities.

This is intractable on classical computers as $n$ grows even to modestly large values - even 20 qubits is hard for a classical computer to simulate without some very clever tricks!

I realize that may seem quite complex, so here is the good news ... you don't really need to know the maths to get the concept behind a Quantum Turing Machine and, by extension, a Quantum Computer. You just need to know that these nice things called Hilbert Spaces, $\mathcal{H}$, are used to describe the states that quantum mechanical systems may occupy like like the set of $\{A, B, C\}$ described the states our Classical Turing Machine could occupy earlier.

.

##### Concept three: A Quantum Turing Machine

A Turing Machine operates over an infinitely long tape which is divided into discrete cells. A simple one-tape Turing Machine consists of a finite, non-empty set of states, $Q$ and a finite, non-empty set of tape symbols, $\Gamma$. The set of input or output symbols, $\Sigma$, is usually just the set of tape symbols with a blank symbol $b$ included. The Turing Machine also has an initial state, $q_0$, and a set of final or accepting states, $F$, which belong to the set of all states. The brain of the Turing Machine is a transition function, $\delta$, which allows it to transition between the different states. Depending on what state the Turing Machine is in, and what symbols it scans, it can shift left or right along the tape and write a symbol to the tape. The Turing Machine halts when it is in an accepting state.

Now we can describe a Quantum Turing Machine with three tapes. The first and last tapes are for the input and output which may be classical inputs and the second tape stores intermediary calculations between input and output. Our Quantum Turing Machine replaces the set of states it can occupy with a Hilbert Space, $Q$ and it also replaces the symbols on the secondary tape with another Hilbert Space, $\Gamma$, where the blank symbol represents the zero vector. The Quantum Turing Machine also has an initial state which may be quantum of classical and a set of final or accepting states, $F$. The brain of the Quantum Turing Machine is a transition function, $\delta$, which preserves the Hilbert Space. In order to preserve the Hilbert Space the transitions must be a collection of unitary matrices. The Quantum Turing Machine halts when it is in an accepting state. See, it's pretty similar to a Classical Turing Machine.

That is except for one fundamental difference ... A Turing Machine operates over classical states and a Quantum Turing Machine can operate over superpositions of classical states. This is what gives Quantum Turing Machines a massive speed advantage over classical Turing Machines. Remember that the study of computational complexity is the study of what problems can be solved by what kinds of Turing Machines? Well Quantum Turing Machines can solve some NP-hard problems in P time. The two most important quantum complexity classes are $BQP$ and $QMA$:

1. $BQP$ ~ The set of decision problems solvable by a Quantum Turing Machine in polynomial time, with a bounded error probability of at most 1/3 for all instances. Integer Factorization belongs to $BQP$ because of Shor's algorithm and classical $NP$ which implies that there are some intermediary complexity classes.
2. $QMA$ ~ The set of decision problems which are not solvable by a Quantum Turing Machine in polynomial time. This is the complexity class we need post-quantum encryption to be in.

# What's the Big Deal?

Quantum computers, if implemented, would be exponentially - or at least quadratically - better at performing certain computations. The two biggest quantum algorithms are Shor's Algorithm and Grover's Algorithm. Shor's Algorithm would allow us to factorize integers in polynomial time which would make all modern encryption schemes - which rely on it being intractable - insecure. Grover's Algorithm would allow us to find elements in an unordered list (or entries in a database) in fewer than $N$ operations where $N$ is the length of the list. To be more specific we would be able to find entries in $\sqrt{N}$ which is a quadratic speed-up. Searching unstructured data would never be the same again.

Harrow, Hassidim and Lloyd recently showed us that you can solve sets of linear equations faster using a quantum computer. Solving linear equations is a fundamental computer science problem which lies at the heart of most machine learning and optimization methods. Their algorithm, called the HHL algorithm - of as Seth nonchalantly puts it, the Holy Hell algorithm - could revolutionize the way we tackle machine learning, data science, and even quantitative finance problems. For a theoretical example see this paper entitled: "Quantum Support Vector Machine for Big Data Classification" by Patrick Rebentrost. That's pretty exciting right?

.

.

But here is the catch. There are no quantum computers. Nobody has been able to scale a quantum computer beyond about 10 qubits and we need roughly 40 for quantum computing to start making a dent. So without further ado, let's take a quick look at current generation real-world quantum computers!

# Current Generation Quantum Computers

Let me first mention some of the big accomplishments in this field. In 2001, a paltry 15 years ago, researchers first demonstrated Shor's algorithm on a 7-qubit computer. This quantum computer was an example of a Nuclear Magnetic Resonance (NMR) quantum computer. In 2005 and 2009  the first semiconductor chip ion trap and solid-state quantum processor were built. In 2011, D-Wave Systems announced the D-Wave One quantum computing device with a 128 qubit processor. In 2011 it was proven that quantum computers can be made with a separate RAM as per the Von Neumann architecture. In 2013 Google and NASA announced that they were launching the Quantum Artificial Intelligence Lab (QuAIL). In 2014 some information came out that the NSA was running  a research program to develop a quantum computer. And for the first time, this year, a programmable quantum computer was developed.

Note that I called the D-Wave a quantum computing device not a universal quantum computer. Essentially the D-Wave is a device which can solve hard combinatoric optimization problems using a process called quantum annealing which is analogous to simulated annealing in Artificial Intelligence. How does Quantum Annealing work? It works exactly like our rope computer I described at the beginning of this essay. Just like the rope which tends towards its lowest energy state which corresponds to an optimal arch design so too can quantum systems be designed to optimize certain quantities when moving towards a low energy state. Nevertheless, Quantum Annealing does offer the potential to construct an Adiabatic Quantum Computer. For more information about their systems I highly recommend watching their videos.

Despite all this progress nobody can claim that they have successfully built a scalable general-purpose, quantum computer and the reasons for this are all too similar to the challenges faced by early digital computers.

##### Competing architectures ... and hybrids again?

There are a number of different technologies you could use to potentially build universal quantum computers. Some of the competing technologies including superconducting qubits (the technology Google is betting on), ion traps, Nuclear Magnetic Resonance (NMR), Linear OpticsQuantum Dots, and many others. It isn't clear which technology is the best, by the two biggest areas of research seem to be superconducting qubits and ion traps. This uncertainty of new technologies is very similar to the uncertainty that was experienced when building early digital computers. Another option which has been gaining some traction is the idea of hybrid classical-quantum computers. This reminds me a great deal of the argument for hybrid analog-digital computers that was made in the 1940's and 50's.

.

##### Difficult to manufacture and use

Cooling on a D-Wave

Some quantum phenomena - such as those in superconducting qubits - are only observed at extremely low temperatures. In fact, the temperatures I am talking about are colder than outer space! Naturally this makes things manufacturing certain types of quantum computers quite difficult because you end up needing to build an entire room full of "refrigerators" dedicated to keeping your quantum chip cool!

That having been said, using a quantum computer would probably not be as difficult as using an early classical computer for one simple reason: the internet. It is entirely possible that Quantum Computers will never sit on our desks or in our pockets. Quantum Computers may only be accessible via an API sitting on a cloud exactly like IBM is doing now. But to play devils advocate, the exact same argument was proposed in relation to early digital computers which gave their users access to a thin client and carefully managed computer resources using a time sharing system ;-).

.

##### Noise and scalability

The single biggest problem facing the development of a large-scale quantum computer is, in my opinion, analogous to noise in early digital computers and it's called quantum decoherence. Quantum systems are extremely sensitive to external influences because of them existing in the subatomic realm. Light particles can interfere with the wave functions causing them to lose information to their environments which causes the quantum weirdness we spoke about earlier to stop. Coherence is a fundamental property of quantum mechanics, and is a prerequisite to quantum computing. Nobody has worked out how to effectively scale up a quantum computer.

A visual explanation of quantum decoherence.

.

##### Limited Use Cases

Lastly, the use cases of a quantum computer are not - again in my opinion - fully understood. Just like during the 1940's and 1950's when classical computers were being developed we have narrow vision. All of the best applications of quantum computers which have been discovered or thought of so far, with the exception of the HHL algorithm which is only 7 years only, have to do with breaking codes. Now doesn't that sounds awfully familiar?

# Some Concluding Thoughts

The theoretical foundations which lead to the discovery of the Universal Turing Machine began in the 19th century with visionaries like Babbage and Hilbert. It was only decades later, in 1936, that the theoretical idea of the Universal Turing Machine was introduced by Alan Turing. In 1946, ten years later, the first digital computer was developed. From the 1940's to the early 1960's progress was hard earned and the applications of computers were confined. It was the introduction of the transistor, integrated circuit, and finally the microprocessor in 1971 which kickstarted Moore's law and gave rise to the ubiquity we see today. Classical computing was an overnight success that took the better part of 40 years to happen despite the impetus of World War 2 and the Cold War.

Likewise the theoretical foundations which led to the discovery of the Universal Quantum Turing Machine began with Hilbert, Max Planck, Einstein, Bohr, Heisenberg, and Schrödinger from 1900 to the 1930's. It was only decades later, in 1982, that esteemed professor Richard Feynman proposed the first theoretical model of a Universal Quantum Simulator. These ideas led David Deutsch to the discovery of the Universal Quantum Turing Machine in 1985 only 31 years ago. In the past thirty years researchers have managed to devise numerous architectures for quantum computers and in 2001 implemented the first quantum computer. Right now I think it is safe to say that quantum computing is waiting on its microprocessor; the technology that will take it a giant leap forward.

Is history repeating itself? I think so. There is no physical law in the universe which prohibits large-scale quantum computers. Therefore it is either possible, or in failing we will make one of the greatest discoveries of modern science: that everything we know is wrong. Given the weight of evidence in favour of quantum mechanics, I'm optimistic that science will prevail and we will find that elusive technology needed to take us a giant leap forward into the age of quantum computing. Moore's Law is not a physical law like Newton's Law's of motion. It is a simple promise: so long as there are unreasonable men, women, researchers, scientists, and companies in this world we will never stop moving forward. I would like to end off with a relevant quote by George Bernard Shaw,

.

The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw

# Lots of References

If you managed to get this far then kudos to you!  😀 Here is a long list of references.

##### Theoretical Computer Science + Quantum Computing

If you notice anything I really should have mentioned please let me know in the comment section below. And if you find any mistakes, please also let me know so that I can update the essay. Thanks for reading!

1. Wow, Stuart, again a marvelous piece to read. So, you went into QC. Maybe you would like a few Quantum Computing related ideas from Q4Q >>> http://www.quantumforquants.org/question/complexity-classes-are-we-on-the-cusp-of-change/

Welcome to the challenges of the EXPTIME / EXPSPACE corner of the Complexity-ZOO.

• Thanks Michal, good to hear from you again! Yip, I started getting interested in it earlier this year and I thought it was time to write something about it! I didn't know about quantumforquants.org, I am looking at it now and it's really cool. Thanks for the link!

2. I would love to know why so many people did not imagine current computing use cases once a universal form of computation was proven. I read a research article using the Delphi method which predicted writing or reading information from the brain will never occur. It was written in the 60's or a bit later. I guess my consternation is this: how can the brain solve uncomputable problems? It can't or perhaps it is a class of problem solving machinery we have yet to comprehend. This is not the mainstream view yet empirically it appears true.