The Promise of Computing
You would be forgiven for thinking that Moore's law is a law like Newton's laws. It really does seem that as surely as an apple will fall to the ground, so too shall our computers, phones, tablets, and (now) watches capacity increase yearafteryear at an exponential rate ... but Moore's law is not a law, it is the promise of computing.
And it is the kind of promise that is hard to keep. The future of computation has always been dependent on the many smart men and women who dedicate their lives to making possible the impossible. Ubiquitous computing, and all that comes with it, only exists because those people overcame the many thousands of theoretical and engineering problems which stood in the way of the next generation of computers.
I recently had the privilege to meet a bunch of these smart people who are trying to push the boundaries of computing through the manipulation of quantum systems a.k.a quantum computing. After this mindbending encounter I wrote this essay because of the many spooky correlations I see between the current state of quantum computing and the state of classical computing from the 1930's to the mid 1960's.
These people  physics Professors and PhD students from all over the world  congregated at a stunning little beach resort just north of Durban in South Africa to talk candidly about the intersection of quantum computing and machine learning; an area which is now known as "quantum machine learning". The conference was hosted by the University of KZN's Center of Quantum Technology. In particular Seth Lloyd, an MIT professor, pioneer of theoretical quantum computing, and cocreator of the HHL quantum algorithm which lies at the heart of many of published quantum machine learning algorithms since 2008, was in attendance ... I was a just a lucky fly on the wall.
At this point in time quantum machine learning  as grandiose and singularityinducing as it sounds  is easily summed up as: "writing algorithms for computers which do not exist". This seems quite ridiculous but let's not forget that the same thing could have been said about John McCarthy's conference on "Artificial Intelligence" at Dartmouth in 1956. It took 40 years after this conference for a computer to beat Chess and a further 20 years to beat Go ... despite the fact that many of the fundamental principles were already wellknown in the 1950's.
It was this characteristic of the workshop  its ability to talk seriously about algorithms designed to run on computers that do not exist and which may never exist  that made such an impression on me and inspired this essay. What follows is a noisy  in the Information Theory sense  history of classical computation from Hilbert's program to the personal computer as well as the more recent history and current state of quantum computation.
My aim with this essay is to illustrate that, despite all the challenges, quantum computation has the potential to revolutionize computation and machine learning ... but only if we give it the kind of support our predecessors gave classical computation in the previous century. The mantle has been passed and it's up to us to live up to it.
Before I get into the essay I have an announcement and a disclaimer to make. Sorry I have not been blogging regularly, I am working furiously on my Master's Thesis: "An Exact Test of Market Efficiency using Online Machine Learning". Secondly, I am not allknowing so if (when) you find errors please let me know in the comment section below; and if you find this interesting, I've included a massive list of detailed references at the end of this essay.
Update: I have changed a few things around in the quantum computing section, and will probably do so again shortly, thanks to some extremely insightful inputs from Ryan Sweke.
Computation "before" computers
It's impossible to pinpoint the beginning of a concept like computation. One might be tempted to start with Alan Turing and his Automatic Machine  later renamed the Universal Turing Machine (UTM)  but that would gloss over too many important nuances of computation. Nevertheless we must draw a line in the sand and start somewhere. So for the sake of some great analogies further down the line, the beginning of our story is 1675 London.
St Paul's Cathedral, characterized by its massive stone domes, is a breathtaking, architectural masterpiece. At the time of construction working out what shape would make the strongest possible arch between two pillars was a challenging problem. Spherical domes, whilst aesthetically pleasing, are not strong structures and could not have supported the weight of the Lantern at the top of the dome.
Luckily an early analog computing device called a rope can solve the problem rather easily. If you hang a slack rope between the two pillars it will assume a specific shape called a catenary. This shape, when inverted, corresponds exactly to the required thrust to build the strongest arch. In physical terms we can say that the rope found its lowest energy state. We can also say that the rope is used to compute the desired quantities. In this specific instance, a rope is acting as a computer.
To be more specific, the rope is acting as an analog computer. An analog computer is one which solves a mathematical problem by means of a physical analogy. A wind tunnel is another example of an analog computer which is still used today to simulate the distribution of air pressure over and under new aircraft designs' wings.
Credit for the rope example goes to Scott Aaronson whose blog, ShtetlOptimized, has been a great inspiration to me over the past few years. For more information about the example please see this page from plus magazine and this Wikipedia entry on the catenary or just drop me a mail.
The simple idea which our rope computer conveys is that devices can be designed to solve specific mathematical problems. These devices are not universal  meaning that they can solve every computable problem (more on that later)  they are dedicated. Long before the first general purpose computer was built and even before the idea of a general purpose computer even existed there were many dedicated computational devices.
For the sake of completeness I will mention four very famous dedicated computational devices. The first, and oldest known analog computing device from 100 BC, is the Antikythera Mechanism. This Mechanism which predicted astronomical positions and even eclipses. The second is the Analytical Engine developed by faraheadofhistime visionary, Charles Babbage, and programmed by the world's first programmer, Ada Lovelace. The third, is the Differential Analyzer, an analog computer designed to solve complex differential equations such as those that arise when determining the trajectories of missiles or rockets. Lastly, the fourth dedicated quantum computational device is the DWave Quantum Computer, but I will get back to that towards the end of this essay.
Dedicated computational devices are nifty, but just imagine how frustrating it would be if almost every mathematical problem you came across needed a different physical device to be solved. This seems alien to us now because we have grown up surrounded by general purpose computers. But as early as the 19th century the hunt was on for systems, whether mechanical or purely logical, which could do more. In the early 1930's these systems finally materialized as Alan Turing's Universal Turing Machine and Alonzo Church's Lambda Calculus.
Consistent? Complete? Decidable?
The chain of ideas that lead Alan Turing to his discovery of Universal Turing Machines and Church to his discovery of the Lambda Calculus can be traced back to the most towering figure of mathematics in the 19th century: David Hilbert. Hilbert pioneered many fields of mathematics including invariance theory (the same field Jim Simons is known for), functional analysis, and the axiomatization of geometry. Hilbert was a mathematical badass.
That said, I think that Hilbert's biggest contribution came from the direction he gave mathematics under "Hilbert's Program", the goal of which was to reduce all of mathematics to a formal system which was:
 Consistent ~ no contradictions can be obtained within this formal system,
 Complete ~ all true propositions can be proven within this formal system, and
 Decidable ~ there exists an algorithm for determining the truth or falsity of any proposition.
In so doing Hilbert hoped to eradicate the many inconsistencies and contradictions which plagued all fields of mathematics. An example of a contradiction is the proposition that "this statement is false". This was an ambitious goal which set the research agenda of a generation of mathematicians, logicians, and philosophers. In particular two British mathematicians, Bertrand Russell and Alfred Whitehead, answered Hilbert's call.
Russell and Whitehead coauthored a famous threevolume, book called Principia Mathematica (often denoted as PM). The goal of PM was to construct a mathematical formalism consisting of axioms and inference rules within which all mathematical truths could be proven. PM was trying to meet Hilbert's challenge and for many years after it was first published, scholars believed it had achieved  or at least almost achieved  this lofty goal.
It is worth mentioning exactly how PM attempted to meet this goal. Essentially PM made certain methods in mathematics "illegal". These operations included metamathematical propositions (propositions about propositions such as "this statement is false") and recursion. The rationale for this was that these two simple operations appeared to lie at the heart of all the known mathematical inconsistencies and contradictions.
.
.
Enter Kurt Friedrich Gödel. Gödel was an Austrian logician, mathematician, and philosopher who belonged to the famous Vienna Circle. In 1931 Gödel published two groundbreaking Incompleteness Theorems which proved that any mathematical formalism powerful enough to describe the arithmetic of natural numbers could neither be complete nor could it demonstrate its own consistency. Gödel showed us that there are mathematical truths which cannot be proven and in so doing he also showed us that contradictions and inconsistencies are not extraneous to mathematics, they are an integral part of it. Truth out runs proof. Contradictions are unavoidable.
The amazing thing about Gödel's proof is that he did it within the formalism defined by Russell and Whitehead in PM. To (over)simplify it, Gödel derived a formula which claims that it is unprovable in PM. Thus if the formula were provable, it would also be false, which means the system is not consistent and that provable statements are not always true. This was achieved using a unique and influential mathematical coup de grâce called Gödel numbering.
With Gödel's proof twothirds of Hilbert's dream had been crushed. All that remained was the hope that the formalism would, at the very least, be decidable meaning that any true but unprovable proposition would be known to the true but unprovable from the outset. This is the problem which gave birth to the computer.
Computability and the ChurchTuring Thesis
In the early 1930's, on opposite sides of the Atlantic, two great mathematicians were working on a disproof of Hilbert's Entscheidungsproblem which, when translated from German, means The Decision Problem. These two men, Alonzo Church and Alan Turing, were putting the final nail in coffin for Hilbert's program with two distinct, but mathematically equivalent, systems of computation: Lambda (λ) calculus and Turing Machines.
Lambda Calculus is a formal system devised by Alonzo Church which uses two operators, variable binding and substitution, to model computation. Around the time it was published Alan Turing was independently working on another model of computation which he called the Automatic Machine. Upon hearing of the work by Church, Turing left King's College London temporarily to study at the Institute for Advanced Study (IAS) with Church. There Turing proved Lambda Calculus and the Automating Machine are equivalent and later published his seminal paper entitled: "On Computable Numbers with an Application to the Entscheidungsproblem". The contents of this paper formed the basis of the theory of Computability also fondly known as the ChurchTuring thesis.
.
.
So what is computability? Computability is simply the ability to solve a problem in a mechanical or algorithmic way. Therefore, computable problems are those problems for which a mechanism or algorithm exists to solve it.
The problem of building the strongest arch between two pillars is a computable problem because we can hang a rope between the pillars to get the desired quantities. Likewise, the problem of determining the distribution of pressure over an airplane's wings is a computable problem because we can build a model of the plane and put it in a wind tunnel to get the desired quantities. But there are many problems ropes and wind tunnels cannot solve ...
Church's and Turing's shared genius was to develop systems which were able to solve any computable function. Put simply, they proved that everything that can be computed can be computed with a Turing Machine or λcalculus. Turing and Church went on to demonstrate that there exist uncomputable problems and it is this observation which holds the key to the proof that any mathematical formalism powerful enough to describe the arithmetic of natural numbers is not consistent, complete, nor is it decidable ...
Consider the following. We create a Turing Machine, for oracle, which returns true or false if any Turing Machine halts or does not halt. Then we create a Turing Machine, for Deceiver, which halts if halts. What would happen if we asked whether halts? This causes a contradiction because halts if halts and can only return true if halts. A contradiction usually means that our assumptions were not valid. In this case, the existence of the Oracle is invalid because we can always write a Turing Machine which cannot decide whether or not it halts. As such, , cannot exist! This problem is called The Halting Problem and it proves that any mathematical formalism powerful enough to describe the arithmetic of natural numbers cannot be decidable because there is simply no way of knowing (deciding upfront) that the program will or will not run forever.
I realize that this is a confusing explanation  most proofs by contradiction are somewhat confusing  so if you are confused perhaps any of these answers will help: answer 0, answer 1, answer 2, answer 3, and answer 4.
For those of you who are as interested in the mathematical foundations of Computer Science as I am ... and who spent as much time in traffic as I do ... I highly recommend listening to copies of the following audiobooks. They cover this topic in much more detail than I could hope to in this short essay:
I finished this audio book earlier this year. It is easy listening and spends a considerable amount of time talking about the theoretical side of computer science and the engineering challenges faced by early digital computers.
I am currently listening to this book. It's a really fascinating story of information through the ages and elaborates on a lot of theoretical computer science topics in an easy to understand way. I highly recommend it
So what exactly is a Turing Machine?
I didn't elaborate on how a Turing Machine actually works in the previous section because it isn't particularly relevant to the theory of computability. There are actually many formal systems equally powerful to the Turing Machine. These systems include λcalculus, μrecursive functions, and even Artificial Neural Networks! The important point is that these systems can compute anything which is computable, they are universal.
However, Turing Machines are special because they are inherently mechanical and bear some resemblance to early digital computers. So without further ado ... here's an explanation of a Turing Machine.
.
A diagram representing the architecture of a Turing Machine
A Turing Machine operates over an infinitely long tape which is divided into discrete cells. A simple onetape Turing Machine consists of a finite, nonempty set of states, and a finite, nonempty set of tape symbols, . The set of input or output symbols, , is usually just the set of tape symbols with a blank symbol included. The Turing Machine also has an initial state, , and a set of final or accepting states, , which belong to the set of all states. The brain of the Turing Machine is a transition function, , which allows it to transition between the different states. Depending on what state the Turing Machine is in, and what symbols it scans, it can shift left or right along the tape and write a symbol to the tape. The Turing Machine halts when it is in an accepting state.
The tape symbols in a classical Turing Machine are usually binary i.e. and the states that the Turing Machine may exist in are classical discrete states e.g. . Later I will explain that a Quantum Turing Machine is very similar to a classical Turing Machine except that it operates over quantum states rather than classical states. Just like a Classical Turing Machine a Quantum Turing Machine transitions through various "states" until it reaches an accepting state at which point it halts and the solution has been found.
A prerequisite for the computability proofs by Turing and Church is that the tape be infinitely long. Since no tape can be infinitely long a Turing Machine, despite being mechanical, is a theoretical construct. All realworld implementations of Turing Machines a.k.a. all computers are therefore technically Linearly Bounded Finite State Machines (FSM) also known as a Restricted Turing Machine. However, any sufficiently large Finite State Machine can be used to simulate just about any Turing Machine meaning that you and I don't notice a difference.
What about Complexity?
In fact there are many different kinds of Turing Machines including: Alternating Turing Machines, Nondeterministic Turing Machines, Probabilistic Turing Machines, and Quantum Turing Machines which I will discuss later on. The study of what types of Turing machines can solve what types of problems is called computational complexity. Generally speaking there are a handful of computational complexity classes you need to know about,
 ~ The set of decision problems (problems with a yes or no answer) that can be solved by a Deterministic Turing Machine in polynomial time. These problems are called tractable.
 ~ The set of decision problems that can be solved by a Nondeterministic Turing Machine in polynomial time. A typical NP problem asks: "Are there any solutions that satisfy certain constraints?".
 ~ The set of counting problems associated with the decision problems in the set NP. In other words a typical problem asks: "How many solutions that satisfy certain constraints?"
 ~ The class of problems that are at least as hard as the hardest problem in the set of all NP problems. These problems are called "intractable" e.g. The Travelling Salesman Problem.
Quantum Turing Machines introduce a few more complexity classes which, to put it simply, allow them to solve certain NP problems in P time. As such, largescale quantum computers will have farreaching implications which I will mention later on. For a list of complexity classes see the Complexity Zoo curated by Scott Aaronson.
But for now let's talk about the early classical computers!
There are at least two YouTube channels which I think everyone interested in mathematics and computer science should watch: Computerphile and Numberphile. The videos are almost always entertaining and have been the source of some of my more inspired ideas.
The Computer Race! ... and Hybrids?
The race to build the first stored program digital computer, like many other significant inventions, sprang forth from a military use case: breaking codes. At the beginning of World War II the Germans were using an advanced system to encrypt their communications called the Enigma. The Enigma machine was an electromechanical device which implemented a polyalphabetic cipher along with some other annoying complexities.
Mathematically a cipher is just a function for turning ordinary messages, called plaintext, into encrypted messages, called ciphertext. So long as this function is computable, a computer can  at least theoretically  solve it. I say theoretically because the computational complexity of the solution may essentially make it unsolvable. Modern encryption relies on the complexity of a problem called integer factorization for their security ...
Cryptanalysts, researchers who specialize in the making and breaking of codes, knew this and therefore endeavoured to design computational devices which could decrypt the ciphertext. Typically these devices were built to look for patterns, in a similar way to how quants look for patterns in the market. The first computational device for breaking the Enigma was developed by the Polish Cipher Bureau and was called the cryptologic bomb.
After the invasion of Poland the notes and designs for the cryptologic bomb were taken to the UK Government Code and Cypher School (GC&CS) at Bletchley Park. Alan Turing, who was working for GC&CS at the time, extended the design of the cryptologic bomb and produced a new computational device for breaking the Enigma called The Bombe! This story was documented in a movie called The Imitation Game after another paper written by Turing which actually has nothing to do with WW2 code breaking, but does sound pretty cool.
.
.
A little fact that I found mildly interesting was that Alan Turing travelled to the United States to assist with their Bombe project. The US's ability during the war to produce so many Bombes gave them an edge over the Axis forces. On this trip Alan Turing spent time at Bell Laboratories and happened to have lunch everyday with another iconic figure in Computer Science, Claude Shannon, the inventor of Information Theory.
An important point is that the Bombe was a computational device dedicated to the breaking of Enigma codes. Just like our archcomputing rope and windtunnels, it couldn't be used to solve other problems. By contrast a stored program computer is one which can be configured to solve different problems. As such, a stored program computer is general purpose but not universal, since the memory of a stored program computer is finite.
The first stored program digital computer was also developed at GC&CS for breaking codes and was called the Colossus. The Colossus first worked in December 1943 and was operational in 1944, far too late to make a meaningful difference in WW2. The Colossus project was a closely guarded military secret at the time so the ENIAC, completed in 1945, is often awarded the title "first stored program digital computer".
Actually around this time there were at least five major projects on the go to build stored program computers: the Collosus at GC&CS, the Electronic Numerical Integrator And Computer (ENIAC) in the US Army, the Electronic Delay Storage Automatic Calculator (EDSAC) by the University of Cambridge, the Electronic Discrete Variable Automatic Computer (EDVAC), the Automatic Computing Engine (ACE) by the National Physics Laboratory in the UK, and the Manchester Mark 1 computer at the University of Manchester.
Instead of focussing on their differences like most historical essay do, I am going to focus on their similarities and the shared technical challenges that these devices faced. Later you will notice the similarities between these computers the current problems facing largescale general purpose quantum computers.
Competing architectures
During this period there were a LOT of different possible architectures and design choices available when trying to build a stored program digital computer. The different options available included, but were not limited to:
 Binary or Decimal Arithmetic? The Colossus computer used binary arithmetic and the original ENIAC computer used decimal arithmetic however its successor, the EDVAC, changed to binary arithmetic.
 Delay lines or Cathode Ray Tubes (CRT)? A number of possible memory stores existed include slower and cheaper delay lines used in the EDSAC and faster and more expensive Williams cathode ray tubes used in the Manchester Mark 1 computer. The William's tube was the first randomaccess digital storage device.
 Distributed computation or Accumulators? The ACE computer, designed by Alan Turing, employed distributed processing, whilst the ENIAC and EDVAC computers, designed by Von Neumann, had a centralised structure consisting of many accumulators which could do specific functions such as addition.
 Purely digital or Hybrid AnalogDigital? Hybrid computers used digital and analog methods together with one another. At this stage it wasn't immediately obvious that digital computing would "win out" over analog. In fact, work on analog computers like the Differential Analyzer continued into the 1960's at NACA/NASA.
There were a lot of ideas floating around about how exactly one should build a computer. For a fascinating example of one such architecture see the somewhat controversial "First Draft of a Report on the EDVAC" by John Von Neumann. Many of the ideas floating around had also never been tried before and were certainly not guaranteed to work. One particularly crazy idea was the use of Gin in delay lines because of its "acoustic properties".
Difficult to manufacture and use
It was extremely difficult to manufacture computer components for a few reasons. Firstly, because the components were specialized and not used in many other places so many of the components were hand made! Secondly, many of the components were being built for the first time so information about how well they worked and how they would perform over extended periods of time in conjunction with other components was not available.
In addition to being hard to manufacture these early computers were also notoriously difficult to program. "Programming" these computers usually involved plugging wires into different locations on a massive plug board and hoping you got it right. An interesting fact about early programming was that it was done primarily by woman.
Noise and Scalability
Another problem facing early stored program computers was noise. Noise is defined as "irregular fluctuations that accompany a transmitted electrical signal but are not part of it and tend to obscure it." Shannon showed us that the fundamental problem of all communications  whether by African drums, smoke signals, telegrams, telephone, or email  is the reproduction of the message at the end of the channel it was sent through. This is a challenge because all channels are noisy and whatever data is sent through them will, to some extent, be corrupted.
For this reason significant effort was directed towards the construction of error correcting codes and signal amplification. These techniques use redundancy and retransmission to mitigate the adverse effects of noise. In the 1950's this was a problem a user might conceivably face. Today this problem is solved and few people when writing software on computers are concerned about memory corruption due to channel noise.
Early stored program computers were built using valves and vacuum tubes. The smallest such devices were roughly the size of the top of your thumb! What this meant was that computers were extremely large and often took up entire rooms. Furthermore the valves and vacuum tubes were not exactly reliable. The size of these computers had one noteworthy impact: the amplification of noise. This is due to the fact that the longer the channel over which a message is communicated, the more difficult it is to communicate that message without error.
Limited Use Cases
Lastly, the imagined use cases of these early stored program machines were rather narrow. Imagined use cases included cryptography, simulation of physical systems, calculation of difficult mathematical quantities, and abstract mathematics. Alan Turing's paper, "Computing Machinery and Intelligence" which postulated that computers could be used to imitate human intelligence was perhaps the most creative use case. I doubt any of the early pioneers of computing would have believed that computers would become so ubiquitous and used for everything from selfdriving cars to playing Pokémon GO ... I'm stuck at level 22 if you were wondering 😉 .
My point here is quite simply that had you been alive and intimately familiar with technology in the 1940's you would have had a great many reasons to be skeptical of these "giant electronic brains". Why? Because they were not a sure thing. There were many uncertainties and risks associated with computing so dedicating your life to computers was as much an act of faith as it was an act of sheer bloody mindedness.
The Transistor!
At the heart of all modern electronics is the transistor. A transistor is a switch which can occupy one of two states: on or off, 1 or 0. Transistors are easier to produce than vacuum tubes and can also be made a lot smaller. The transistor was first practically implemented in 1947 by John Bardeen, Walter Brattain, and William Shockley who were working at Bell Laboratories at the time. They received the Nobel Prize in Physics in 1956.
In addition to the transistor, the Integrated Circuit was developed in 1958 which allowed sets of electronic circuits to be printed on chips of semiconductor material. As such, by 1959 computers were being designed to work on small printed circuits of transistors. This made for smaller, more powerful, and more reliable computers.
These two technologies, the transistor and the integrated chip, are the foundations of the next greatest invention in the history of classical computing: the microprocessor. A microprocessor essentially implemented the full architecture of a 1950's computer onto a single integrated chip. Intel introduced its first 4bit microprocessor called the 4004 in 1971 and its 8bit microprocessor called the 8008 in 1972. These microprocessors had up to a hundred transistors onboard. A 2014 Quad Core i7 Haswell has 1,400,000,000 processors.
However, as many of you will know. We have hit the limits of how small we can reliably make transistors. As such, most of the advances in the past few years have come from increasing the number of cpu's on a chip and not the raw speed of any one given CPU. This fact has led many people to claim that "Moore's Law is dead".
This is overly pessimistic and akin to saying "progress is dead" or  if you extrapolate too far  "Science is Dead". Nothing could be further from the truth. There are still many new discoveries to be made and much room for improvement. Plus, whilst classical computing was taking over the world a quiet revolution has been brewing and its name is quantum computing ... but before I get into that let me tell you a bit about the quantum revolution.
Meanwhile ... the Quantum Revolution
At the turn of the 20th century classical physics was faced by some serious problems, observations and experimental evidence which could not be explained by the laws of physics as they existed then.
The first problem was that if atoms did exist, they shouldn't last long. Remember the model of the atom you learnt about in school? In that model the electrons were orbiting the nucleus in a circular motion. Objects in a circular motion are accelerating due to the change in direction ... but Maxwell's equations of electromagnetism predict that accelerating electric charges emit energy in the form of light. Since the electrons are losing energy they should, according to classical physics, spiral into the nucleus. And all that should happen in an instant, meaning that atoms like you learnt about in school should not exist according to classical physics.
The second problem was that atoms should emit light of all colours, not just discrete spectral lines unique to each element as had been observed. The third problem was called the "Ultraviolet Catastrophe" which was prediction that an ideal blackbody would emit energy, firstly, at all frequencies and, secondly, an infinite amount of it! This contradicted the law of conservation of energy. Lastly, the photoelectric effect  the instantaneous ejection of electrons from a material when light of different frequencies is shone onto it  was not consistent with classical physics because, firstly, it should not occur instantaneously, and secondly, it should depend on the intensity of the light not its frequency. Put simply physics was a mess, kind of like finance today will all of its "anomalies".
.
The transition to quantum started with Max Planck in 1900 who showed that the ultraviolet catastrophe could be resolved by assuming that atomic vibrations are quantized, meaning that they only occur in multiples of some constant, . The resulting formula, , measures the energy, , of an atomic particle vibrating at frequency, . Planck’s constant is a measure of the fundamental “graininess” of our universe at small scales.
The transition to quantum continued in 1905  the same year he wrote seminal papers on Special Relativity and Brownian Motion  when Einstein famously explained the photoelectric effect. Einstein declared that the energy in a light wave is not uniformly distributed, it is concentrated in particlelike quanta which we now call "photons". Furthermore, the energy of a photon is quantized according to Planck's formula, . Einstein explained that the photoelectric occurs because it takes a certain minimum amount of energy to eject an electron from a material. Therefore, if the light shone onto the material had too low of a frequency (it was redder) then fewer electrons will be omitted. But if the light is a higher frequency the electrons only need to be hit by one photon to be ejected.
In 1913 Niels Bohr finalized the transition to quantum when he proposed the Bohr model of atom. In Bohr's model the atomic orbits of electrons are quantized meaning that only certain discrete orbits are allowed. These orbits correspond to the discrete values of an electron's energy measured in units of Planck's constant. Bohr proposed that the angular momentum, , of an orbiting electron is given by , where is the electron's momentum. In Bohr's model electrons don't radiate electromagnetic waves  they don't lose energy  so they don't crash into the nucleus. They stay in orbit and emit electromagnetic waves if, and only if, electrons jump between orbits.
These contradictions came down to one unintuitive difference between classical physics and the soontobedeveloped quantum mechanics. In classical physics the world is continuous and deterministic, but in quantum mechanics the world is discontinuous (made up of little bits) and is, as a result, stochastic.
Quantum Weirdness
An unintended consequence of quantization is nondeterminism a.k.a randomness and uncertainty. When we see objects in the real world, our eyes are actually absorbing the photons which have reflected from that object. Planck's constant sets the minimum change in energy that can occur and Einstein tells us that Planck's constant is also equal to the minimum energy of the electromagnetic wave, a photon, itself. So what happens if we want to observe an electron? The only way to do this is to bounce a photon off of it but by doing that we influence the electron, we transfer energy to it. We can't observe the subatomic world without also influencing it.
.
The first weird thing: Superposition
This observation lies at the heart of Heisenberg's famous uncertainty principle which states that we cannot simultaneously measure both a particles velocity and position. Suppose we wanted to measure the position of an electron with high precision. We would do this by bouncing one photon off of it. In order to get an accurate position measurement we need to use a photon with a high frequency because the "wiggling" of the wave is confined to a smaller region. However, according to the equation if the photon has a high frequency then it has a high energy so it transfers a lot of energy to the electron thereby substantially altering its velocity.
This phenomenon has been experimentally verified time and time again using the famous double slit experiment. In this experiment electrons are sent through a microscopic double slit oneatatime. When the electrons are left to their own devices  i.e. they are not observed  they behave like waves and form an interference pattern.
But when we try to observe the electrons as they travel through  by bouncing photons off of them  they stop behaving like a wave and start behaving like a particle and no interference pattern is formed! This phenomenon is called waveparticle duality. The electron is both a wave and a particle. It is in a superposition.
.
.
Erwin Schrödinger explained this with his wave mechanics. Under this description each particle is described by a wave equation  a partial differential equation  for which each solution corresponds to the probability of an events (a combination of space and time). As such, the wave equation gives us the probabilities of a particle being at any point in space at any given moment in time. However, when we measure a particle in superposition it gives us a definitive answer. At that moment the superposition ceases to exist and the particle exists in a definitive state. The wave function us said to have collapsed and one of the possible events has occurred. This all comes down to Planck's constant which tells us that observation is not sideeffect free. The act of observation is a tangible thing.
Think of Planck's constant as limiting the precision with which we can measure the world. Because is so tiny, this uncertainty has a negligible effect on measurements done in the real world. That's why classical physics works. However, at the subatomic scale this makes a big difference ... it causes randomness and uncertainty.
.
The second weird thing: Entanglement
Entanglement is another strange quantum phenomenon in which pairs of particles interact in such a way that the quantum state of each particle cannot be described independently of the others, even when the particles are separated by a large distance. The quantum state must be described for the system as a whole.
Consider a pair of electrons which are produced by a decaying particle and fly apart in opposite directions. All electrons have a property called spin or angular momentum. Since the decayed particle had no spin the electrons must have opposite spins. So if you measure the spin of one of the electrons you immediately know the spin of the other. The two electrons are entangled and remain so regardless of the distance between them.
Entanglement was the subject of some controversy in physics. Einstein, Podolsky, and Rosen argued that entanglement implies that there exists an underlying objective and deterministic reality. However Bohr disagreed with this and argued that the observation of one half of the entangled particles in one location caused the other half to collapse in a different location instantaneously i.e. faster than the speed of light. Einstein called this "spooky action at a distance". However, this is not a violation of General Relativity  which states that information cannot travel faster than the speed of light  because no classical information is communicated.
This debate was pretty much settled by John Stewart Bell who came up with a really ingenious experiment to test whether or not the spin of the particles was predetermined. The video below explains this better than I can,
.
Now we can get to the fun part :). How can we take all of these cool (and really weird) phenomena and make a quantum computer out of them? Spoiler alert: scientists are not 100% sure, but progress is being made!
Feynman's Conjecture and Quantum Computation
Richard Feynman was an American theoretical physicist who did a lot of work to champion quantum mechanics. In 1965 Feynman received the Nobel Prize in Physics for his work on quantum electrodynamics and in 1982 Feynman proposed a Universal Quantum Simulator after proving that a classical Turing Machine would experience an exponential slowdown when trying to simulate any quantum phenomena. In 1985 David Deutsch extended this ideas and proposed the first theoretical model of a Universal Quantum Turing Machine which is essentially an extension of classical Turing Machine which allows it to take advantage of superposition and entanglement.
.
Concept one: The Qubit
A quantum bit (qubit) is a two state quantum mechanical system meaning that it consists of two independent quantum states let's call them and for simplicity. The qubit may be in either or or any state in between, . Here and are probability amplitudes and must satisfy the constraint that . Put simply, and , when squared, are the probabilities of the qubit being either or . Just like a particle can be in two places at once according to Schrödinger's wave equations, so too can our qubit be. If you were wondering this notation is called BraKet or Dirac notation. It was introduced by Paul Dirac in 1939.
I am sure you are wondering "how is that even possible?". Good question! When we are talking about a bit in classical computer science we are actually talking about a transistor which is either on or off. There is a physical analog for what we are talking about when we talk about bits. In exactly the same way when we talk about qubits we are talking about things in the real world which exist in superposition. Examples include the Josephson Junction, Trapped Ions, and Electrons which can have up or down spin. A short list can be found here.
.
Concept two: Hilbert Spaces
Remember when I said that David Hilbert  whose program to find a complete, consistent, and decidable formal systems for all of mathematics indirectly lead to the discovery of the computer  a was mathematical badass? Well in addition to his program he generalized Euclidean geometry  the mathematics of distance and space in two and three dimensions  to any finite and even infinite dimensions. These high dimensional spaces are now called Hilbert Spaces and they are used in to describe the states that quantum mechanical systems may occupy. The Hilbert Space which describes a qubit is 2dimensional, the Hilbert Space which describes qubits is dimensional. This is what is known as the "exponential explosion" and really explains why we need quantum computing.
To illustrate this a little bit more technically consider the example of two vectors on a cartesian plane (2dimensional Euclidean space). These two vectors, let's call then and , have coordinates and. To calculate the dot product (inner produce) of these two vectors we just sum the product of their coordinates, . To calculate the norm of the vector (its length) we calculate the square root of the sum of the coordinates squared, . One popular Hilbert space, , is a generalization of this to dimensions. In this case the vectors are given as, and and the inner product of and is given by, and the norm is given by, . Or, in the much cleaner bracket notation, the inner product is simply given by and the norm by . A Hilbert space is a space where the norm is bounded from above by infinity, and is complete.
A Better Explanation
Ryan also sent me the following comment over email with a much better explanation of Hilbert Spaces. So I thought I would rather include it here and award him the credit.
Technically speaking a Hilbert space is just any "complete, inner product space", meaning that it is any vector space (including finite vector spaces) equipped with an inner product, under which it is complete with respect to the norm induced by the inner product. In other words, almost all vector spaces which you can equip with an inner product are Hilbert spaces!
The Hilbert space which describes n qubits is dimensional (to be precise the Hilbert space is ). This is called the "exponential explosion" and is exactly why we need quantum computers! To be more precise, the size of the Hilbert space grows exponentially with the number of particles (qubits), and therefore to simulate such systems on classical computers requires resources which grow exponentially.
The dimension of the Hilbert space (which is basically just an augmented vector space) of a many particle quantum system grows exponentially with the number of particles. This is because of the tensor product structure of many particle systems. Therefore, to describe the state of an qubit system you have to provide complex numbers  the amplitudes of all the basis vectors of the vector space, and the square roots of the associated probabilities.
This is intractable on classical computers as grows even to modestly large values  even 20 qubits is hard for a classical computer to simulate without some very clever tricks!
I realize that may seem quite complex, so here is the good news ... you don't really need to know the maths to get the concept behind a Quantum Turing Machine and, by extension, a Quantum Computer. You just need to know that these nice things called Hilbert Spaces, , are used to describe the states that quantum mechanical systems may occupy like like the set of described the states our Classical Turing Machine could occupy earlier.
.
Concept three: A Quantum Turing Machine
A Turing Machine operates over an infinitely long tape which is divided into discrete cells. A simple onetape Turing Machine consists of a finite, nonempty set of states, and a finite, nonempty set of tape symbols, . The set of input or output symbols, , is usually just the set of tape symbols with a blank symbol included. The Turing Machine also has an initial state, , and a set of final or accepting states, , which belong to the set of all states. The brain of the Turing Machine is a transition function, , which allows it to transition between the different states. Depending on what state the Turing Machine is in, and what symbols it scans, it can shift left or right along the tape and write a symbol to the tape. The Turing Machine halts when it is in an accepting state.
Now we can describe a Quantum Turing Machine with three tapes. The first and last tapes are for the input and output which may be classical inputs and the second tape stores intermediary calculations between input and output. Our Quantum Turing Machine replaces the set of states it can occupy with a Hilbert Space, and it also replaces the symbols on the secondary tape with another Hilbert Space, , where the blank symbol represents the zero vector. The Quantum Turing Machine also has an initial state which may be quantum of classical and a set of final or accepting states, . The brain of the Quantum Turing Machine is a transition function, , which preserves the Hilbert Space. In order to preserve the Hilbert Space the transitions must be a collection of unitary matrices. The Quantum Turing Machine halts when it is in an accepting state. See, it's pretty similar to a Classical Turing Machine.
That is except for one fundamental difference ... A Turing Machine operates over classical states and a Quantum Turing Machine can operate over superpositions of classical states. This is what gives Quantum Turing Machines a massive speed advantage over classical Turing Machines. Remember that the study of computational complexity is the study of what problems can be solved by what kinds of Turing Machines? Well Quantum Turing Machines can solve some NPhard problems in P time. The two most important quantum complexity classes are and :
 ~ The set of decision problems solvable by a Quantum Turing Machine in polynomial time, with a bounded error probability of at most 1/3 for all instances. Integer Factorization belongs to because of Shor's algorithm and classical which implies that there are some intermediary complexity classes.
 ~ The set of decision problems which are not solvable by a Quantum Turing Machine in polynomial time. This is the complexity class we need postquantum encryption to be in.
What's the Big Deal?
Quantum computers, if implemented, would be exponentially  or at least quadratically  better at performing certain computations. The two biggest quantum algorithms are Shor's Algorithm and Grover's Algorithm. Shor's Algorithm would allow us to factorize integers in polynomial time which would make all modern encryption schemes  which rely on it being intractable  insecure. Grover's Algorithm would allow us to find elements in an unordered list (or entries in a database) in fewer than operations where is the length of the list. To be more specific we would be able to find entries in which is a quadratic speedup. Searching unstructured data would never be the same again.
Harrow, Hassidim and Lloyd recently showed us that you can solve sets of linear equations faster using a quantum computer. Solving linear equations is a fundamental computer science problem which lies at the heart of most machine learning and optimization methods. Their algorithm, called the HHL algorithm  of as Seth nonchalantly puts it, the Holy Hell algorithm  could revolutionize the way we tackle machine learning, data science, and even quantitative finance problems. For a theoretical example see this paper entitled: "Quantum Support Vector Machine for Big Data Classification" by Patrick Rebentrost. That's pretty exciting right?
.
.
But here is the catch. There are no quantum computers. Nobody has been able to scale a quantum computer beyond about 10 qubits and we need roughly 40 for quantum computing to start making a dent. So without further ado, let's take a quick look at current generation realworld quantum computers!
Current Generation Quantum Computers
Let me first mention some of the big accomplishments in this field. In 2001, a paltry 15 years ago, researchers first demonstrated Shor's algorithm on a 7qubit computer. This quantum computer was an example of a Nuclear Magnetic Resonance (NMR) quantum computer. In 2005 and 2009 the first semiconductor chip ion trap and solidstate quantum processor were built. In 2011, DWave Systems announced the DWave One quantum computing device with a 128 qubit processor. In 2011 it was proven that quantum computers can be made with a separate RAM as per the Von Neumann architecture. In 2013 Google and NASA announced that they were launching the Quantum Artificial Intelligence Lab (QuAIL). In 2014 some information came out that the NSA was running a research program to develop a quantum computer. And for the first time, this year, a programmable quantum computer was developed.
Note that I called the DWave a quantum computing device not a universal quantum computer. Essentially the DWave is a device which can solve hard combinatoric optimization problems using a process called quantum annealing which is analogous to simulated annealing in Artificial Intelligence. How does Quantum Annealing work? It works exactly like our rope computer I described at the beginning of this essay. Just like the rope which tends towards its lowest energy state which corresponds to an optimal arch design so too can quantum systems be designed to optimize certain quantities when moving towards a low energy state. Nevertheless, Quantum Annealing does offer the potential to construct an Adiabatic Quantum Computer. For more information about their systems I highly recommend watching their videos.
Despite all this progress nobody can claim that they have successfully built a scalable generalpurpose, quantum computer and the reasons for this are all too similar to the challenges faced by early digital computers.
Competing architectures ... and hybrids again?
There are a number of different technologies you could use to potentially build universal quantum computers. Some of the competing technologies including superconducting qubits (the technology Google is betting on), ion traps, Nuclear Magnetic Resonance (NMR), Linear Optics, Quantum Dots, and many others. It isn't clear which technology is the best, by the two biggest areas of research seem to be superconducting qubits and ion traps. This uncertainty of new technologies is very similar to the uncertainty that was experienced when building early digital computers. Another option which has been gaining some traction is the idea of hybrid classicalquantum computers. This reminds me a great deal of the argument for hybrid analogdigital computers that was made in the 1940's and 50's.
.
Difficult to manufacture and use
Some quantum phenomena  such as those in superconducting qubits  are only observed at extremely low temperatures. In fact, the temperatures I am talking about are colder than outer space! Naturally this makes things manufacturing certain types of quantum computers quite difficult because you end up needing to build an entire room full of "refrigerators" dedicated to keeping your quantum chip cool!
That having been said, using a quantum computer would probably not be as difficult as using an early classical computer for one simple reason: the internet. It is entirely possible that Quantum Computers will never sit on our desks or in our pockets. Quantum Computers may only be accessible via an API sitting on a cloud exactly like IBM is doing now. But to play devils advocate, the exact same argument was proposed in relation to early digital computers which gave their users access to a thin client and carefully managed computer resources using a time sharing system ;).
.
Noise and scalability
The single biggest problem facing the development of a largescale quantum computer is, in my opinion, analogous to noise in early digital computers and it's called quantum decoherence. Quantum systems are extremely sensitive to external influences because of them existing in the subatomic realm. Light particles can interfere with the wave functions causing them to lose information to their environments which causes the quantum weirdness we spoke about earlier to stop. Coherence is a fundamental property of quantum mechanics, and is a prerequisite to quantum computing. Nobody has worked out how to effectively scale up a quantum computer.
.
Limited Use Cases
Lastly, the use cases of a quantum computer are not  again in my opinion  fully understood. Just like during the 1940's and 1950's when classical computers were being developed we have narrow vision. All of the best applications of quantum computers which have been discovered or thought of so far, with the exception of the HHL algorithm which is only 7 years only, have to do with breaking codes. Now doesn't that sounds awfully familiar?
Some Concluding Thoughts
The theoretical foundations which lead to the discovery of the Universal Turing Machine began in the 19th century with visionaries like Babbage and Hilbert. It was only decades later, in 1936, that the theoretical idea of the Universal Turing Machine was introduced by Alan Turing. In 1946, ten years later, the first digital computer was developed. From the 1940's to the early 1960's progress was hard earned and the applications of computers were confined. It was the introduction of the transistor, integrated circuit, and finally the microprocessor in 1971 which kickstarted Moore's law and gave rise to the ubiquity we see today. Classical computing was an overnight success that took the better part of 40 years to happen despite the impetus of World War 2 and the Cold War.
Likewise the theoretical foundations which led to the discovery of the Universal Quantum Turing Machine began with Hilbert, Max Planck, Einstein, Bohr, Heisenberg, and Schrödinger from 1900 to the 1930's. It was only decades later, in 1982, that esteemed professor Richard Feynman proposed the first theoretical model of a Universal Quantum Simulator. These ideas led David Deutsch to the discovery of the Universal Quantum Turing Machine in 1985 only 31 years ago. In the past thirty years researchers have managed to devise numerous architectures for quantum computers and in 2001 implemented the first quantum computer. Right now I think it is safe to say that quantum computing is waiting on its microprocessor; the technology that will take it a giant leap forward.
Is history repeating itself? I think so. There is no physical law in the universe which prohibits largescale quantum computers. Therefore it is either possible, or in failing we will make one of the greatest discoveries of modern science: that everything we know is wrong. Given the weight of evidence in favour of quantum mechanics, I'm optimistic that science will prevail and we will find that elusive technology needed to take us a giant leap forward into the age of quantum computing. Moore's Law is not a physical law like Newton's Law's of motion. It is a simple promise: so long as there are unreasonable men, women, researchers, scientists, and companies in this world we will never stop moving forward. I would like to end off with a relevant quote by George Bernard Shaw,
.
The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.  George Bernard Shaw
Lots of References
If you managed to get this far then kudos to you! 😀 Here is a long list of references.
History of Computer Science and Maths
 Audio book / book: Alan Turing, The Enigma
 Audio book / book: The Information: A History, A Theory, A Flood
 Timeline: ComputerHistory.org Timeline of Computer History
 Article: Stanford, The Modern History of Computing
 Paper: Alan Turing, On Computable Numbers
 YouTube Channel: ComputerPhile and NumberPhile
 Awesome Webite: Institute for Advanced Study Ideas
Quantum Mechanics and Physics
 Audio book / book: Quantum: A Guide for the Perplexed
 Audio book / course: The Great Courses: Einstein's Relativity and the Quantum Revolution
 Audio book / course: The Great Courses: Albert Einstein: Physicist, Philosopher, Humanitarian
 Introductory Book: Quantum Mechanics: The Theoretical Minimum
 Introductor Video Series: Quantum Computing for the Determined
 Paper: Quantum Theory From Five Simple Axioms (recent discovery, quite nice)
 Paper: Quantum Information Meets Quantum Matter
Theoretical Computer Science + Quantum Computing
 Blog: ShtetlOptimized  Scott Aaronson
 Book: Michael Sipser, Introduction to the Theory of Computation
 Book: Thomas H. Cormen, Introduction to Algorithms, 3rd Edition
 DWave Video Series: Quantum Annealing Explained
 Paper / Review: Quantum algorithms: An overview
 Conferences: Quantum Technology Lab
 Reference List: Quantum Computing Algorithms and their speedups
If you notice anything I really should have mentioned please let me know in the comment section below. And if you find any mistakes, please also let me know so that I can update the essay. Thanks for reading!

Wow, Stuart, again a marvelous piece to read. So, you went into QC. Maybe you would like a few Quantum Computing related ideas from Q4Q >>> http://www.quantumforquants.org/question/complexityclassesareweonthecuspofchange/
Welcome to the challenges of the EXPTIME / EXPSPACE corner of the ComplexityZOO.
( as always, already looking forward to read your next fresh thoughts, Stuart )

I would love to know why so many people did not imagine current computing use cases once a universal form of computation was proven. I read a research article using the Delphi method which predicted writing or reading information from the brain will never occur. It was written in the 60's or a bit later. I guess my consternation is this: how can the brain solve uncomputable problems? It can't or perhaps it is a class of problem solving machinery we have yet to comprehend. This is not the mainstream view yet empirically it appears true.

Loosely connected: https://arxiv.org/abs/1608.08225

Great post, took me about 2 weeks to read but was superbly interesting.
Comments