The Man Who Invented Modern Probability

Chance encounters in the life of Andrei Kolmogorov.

If two statisticians were to lose each other in an infinite forest, the first thing they would do is get drunk. That way, they would walk more or less randomly, which would give them the best chance of finding each other. However, the statisticians should stay sober if they want to pick mushrooms. Stumbling around drunk and without purpose would reduce the area of exploration, and make it more likely that the seekers would return to the same spot, where the mushrooms are already gone.

Such considerations belong to the statistical theory of “random walk” or “drunkard’s walk,” in which the future depends only on the present and not the past. Today, random walk is used to model share prices, molecular diffusion, neural activity, and population dynamics, among other processes. It is also thought to describe how “genetic drift” can result in a particular gene—say, for blue eye color—becoming prevalent in a population. Ironically, this theory, which ignores the past, has a rather rich history of its own. It is one of the many intellectual innovations dreamed up by Andrei Kolmogorov, a mathematician of startling breadth and ability who revolutionized the role of the unlikely in mathematics, while carefully negotiating the shifting probabilities of political and academic life in Soviet Russia.

As a young man, Kolmogorov was nourished by the intellectual ferment of post-revolutionary Moscow, where literary experimentation, the artistic avant-garde, and radical new scientific ideas were in the air. In the early 1920s, as a 17-year-old history student, he presented a paper to a group of his peers at Moscow University, offering an unconventional statistical analysis of the lives of medieval Russians. It found, for example, that the tax levied on villages was usually a whole number, while taxes on individual households were often expressed as fractions. The paper concluded, controversially for the time, that taxes were imposed on whole villages and then split among the households, rather than imposed on households and accumulated by village. “You have found only one proof,” was his professor’s acid observation. “That is not enough for a historian. You need at least five proofs.” At that moment, Kolmogorov decided to change his concentration to mathematics, where one proof would suffice.

It is oddly appropriate that a chance event drove Kolmogorov into the arms of probability theory, which at the time was a maligned sub-discipline of mathematics. Pre-modern societies often viewed chance as an expression of the gods’ will; in ancient Egypt and classical Greece, throwing dice was seen as a reliable method of divination and fortune telling. By the early 19th century, European mathematicians had developed techniques for calculating odds, and distilled probability to the ratio of the number of favorable cases to the number of all equally probable cases. But this approach suffered from circularity—probability was defined in terms of equally probable cases—and only worked for systems with a finite number of possible outcomes. It could not handle countable infinity (such as a game of dice with infinitely many faces) or a continuum (such as a game with a spherical die, where each point on the sphere represents a possible outcome). Attempts to grapple with such situations produced contradictory results, and earned probability a bad reputation.

Pre-modern societies often viewed chance as an expression of the gods’ will; in ancient Egypt and classical Greece, throwing dice was seen as a reliable method of divination and fortune telling.

Reputation and renown were qualities that Kolmogorov prized. After switching his major, Kolmogorov was initially drawn into the devoted mathematical circle surrounding Nikolai Luzin, a charismatic teacher at Moscow University. Luzin’s disciples nicknamed the group “Luzitania,” a pun on their professor’s name and the famous British ship that had sunk in the First World War. They were united by a “joint beating of hearts,” as Kolmogorov described it, gathering after class to exalt or eviscerate new mathematical innovations. They mocked partial differential equations as “partial irreverential equations” and finite differences as “fine night differences.” The theory of probability, lacking solid theoretical foundations and burdened with paradoxes, was jokingly called the “theory of misfortune.”

It was through Luzitania that Kolmogorov’s evaluation of probability took on a more personal turn. By the 1930s, the onset of Stalinist terror meant anyone could expect a nighttime knock on the door by the secret police, and blind chance seemed to rule people’s lives. Paralyzed by fear, many Russians felt compelled to participate in denunciations, hoping to increase their chance of survival. Bolshevik activists among the mathematicians, including Luzin’s former students, accused Luzin of political disloyalty and castigated him for publishing in foreign countries. Kolmogorov, having published abroad himself, may have realized his own vulnerability. He had already displayed an apparent readiness to make political compromises for the sake of his career, accepting a position as a research institute director when his predecessor was imprisoned by the Bolshevik regime for supporting religious freedom. Now Kolmogorov joined the critics and turned against Luzin. Luzin was subject to a show trial by the Academy of Sciences and lost all official positions, but surprisingly escaped being arrested and shot by the Russian authorities. Luzitania was gone, sunk by its own crew.

The moral dimension of Kolmogorov’s decision aside, he had played the odds successfully and gained the freedom to continue his work. In the face of his own political conformity, Kolmogorov presented a radical and, ultimately, foundational revision of probability theory. He relied on measure theory, a fashionable import to Russia from France. Measure theory represented a generalization of the ideas of “length,” “area,” or “volume,” allowing the measure of various weird mathematical objects to be taken when conventional means did not suffice. For example, it could help calculate the area of a square, with an infinite number of holes in it, cut it into an infinite number of pieces, and scattered over an infinite plane. In measure theory, it is still possible to speak of the “area” (measure) of this scattered object.

The theory of probability, lacking solid theoretical foundations and burdened with paradoxes, was jokingly called the “theory of misfortune.”

Kolmogorov drew analogies between probability and measure, resulting in five axioms, now usually formulated in six statements, that made probability a respectable part of mathematical analysis. The most basic notion of Kolmogorov’s theory was the “elementary event,” the outcome of a single experiment, like tossing a coin. All elementary events formed a “sample space,” the set of all possible outcomes. For lightning strikes in Massachusetts, for example, the sample space would consist of all the points in the state where lightning could hit. A random event was defined as a “measurable set” in a sample space, and the probability of a random event as the “measure” of this set. For example, the probability that lightning would hit Boston would depend only on the area (“measure”) of this city. Two events occurring simultaneously could be represented by the intersection of their measures; conditional probabilities by dividing measures; and the probability that one of two incompatible events would occur by adding measures (that is, the probability that either Boston or Cambridge would be hit by lightning equals the sum of their areas).

The Paradox of the Great Circle was a major mathematical conundrum that Kolmogorov’s conception of probability finally put to rest. Assume aliens landed randomly on a perfectly spherical Earth and the probability of their landing was equally distributed. Does this mean that they would be equally likely to land anywhere along any circle that divides the sphere into two equal hemispheres, known as a “great circle?” It turns out that the landing probability is equally distributed along the equator, but is unevenly distributed along the meridians, with the probability increasing toward the equator and decreasing at the poles. In other words, the aliens would tend to land in hotter climates. This strange finding might be explained by the circles of latitude getting bigger as they get closer to the equator—yet this result seems absurd, since we can rotate the sphere and turn its equator into a meridian. Kolmogorov showed that the great circle has a measure zero, since it is a line segment and its area is zero. This explains the apparent contradiction in conditional landing probabilities by showing that these probabilities could not be rigorously calculated.

Having crossed from the very real world of Stalinist purges into the ephemeral zone of zero-measure conditional probabilities, Kolmogorov was soon plunged back into reality. During the Second World War, the Russian government asked Kolmogorov to develop methods for increasing the effectiveness of artillery fire. He showed that, instead of trying to maximize the probability of each shot hitting its target, in certain cases it would be better to fire a fusillade with small deviations from perfect aim, a tactic known as “artificial dispersion.” The Moscow University Department of Probability Theory, of which he had become the head, also calculated ballistic tables for low-altitude, low-speed bombing. In 1944 and 1945, the government awarded Kolmogorov two Orders of Lenin for his wartime contributions, and after the war, he served as a mathematics consultant for the thermonuclear weapons program.

But Kolmogorov’s interests inclined him in more philosophical directions, too. Mathematics had led him to believe that the world was both driven by chance and fundamentally ordered according to the laws of probability. He often reflected on the role of the unlikely in human affairs. Kolmogorov’s chance meeting with fellow mathematician Pavel Alexandrov on a canoeing trip in 1929 began an intimate, lifelong friendship. In one of the long, frank letters they exchanged, Alexandrov chastised Kolmogorov for the latter’s interest in talking to strangers on the train, implying that such encounters were too superficial to offer insight into a person’s real character. Kolmogorov objected, taking a radical probabilistic view of social interactions in which people acted as statistical samples of larger groups. “An individual tends to absorb the surrounding spirit and to radiate the acquired lifestyle and worldview to anyone around, not just to a select friend,” he wrote back to Alexandrov.

Mathematics had led him to believe that the world was both driven by chance and fundamentally ordered according to the laws of probability.

Music and literature were deeply important to Kolmogorov, who believed he could analyze them probabilistically to gain insight into the inner workings of the human mind. He was a cultural elitist who believed in a hierarchy of artistic values. At the pinnacle were the writings of Goethe, Pushkin, and Thomas Mann, alongside the compositions of Bach, Vivaldi, Mozart, and Beethoven—works whose enduring value resembled eternal mathematical truths. Kolmogorov stressed that every true work of art was a unique creation, something unlikely by definition, something outside the realm of simple statistical regularity. “Is it possible to include [Tolstoy’s War and Peace] in a reasonable way into the set of ‘all possible novels’ and further to postulate the existence of a certain probability distribution in this set?” he asked, sarcastically, in a 1965 article.

Yet he longed to find the key to understanding the nature of artistic creativity. In 1960 Kolmogorov armed a group of researchers with electromechanical calculators and charged them with the task of calculating the rhythmical structures of Russian poetry. Kolmogorov was particularly interested in the deviation of actual rhythms from classical meters. In traditional poetics, the iambic meter is a rhythm consisting of an unstressed syllable followed by a stressed syllable. But in practice, this rule is rarely obeyed. In Pushkin’s Eugene Onegin, the most famous classical iambic poem in the Russian language, almost three-fourths of its 5,300 lines violate the definition of the iambic meter, and more than a fifth of all even syllables are unstressed. Kolmogorov believed that the frequency of stress deviation from the classical meters offered an objective “statistical portrait” of a poet. An unlikely pattern of stresses, he thought, indicated artistic inventiveness and expression. Studying Pushkin, Pasternak, and other Russian poets, Kolmogorov argued that they had manipulated meters to give “general coloration” to their poems or passages.

Music and literature were deeply important to Kolmogorov, who believed he could analyze them probabilistically to gain insight into the inner workings of the human mind.

To measure the artistic merit of texts, Kolmogorov also employed a letter-guessing method to evaluate the entropy of natural language. In information theory, entropy is a measure of uncertainty or unpredictability, corresponding to the information content of a message: the more unpredictable the message, the more information it carries. Kolmogorov turned entropy into a measure of artistic originality. His group conducted a series of experiments, showing volunteers a fragment of Russian prose or poetry and asking them to guess the next letter, then the next, and so on. Kolmogorov privately remarked that, from the viewpoint of information theory, Soviet newspapers were less informative than poetry, since political discourse employed a large number of stock phrases and was highly predictable in its content. The verses of great poets, on the other hand, were much more difficult to predict, despite the strict limitations imposed on them by the poetic form. According to Kolmogorov, this was a mark of their originality. True art was unlikely, a quality probability theory could help to measure.

Kolmogorov scorned the idea of placing War and Peace in a probabilistic sample space of all novels—but he could express its unpredictability by calculating its complexity. Kolmogorov conceived complexity as the length of the shortest description of an object, or the length of an algorithm that produces an object. Deterministic objects are simple, in the sense that they can by produced by a short algorithm: say, a periodic sequence of zeroes and ones. Truly random, unpredictable objects are complex: any algorithm reproducing them would have to be as long as the objects themselves. For example, irrational numbers—those that cannot be written as fractions— almost surely have no pattern in the numbers that appear after the decimal point. Therefore, most irrational numbers are complex objects, because they can be reproduced only by writing out the actual sequence. This understanding of complexity fits with the intuitive notion that there is no method or algorithm that could predict random objects. It is now crucial as a measure of the computational resources necessary to specify an object, and finds multiple applications in modern-day network routing, sorting algorithms, and data compression.

By Kolmogorov’s own measure, his life was a complex one. By the time he died, in 1987 at the age of 84, he had not only weathered a revolution, two World Wars, and the Cold War, but his innovations left few mathematical fields untouched, and extended well beyond the confines of academe. Whether his random walk through life was of the inebriated or mushroom-picking variety, its twists and turns were neither particularly predictable nor easily described. His success at capturing and applying the unlikely had rehabilitated probability theory, and had created a terra firma for countless scientific and engineering projects. But his theory also amplified the tension between human intuition about unpredictability and the apparent power of the mathematical apparatus to describe it.

For Kolmogorov, his ideas neither eliminated chance, nor affirmed a fundamental uncertainty about our world; they simply provided a rigorous language to talk about what cannot be known for certain. The notion of “absolute randomness” made no more sense than “absolute determinism,” he once remarked, concluding, “We can’t have positive knowledge of the existence of the unknowable.” Thanks to Kolmogorov, though, we can explain when and why we don’t.