Bayes and the Math of Uncertainty

Probability, as it was understood in the 18th century, worked in one direction. If you know the composition of a deck of cards, you can calculate the probability of drawing an ace. If you know the bias of a coin, you can calculate the probability of getting heads ten times in a row. This is forward probability: from known causes to predicted effects.

But the practical problems of science and daily life run the other direction. You observe effects and want to infer causes. You see ten heads in a row and want to know whether the coin is biased. You observe symptoms and want to know the disease. You collect data and want to know which theory is correct. This is inverse probability: from observed effects to probable causes.

The mathematician who first solved this inverse problem was Thomas Bayes, an English Presbyterian minister whose single most important work was published after his death. His theorem, largely ignored for decades, eventually became one of the most powerful and widely used results in all of mathematics.

The Reverend Bayes

Thomas Bayes was born around 1701 in London and was ordained as a Presbyterian minister. He served as minister at the Mount Sion chapel in Tunbridge Wells, Kent. He was elected a Fellow of the Royal Society in 1742, apparently on the basis of his mathematical abilities, though he had published very little.

Bayes was not a professional mathematician. He was a clergyman who pursued mathematics as an intellectual interest. His published works during his lifetime included a defense of Newton’s calculus against a philosophical attack by Bishop Berkeley, but nothing on probability.

After Bayes died in 1761, his friend Richard Price discovered among his papers an essay titled “An Essay towards solving a Problem in the Doctrine of Chances.” Price edited the essay and presented it to the Royal Society in 1763. It contained what we now call Bayes’ theorem.

The Theorem

Bayes’ theorem can be stated in a single equation: P(H|E) = P(E|H) × P(H) / P(E), where H is a hypothesis and E is evidence. In words: the probability of a hypothesis given the evidence equals the probability of the evidence given the hypothesis, multiplied by the prior probability of the hypothesis, divided by the total probability of the evidence.

Each component has a name and a role:

P(H|E) is the posterior probability: how likely the hypothesis is after seeing the evidence. This is what we want to know.
P(E|H) is the likelihood: how likely the evidence would be if the hypothesis were true. This is usually calculable from the hypothesis itself.
P(H) is the prior probability: how likely the hypothesis was before seeing the evidence. This represents our initial state of knowledge.
P(E) is the marginal likelihood: the total probability of the evidence under all possible hypotheses. This acts as a normalizing constant.

The theorem tells you how to update your beliefs in light of new evidence. You start with a prior belief (how likely you think a hypothesis is before seeing any data), you collect evidence, and the theorem tells you how to adjust your belief to produce a posterior (how likely the hypothesis is after seeing the data).

A Medical Example

The power of Bayes’ theorem becomes clear through a concrete example. Suppose a medical test for a rare disease is 99% accurate (it correctly identifies 99% of people who have the disease and correctly clears 99% of people who do not). Suppose the disease affects 1 in 10,000 people. You test positive. What is the probability that you actually have the disease?

Most people guess 99%. The correct answer, from Bayes’ theorem, is about 1%.

The reasoning: out of 10,000 people, about 1 has the disease and will (probably) test positive. But about 100 of the remaining 9,999 healthy people will also test positive (the 1% false positive rate applied to a large population). So of the approximately 101 people who test positive, only 1 actually has the disease. The probability of having the disease given a positive test is about 1/101, roughly 1%.

The prior probability (1 in 10,000) is so low that even a highly accurate test produces mostly false positives. Bayes’ theorem captures this quantitatively. Without it, the base rate fallacy (ignoring how rare the disease is) leads to wildly overestimating the significance of a positive test.

Laplace and the Rule of Succession

The development of Bayesian thinking did not end with Bayes. Pierre-Simon Laplace independently developed a more general version of Bayes’ theorem in the 1770s and applied it to a wide range of problems in astronomy, demography, and jurisprudence. Laplace was the first to use the theorem systematically in scientific inference, and much of what is now called “Bayesian statistics” is really Laplacian.

Laplace formulated the famous rule of succession: if an event has occurred n times in succession without failure, the probability that it will occur again is (n+1)/(n+2). Using this rule, Laplace estimated the probability that the Sun would rise tomorrow, given that it had risen every day in recorded history. The answer was extremely close to 1, but not exactly 1, a whimsical application that illustrates the Bayesian principle that no amount of evidence makes a conclusion absolutely certain.

The Long Neglect

Despite Laplace’s work, Bayesian methods fell out of favor in the 19th and early 20th centuries. The main objection was the prior probability. Critics argued that the choice of prior was subjective: different people could start with different priors and reach different conclusions from the same data. This seemed unscientific. How could a method that depended on the analyst’s initial beliefs be objective?

The alternative, developed by Ronald Fisher, Jerzy Neyman, and Egon Pearson in the early 20th century, was frequentist statistics. Frequentist methods avoid priors entirely. They define probability as the long-run frequency of events and evaluate hypotheses through procedures (significance tests, confidence intervals) that have guaranteed error rates regardless of the analyst’s beliefs.

Frequentist statistics dominated the 20th century and remains the default framework taught in most introductory statistics courses. But the Bayesian approach never disappeared entirely, and by the late 20th century, it began a dramatic revival.

The Bayesian Revival

Several developments drove the return of Bayesian methods. First, the growth of computing power made Bayesian calculations practical. Bayes’ theorem is conceptually simple, but applying it to complex problems often requires integrating over high-dimensional spaces, a task that is computationally intractable for most problems by hand but feasible with modern computers. The development of Markov chain Monte Carlo (MCMC) methods in the 1990s made it possible to apply Bayesian inference to problems of arbitrary complexity.

Second, the philosophical objections to priors weakened. Statisticians recognized that frequentist methods also involve subjective choices (the choice of test, the significance level, the definition of the hypothesis) and that Bayesian priors, when chosen carefully, can encode genuine prior knowledge (from previous experiments, physical constraints, or expert judgment) in a transparent and mathematically rigorous way.

Third, Bayesian methods proved superior in many practical applications. Spam filtering, speech recognition, machine translation, medical diagnosis, drug development, climate modeling, and cosmological parameter estimation all benefit from the Bayesian framework’s ability to combine prior knowledge with new data and to quantify uncertainty in a natural way.

Bayes in the Modern World

Today, Bayesian methods are everywhere in science and technology, often invisibly.

Your email’s spam filter uses a naive Bayes classifier: it assigns a probability that a message is spam based on the words it contains, updating the probability with each word using Bayes’ theorem. The filter learns from labeled examples (spam and not-spam) and continuously updates its model as new patterns emerge.

Medical diagnosis systems use Bayesian networks to combine symptoms, test results, and patient history to estimate the probability of various conditions. The explicit use of priors (the base rate of each disease in the relevant population) helps avoid the base rate fallacy illustrated in the testing example above.

Machine learning uses Bayesian principles extensively. Bayesian optimization is used to tune the hyperparameters of neural networks. Bayesian neural networks quantify uncertainty in their predictions. Gaussian processes, a Bayesian method, are used for regression and classification in settings where data is scarce.

In cosmology, Bayesian methods are used to estimate the parameters of the universe (its age, expansion rate, dark matter content) from observations of the cosmic microwave background and the large-scale distribution of galaxies. The Bayesian framework naturally handles the combination of multiple data sets and the quantification of parameter uncertainties.

The Mathematics of Belief

Bayes’ theorem is sometimes described as the mathematics of learning. It formalizes the process by which rational agents update their beliefs in response to evidence. Start with what you know (the prior). Observe the world (the evidence). Calculate how the evidence changes what you know (the posterior). Then use the posterior as the prior for the next round of evidence.

This iterative process of belief updating is not just a statistical technique. It is a model of rational thought itself. The philosopher of science Karl Popper argued that science progresses through conjecture and refutation. The Bayesian framework quantifies this process: conjectures are priors, observations are evidence, and refutation (or confirmation) is the updating of posteriors.

The mathematical traditions that underlie Bayesian inference, probability theory, combinatorics, and analysis, trace back through Laplace and Bernoulli to the foundational work of mathematicians like Gauss, whose contributions to statistics (the method of least squares, the normal distribution) are documented in his handwritten notebooks. Gauss himself used Bayesian reasoning in his astronomical calculations, estimating the orbits of celestial bodies from incomplete and noisy data.

The Minister’s Legacy

Thomas Bayes published almost nothing during his lifetime. His most important work was found after his death and presented by a friend. It was ignored for decades, revived by Laplace, challenged by frequentists, and eventually rehabilitated by computers, practical success, and philosophical reflection.

Today, Bayes’ theorem is one of the most widely applied results in all of mathematics. It powers the algorithms that filter your email, recommend your purchases, diagnose your illnesses, and map the universe. It is the formal answer to the question that every scientist, every doctor, and every thinking person asks: given what I have seen, what should I now believe?

A Presbyterian minister in 18th century England asked how to reason backward from effects to causes. The answer he found, a simple equation relating prior beliefs to new evidence, turned out to be one of the most useful ideas in the history of human thought.

The Reverend Bayes

The Theorem

A Medical Example

Laplace and the Rule of Succession

The Long Neglect

The Bayesian Revival

Bayes in the Modern World

The Mathematics of Belief

The Minister’s Legacy

You might also like

John von Neumann Architecture: How One Genius Shaped Modern Computing

Geometric Algebra in Euclid’s Elements: Ancient Visual Mathematics

The Metric System: When France Measured the World