It was known in antiquity that there is no “largest” prime number, and hence that there must be infinitely many primes. Euclid proved in in Book IX of his *Elements*. For suppose the set S of primes is finite. Consider the number N = 1 + ∏_{p∈S} p. N is a finite number if S is a finite set. N is not divisible by any p ∈ S, because the remainder of dividing N by such a p is always 1. N can’t be prime, otherwise N would be a member of S, which is impossible by the reason given in the previous sentence. So N must be divisible by some prime not in S, but that contradicts the definition of S. This contradiction proves S must be an infinite set.

Interestingly enough, Euler’s product formula for ζ(s) gives an independent proof that there are infinitely many primes. It is known that the infinite sum ∑_{n} 1/n diverges, i. e. is not a finite number. By a standard result of calculus, one has the “Taylor series” expansion:

-log(1-x) = ∑

_{1≤n<∞}x^{n}/ n

(Here, “log” always means the natural logarithm, with base e = 2.718281828… .) But this expansion is valid only for -1 ≤ x < 1, since the left hand side is not finite for x = 1. In fact, the sum with N terms is approximately log N, which grows arbitrarily large as N does, albeit very slowly. More precisely, there is a small number γ such that

γ = lim

_{N→∞}[(∑_{1≤n≤N}1/n) – log N]

This too was proven by Euler, and γ is known as “Euler’s constant”. It has been computed that γ = 0.5772157… . (Incidentally, it’s an open question whether γ is a rational number, though it very likely isn’t.) Consequently, the sum ∑_{n} 1/n^{s} must be arbitrarily large for real s > 1 as s approaches 1. But by Euler’s formula for ζ(s), that sum must approach the product ∏_{p} (1 – 1/p)^{-1}, which is finite if there are only finitely many primes. The contradiction implies there must be infinitely many primes.

So already the function ζ(s) tells us something about the distribution of primes. Now, given that there are infinitely many primes, we want to ask ourselves how are they distributed. Simply looking at a table of primes suggests that the primes come farther and farther apart the larger they are. We would like to have a better quantification of this observation.

Intuitively, the reason that primes become sparser as they grow larger is that in some sense it becomes more “difficult” for a large number to be prime the larger it is. This is because there are an ever increasing number of smaller numbers (primes among them) which could divide it. The ancient Greeks knew how to construct tables of primes using what is known as the “sieve of Eratosthenes”, after its inventor, who lived in the 3rd century BCE. The method of the sieve is to list all numbers up to whatever limit one wishes to deal with. Then starting with the prime 2, cross out every second number thereafter, every 3rd number after 3, every 5th number after 5, and so on, up to a prime larger than the square root of the largest number in the original list. The numbers which are left are guaranteed to be prime.

One can ask what is the probability that any given number in the list will be crossed out at some point in the process. Of course, for any particular number, it is either prime or it isn’t, so the probability is either 0 or 1. But suppose we don’t know in advance which numbers are which and we simply pick one at random. What can we say about the probability it is prime, knowing only its size?

Suppose the number is X. The probability it is divisible by 2 is 1/2. The probability it is divisible by 3 is 1/3, and so the probability that it is *not* divisible by 3 is 2/3. In general, for any prime p ≤ X, the probability X is divisible by p is 1/p, and the probability it is not divisible by p is 1 – 1/p. Therefore, the probability that X is prime is the probability that it is not divisible by *any* prime p ≤ X, which is the product of all the separate probabilities, which we define as G(X) = ∏_{p≤X} (1 – 1/p) – *provided* the probabilities for divisibility by each prime are independent of each other. *A priori* we don’t know about independence. There’s no good reason to think the probabilities are entirely independent. We will see that a slight adjustment is necessary to get the correct probability.

We are interested in the total number of primes ≤ X, and so we give it a name. We will call this number π(X). A very rough approximation is the count of numbers ≤ X times the probability each is prime. We would like to find a simple formula f(X) such that the ratio of π(X) to f(X) for large X is 1. Mathematicians call this notion “asymptotic equivalence” and even have a notation for it. We write π(X) ∼ f(X) provided π(X)/f(X) → 1 as X → ∞. Note that is not to say that the limit of π(X) is f(X) as X → ∞, since that would imply the *difference* between the two tends to 0 as X → ∞. We are willing to settle for something a little weaker, namely that the *ratio* tends to 1 in the limit.

Does that product expression for the probability G(X) look familiar? It sure as heck does. It is a finite form of the product in Euler’s formula for the zeta function (or rather, its reciprocal, to be precise). We also recall that there is a relationship between the finite sum ∑_{1≤n≤X} 1/n and log(X) which involves Euler’s constant γ. Is there a similar relationship between the finite product and log(X)? Indeed there is. It was discovered by Franz Mertens in 1874, and so it’s called Mertens’ theorem:

1/G(X) = ∏

_{p≤X}(1 – 1/p)^{-1}∼ e^{γ}log(X)

(Mertens also made a conjecture, called the Mertens conjecture which, if true, would have implied the Riemann hypothesis. Unfortunately, it was proved false in 1984, as we shall see later.) When a careful analysis of probabilities is done, it turns out that e^{γ} is exactly the factor that must multiply G(X) to give the probability that X is prime. (e^{γ} is about 1.782, so the estimate wasn’t wildly off.) In other words, the suspicion is that

π(X) ∼ e

^{γ}G(X)X

and so by Mertens’ theorem

π(X) ∼ X / log(X)

This formula can be interpreted to say that the probability that a given number less than X is prime is just 1/log(X). Alternatively, it says that the average spacing between primes less than X is about log(X).

The argument here is *by no means* a rigorous proof. The result is in fact *true*, but it is a very difficult theorem, now called the prime number theorem. C. F. Gauss (1777-1855) suspected (perhaps as early as 1792 by his own account) that the probability any integer X is prime is about 1/log(X), and hence that π(X) ∼ X / log(X). But a rigorous proof did not come until 1896, when Jacques Hadamard (1865-1963) and C. J. de la Vallée-Poussin (1866-1962) independently came up with proofs. Although various proofs of the prime number theorem have been given since then, none are especially easy.

Interestingly enough, Riemann himself seems not to have worked much on this problem, even though his famous conjecture turns out to be equivalent to a statement about just how precise the approximation of the prime number theorem actually is. To understand somewhat of the nature of this deep connection, we have to talk more about the zeta function.