Motivation: generating prime numbers

Almost all of modern cryptography involves large prime numbers: RSA, ElGamal, DSA, even some Elliptic-curve cryptosystems. The problem of generating large primes suitable for use in cryptography is unfortunately often one passed over by introductory material on the subject.

Prime numbers are often generated by the use of primality tests: given a number, they produce a judgement as to if the number is prime or composite. Surprisingly enough, the process of generating a large random odd number and checking its primality is enough: using the Euler Criterion, the number of primes until a certain number is approximately Generated by LaTeX. So (as an example) the number of primes representable by a 128-bit integer is about: Generated by LaTeX When compared to the amount of values a 128-bit integer, roughly Generated by LaTeX, this implies that a random 128-bit integer has about a 1% chance of being prime. If the primality test is fast, then this is clearly a very feasible way to generate primes. 1

So, let's talk about one algorithm: the Fermat primality test.

The Fermat primality test

The idea behind the so-called Fermat primality test is extremely simple, and uses a theorem that gives the test its name:

Fermat's Little Theorem

Let Generated by LaTeX be a prime, and Generated by LaTeX an integer such that Generated by LaTeX. Then Generated by LaTeX.

The actual algorithm itself: given a positive integer number Generated by LaTeX, pick a random integer Generated by LaTeX such that Generated by LaTeX, and then calculate Generated by LaTeX. Clearly, if Generated by LaTeX, then the number Generated by LaTeX is not a prime, by the contrapositive of Fermat's Little Theorem.

Or, more formally:

Fermat Primality Test

Input: Positive integer Generated by LaTeX.

Output: "Composite" or "Prime".

  1. Pick Generated by LaTeX at random.
  2. Compute Generated by LaTeX.
  3. If Generated by LaTeX then return "Composite".
  4. Compute Generated by LaTeX.
  5. If Generated by LaTeX then return "Prime".
  6. Otherwise return "Composite".

The GCD calculation is in place just in case the random number Generated by LaTeX picked is actually a factor of Generated by LaTeX. The chances of this occuring are low enough that it may seem fruitless, but it will be useful to refer to later on.

Examples

Here's an example: let Generated by LaTeX (which is prime), and pick Generated by LaTeX. Then note that Generated by LaTeX, which is a simple verification of Fermat's Little Theorem.

For a more interesting example, try Generated by LaTeX. Since Generated by LaTeX, clearly the test should fail. Pick Generated by LaTeX and then we see that Generated by LaTeX. But if we choose Generated by LaTeX, it is also interesting to note that Generated by LaTeX -- which appears to verify that 129 is prime, even though we know it is not!

This leads to an important fact about the Fermat primality test: it sometimes lies! It is important to note, however, that it never lies about a number not being prime. If the Fermat test claims a number is not prime, it definitely is not. However, if it says it might be prime, then it is entirely possible that it is lying.

At first, this seems a deathblow to the Fermat test. A test that doesn't always work seems to be not a very useful test. But a ray of hope remains: what if it lies only with an extremely low probability?

Probability of lying

We seem to have a bit of a difficulty now. How can we determine if the Fermat test lies for a given input? Well, here's a place to start. How about we define the list of composite numbers for which a given Generated by LaTeX results in a Fermat lie?

Fermat liars

A number Generated by LaTeX is a Fermat liar for a composite number Generated by LaTeX iff Generated by LaTeX. The set of Fermat liars for a composite positive integer Generated by LaTeX is the set Generated by LaTeX.

This gives us something to think about: the chance for the Fermat test to lie is related to the size of the set Generated by LaTeX. If Generated by LaTeX is small, the chances of lying are small, and the same goes vice-versa. But how small?

That's the question, really. Here's something to consider: remember that GCD test in the psuedocode for the Fermat test above? If we discard everything that has GCD that is not one, then, well . . . we get that Generated by LaTeX, don't we?

Let's try something, then. Using Generated by LaTeX and Generated by LaTeX again from before, Generated by LaTeX. So Generated by LaTeX. Perhaps elements of the multiplicative group? But with Generated by LaTeX, the same thing happens; the GCD is 1.

But this leads to an interesting fact: we only care about elements of Generated by LaTeX. Anything outside will be discarded by the GCD test. But now since Generated by LaTeX by construction, we really note that Generated by LaTeX.

Here's a theorem to get you thinking:

Lagrange's Theorem

Let Generated by LaTeX be a group, and Generated by LaTeX a subgroup of Generated by LaTeX. Then Generated by LaTeX.2

Can we form a subgroup out of Generated by LaTeX? Let's try it, shall we? Using Generated by LaTeX from before, note that Generated by LaTeX is such that Generated by LaTeX, meaning that Generated by LaTeX. But we see that Generated by LaTeX, and Generated by LaTeX. (Since Generated by LaTeX, it is clearly invertible and hence Generated by LaTeX. So it seems that Generated by LaTeX may be a subgroup, from a trivial verification example.

The subgroup test in abstract algebra states that a set Generated by LaTeX is a subgroup of a group Generated by LaTeX iff

Well, clearly Generated by LaTeX, and so we can knock off the first condition. The second gives a little more trouble, but soon falls as soon as we notice that if Generated by LaTeX and Generated by LaTeX, then Generated by LaTeX, as Generated by LaTeX is a commutative group. So Generated by LaTeX, and the second condition applies.

The third condition requires a touch more thought, but is solved just as easily as one finds that Generated by LaTeX

So we see that Generated by LaTeX is indeed a subgroup of Generated by LaTeX! This is a very interesting result: with a little bit of finesse, we can actually get a fairly decent estimate of the probability of lying here.

For the moment, just assume that Generated by LaTeX. So there is an element in Generated by LaTeX that is Generated by LaTeX, since by Lagrange's Theorem we see that Generated by LaTeX, we can conclude that at the very most, the probability of lying is Generated by LaTeX.

The deathblow to the Fermat test still seems intact, until we consider the possibility of running the Fermat test multiple times. Theoretically, if we were to choose the integer Generated by LaTeX at random, then each test would be independent. If we use basic probability theory, and consider executing the test Generated by LaTeX times (say Generated by LaTeX), the chance of all tests lying (and hence the chance of having a composite number) is of course Generated by LaTeX. For Generated by LaTeX, that's approximately Generated by LaTeX -- a probability so small as to be essentially zero.

But, you may say, this is all relying on the fact that Generated by LaTeX! Are there numbers Generated by LaTeX for which this is the case?

Introducing: Carmichael numbers

Sadly, the answer is yes, and these numbers are known as the Carmichael numbers, named after a fellow by the name of Robert Carmichael. Even worse news: there are an infinite number of Carmichael numbers, so we can't just store a list of known Carmichaels and avoid them. They are, fortunately, extremely sparse 3 and hence really are not that much of a problem.

So, what is a Carmichael number, exactly?

Carmichael numbers

A Carmichael number is a positive integer Generated by LaTeX that satisfies:

  1. Generated by LaTeX is composite,
  2. Generated by LaTeX is square-free,
  3. For all primes Generated by LaTeX, Generated by LaTeX.

Here's a quick sketch of a proof of why Carmichael numbers defeat the Fermat primality test, for completeness' sake:

Proof:

Let Generated by LaTeX be a Carmichael number. Then because Generated by LaTeX is composite, we know that for some primes Generated by LaTeX and some positive integers Generated by LaTeX, Generated by LaTeX In particular, because Generated by LaTeX is square-free, Generated by LaTeX, we see that Generated by LaTeX

Let Generated by LaTeX be arbitrary. Then decompose Generated by LaTeX into values Generated by LaTeX as follows: Generated by LaTeX Because Generated by LaTeX, we know that Generated by LaTeX (group closure), and thus that Generated by LaTeX. Hence there is no Generated by LaTeX.

So now, because Generated by LaTeX, we can use the Chinese Remainder Theorem to state that there is some unique number Generated by LaTeX such that Generated by LaTeX. But Generated by LaTeX satisfies this: because Generated by LaTeX, and hence Generated by LaTeX (where Generated by LaTeX) by Fermat's Little Theorem.

So we may conclude that Generated by LaTeX, and hence the arbitrarily-chosen number Generated by LaTeX is a Fermat liar for Generated by LaTeX (because Generated by LaTeX), as required. Generated by LaTeX

So: yes, we do have to worry about Carmichael numbers with the Fermat test, but not really to any significant degree. As noted before, their occurence is low enough that for all but the most paranoid of situations, the Fermat test is really quite sufficient.

Conclusion

So: the Fermat primality test is certainly a simple algorithm with some very interesting properties. It performs well enough in most situations, and also has the advantage of being extremely quick. I'm not suggesting that the next time you need a prime number, you should implement the Fermat test -- there are plenty of good libraries out there that will do this for you -- but it seems like a good idea to at least be aware of the general process.

Happy hacking,

- ethereal


  1. One may ask the question 'What happens at larger bit-sizes?' The chance is actually Generated by LaTeX; hence the chance scales about inverse-linearly with the number of bits. 

  2. There are many equivalent ways of stating Lagrange's Theorem. This just happens to be the most convinient for this context. 

  3. They seem to appear roughly one every fifty billion numbers. There is a list of the known Carmichael numbers up to Generated by LaTeX, for the curious.