The dignity of science seems to demand that every aid to
[the resolution of composite numbers into their prime factors]
be zealously cultivated. Carl Friedrich Gauss (1777-1855)
Disquisitiones Arithmeticae #329

Jevons Number. Factoring
8616460799 is now an easy task.

Challenges help tell
special-purpose and general-purpose methods apart.

Special cases of a priori
factorizations are helpful to number theorists.

Trial division may be used
to weed out the small prime factors of a number.

"Can the reader say what two numbers multiplied together will produce the number 8616460799 ?
I think it unlikely that anyone but myself will ever know."
(Principles of Science, p. 123)

The number 8616460799 became known as
Jevons' Number.
If was first factored in 1903 by
Derrick Norman Lehmer
(1867-1938).
Yet, Jevons' Number can now be factored in a few seconds with
a handheld calculator
(this would take a tiny fraction of a second on any desktop computer).

8616460799 = 89681 . 96079

A once formidable challenge has become trivial.
This may serve as a warning against our collective foolishness when we rely
on the apparent difficulty of the factorization problem
to safeguard confidential transactions in the computer age.

(2006-05-15) A fuzzy classification of factoring algorithms...
Distinguishing between special-purpose and general-purpose methods.

In recent years, cryptographic applications have become popular which depend critically on the
difficulty of retrieving two large primes from their product.
This has led to the distinction between two broad classes of factorization methods:

A special-purpose factorization algorithm (usually)
discovers promptly only factors with
a known property (e.g., p is small, or p-1 is smooth)._{ }

Other factorization algorithms are called general-purpose.

We could try to place this distinction on a firm mathematical
footing by stating that the worst running time of a special-purpose algorithm
depends on the size of an unknown prime factor p
(or on some other characteristic of it, like the
so-called smoothness of p-1 or p+1).
By contrast, the worst time for a general-purpose algorithm would
depend only on the size of the number to factor...
This is the right idea but it won't do theoretically "as is",
because the "least special" prime factors of numbers of bounded size
may be used to establish a generous overall upper bound
on the running time of any dubiously defined "special-purpose" algorithm
(provided it's "general" enough to discover any factor, albeit slowly).
Nevertheless, the distinction remains of great practical importance,
to evaluate what state-of-the-art methods can and cannot do.

Cryptographers can defeat efficient special-purpose algorithms by
avoiding "special" primes in their ciphers.
To demonstrate the robustness of ciphers against
general-purpose algorithms, they have been offering
well-publicized prizes
for factoring products of selected large primes.
This served two related purposes:

Offer all
innovators in the domain an alternative to dishonest rewards.

Reassure all clients about the enduring security of the technology.

RSA Laboratories decided to discontinue such challenges in 2007,
stating that
"the industry has a considerably more advanced understanding of the cryptanalytic strength
of common symmetric-key and public-key algorithms".
Well, "more advanced" than what ?
The grammatical flaw reveals the lack of a basis for such confidence.
(My being alive does not prove my immortality, does it?)
I find it somewhat
disturbing that an entire industry based on unproven conjectures
has ceased to monitor the margin which exists between the toughest codes
that can be currently broken and the basic encryption schemes they actually sell...

(2005-05-01) Special Cases
Special numbers often have special (preliminary) factorizations.

If you are into number theory, rather than cryptography, the numbers you'd like to factor
often have special forms and, possibly, special factorizations.
You may want to investigate this before running any factorization algorithm.

For example, the following
Aurifeuillian (preliminary) factorization
often pops up when x is a power of 2:

In either case, the first factor has a similar factorization
when y = 2z^{ 3} etc.

Thus, we have two preliminary factors of 2^{n}+1 when
n is congruent to 2 modulo 4,
we've 6 of them when n is congruent to 6 modulo 12, and so forth...

(2005-04-28) Factoring by trial division:
What you've been taught...
The smallest prime divisor of a composite number n
can't exceed Ön.

The best way to find small prime factors is simply to try them all out...
However, unless a prior sequence of small primes is available,
it's simpler to use a sequence of trial divisors (starting with 2) which includes
all primes without ruling out composite numbers.
For example, we may try 2 and all odd numbers. We may also
try 2, 3, 5 and integers not multiple of these, namely
(for k = 0,1,2,3...):

In such an increasing sequence of trial divisors,
the first number (p) which is found to divide n is necessarily prime
(since any divisor of p would have been found earlier as a divisor of n).
Once such a divisor is found, the search may be resumed
for the lesser number n/p, considering only divisors at least equal to p.

We abort the process when the trial divisor exceeds the square root of
whatever number remains to be factored,
since that number is then proven to be prime.

If you fail to find small factors, it's probably a good idea to
try a primality test
before going any further (preferably using some other method of factorization).
If the number n is prime [beyond a reasonable doubt] the search for its smallest prime
factor may end with n itself, well before the trial divisor
reaches Ön.

With this added primality test, trial division
finds all factors of a
composite number in a time roughly proportional to its
second-largest prime factor.

The running time of the r
method (discussed below)
is roughly proportional to the square root of that.
In the factorization of large numbers, trial division is only useful as a
first step whose purpose is to remove the smallest
prime factors.

(2006-05-25) Allowed and Disallowed Prime Factors
Some forms of numbers only allow factors obeying certain congruences.

In some special cases, the above trial divisions can be
restricted to only certain allowed trial divisors.
Some examples are important theoretically or historically.

The odd primes which divide n^{2}-2
must be congruent to +1 or -1 modulo 8.
Besides 2 and 3, prime factors of n^{2}-3
are congruent to +1 or -1 modulo 12.
Similar statements can be summarized in a tabular form^{ }:

Number

Possible Prime Factors

n^{2} + 1

2, 4k+1

n^{2} - 2

2, 8k+1, 8k+7

n^{2} + 2

2, 8k+1, 8k+3

n^{2} - 3

2, 3, 12k+1, 12k+11

n^{2} + 3

2, 3, 12k+1, 12k+7

n^{2} - 5

2, 5, 10k+1, 10k+9

n^{2} + 5

2, 5, 20k+1, 20k+3,
20k+7, 20k+9

n^{2} - 6

2, 3, 24k+1,
24k+5, 24k+19, 24k+23

n^{2} + 6

2, 3, 24k+1,
24k+5, 24k+7, 24k+11

Those results are direct consequences of the
law of quadratic reciprocity.
Historically, they were discovered before the general law
was proved.

Consider a prime factor p of a Mersenne number
2^{q}-1, for some odd prime q.
In "modulo p" arithmetic, 2 raised to the power of q is unity,
so the order
of 2 divides the prime q and is thus equal to it.
By Fermat's Little Theorem,
the order of 2 must divide (p-1). In other words,
q must divides p-1. So, p is 2kq+1.

Furthermore, since 2^{(p-1)/2} is thus congruent to
1 modulo p, we know
that 2 is a quadratic residue modulo p.
So, p divides a number which is 2 units below a square and must therefore
be congruent to +1 or -1 modulo 8 (as stated above).

For example,
with q = 31, a prime factor is necessarily congruent to either
1 or 63 modulo 248.
This leaves only 373 factors to try (below the square root of 2147483647)
to show that this 31^{st }
Mersenne number is prime.
Some time before 1772, that's exactly how Leonhard Euler
proved the primality of 2147483647. That 10-digit integer would remain the
largest known prime for the better part of a century...

(2005-04-30) Ultimately Periodic Sequences
Picture their indices on a r shape (Greek letter "rho").

A sequence
U_{0},
U_{1},
U_{2} ...
is called ultimately periodic
with preperiod m
and period l when
U_{n+l} = U_{n} for any
n at least equal to m,
but not for n = m-1
(if m ¹ 0).

If each term of a sequence is a function of the previous one, and if only finitely
many values are allowed, then the sequence is necessarily
ultimately periodic.
(HINT: Since only finitely many values are possible,
one will eventually be met again.
This marks the beginning of a regular cycle in a recursive sequence.)

Recursively-defined sequences with finitely many values :

Such a sequence (U) is defined by its initial value U_{0}
and its recursion relation
U_{n+1} = f (U_{n })
over a finite set of p allowed values, like {0,1,2,3 ... p-1}.

The recursive definition of U endows it with a nice property:

{ U_{n} = U_{n+q} }
Û
{ n ³ m and
l | q }

This may be used [with n = q] to show that there's a unique
index n between m (included)
and m+l (excluded)
such that U_{n} = U_{2n}.
(Note that l divides n.)

n := 0 ; x := U_{0} ; y := x repeat
n := n+1 ; x := f (x)
; y := f ( f (y))
until x = y

The above computation of n requires just 3n calls
to the function f.
No more than 2 or 3 values of U are ever stored in the process...

That method is known as Floyd's cycle-finding algorithm.
It was credited by D.E. Knuth
(in 1969) to
Robert W. Floyd
(1936-2001).
It's also called the "tortoise and hare" algorithm: The tortoise
is the slow-moving variable (x) and the hare is the fast-moving variable (y).

I can't resist making a few incidental remarks
(inspired by D.E. Knuth):

A fairly useless expression for
the index n thus computed is:

n =
l max ( 1,
é^{ m}/_{l }ù )

Once n is known, we observe that:

m is the smallest i such that
U_{n+i} = U_{i}
l is the smallest i such that
U_{n+i} = U_{n}

We need not check both of these conditions beyond the point where the
first one is met (which gives directly the preperiod m):
If the second one has already been met, we're done with
the period (l) as well.
Otherwise, we have m < l and must have
l = n
(since l divides
n < m+l < 2l).
All told, we may obtain
m and l by calling
the function f only
2m additional times,
using the following pseudocode
(the above has "initialized" n and
y = U_{n }).

i := 0 ; x := U_{0} ; z := y ;
l := 0 while x ¹ z
i := i+1 ; x := f (x)
; z := f (z)
if z = y and
l = 0
then l := i
wend m := i
if l = 0 then l := n

Application to Factorization Involving a
Polynomialf :

Modulo p, the value of a polynomial f
is congruent to its value modulo pq.

So, if p is a factor of a large integer N, we see that
terms of like indices in the following sequences are congruent
modulo p,
provided U_{0} and V_{0} are equal:

U_{n+1}

=

f
(_{ }U_{n }) [ mod p ]

V_{n+1}

=

f (_{ }V_{n })
[ mod N ]

Thus, if we have U_{i} = U_{j} modulo p,
then V_{i}- V_{j}
is divisible by p too, and we may retrieve p,
or some multiple D of p which divides N, via:

D = gcd ( N , V_{i}- V_{j} )
[ with i ¹ j ]

To turn this into a true factorization technique, we assume no prior knowledge of p
(and/or U). Instead, we just rely on the above result D.
When D = 1, the pair of indices (i,j) is not "magic" and we keep
searching for other possible pairs, following some predetermined scheme
(the above would call for j = 2i for successive values of i,
but there are other ways).
In the rare case where D = N,
we must abort a doomed search.
Otherwise, D is a nontrivial factor of N.

If D = N, a fresh start with a new polynomial is called for.
This is also recommended if D isn't prime and needs to be factored further...
On the other hand, if
N' = N/D isn't prime
it's a very good idea to search for the factors of N'
by resuming the same sequence, modulo N'.

If we look for indices (i,j) in the form i = n and j = 2n,
then the above shows n to be less than m+l,
which is itself usually on the order of
Öp.
We may thus expect factors proportional to the square of the time spent
looking for them.

Although the algorithm discussed next uses a different scheme,
it's enlightening to analyze this simple one, using the very popular polynomial
f (x) = x^{ 2 }+1 :

If a prime number p first appears on the n^{th} row of the
above table, it's called a
primitive factor of the corresponding
term in the tabulated sequence and p would be found to divide
any large integer in n steps of the related process.
However, p could possibly
(albeit rarely) remain buried
within a composite divisor whose prime factors are all
primitive factors of the same term...

The last partial factorization tabulated (n = 9)
is that of a 46377-digit number.
For n = 20, we're dealing with a 194516604724-digit integer,
whose divisors include the sixth power of 677 and the fourth power of 1265129.
For n = 100, we're facing a
really big number, divisible by the twelfth power
of 1265129...
Fortunately, we only use this sequence modulo
the integer to be factored.

The prime p = 1096967 doesn't show up until n = 4032.
All lesser primes appear for n less than or equal
to 3680 (which is needed for p = 967129).

Primitive Factors of a Term in a Sequence :

Long before the notion became relevant to the factorization algorithms
discussed here,
number theorists had defined a primitive factor of a term
in a sequence as a prime divisor of that term which does not
divide earlier terms in the sequence.
(A prime number is a primitive factor of at most
one term in a sequence.)

Such things are worth studying for some sequences with special
divisibility properties (see an example
elsewhere on this site and
also the use of Zsigmondy's Theorem to
prove Wedderburn's Theorem).

The locution "primitive factor" must be taken as a whole; it's peculiar
to this context. It should not be confused
with a primitive root or a primitive element (which
is an element that generates a cyclic group).

(2005-04-30) Pollard's r (rho)
Factoring Method
The name comes from the shape of the letter r,
as discussed above.

In a given amount of time, Pollard's r method may find
factors with about twice as many digits as those obtained by
trial division.
The previous article introduces the main idea.
What follows is just a slightly refined procedure...

Using the above notations,
our previous approach was to discover (at the cost
of 3n calls
to a polynomial function f )
the least index n such that:

U_{n} = U_{2n}
which happens when
n ³ m and
l | n

Instead, we may look for a 2-power k and a lesser q
satisfying the condition:

U_{k} = U_{k+q} with
k = 2^{i } ³ q > 0
which happens when
k ³ m and
l | q

This can be done with only k+q calls to the function
f (we just store the value corresponding to
the latest index which was a 2-power and compare the running
value to that).
The least satisfactory value of k is less than 2n
while the least satisfactory value of q is no more than n
(since n rounded up to a power of two is a satisfactory value of k
with q = n).
Therefore, k+q is always less than 3n, which is to say that
every factor
discovered by the previous approach is discovered by the newer one with
fewer calls to the function f.

Let's just give a terse
UBASIC implementation of this newer version.
For the sake of simplicity and educational value,
this does not check the primality of the divisors printed
(such composite divisors can't be factored by the same procedure,
but might be broken down with a different polynomial in the inner loop).

input "Integer to be factored";N
U=1 : REM U is the value at index K (K is a power of 2)
K=1
V=U : REM V is the value at index K+Q (the "running" value)
loop
for Q=1 to K
V=(1+V^2)@N : REM x@y stands for "x modulo y"
D=gcd(V-U,N) : REM This bottleneck can be bypassed !
if D>1 then print D : N=N\D : if N=1 then end
next Q
U=V
K=2*K
endloop

That's almost as simple as trial division...
The composite numbers which can't be factored this way are those
whose factors all make their
first
appearance in the following table on the same row
(e.g., 217 = 7 . 31).
An ordered list of such "exceptions" is given below.

W_{0} = 1 and
W_{n+1} =
W_{n}^{2 }+1

k+q

Factorization of
W_{k+q }- W_{k}
( k = 2^{i } ³ q > 0 )

Here are the composite numbers for which the above version
of the r-method
won't find any factor:
4, 8, 25, 125, 169, 217, 289, 485, 703, 817, 1027, 1219, 1241, 1469, 1591, 1681, 1943,
2425, 2461, 2983, 3587, 3683, 3749, 4913, 5371, 5497, 5671, 5809, 5947, 6751, 6757,
8149, 8509, 8639, 8927, 9167, 11581, 12125, 12533, 12667, 12997, 13261 ...

Since the r-method is normally used
only
after small primes have been removed by trial division,
such "exceptions" are rare enough in practice.

(2005-04-30)
Pollard's P-1 Method

This approach was devised by John M. Pollard in 1974.
It can find quickly the prime factors p for which p-1
is smooth (i.e., composed of small factors).
It's based on the following remark:

If b is divisible by p-1, then
a^{b}-1 is divisible by a^{p-1}-1,
which is itself divisible by p, if p is prime
(by Fermat's little theorem).

Without looking for the utmost efficiency at first,
we may thus factor a large integer N
using fast exponentiation modulo N,
in the following way:
Consider an arbitrary integer a. Either we're in luck
and gcd(a-1,N) is
a nontrivial factor of N, or else we raise a
to some power and try again with the result...
The above remark tells us that we'll find a nontrivial factor this way,
as soon as the product of the successive exponents happens to be
a multiple of p-1 for some (unknown) prime factor p of N.
(We may succeed earlier because our starting point was already a relevant
power modulo N.)

Consider some arbitrary integer a between 2 and N-1.
Note that gcd(a-1,N)
Raise it successively to the power of 2,3,4,5,6,7...
After m steps, the .../...

(2005-04-30) Williams' P+1 Method

This method was proposed by Hugh C. Williams in 1982.

This approach was first proposed by
H.W. Lenstra, Jr. in 1985.

(2005-04-30) The Many Successful Variants of Dixon's Method
Combining small square residues modulo n (or kn) to find a factor of n.

This method is intended for the factorization of a fairly large number n,
preferably once smaller factors have been removed, using preliminary methods
which are more efficient at weeding out small or
medium-sized divisors...

As far as we know, no one has yet devised a more efficient approach for large factors.
Frustrating as this may be, this is the best we have. For now.

The basic idea is to find many numbers whose squares are congruent to
relatively small integers modulo n
(or modulo kn, for some optimized multipler k, if the
continued fraction approach is used).
Hopefully, such "small" numbers can be fully factored using only a so-called
factor base...

The factor base is a predetermined set of small primes
[together with the negative unit (-1) whenever negative residues are expected].
In some versions, one larger prime is allowed in each factorization,
in the hope that it will pop up again in the same capacity when factoring another
residue (when this happens, the two factorizations can be effectively
combined into a single factorization over the factor base only).

A nontrivial solution of x^{ 2}
º y^{ 2}
can be constructed using a large enough set of those;
that yields a factor of n as either gcd(n,x-y) or gcd(n,x+y).