home | index | units | counting | geometry | algebra | trigonometry & functions | calculus
analysis | sets & logic | number theory | recreational | misc | nomenclature & history | physics

Final Answers
© 2000-2014   Gérard P. Michon, Ph.D.

Prime Factorizations

 Carl Friedrich Gauss 
 (1777-1855)
The dignity of science seems to demand that every aid to [the resolution of composite numbers into their prime factors] be zealously cultivated.
Carl Friedrich Gauss (1777-1855)  Disquisitiones Arithmeticae  #329
 border
 border  border
 border

Related articles on this site:

Related Links (Outside this Site)

Home page  of  John M. Pollard  (and his famous papers on  Number Theory).
Factoring:  State of the Art and Predictions  (1995).
Factoring Theory  by  Paul Herman.
Prime Factorization Algorithms  by  Eric W. Weisstein  (MathWorld).
(Former) RSA Factoring Challenge from RSA Laboratories  (cf. Wikipedia)

Calculators :

Factorization using the Elliptic Curve Method  by Dario Alejandro Alpern.

Wikipedia :   Quadratic Sieve (QS)   |   SNFS   |   Number Field Sieve (NFS)

DMOZ: Factoring

Books :

Edouard Lucas and primality testing  by Hugh Cowie Williams  (1998).
 
border
border

Factoring into Primes


(2009-01-17)   Jevons' Number :   8616460799
A once formidable factorization challenge is now a trivial task.

William Stanley Jevons (1835-1882) once wrote:

"Can the reader say what two numbers multiplied together will produce the number 8616460799 ?  I think it unlikely that anyone but myself will ever know."   (Principles of Science, p. 123)

The number  8616460799  became known as  Jevons' Number.  If was first factored in 1903 by Derrick Norman Lehmer (1867-1938).  Yet,  Jevons' Number  can now be factored in a few seconds with a  handheld calculator  (this would take a tiny fraction of a second on any desktop computer).

8616460799   =   89681 . 96079

A once formidable challenge has become trivial.  This may serve as a warning against our collective foolishness when we rely on the apparent difficulty of the factorization problem to safeguard confidential transactions in the computer age.


(2006-05-15)   A fuzzy classification of factoring algorithms...
Distinguishing between special-purpose and general-purpose methods.

In recent years, cryptographic applications have become popular which depend critically on the difficulty of retrieving two large primes from their product.  This has led to the distinction between two broad classes of factorization methods:

  • special-purpose  factorization algorithm (usually) discovers promptly only factors with a known property  (e.g., p is small, or p-1 is smooth). 
  • Other factorization algorithms are called  general-purpose.

We could  try  to place this distinction on a firm mathematical footing by stating that the worst running time of a special-purpose algorithm depends on the size of an unknown prime factor  p  (or on some other characteristic of it, like the so-called  smoothness  of  p-1  or  p+1).  By contrast, the worst time for a general-purpose algorithm would depend only on the size of the number to factor...  This is the right idea but it won't do theoretically "as is", because the "least special" prime factors of numbers of bounded size may be used to establish a generous overall upper bound on the running time of any dubiously defined "special-purpose" algorithm  (provided it's "general" enough to discover any factor, albeit slowly).  Nevertheless, the distinction remains of great  practical  importance, to evaluate what state-of-the-art methods can and cannot do.

Cryptographers can defeat efficient  special-purpose  algorithms by avoiding "special" primes in their ciphers.  To demonstrate the robustness of ciphers against  general-purpose  algorithms, they have been offering well-publicized prizes for factoring products of selected large primes.  This served two related purposes:

  • Offer  all  innovators in the domain an alternative to dishonest rewards.
  • Reassure  all  clients about the enduring security of the technology.

RSA Laboratories  decided to discontinue such challenges in 2007, stating that  "the industry has a considerably more advanced understanding of the cryptanalytic strength of common symmetric-key and public-key algorithms".  Well, "more advanced" than  what ?  The grammatical flaw reveals the lack of a basis for such confidence.  (My being alive does not prove my immortality, does it?)  I find it somewhat disturbing that an entire industry based on  unproven conjectures  has ceased to monitor the margin which exists between the toughest codes that can be currently broken and the basic encryption schemes they actually sell...


(2005-05-01)   Special Cases
Special numbers often have special (preliminary) factorizations.

If you are into number theory, rather than cryptography, the numbers you'd like to factor often have special forms and, possibly, special factorizations.  You may want to investigate this before running any factorization algorithm.

For example, the following Aurifeuillian (preliminary) factorization often pops up when  x  is a power of 2:

4 x 4 + 1   =   ( 2 x 2 - 2x + 1)  ( 2 x 2 + 2x + 1)

Don't rush either factor into a factoring algorithm yet...  If  x = 2y 3, then:

( 2 x 2 + 2x + 1)   =   ( 2 y 2 - 2y + 1)  ( 4 y 4 + 4 y 3 + 2 y 2 + 2y + 1)
( 2 x 2 - 2x + 1)   =   ( 2 y 2 + 2y + 1)  ( 4 y 4 - 4 y 3 + 2 y 2 - 2y + 1)

In either case, the first factor has a similar factorization when  y = 2z 3  etc.

Thus, we have two preliminary factors of  2n+1  when  n  is congruent to  2  modulo 4,  we've 6 of them when  n  is congruent to 6 modulo 12, and so forth...


(2005-04-28)   Factoring by trial division:  What you've been taught...
The smallest prime divisor of a composite number n can't exceed Ön.

The best way to find small prime factors is simply to try them all out...  However, unless a prior sequence of small primes is available, it's simpler to use a sequence of trial divisors (starting with 2) which includes all primes without ruling out composite numbers.  For example, we may try 2 and all odd numbers.  We may also try  2, 3, 5  and integers not multiple of these, namely  (for k = 0,1,2,3...):

30k+7, 30k+11, 30k+13, 30k+17, 30k+19, 30k+23, 30k+29, 30k+31

In such an  increasing  sequence of trial divisors, the first number (p) which is found to divide n is  necessarily  prime  (since any divisor of p would have been found earlier as a divisor of n).  Once such a divisor is found, the search may be resumed for the lesser number  n/p, considering only divisors at least equal to  p.

We abort the process when the trial divisor exceeds the square root of whatever number remains to be factored, since that number is then  proven  to be prime.

If you fail to find small factors, it's probably a good idea to try a  primality test  before going any further (preferably using some other method of factorization).  If the number n is prime [beyond a reasonable doubt] the search for its smallest prime factor may end with n itself, well before the trial divisor reaches Ön.


With this added primality test,  trial division  finds  all  factors of a composite number in a time roughly proportional to its  second-largest  prime factor.

The running time of the  r  method  (discussed below)  is roughly proportional to the square root of that.  In the factorization of large numbers, trial division is only useful as a  first step  whose purpose is to remove the smallest prime factors.


(2006-05-25)   Allowed and Disallowed Prime Factors
Some forms of numbers only allow factors obeying certain congruences.

In some special cases, the above trial divisions can be restricted to only certain  allowed  trial divisors.  Some examples are important theoretically or historically.

The odd primes which divide   n2-2   must be congruent to +1 or -1 modulo 8.  Besides 2 and 3, prime factors of   n2-3   are congruent to +1 or -1 modulo 12.  Similar statements can be summarized in a tabular form :

 Number Possible Prime Factors
n2 + 12,   4k+1
n2 - 22,   8k+1,   8k+7
n2 + 22,   8k+1,   8k+3
n2 - 32,   3,   12k+1,   12k+11
n2 + 32,   3,   12k+1,   12k+7
n2 - 52,   5,   10k+1,   10k+9
n2 + 52,   5,   20k+1,   20k+3,   20k+7,   20k+9
n2 - 6  2,   3,   24k+1,   24k+5,   24k+19,   24k+23  
n2 + 6  2,   3,   24k+1,   24k+5,   24k+7,   24k+11  

Those results are direct consequences of the  law of quadratic reciprocity.  Historically, they were discovered  before  the general law was proved.

The prime factors of  Mersenne numbers :

Consider a prime factor  p  of a Mersenne number  2q-1,  for some odd prime  q.  In "modulo p" arithmetic,  2  raised to the power of q is unity, so the order of  2  divides the prime q and is thus equal to it.  By Fermat's Little Theorem, the order of  2  must divide  (p-1).  In other words,  q  must divides p-1.  So, p is 2kq+1.

Furthermore, since  2(p-1)/2  is thus congruent to  1  modulo p, we know that 2 is a  quadratic residue  modulo p.  So, p divides a number which is 2 units below a square and must therefore be congruent to +1 or -1 modulo 8  (as stated above).

So, by the Chinese remainder theorem, p can only have 2 values modulo 8q.

For example, with  q = 31, a prime factor is necessarily congruent to either  1  or  63  modulo  248.  This leaves only 373 factors to try (below the square root of 2147483647)  Leonhard Euler 
 (1707-1783) to show that this  31st  Mersenne number is prime.  Some time before 1772, that's exactly how Leonhard Euler proved the  primality  of  2147483647.  That 10-digit integer would remain the largest known prime for the better part of a century...

 The Rho Shape
(2005-04-30)   Ultimately Periodic Sequences
Picture their indices on a r shape  (Greek letter "rho").

A sequence  U0, U1, U2 ... is called  ultimately periodic  with  preperiod  m and period l when  Un+l = Un  for any  n  at least equal to m, but not for  n = m-1  (if m ¹ 0).

If each term of a sequence is a function of the previous one, and if only finitely many values are allowed, then the sequence is necessarily  ultimately periodic.  (HINT:  Since only finitely many values are possible, one will eventually be met again.  This marks the beginning of a regular cycle in a recursive sequence.)

Recursively-defined sequences with finitely many values :

Such a sequence (U) is defined by its initial value   U0  and its recursion relation  Un+1 =  f (U)   over a finite set of p allowed values, like {0,1,2,3 ... p-1}.

The recursive definition of  U  endows it with a nice property:

{ Un = Un+q }   Û   { n ³ m   and   l | q }

This may be used [with n = q] to show that there's a  unique  index n between m (included) and m+l (excluded) such that Un = U2n.  (Note that l divides n.)

n := 0   ;   x := U0   ;   y := x
repeat     n := n+1   ;   x := f (x)   ;   y := ff (y))     until     x = y

The above computation of  n  requires just  3n  calls to the function  f.  No more than 2 or 3 values of  U  are ever stored in the process...

That method is known as Floyd's cycle-finding algorithm.  It was credited by D.E. Knuth (in 1969) to Robert W. Floyd (1936-2001).  It's also called the "tortoise and hare" algorithm:  The tortoise is the slow-moving variable (x) and the hare is the fast-moving variable (y).
I can't resist making a few incidental remarks  (inspired by D.E. Knuth):

A fairly  useless  expression for the index  n  thus computed is:

n   =   l  max ( 1, é m/ l ù )

Once  n  is known, we observe that:

m  is the smallest i such that  Un+i = Ui
l   is the smallest i such that  Un+i = Un

We need not check both of these conditions beyond the point where the first one is met (which gives directly the preperiod m):  If the second one has already been met, we're done with the period (l) as well.  Otherwise, we have m < l  and must have  l = n  (since l divides  n < m+l < 2l).  All told, we may obtain  m  and  l  by calling the function  f  only  2m  additional times,  using the following pseudocode  (the above has "initialized"  n  and  y = U).

i := 0   ;   x := U0   ;   z := y   ;   l := 0
while   x ¹ z
    i := i+1   ;   x := f (x)   ;   z := f (z)
    if   z = y   and   l = 0   then   l := i
wend
m := i
if   l = 0   then   l := n

Application to Factorization Involving a  Polynomial  f :

Modulo p, the value of a polynomial  f  is congruent to its value modulo pq...

If p is a factor of a large integer N, we observe that terms of like indices in the following sequences are congruent modulo p, provided U0 and V0 are equal:

Un+1  =   f ( Un )   [ mod p ]
Vn+1= f ( Vn )   [ mod N ]

Therefore, if we have  Ui = Uj  modulo p,  then  Vi- Vj  is divisible by  p  too, and we may retrieve  p,  or some multiple  D  of  p  which divides  N,  via:

D   =   gcd ( N , Vi- Vj )         [ with  i ¹ j ]

To turn this into a true factorization technique, we assume no prior knowledge of p  (and/or U).  Instead, we just rely on the above result D.  When D = 1, the pair of indices (i,j) is not "magic" and we keep searching for other possible pairs, following some predetermined scheme  (the above would call for  j = 2i  for successive values of i, but there are other ways).  In the rare case where  D = N, we must abort a doomed search.  Otherwise, D is a nontrivial factor of N.

If D = N, a fresh start with a new polynomial is called for.  This is also recommended if D isn't prime and needs to be factored further...  On the other hand, if  N' = N/D  isn't prime it's a very good idea to search for the factors of  N'  by resuming the  same  sequence, modulo N'.

If we look for indices (i,j) in the form  i = n and j = 2n,  then the above shows n to be less than m+l, which is itself   usually on the order of Öp.  We may thus expect factors proportional to the  square of the time spent looking for them.

Although the algorithm discussed next uses a different scheme, it's enlightening to analyze this simple one, using the very popular polynomial  f (x) =  x 2 +1 :

W0 = 1   and    Wn+1 =  Wn2 +1
 n Factorization of   W2n - Wn
13
22 5 . 3 . 7
33 3 . 5 3 . 7 . 31 . 97 . 2957
42 8 . 3 . 7 . 11 . 13 3 . 19 . 31 . 37 . 79 . 113 . 163 . 9103 . 16369 . 37633 . 3898927 . 21617899
53 . 7 . 19 . 31 . 37 . 83 . 151 . 283 . 563 . 677 3 . 821 . 13249 . 19687 . 81707 . 152777 . 459007 . 5729593 . 1571825191 . 30009484129 . 45341337176538211 . 1429595747719217881879 .
2358937632593897968703206264666894551643096860994612426386412205627
62 11 . 3 3 . 5 4 . 7 2 . 11 . 17 3 . 19 . 23 . 31 . 37 . 43 . 53 . 73 . 97 . 107 . 163 . 211 . 239 . 313 . 2039 . 2957 . 8429 . 9949 . 10861 . 13339 . 38747 . 45833 3 . 459007 . 791801 . 2822069 . 15607909 . 17008487 . 250212983 . 298814209 . 327430417 . 384223559   ...   etc.
73 . 7 . 19 . 31 . 37 . 41 3 . 43 . 131 . 157 . 313 . 317 . 1277 3 . 2789 . 41179 . 459007 . 4012193 3 . 15607909 . 34658977 . 58348273 . 83570923 . 1497946091 . 1961655181   ...   etc.
82 14 . 3 . 7 . 11 . 13 4 . 19 . 23 . 29 . 31 . 37 . 43 . 53 . 79 . 107 . 113 . 157 . 163 . 173 . 197 . 239 . 313 . 1451 . 1877 . 2153 . 2887 . 3391 . 3511 . 3539 . 3919 . 5669 . 7121 3 . 8737 . 9103 . 9949 . 13759 . 16369 . 37633 . 38747 . 459007 . 621113 3 . 677791 . 1188017 . 2331751 . 3898927 . 7418347 . 14350313 . 15607909 . 21617899 . 27975743 . 45004633 . 1847659213 . 1961655181 . 2468240317   ...   etc.
93 4 . 5 5 . 7 . 19 2 . 31 . 37 . 43 . 59 . 97 . 101 . 157 . 271 . 313 . 509 3 . 743 . 983 . 1033 . 1187 . 2039 . 2957 . 3067 . 3511 . 8429 . 13441 . 27191 . 35449 . 56813 3 . 97381 . 459007 . 7248457 . 15607909 . 17008487 . 21298769 3 . 27308977 . 45004633 . 250212983 . 298814209 . 299155907 . 384223559 . 1735227853 . 1847659213 . 1961655181 . 2321360039   ...   etc.

If a prime number  p  first appears on the nth row of the above table, it's called a  primitive factor  of the corresponding term in the tabulated sequence and  p  would be found to divide any large integer in  n  steps of the related process.  However,  p  could possibly  (albeit rarely)  remain  buried  within a composite divisor whose prime factors are all  primitive factors  of the same term...

The last  partial  factorization tabulated  (n = 9)  is that of a 46377-digit number.  For n = 20, we're dealing with a 194516604724-digit integer, whose divisors include the sixth power of 677 and the fourth power of 1265129.  For n = 100, we're facing a  really  big number, divisible by the twelfth power of 1265129...  Fortunately, we only use this sequence modulo the integer to be factored.

The prime  p = 1096967  doesn't show up until  n = 4032.  All  lesser primes appear for  n  less than or equal to  3680  (which is needed for  p = 967129).

Primitive Factors of a Term in a Sequence :

Long before the notion became relevant to the factorization algorithms discussed here, number theorists had defined a  primitive factor  of a term in a sequence as a  prime  divisor of that term which does not divide earlier terms in the sequence.  (A prime number is a  primitive factor  of at most one term in a sequence.) 

Such things are worth studying for some sequences with special divisibility properties  (see an example elsewhere on this site and also the use of  Zsigmondy's Theorem  to prove Wedderburn's Theorem).
 
The locution "primitive factor" must be taken as a whole; it's peculiar to this context.  It should not be confused with a primitive root or a primitive element  (which is an element that generates a cyclic group).


(2005-04-30)   Pollard's r (rho) Factoring Method
The name comes from the shape of the letter r, as discussed above.

In a given amount of time, Pollard's  r  method may find factors with about  twice  as many digits as those obtained by trial division.  The previous article introduces the main idea.  What follows is just a slightly refined procedure...

Using the above notations, our previous approach was to discover  (at the cost of  3n  calls to a polynomial function  f )  the  least  index  n  such that:

Un  =  U2n
which happens when   n ³ m   and   l | n

Instead, we may look for a 2-power  k  and a lesser  q  satisfying the condition:

Uk  =  Uk+q     with   k = 2 ³ q > 0
which happens when   k ³ m   and   l | q

This can be done with only  k+q  calls to the function  f  (we just store the value corresponding to the latest index which was a 2-power and compare the running value to  that).  The least satisfactory value of  k  is less than  2n  while the least satisfactory value of  q  is no more than  n  (since  n  rounded up to a power of two is a satisfactory value of  k  with  q = n).  Therefore, k+q  is always less than  3n,  which is to say that  every  factor discovered by the previous approach is discovered by the newer one with  fewer  calls to the function  f.

Let's just give a terse  UBASIC  implementation of this newer version.  For the sake of simplicity and educational value, this does not check the primality of the divisors printed  (such composite divisors can't be factored by the same procedure, but might be broken down with a different polynomial in the inner loop).

input "Integer to be factored";N

U=1 : REM U is the value at index K   (K is a power of 2)
K=1
V=U : REM V is the value at index K+Q (the "running" value)
loop
   for Q=1 to K
      V=(1+V^2)@N : REM  x@y  stands for "x modulo y"
      D=gcd(V-U,N) : REM This bottleneck can be bypassed !
      if D>1 then print D : N=N\D : if N=1 then end
   next Q
   U=V
   K=2*K
endloop

That's almost as simple as trial division...  The composite numbers which can't be factored by this procedure are those whose factors  all  make their  first  appearance in the following table on the  same  row.  One of the first examples is  217 = 7 . 31.  An ordered list of such "exceptions" is given below.

W0 = 1   and    Wn+1 =  Wn2 +1
 k+q Factorization of   Wk+q - Wk       ( k = 2 ³ q > 0 )
33
42 3 . 3
53 . 7 . 31
62 6 . 3 . 7 . 11 . 31
73 3 . 5 3 . 7 . 31 . 97 . 2957
82 6 . 7 . 11 . 13 2 . 31 . 79 . 113 . 16369 . 3898927
93 . 7 . 19 . 31 . 37 . 43 . 157 . 313 . 459007 . 15607909 . 1961655181 . 143281730359
102 12 . 3 . 7 . 11 . 19 . 23 . 31 . 37 . 43 . 53 . 107 . 157 . 163 . 239 . 313 . 9949 . 13759 . 38747 . 459007 . 15607909 . 1961655181 . 143281730359 . 1603600833843161189
113 3 . 5 4 . 7 . 19 . 31 . 37 . 43 . 59 . 97 . 157 . 313 . 2039 . 2957 . 8429 . 97381 . 459007 . 15607909 . 17008487 . 250212983 . 298814209 . 384223559 . 1961655181 . 2321360039 . 143281730359 . 465953130701 . 6342197990153740801 . 13827015174505923592330541921025424182389
122 12 . 3 . 7 . 11 . 13 3 . 19 . 23 . 31 . 37 . 43 . 53 . 79 . 107 . 113 . 157 . 163 . 173 . 197 . 239 . 313 . 1877 . 9103 . 9949 . 13759 . 16369 . 37633 . 38747 . 459007 . 1188017 . 3898927 . 7418347 . 14350313 . 15607909 . 21617899 . 2468240317 . 1961655181 . 143281730359 . 213125722525297 . 4714802990227327 . 1603600833843161189 ... etc.

Here are the composite numbers for which the above version of the r-method won't find  any  factor:   4, 8, 25, 125, 169, 217, 289, 485, 703, 817, 1027, 1219, 1241, 1469, 1591, 1681, 1943, 2425, 2461, 2983, 3587, 3683, 3749, 4913, 5371, 5497, 5671, 5809, 5947, 6751, 6757, 8149, 8509, 8639, 8927, 9167, 11581, 12125, 12533, 12667, 12997, 13261 ...

However, since the r-method is normally used  only  after small primes have been removed by trial division, such "exceptions" are rarely encountered in practice.


(2005-04-30)   Pollard's P-1 Method

This approach was devised by John M. Pollard in 1974.  It can find quickly the prime factors  p  for which  p-1  is  smooth  (i.e., composed of   small  factors).  It's based on the following remark:

If  b  is divisible by  p-1,  then  ab-1  is divisible by  ap-1-1,  which is itself divisible by  p, if p is prime  (by Fermat's little theorem).

Without looking for the utmost efficiency at first, we may thus factor a large integer  N  using fast exponentiation modulo N,  in the following way:  Consider an arbitrary integer  a.  Either we're in luck and  gcd(a-1,N)  is a nontrivial factor of N, or else we raise  a  to some power and try again with the result...  The above remark tells us that we'll find a nontrivial factor this way, as soon as the product of the successive exponents happens to be a multiple of p-1 for some (unknown) prime factor p of N.  (We may succeed earlier because our starting point was already a relevant power modulo N.)

 Come back later, we're
 still working on this one...

Consider some arbitrary integer  a  between 2 and N-1.  Note that gcd(a-1,N) Raise it successively to the power of 2,3,4,5,6,7...  After m steps, the .../...

 Come back later, we're
 still working on this one...


(2005-04-30)   Williams' P+1 Method

This method was proposed by  Hugh C. Williams  in 1982.

 Come back later, we're
 still working on this one...


(2005-04-30)   Lenstra's Elliptic Curve Method   (ECM)

This approach was first proposed by  H.W. Lenstra, Jr.  in 1985.

 Come back later, we're
 still working on this one...


(2005-04-30)   The Many Successful Variants of Dixon's Method
Combining small square residues modulo n (or kn) to find a factor of n.

This method is intended for the factorization of a fairly large number n, preferably once smaller factors have been removed, using preliminary methods which are more efficient at weeding out small or medium-sized divisors...

As far as we know, no one has yet devised a more efficient approach for large factors.  Frustrating as this may be, this is the best we have.  For now.

The basic idea is to find many numbers whose squares are congruent to relatively small integers modulo n  (or modulo kn, for some optimized multipler k, if the continued fraction approach is used).  Hopefully, such "small" numbers can be fully factored using only a so-called  factor base...

The  factor base  is a predetermined set of small primes  [together with the negative unit (-1) whenever negative residues are expected].  In some versions, one larger prime is allowed in each factorization, in the hope that it will pop up again in the same capacity when factoring another residue  (when this happens, the two factorizations can be effectively combined into a single factorization over the  factor base  only).

A nontrivial solution of   x 2 º y 2   can be constructed using a large enough set of those; that yields a factor of n as either gcd(n,x-y) or gcd(n,x+y).

 Come back later, we're
 still working on this one...

Continued Fractions :

Quadratic Sieve :

Multiple Polynomial Quadratic Sieve (MPQS)

Special Number Field Sieve (SNFS)

[General] Number Field Sieve (NFS or GNFS)

 Come back later, we're
 still working on this one...

border
border
visits since May 12, 2005
 (c) Copyright 2000-2014, Gerard P. Michon, Ph.D.