8/26/13

Growing prime factorization plants

Inspired by a wonderful  Factorization Diagrams and Dancing Factors, I've come up with a little experiment in visualization of prime factorization for integers. 

A la "math is beautiful" and all that.

The idea is that prime factors of a number are drawn as "plants". Every prime factor contributes to a branching point in the "plant", the branches have "fruits" at the tips, and the total amount of fruits is thus equal to the number itself.

For example, here's how factorization of 20 (2x2x5) and 54 (2x3x3x3) would look like:


I've hacked together a little program in C++,  it's available on github. Tested on Ubuntu Linux and Windows 7 (there is a project file for Visual Studio 2010, but you'll need to download and compile freeglut yourself in order for it to build).

The code there is as inefficient as it can get (so please don't take it as an example of how to write proper OpenGL programs), but for the sake of experiment it worked just fine.

Going a bit more into depth, at the heart of the whole thing is, of course, the integer factorization itself.

To keep things simple, it was done here with a very basic Trial Division method:

template <typename T>
void prime_factors(T val, std::vector& factors, std::vector* primes_cache = NULL)
{
    std::vector _primes;
    std::vector& primes = primes_cache ? *primes_cache : _primes;

    factors.clear();

    T max_to_check = (T)sqrt((double)val) + 1;
    find_primes(max_to_check, primes);
    const int np = primes.size();

    //  do trial division on the primes one by one
    T cval = val;
    int pidx = 0;
    while (pidx < np)
    {
        const T cp = primes[pidx];
        if (cp > max_to_check) break;
        while (cval % cp == 0) 
        {
            factors.push_back(cp);
            cval /= cp;
        }
        pidx++;
    }

    if (cval > 1) factors.push_back(cval);
}


The list of primes (find_primes) is obtained with an Eratosthenes Sieve. With the latter I went a little bit overboard (alright... it just was a kind of fun), and added a few optimizations, even though it was not needed for the particular problem at hand:
  • the primes less than 1000 are precomputed
  • only odd numbers get sieved out
  • a bit array is used for the "sieved out" status
  • sieving is done in "segments", in order to use less memory and improve cache coherence
  • for the next prime cp all the numbers less than square of cp will not get checked, since they have been already sieved out by smaller primes before
So here's the function itself:

template <typename T>
void find_primes(T up_bound, std::vector& out_primes)
{
    if (out_primes.size() > 0 && out_primes.back() >= up_bound) 
    {
        //  already have the needed prime range
        return;
    }

    const T BAKED_PRIMES[] = {
        2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 
        101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 
        211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 
        307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 
        401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 
        503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 
        601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 
        709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 
        809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 
        907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997 };
    const int NUM_BAKED_PRIMES = sizeof(BAKED_PRIMES)/sizeof(BAKED_PRIMES[0]);

    //  search for the upper bound inside the baked range
    const T* ptop = std::lower_bound(BAKED_PRIMES, BAKED_PRIMES + NUM_BAKED_PRIMES, up_bound + 1);
        
    if (ptop < BAKED_PRIMES + NUM_BAKED_PRIMES)
    {
        //  got all the needed primes from the baked range
        out_primes = std::vector(BAKED_PRIMES, ptop);
        return;
    }

    if (out_primes.size() < NUM_BAKED_PRIMES)
    {
        //  take all the backed primes range before started sieving
        out_primes = std::vector(BAKED_PRIMES, BAKED_PRIMES + NUM_BAKED_PRIMES);
    }

    //  do the Sieve of Eratosthenes on the rest of the range, segment by segment
    const T SEGMENT_SIZE = 512;
    const T SIEVED_ARR_SIZE = (SEGMENT_SIZE + 7)/8;
    char sieved[SIEVED_ARR_SIZE]; // bit array of the "sieved" flags

    T last_sieved = out_primes.back();
    while (last_sieved < up_bound - 1)
    {
        const T num_sieved   = std::min(SEGMENT_SIZE, (up_bound - last_sieved)/2);
        const T first_sieved = last_sieved + 2;
        last_sieved = (num_sieved - 1)*2 + first_sieved;
    
        //  reset the "sieved" flags
        const T sieved_bytes = (num_sieved + 7)/8;
        std::fill(sieved, sieved + sieved_bytes, 0);

        //  sieve out the composites from the given range using the previously found primes 
        //  (only the odd numbers are considered)
        const T max_to_check = (T)sqrt((double)last_sieved) + 1;
        for (int i = 1, e = out_primes.size(); i < e; i++)
        {
            const T cp = out_primes[i];
            if (cp > max_to_check) break;
            const T fs   = std::max(first_sieved, cp*cp); // number to start sieving from
            const T fd   = ((fs + cp - 1)/cp)*cp;         // first dividend in the range
            const T fdo  = fd + cp*(1 - (fd&1));          // first odd dividend in the range
            const T di   = (fdo - first_sieved)/2;        // index of the first odd dividend
            for (T j = di; j < num_sieved; j += cp) 
            {
                //  set the corresponding "sieved" bit
                sieved[j/8] |= (1 << (j%8));
            }
        }

        //  collect all the primes (the numbers with the "sieved" bit not set)
        for (T i = 0; i < sieved_bytes; i++) 
        {
            if (sieved[i] == 0xFF) continue; // no primes in the whole byte
            for (int j = 0; j < 8; j++)
            {
                const T n = (i*8 + j);
                if (!(sieved[i]&(1 << j)) && n < num_sieved)
                {
                    out_primes.push_back(n*2 + first_sieved);
                }
            }
        }
    }
}
      
It's a good example of how a simple piece of logic can eventually turn into an incomprehensible abomination. With all the evolution, history, bizarre optimizations and personality factors.

Anyway, getting to the visualization part, here's how the numbers from 1 to 16 look like:

 

From the prime_factors function we get the list of numbers, which is already sorted from the smallest primes to the biggest ones (which naturally helps in our visualization, by avoiding too much branching early on).

Then, we are going recursively from "leaves" (that is, from the last number in the primes list towards the first one), starting with a simple branch and a "fruit" at its tip:


With every prime number P from the list we repeat the same operation: 
  • all the branches we've got so far are cloned P times
  • a new common root branch is added
  • the parameters for the P cloned branches' are computed, such that their bases are neatly arranged along the semi-circle at the top of the newly added root branch
  • each one from the P cloned branches is transformed by a composition of:
    • a non-uniform scale: the scale about X axis is such that the base can fit together with other bases on the root's top semi-circle; scale about Y axis is fixed
    • a shift upwards, so that the base lies on the root's top semi-circle
    • rotation with the corresponding angle, so that all the P cloned branches are laid out evenly
    • another shift upwards, to give space for the new root branch

It's worth noticing, that going leaves-to-root gives the tree building algorithm complexity of O(N*logN), while going in opposite direction could probably made with O(N), since we'd only need to transform every branch once. But that would only matter if we needed to optimize the algorithm, which at the moment we don't.

Because the scale that we apply on every step is non-uniform, the branches get "skewed", and this effect gets more profound towards the leaves. This may give the whole thing a rather bizarre look (which, let's say, is fine from the point of view of "aesthetics"):


In case when the prime number is too big, the "branches" are represented just with "fruits", and they are arranged into a "bud" (or "inflorescence"), which is using Fermat's spiral layout. Here's how 347 (a prime number) looks like:


The buds will be also drawn at the higher levels, albeit the non-uniform scale makes them look somewhat weird (and it's not obvious how exactly to scale the fruit radii, so I am just doing some kind of an average scale). That's what 1908 (2x2x3x3x53) turns into:


The fruits are colored randomly, using a "bipolar" green-purple color map... just because it looked funky this way (and also because I think that Kenneth Moreland's work is cool).

In the test program, it can be toggled on/off via 'c' button:


So that's about it, we've got our "factorization plants".

Turns out, prime factorization is a fascinating problem, which is important in the computing world, being, for example, at the core of widely used cryptography algorithms. It's a complex problem as well.

Getting a little bit of a visual insight into it feels like a little tiny step into understanding the problem. Even though in reality it's just some amusing pictures.

But hey, it looks nice, so who cares.

1 comment:

onkel dagobert said...

Well done.
I could stare at those trees for hours.
They'd really make a great screensaver.