On April I published an entry explaining how to calculate PI using the most basic Monte Carlo algorithm and now, using the Middle Earth cluster I can do it parallelized.

Parallelizing Monte Carlo is a very simple task because of it’s random and independent nature and this basic monte carlo is even easier. I can just run exactly the same routine as before on all nodes and at the end sum everything and divide by the number of nodes. To achieve that, I just changed the main.cc file to use MPI, quite simple indeed.

The old main.cc just called the function and returned the value:

    area_disk(pi, max_iter, delta);
    cout << "PI = " << pi << endl;

But now, the new version should know whether it’s the main node or a computing node. After, all computing nodes should calculate the area and the main node should gather and sum.

    /* Nodes, compute and return */
    if (myrank) {
        area_disk(tmp, max_iter, delta);
        MPI_Send( &tmp, 1, MPI_DOUBLE, 0, 17, MPI_COMM_WORLD );

    /* Main, receive all areas and sum */
    } else {
        for (int i=1; i < size; i++) {
            MPI_Recv( &tmp, 1, MPI_DOUBLE, i, 17, MPI_COMM_WORLD, &status );
            pi += tmp;
        }
        pi /= (size-1);
        cout << "PI = " << pi << endl;
    }

On MPI, myrank says your node number and size shows you the total number of nodes. On the most basic MPI program, if it’s zero you’re the main node, otherwise you’re a computing node.

All computing nodes calculate the area and MPI_Send the result to the main node. The main node waits for all responses on the main loop and sum the temporary result tmp to pi and at the end divide by the number of computing nodes.

Benefits:

This monte carlo is extremely basic and very easy to parallelize. As this copy is run over N computing nodes and there’s no dependency between them you should achieve an increase in speed of over N times the non-parallel one.

Unfortunately, this algorithm is so slow and inaccurate that even running on 9 computing nodes (ie. 9 times faster) it’s still wrong on the third digit.

The slowness is due to the algorithm’s stupidity but the inaccuracy is due to the lack of a really good standard random number generators. Almost all machines yielded results far from the 5-digit answer on M_PI macro of C standard library and the result was also far from it. Also, there are so many other ways of calculating PI that are so much faster that it wouldn’t be a good approach ever!

The good thing is that it was just to show a distributes monte carlo algorithm working… 😉