MPI stands for Message Passing Interface and is a system to execute programs across nodes in a cluster using a message passing library to enable communication among nodes. It’s a very powerful library and is now the standard for parallel programs.

Normally I’d choose LAM MPI as I always did in the past but I wanted to test MPICH, another very famous MPI implementation.

But what I found out was that the MPICH version for Ubuntu is rather old and the on line documentation is completely different from what I had and there was no documentation at all on any Ubuntu package I could find. (for instance, my config file was apache-like and the new is XML, so I couldn’t even start the service).

Well, I guess that the best always win and that’s the third time I choose LAM over MPICH exactly because of the same problem: installation and documentation.

Installing LAM MPI was very simple. On the master node (gandalf) I installed:

$ sudo apt-get install lam-runtime lam4c2 lam4-dev

And on the execution nodes, just the runtime:

$ sudo apt-get install lam-runtime lam4c2

MPEasy

A while ago I had developed a set of scripts to help running and syncing a LAM MPI cluster when you don’t have a shared disk yet to use within the cluster (my case yet) so it’s specially designed to home clusters and the start of a more serious cluster when you didn’t have time to setup a shared disk setup yet. 😉

So, installing MPEasy is easy, download the tarball, explode it into some dir and set the env variable on your startup script:

On .bashrc:

export MPEASY=~/mpeasy
export PATH=$PATH:$MPEASY/bin

On .cshrc:

setenv MPEASY ~/mpeasy
setenv PATH $PATH:$MPEASY/bin

And put the node list, one per line, on $MPEASY/conf/lam_hosts. Afther that, just running:

$ bw_start

should start your mpi cluster. After that you can start some MPI tests. Go to the $MPEASY/devel/test directory and compile the hello.c.

$ mpicc -o hello hello.c

Than, you need to sync the current devel directory to all nodes:

$ bw_sync

And run:

$ bw_run 10 $MPEASY/devel/test/hello

You should be able to do the same to all other codes on it, just remember to sync before running, otherwise you’ll have an outdated version on the nodes and you’ll have problems. On a shared disk environment it wouldn’t be a problem, of course.