My current effort is to drive ARM hardware into the data centre, through HPC deployments, for the exascale challenge. Current Top500 clusters start from 0.5 petaflops, but the top 10 are between 10 and 90 petaflops, using as much as close to 20MW of combined power. Scaling up 10 ~ 100 times in performance towards exascale would require a whole power plant (coal or nuclear) for each cluster. So, getting to ExaFLOPS involves not only getting server-grade ARM chips into a rack mount, but baking the whole ecosystem (PCB, PCIe, DRAM, firmware, drivers, kernel, compilers, libraries, simulations, deep learning, big data) that can run at least at 10x better performance than existing clusters at, hopefully, a fraction of the power budget.
The first step then, is to find a system that can glue most of those things together, so we can focus on real global solutions, not individual pieces together. It also means we need to fix all problems in the right places, rather than deferring that to external projects because it’s not our core business. At Linaro we tend to look at computing problems from an holistic point of view, so if things don’t work the way they should, we make it our job to go and fix the problem where it’s most meaningful, finding the appropriate upstream project and submitting pull requests there, then back-porting solutions internally and gluing them together into a proper solution.
And that’s why we came to OpenHPC, a project that aims to facilitate HPC cluster deployment. Essentially, it’s a package repository that glues functional groups using meta-packages, in additional to recipes and documentation on how to setup a cluster, and a lively community that deploys OpenHPC clusters across different architectures, operating systems and provisioning styles.
The recent version of OpenHPC, 1.3.2, has just been released with some additions proposed and tested by Linaro, such as Plasma, Scotch and SLEPc as well as LLVM with Clang and Flang. But while that works well on x86_64 clusters for now, and they have passed all tests on AArch64, installing a new cluster on AArch64 with automatic provisioning still needs some modification. That’s why it’s still in Tech Preview.
For those who don’t know, warewulf is a cluster provisioning service. As such, it is installed in a master node, which will then keep a database of all the compute nodes, resource files, operating system images and everything that is necessary to get the compute nodes up and running. While you can install the nodes by hand, then install a dispatcher (like Slurm) on each node, warewulf makes that process simple: it creates an image of the installation of a node, produces an EFI image for PXE boot and let the nodes discover themselves as they come up alive.
The OpenHPC documentation explain step by step how you can do this (by installing OpenHPC’s own meta-packages and running a few configuration tasks), and if all goes well, every compute node that boots in PXE mode will soon encounter the DHCP server, then the TFTP and will get its disk-less live installation painlessly. That is, of course, if the PXE process work.
While running this on a number of different ARM clusters, I realised that I was getting an error:
ERROR: Could not locate Warewulf's internal pxelinux.0! Things might be broken!
While that doesn’t sound good at all, I learnt that people in the ARM community knew about that all along and were using a Grub hack (to create a simple Grub EFI script to jump start the TFTP download and installation). It’s good that it works in some way, but it’s the kind of thing that should just work, or people won’t even try anything further. Turns out the PXELinux folks haven’t bothered much with AArch64 (examples here and here), so what to do?
One of the great strengths of Linaro is that there are a bunch of maintainers of the core open source projects most people use, so I was bound to find someone that knew what to do, or at least who to ask. As it turns out, two EFI gurus (Leif and Ard) work on my extended team, and we begun discussing alternatives when Eric (Arm Research, also OpenHPC collaborator) tipped us that warewulf would be completely replacing PXELinux for iPXE, in view of his own efforts in getting that working.
After a few contributions and teething issues, I managed to get iPXE booting on ARM clusters as smoothly as I would have done for x86_64.
Even though it works perfectly well, OpenHPC 1.3.2 was release without it.
That’s because it uses an older snapshot of warewulf, while I was using the bleeding edge development branch, which was necessary due to all the fixes we had to do while making it work on ARM.
This has prompted me to replicate their validation build strategy so that I could replace OpenHPC’s own versions of warewulf in-situ, ignoring dependencies, which is not a very user friendly process. And while I have validated OpenHPC (ran the test-suite, all green) with the development branch of warewulf (commit 74ad08d), it is not a documented or even recommended process.
So, despite working better than the official 1.3.2 packages, we’re still in Tech Preview state, and will be until we can pack the development branch of warewulf past that commit into OpenHPC. Given that it has been reported to work on both AArch64 and x86_64, I’m hoping it will be ready for 1.3.3 or 1.4, whatever comes next.