Linaro’s Dev Box

10 years ago, when I joined Arm, I imagined that we’d all be using Arm desktops soon. After a while working there, I realised this wasn’t really in anyone’s plans (at least not transparent to us, mere developers), so I kind of accepted that truth.

But as time passed, and the 64-bit architecture came along and phones really didn’t seem to be benefiting from the bump in address space or integer arithmetic (it was actually worse, power consumption wise), so I begun to realise that my early hopes weren’t so unfounded.

But as I left Arm around 2011, to a high-performance group, I realised how complicated it would be to move all of the x86_64/PPC high-performance computing to Arm, and that planted a seed in my brain that led me to join the HPC SIG at Linaro last year.

But throughout that journey, I realised I still didn’t have what I wanted in the first place: an Arm desktop. I’m not alone in that feeling, by all means. Enthusiasts have been building Beagle/Panda/RaspberryPi “computers” for a long time, and we have had Arm Chromebooks for a while, and even used them in our LLVM CI for 3 good years. But they were either severely under-powered to the point of uselessness, or the OS was by far the restricting factor (eyes ChromeOS).

So, when Martin told me we were going to build a proper system, with PCIe, GB network, DRAM, SATA in a compatible form factor (MicroATX), I was all in. Better still, we had the dream team of Leif/Ard/Graeme looking at the specs and fixing the bugs, so I was fairly confident we would get something decent at the end. And indeed, we have.

In September 2016, Linus Torvalds told David Rusling:

“x86 is still the one I favour most and that is because of the PC. The infrastructure is there there and it is open in a way no other architecture is.”

Well, the new Arm devbox is ATX format, with standard DIMMs, SATA disks (SSD and spinning), GB Ethernet port (and speed), PCIe (x8+x1+x1) and has open bootloaders, kernels and operating systems. I believe we have delivered on the request.

Synquacer Developer Box

Dev box with 1080p monitor, showing Youtube in a browser, 24 cores idling, cpuinfo and lspci outputs as well as some games…

The dev box itself is pretty standard (and that’s awesome!), and you can see the specs for yourself here. We got a few boxes to try out, and we had a few other spare hardware to try it with, so after a week or so we had tried all combinations possible, and apart from a few bugs (that we fixed along the way), everything worked well enough. For more news on the box itself, have a look here and here.  Also, here’s the guide on how to install it. Not unlike other desktops.

Even the look is not unlike other desktops, although as I’ll explain later, I’d prefer if I could buy the board on its own, rather than the whole box.

The good

Building LLVM on 20 or all cores doesn’t seem to push the power consumption that much… The GPU is active on idle, and about 12w are spent on it and about 5w (est.) on inefficient PSU

I tried four GPUs: NVidia GT210, GT710, GTX1050Ti and old AMD (which didn’t work on UEFI for lack of any standard firmware). The box comes with the 710 which (obviously) works out-of-the-box. But so does the 210. The 1050Ti works well on UEFI and framebuffer, but (on Debian at lest), you need to install firmware-misc-nonfree which has to be done either with the 710 on terminal or through serial first, then it works on the next boot.

We tried a large number of DIMMs, with and without ECC, and they all seem to work, up to 16GB. We are limited to 4GB per DIMM, but that’s a firmware issue and we’re fixing it. Will come on the next update. Also, in the subject of firmware updates, no need to get your JTAG probes. On Debian, just do like any other desktop.
$ sudo apt install fwupd
$ sudo fwupdmgr refresh
$ sudo fwupdmgr update
$ sudo reboot

Another nice thing is the HTTP installer. Of course, as expected from a desktop, downloading an ISO from your preferred distro and booting from it works out-of-the-box, but in case you’re lazy and don’t want to dd stuff into a USB stick, we bundled an HTTP install from an ISO “on the cloud”. This is an experimental feature, so salt, pepper and all, but the lesson here is simple: on boot, you’ll be pleasantly redirected to a BIOS screen, with options to boot from whatever device, including HTTP net-inst and USB stick.

Folks manage to run Debian (Stretch and Buster) and Fedora and they all work without issues. Though, for the GTX1050Ti you’ll need Buster, because the Nouveau driver that supports it is 1.0.15, which is not on Stretch. I did a dist-upgrade from Stretch and it worked without incidents. A full install, with desktop environment, Cinnamon, Gnome or LXDE have also worked out-of-the-box.

The box builds GCC, LLVM, Linux and a bunch of other software we put it to do (with more than 4GB of RAM is much easier), and it accepts multiple PCI NICs, so you can also run it as a home server, router, firewall. I haven’t tried 10GBE on that board, but I know those cards work on Arm (on our HPC Lab), so it should work just as well on the Synquacer box. 

The not so bad

Inside my server, 8GB RAM, SSD, GT210 (no need for graphics) and a PCIe NIC.

While a lot works out of the box and that’s a first in consumer Arm boards, not everything works perfectly well and needs a bit of fine tuning. Disregarding the need for more / better hardware in the box (you’ll eventually have to buy more RAM and an SSD), there are a few other things that you may need to fiddle.

For example, while Nouveau works out-of-the-box, it does need the following config in its module to get to full speed (seems specific to older cards):

$ echo 'options nouveau config=NvClkMode=auto' | sudo tee /etc/modprobe.d/nouveau.conf
$ sudo update-initramfs -u

Without this, GPU works perfectly well, but it’s not fast enough. With it, I could play Nexuiz at 30fps on “normal” specs, Armagetron at 40fps with all bells and whistles, and 30fps-capped on minetest, with all options set. SuperTuxKart gives me 40fps on the LEGO level, but only 15 on the “under the sea”, and that’s very likely because of its abuse of transparency.

This is not stellar, of course, but we’re talking nouveau driver, which is known to be less performing than the proprietary NVidia drivers, on a GT710. Those games are the ones we had packages for on Debian/Arm, and they’re not the most optimised, OpenGL-wise, so all in all, not bad numbers after all.

Then there’s the problem of too many CPUs for too little RAM. I keep coming at this point because it’s really important. For a desktop, 4GB is enough. For a server, 8GB is enough. But for a build server (my case), it really isn’t. As compilers get more complicated and as programs get larger, the amount of RAM that is used by the compiler and linker can easily pass 1GB per process. On laptops and desktops with 8 cores and 16GB or RAM, that was never a problem, but when the numbers flip to 24 by 4, then it gets ridiculous.

Even 8GB gave me trouble trying to compile LLVM, because I want a sweet spot between using as many cores as possible and not swapping. This is a trial and error process, and with build times in the scale of hours, it’s a really boring process. And that process is only valid until the code grows or you update the compiler. With 8GB, -j20 seems to be that sweet spot, but I still got 3GB of swap anyway. Each DIMM like the one in the box goes for around £40, so there’s another £120 to make it into a better builder. I’d happily trade the GPU for more RAM.

LLVM builds on just over an hour with -j20 and 2 concurrent link jobs (low RAM), which is acceptable, but not impressive. Most of my problems have been RAM shortage and swapping, so I’ll re-do the test with 16GB and see how it goes, but I’m expecting nothing less than 40min, which is still twice as much as my old i7-4720HQ. It’s 1/3 of the clock, in-order, and it has 3x the number of threads, so I was expecting closer timings. Will update with more info when I’m done.

The ugly

The first thing that comes to mind is that I have to buy the whole package, for a salty price of $1210, with hardware that is, putting it mildly, outdated.

It has a case with a huge plastic cover meant for mean big Intel heat-sinks, of which the Synquacer needs none. It’s also too small for some GPUs and too full of loose parts to be of any easy maintenance. No clips, just screws and hard pushes.

The disk is 1TB, standard WD Blue, which is fine, but honestly, in 2018, I’d expect an SSD. A run-of-the-mill SanDisk 120GB SSD comes at the same price and despite being 1/8 of the total space, I’d have preferred it any day. For not much more you could get a 240GB, which is good enough for almost all desktop uses, especially one that won’t be your main box.

It can cope with 64GB of RAM (albeit, right now, firmware limits it to 16GB), but the box comes with 4GB only. This may seem fine when talking about Intel laptops with 4 cores, but the Synquacer has a whooping 24 of them. Even under-powered (1GHz A53), make -j will create a number of threads that will make the box crawl and die when the linking starts. 8GB would have been the minimum I’d recommend for that hardware.

Finally, the SoC. I have had zero trouble with it. It doesn’t overheat, it doesn’t crash, there are no kernel panics, no sudden errors or incompatibility. It reports all its features and Linux is quite happy with it. But it’s an A53. At 1GHz. I know, there are 24 of them, which is amazing when building software (provided you have enough RAM), but pretty useless as a standard desktop.

When I was using the spinning disk, starting Cinnamon was a pain. At least 15 seconds looking at the screen after login. Then I moved to SSD and it got considerably faster to about 5 seconds. With 8GB of RAM barely used, I blame the CPU. It’s not bad bad, but it’s one of the things that, if we have a slight overclock (even to 1.5GHz), it would have been an improvement.

I understand, power consumption and heating is an issue and the whole board design would have to be re-examined to match, but it’s worth it, in my view. I’d have been happier with half of the cores at twice the clock.

Finally, I’d really like to purchase the board alone, so I can put on the case I already have, with the GPU/disk/RAM I want. To me, it doesn’t make much sense to ship a case and a spinning disk halfway across the world, so I can throw it away and buy new ones.

Conclusion

Given that Linux for Arm has been around for at least a decade, it’s no surprise that it works well on the Synquacer box. The surprise is PCIx x8 working with NVidia cards and running games on open source drivers without crashing. The surprise is that I could connect a large number of DIMMs and GPUs and disks and PCI network without a single glitch.

I have been following the developer team working on the problems I reported early on, and I found a very enthusiastic (and extremely competent) bunch of folks (Linaro, Socionext, Arm), who deserve all the credit for making this a product I would want to buy. Though, I’d actually buy the board, if it came on its own, not the entire box.

It works well as an actual desktop (browser, mail, youtube and what have you), as a build server for Arm (more RAM and there you go) and as a home server (NAS, router, firewall). So, I’m quite happy with the results. The setbacks were far fewer and far less severe than I was expecting, even hoping (and I’m a pessimist), so thumbs up!

Now, just get one of those to Linus Torvalds and Gabe Newell, and we have successfully started the “year of the Arm desktops”.

Trashing Chromebooks

At Linaro, we do lots of toolchain tests: GCC, LLVM, binutils, libraries and so on. Normally, you’d find a fast machine where you could build toolchains and run all the tests, integrated with some dispatch mechanism (like Jenkins). Normally, you’d have a vast choice of hardware to chose from, for each different form-factor (workstation, server, rack mount) and you’d pick the fastest CPUs and a fast SSD disk with space enough for the huge temporary files that toolchain testing produces.

tcwg-rack

The only problem is, there aren’t any ARM rack-servers or workstations. In the ARM world, you either have many cheap development boards, or one very expensive (100x more) professional development board. Servers, workstations and desktops are still non-existent. Some have tried (Calxeda, for ex.) but they have failed. Others are trying with ARMv8 (the new 32/64-bit architecture), but all of them are under heavy development, so not even Alpha quality.

Meanwhile, we need to test the toolchain, and we have been doing it for years, so waiting for a stable ARM server was not an option and still isn’t. A year ago I took the task of finding the most stable development board that is fast enough for toolchain testing and fill a rack with it. Easier said than done.

Choices

Amongst the choices I had, Panda, Beagle, Arndale and Odroid boards were the obvious candidates. After initial testing, it was clear that Beagles, with only 500MB or RAM, were not able to compile anything natively without some major refactoring of the build systems involved. So, while they’re fine for running remote tests (SSH execution), they have very little use for anything else related to toolchain testing.

panda

Pandas, on the other hand, have 1GB or RAM and can compile any toolchain product, but the timing is a bit on the wrong side. Taking 5+ hours to compile a full LLVM+Clang build, a full bootstrap with testing would take a whole day. For background testing on the architecture, it’s fine, but for regression tracking and investigative work, they’re useless.

With the Arndales, we haven’t had such luck. They’re either unstable or deprecated months after release, which makes it really hard to acquire them in any meaningful volumes for contingency and scalability plans. We were left then, with the Odroids.

arndale

HardKernel makes very decent boards, with fast quad-A9 and octa-A15 chips, 2GB of RAM and a big heat sink. Compilation times were in the right ball park (40~80 min) so they’re good for both regression catching and bootstrapping toolchains. But they had the same problem as every other board we tried: instability under heavy load.

Development boards are built for hobby projects and prototyping. They normally can get at very high frequencies (1~2 GHz) and are normally designed for low-power, stand-by usage most of the time. But toolchain testing involves building the whole compiler and running the full test-suite on every commit, and that puts it on 100% CPU usage, 24/7. Since the build times are around an hour or more, by the time that the build finishes, other commits have gone through and need to be tested, making it a non-stop job.

CPUs are designed to scale down the frequency when they get too hot, so throughout the normal testing, they stay stable at their operating temperatures (~60C), and adding a heat sink only makes it go further on frequency and keeping the same temperature, so it won’t solve the temperature problem.

The issue is that, after running for a while (a few hours, days, weeks), the compilation jobs start to fail randomly (the infamous “internal compiler error”) in different places of different files every time. This is clearly not a software problem, but if it were the CPU’s fault, it’d have happened a lot earlier, since it reaches the operating temperature seconds after the test starts, and only fails hours or days after they’re running full time. Also, that same argument rules out any trouble in the power supply, since it should have failed in the beginning, not days later.

The problem that the heat sink doesn’t solve, however, is the board’s overall temperature, which gets quite hot (40C~50C), and has negative effects on other components, like the SD reader and the card itself, or the USB port and the stick itself. Those boards can’t boot from USB, so we must use SD cards for the system, and even using a USB external hard drive with a powered USB hub, we still see the failures, which hints that the SD card is failing under high load and high temperatures.

According to SanDisk, their SD cards should be ok on that temperature range, but other parties might be at play, like the kernel drivers (which aren’t build for that kind of load). What pointed me to the SD card is the first place was that when running solely on the SD card (for system and build directories), the failures appear sooner and more often than when running the builds on a USB stick or drive.

Finally, with the best failure rate at 1/week, none of those boards are able to be build slaves.

Chromebook

That’s when I found the Samsung Chromebook. I had one for personal testing and it was really stable, so amidst all that trouble with the development boards, I decided to give it a go as a buildbot slave, and after weeks running smoothly, I had found what I was looking for.

The main difference between development boards and the Chromebook is that the latter is a product. It was tested not just for its CPU, or memory, but as a whole. Its design evolved with the results of the tests, and it became more stable as it progressed. Also, Linux drivers and the kernel were made to match, fine tuned and crash tested, so that it could be used by the worst kind of users. As a result, after one and a half years running Chromebooks as buildbots, I haven’t been able to make them fail yet.

But that doesn’t mean I have stopped looking for an alternative. Chromebooks are laptops, and as such, they’re build with a completely different mindset to a rack machine, and the number of modifications to make it fit the environment wasn’t short. Rack machines need to boot when powered up, give 100% of its power to the job and distribute heat efficiently under 100% load for very long periods of time. Precisely the opposite of a laptop design.

Even though they don’t fail the jobs, they did give me a lot of trouble, like having to boot manually, overheating the batteries and not having an easy way to set up a Linux image easily deployable via network boot. The steps to fix those issues are listed below.

WARNING: Anything below will void your warranty. You have been warned.

System settings

To get your Chromebook to boot anything other than ChromeOS, you need to enter developer mode. With that, you’ll be able not only to boot from SD or USB, but also change your partition and have sudo access on ChromeOS.

With that, you go to the console (CTRL+ALT+->), login with user chronos (no password) and set the boot process as described on the link above. You’ll also need to set sudo crossystem dev_boot_signed_only=0 to be able to boot anything you want.

The last step is to make your Linux image boot by default, so when you power up your machine it boots Linux, not ChromeOS. Otherwise, you’ll have to press CTRL+U every boot, and remote booting via PDUs will be pointless. You do that via cgpt.

You need to find the partition that boots on your ChromeOS by listing all of them and seeing which one booted successfully:

$ sudo cgpt show /dev/mmcblk0

The right partition will have the information below appended to the output:

Attr: priority=0 tries=5 successful=1

If it had tries, and was successful, this is probably your main partition. Move it back down the priority order (6-th place) by running:

$ sudo cgpt add -i [part] -P 6 -S 1 /dev/mmcblk0

And you can also set the SD card’s part to priority 0 by doing the same thing over mmcblk1

With this, installing a Linux on an SD card might get you booting Linux by default on next boot.

Linux installation

You can chose a few distributions to run on the Chromebooks, but I have tested both Ubuntu and Arch Linux, which work just fine.

Follow those steps and insert the SD card in the slot and boot. You should get the Developer Mode screen and waiting for long enough, it should beep and boot directly on Linux. If it doesn’t, means your cgpt meddling was unsuccessful (been there, done that) and will need a bit more fiddling. You can press CTRL+U for now to boot from the SD card.

After that, you should have complete control of the Chromebook, and I recommend adding your daemons and settings during the boot process (inid.d, systemd, etc). Turn on the network, start the SSD daemon and other services you require (like buildbots). It’s also a good idea to change the governor to performance, but only if you’re going to use it for full time heavy load, and especially if you’re going to run benchmarks. But for the latter, you can do that on demand, and don’t need to leave it on during boot time.

To change the governor:
$ echo [scale] | sudo tee /sys/bus/cpu/devices/cpu[N]/cpufreq/scaling_governor

scale above can be one of performance, conservative, ondemand (default), or any other governor that your kernel supports. If you’re doing before benchmarks, switch to performance and then back to ondemand. Use cpuN as the CPU number (starts on 0) and do it for all CPUs, not just one.

Other interesting scripts are to get the temperatures and frequencies of the CPUs:

$ cat thermal
#!/usr/bin/env bash
ROOT=/sys/devices/virtual/thermal
for dir in $ROOT/*/temp; do
temp=`cat $dir`
temp=`echo $temp/1000 | bc -l | sed 's/0\+$/0/'`
device=`dirname $dir`
device=`basename $device`
echo "$device: $temp C"
done

$ cat freq
#!/usr/bin/env bash
ROOT=/sys/bus/cpu/devices
for dir in $ROOT/*; do
if [ -e $dir/cpufreq/cpuinfo_cur_freq ]; then
freq=`sudo cat $dir/cpufreq/cpuinfo_cur_freq`
freq=`echo $freq/1000000 | bc -l | sed 's/0\+$/0/'`
echo "`basename $dir`: $freq GHz"
fi
done

Hardware changes

batteries

As expected, the hardware was also not ready to behave like a rack server, so some modifications are needed.

The most important thing you have to do is to remove the battery. First, because you won’t be able to boot it remotely with a PDU if you don’t, but more importantly, because the head from constant usage will destroy the battery. Not just as in make it stop working, which we don’t care, but it’ll slowly release gases and bloat the battery, which can be a fire hazard.

To remove the battery, follow the iFixit instructions here.

Another important change is to remove the lid magnet that tells the Chromebook to not boot on power. The iFixit post above doesn’t mention it, bit it’s as simple as prying the monitor bezel open with a sharp knife (no screws), locating the small magnet on the left side and removing it.

Stability

With all these changes, the Chromebook should be stable for years. It’ll be possible to power cycle it remotely (if you have such a unit), boot directly into Linux and start all your services with no human intervention.

The only think you won’t have is serial access to re-flash it remotely if all else fails, as you can with most (all?) rack servers.

Contrary to common sense, the Chromebooks are a lot better as build slaves are any development board I ever tested, and in my view, that’s mainly due to the amount of testing that it has gone through, given that it’s a consumer product. Now I need to test the new Samsung Chromebook 2, since it’s got the new Exynos Octa.

Conclusion

While I’d love to have more options, different CPUs and architectures to test, it seems that the Chromebooks will be the go to machine for the time being. And with all the glory going to ARMv8 servers, we may never see an ARMv7 board to run stably on a rack.

DRM in external drives?

Western Digital thinks that bundling the external hard drive with a crappy software that won’t allow you to share your own videos, music and photos is security.

A friend of mine have this disk, he uses for everything, including legal music bought over the internet and other mp3 (like my band’s songs) without DRM. It works a charm on his Mac and on my Linux, I didn’t even know that drive had restrictions.

It’s quite easy for the newbie geek to avoid DRM (especially if he/she uses Linux or Mac which is common) but the non-geek consumer will probably give up the whole thing and by a new one. If only the hardware industry would just stop and think for a second…

I wonder if the same security consultant WD used is the one behind biometric passwords… or probably they just did the same security course at Microsoft…

UPDATE: a very good article by BBC.