At Linaro, we do lots of toolchain tests: GCC, LLVM, binutils, libraries and so on. Normally, you’d find a fast machine where you could build toolchains and run all the tests, integrated with some dispatch mechanism (like Jenkins). Normally, you’d have a vast choice of hardware to chose from, for each different form-factor (workstation, server, rack mount) and you’d pick the fastest CPUs and a fast SSD disk with space enough for the huge temporary files that toolchain testing produces.
The only problem is, there aren’t any ARM rack-servers or workstations. In the ARM world, you either have many cheap development boards, or one very expensive (100x more) professional development board. Servers, workstations and desktops are still non-existent. Some have tried (Calxeda, for ex.) but they have failed. Others are trying with ARMv8 (the new 32/64-bit architecture), but all of them are under heavy development, so not even Alpha quality.
Meanwhile, we need to test the toolchain, and we have been doing it for years, so waiting for a stable ARM server was not an option and still isn’t. A year ago I took the task of finding the most stable development board that is fast enough for toolchain testing and fill a rack with it. Easier said than done.
Amongst the choices I had, Panda, Beagle, Arndale and Odroid boards were the obvious candidates. After initial testing, it was clear that Beagles, with only 500MB or RAM, were not able to compile anything natively without some major refactoring of the build systems involved. So, while they’re fine for running remote tests (SSH execution), they have very little use for anything else related to toolchain testing.
Pandas, on the other hand, have 1GB or RAM and can compile any toolchain product, but the timing is a bit on the wrong side. Taking 5+ hours to compile a full LLVM+Clang build, a full bootstrap with testing would take a whole day. For background testing on the architecture, it’s fine, but for regression tracking and investigative work, they’re useless.
With the Arndales, we haven’t had such luck. They’re either unstable or deprecated months after release, which makes it really hard to acquire them in any meaningful volumes for contingency and scalability plans. We were left then, with the Odroids.
HardKernel makes very decent boards, with fast quad-A9 and octa-A15 chips, 2GB of RAM and a big heat sink. Compilation times were in the right ball park (40~80 min) so they’re good for both regression catching and bootstrapping toolchains. But they had the same problem as every other board we tried: instability under heavy load.
Development boards are built for hobby projects and prototyping. They normally can get at very high frequencies (1~2 GHz) and are normally designed for low-power, stand-by usage most of the time. But toolchain testing involves building the whole compiler and running the full test-suite on every commit, and that puts it on 100% CPU usage, 24/7. Since the build times are around an hour or more, by the time that the build finishes, other commits have gone through and need to be tested, making it a non-stop job.
CPUs are designed to scale down the frequency when they get too hot, so throughout the normal testing, they stay stable at their operating temperatures (~60C), and adding a heat sink only makes it go further on frequency and keeping the same temperature, so it won’t solve the temperature problem.
The issue is that, after running for a while (a few hours, days, weeks), the compilation jobs start to fail randomly (the infamous “internal compiler error”) in different places of different files every time. This is clearly not a software problem, but if it were the CPU’s fault, it’d have happened a lot earlier, since it reaches the operating temperature seconds after the test starts, and only fails hours or days after they’re running full time. Also, that same argument rules out any trouble in the power supply, since it should have failed in the beginning, not days later.
The problem that the heat sink doesn’t solve, however, is the board’s overall temperature, which gets quite hot (40C~50C), and has negative effects on other components, like the SD reader and the card itself, or the USB port and the stick itself. Those boards can’t boot from USB, so we must use SD cards for the system, and even using a USB external hard drive with a powered USB hub, we still see the failures, which hints that the SD card is failing under high load and high temperatures.
According to SanDisk, their SD cards should be ok on that temperature range, but other parties might be at play, like the kernel drivers (which aren’t build for that kind of load). What pointed me to the SD card is the first place was that when running solely on the SD card (for system and build directories), the failures appear sooner and more often than when running the builds on a USB stick or drive.
Finally, with the best failure rate at 1/week, none of those boards are able to be build slaves.
That’s when I found the Samsung Chromebook. I had one for personal testing and it was really stable, so amidst all that trouble with the development boards, I decided to give it a go as a buildbot slave, and after weeks running smoothly, I had found what I was looking for.
The main difference between development boards and the Chromebook is that the latter is a product. It was tested not just for its CPU, or memory, but as a whole. Its design evolved with the results of the tests, and it became more stable as it progressed. Also, Linux drivers and the kernel were made to match, fine tuned and crash tested, so that it could be used by the worst kind of users. As a result, after one and a half years running Chromebooks as buildbots, I haven’t been able to make them fail yet.
But that doesn’t mean I have stopped looking for an alternative. Chromebooks are laptops, and as such, they’re build with a completely different mindset to a rack machine, and the number of modifications to make it fit the environment wasn’t short. Rack machines need to boot when powered up, give 100% of its power to the job and distribute heat efficiently under 100% load for very long periods of time. Precisely the opposite of a laptop design.
Even though they don’t fail the jobs, they did give me a lot of trouble, like having to boot manually, overheating the batteries and not having an easy way to set up a Linux image easily deployable via network boot. The steps to fix those issues are listed below.
WARNING: Anything below will void your warranty. You have been warned.
To get your Chromebook to boot anything other than ChromeOS, you need to enter developer mode. With that, you’ll be able not only to boot from SD or USB, but also change your partition and have
sudo access on ChromeOS.
With that, you go to the console (CTRL+ALT+->), login with user
chronos (no password) and set the boot process as described on the link above. You’ll also need to set
sudo crossystem dev_boot_signed_only=0 to be able to boot anything you want.
The last step is to make your Linux image boot by default, so when you power up your machine it boots Linux, not ChromeOS. Otherwise, you’ll have to press CTRL+U every boot, and remote booting via PDUs will be pointless. You do that via
You need to find the partition that boots on your ChromeOS by listing all of them and seeing which one booted successfully:
$ sudo cgpt show /dev/mmcblk0
The right partition will have the information below appended to the output:
Attr: priority=0 tries=5 successful=1
If it had tries, and was successful, this is probably your main partition. Move it back down the priority order (6-th place) by running:
$ sudo cgpt add -i [part] -P 6 -S 1 /dev/mmcblk0
And you can also set the SD card’s part to priority 0 by doing the same thing over
With this, installing a Linux on an SD card might get you booting Linux by default on next boot.
Follow those steps and insert the SD card in the slot and boot. You should get the Developer Mode screen and waiting for long enough, it should beep and boot directly on Linux. If it doesn’t, means your
cgpt meddling was unsuccessful (been there, done that) and will need a bit more fiddling. You can press CTRL+U for now to boot from the SD card.
After that, you should have complete control of the Chromebook, and I recommend adding your daemons and settings during the boot process (inid.d, systemd, etc). Turn on the network, start the SSD daemon and other services you require (like buildbots). It’s also a good idea to change the governor to
performance, but only if you’re going to use it for full time heavy load, and especially if you’re going to run benchmarks. But for the latter, you can do that on demand, and don’t need to leave it on during boot time.
To change the governor:
$ echo [scale] | sudo tee /sys/bus/cpu/devices/cpu[N]/cpufreq/scaling_governor
scale above can be one of performance, conservative, ondemand (default), or any other governor that your kernel supports. If you’re doing before benchmarks, switch to performance and then back to ondemand. Use cpuN as the CPU number (starts on 0) and do it for all CPUs, not just one.
Other interesting scripts are to get the temperatures and frequencies of the CPUs:
$ cat thermal
for dir in $ROOT/*/temp; do
temp=`echo $temp/1000 | bc -l | sed 's/0\+$/0/'`
echo "$device: $temp C"
$ cat freq
for dir in $ROOT/*; do
if [ -e $dir/cpufreq/cpuinfo_cur_freq ]; then
freq=`sudo cat $dir/cpufreq/cpuinfo_cur_freq`
freq=`echo $freq/1000000 | bc -l | sed 's/0\+$/0/'`
echo "`basename $dir`: $freq GHz"
As expected, the hardware was also not ready to behave like a rack server, so some modifications are needed.
The most important thing you have to do is to remove the battery. First, because you won’t be able to boot it remotely with a PDU if you don’t, but more importantly, because the head from constant usage will destroy the battery. Not just as in make it stop working, which we don’t care, but it’ll slowly release gases and bloat the battery, which can be a fire hazard.
To remove the battery, follow the iFixit instructions here.
Another important change is to remove the lid magnet that tells the Chromebook to not boot on power. The iFixit post above doesn’t mention it, bit it’s as simple as prying the monitor bezel open with a sharp knife (no screws), locating the small magnet on the left side and removing it.
With all these changes, the Chromebook should be stable for years. It’ll be possible to power cycle it remotely (if you have such a unit), boot directly into Linux and start all your services with no human intervention.
The only think you won’t have is serial access to re-flash it remotely if all else fails, as you can with most (all?) rack servers.
Contrary to common sense, the Chromebooks are a lot better as build slaves are any development board I ever tested, and in my view, that’s mainly due to the amount of testing that it has gone through, given that it’s a consumer product. Now I need to test the new Samsung Chromebook 2, since it’s got the new Exynos Octa.
While I’d love to have more options, different CPUs and architectures to test, it seems that the Chromebooks will be the go to machine for the time being. And with all the glory going to ARMv8 servers, we may never see an ARMv7 board to run stably on a rack.