Linaro’s Dev Box

10 years ago, when I joined Arm, I imagined that we’d all be using Arm desktops soon. After a while working there, I realised this wasn’t really in anyone’s plans (at least not transparent to us, mere developers), so I kind of accepted that truth.

But as time passed, and the 64-bit architecture came along and phones really didn’t seem to be benefiting from the bump in address space or integer arithmetic (it was actually worse, power consumption wise), so I begun to realise that my early hopes weren’t so unfounded.

But as I left Arm around 2011, to a high-performance group, I realised how complicated it would be to move all of the x86_64/PPC high-performance computing to Arm, and that planted a seed in my brain that led me to join the HPC SIG at Linaro last year.

But throughout that journey, I realised I still didn’t have what I wanted in the first place: an Arm desktop. I’m not alone in that feeling, by all means. Enthusiasts have been building Beagle/Panda/RaspberryPi “computers” for a long time, and we have had Arm Chromebooks for a while, and even used them in our LLVM CI for 3 good years. But they were either severely under-powered to the point of uselessness, or the OS was by far the restricting factor (eyes ChromeOS).

So, when Martin told me we were going to build a proper system, with PCIe, GB network, DRAM, SATA in a compatible form factor (MicroATX), I was all in. Better still, we had the dream team of Leif/Ard/Graeme looking at the specs and fixing the bugs, so I was fairly confident we would get something decent at the end. And indeed, we have.

In September 2016, Linus Torvalds told David Rusling:

“x86 is still the one I favour most and that is because of the PC. The infrastructure is there there and it is open in a way no other architecture is.”

Well, the new Arm devbox is ATX format, with standard DIMMs, SATA disks (SSD and spinning), GB Ethernet port (and speed), PCIe (x8+x1+x1) and has open bootloaders, kernels and operating systems. I believe we have delivered on the request.

Synquacer Developer Box

Dev box with 1080p monitor, showing Youtube in a browser, 24 cores idling, cpuinfo and lspci outputs as well as some games…

The dev box itself is pretty standard (and that’s awesome!), and you can see the specs for yourself here. We got a few boxes to try out, and we had a few other spare hardware to try it with, so after a week or so we had tried all combinations possible, and apart from a few bugs (that we fixed along the way), everything worked well enough. For more news on the box itself, have a look here and here.  Also, here’s the guide on how to install it. Not unlike other desktops.

Even the look is not unlike other desktops, although as I’ll explain later, I’d prefer if I could buy the board on its own, rather than the whole box.

The good

Building LLVM on 20 or all cores doesn’t seem to push the power consumption that much… The GPU is active on idle, and about 12w are spent on it and about 5w (est.) on inefficient PSU

I tried four GPUs: NVidia GT210, GT710, GTX1050Ti and old AMD (which didn’t work on UEFI for lack of any standard firmware). The box comes with the 710 which (obviously) works out-of-the-box. But so does the 210. The 1050Ti works well on UEFI and framebuffer, but (on Debian at lest), you need to install firmware-misc-nonfree which has to be done either with the 710 on terminal or through serial first, then it works on the next boot.

We tried a large number of DIMMs, with and without ECC, and they all seem to work, up to 16GB. We are limited to 4GB per DIMM, but that’s a firmware issue and we’re fixing it. Will come on the next update. Also, in the subject of firmware updates, no need to get your JTAG probes. On Debian, just do like any other desktop.
$ sudo apt install fwupd
$ sudo fwupdmgr refresh
$ sudo fwupdmgr update
$ sudo reboot

Another nice thing is the HTTP installer. Of course, as expected from a desktop, downloading an ISO from your preferred distro and booting from it works out-of-the-box, but in case you’re lazy and don’t want to dd stuff into a USB stick, we bundled an HTTP install from an ISO “on the cloud”. This is an experimental feature, so salt, pepper and all, but the lesson here is simple: on boot, you’ll be pleasantly redirected to a BIOS screen, with options to boot from whatever device, including HTTP net-inst and USB stick.

Folks manage to run Debian (Stretch and Buster) and Fedora and they all work without issues. Though, for the GTX1050Ti you’ll need Buster, because the Nouveau driver that supports it is 1.0.15, which is not on Stretch. I did a dist-upgrade from Stretch and it worked without incidents. A full install, with desktop environment, Cinnamon, Gnome or LXDE have also worked out-of-the-box.

The box builds GCC, LLVM, Linux and a bunch of other software we put it to do (with more than 4GB of RAM is much easier), and it accepts multiple PCI NICs, so you can also run it as a home server, router, firewall. I haven’t tried 10GBE on that board, but I know those cards work on Arm (on our HPC Lab), so it should work just as well on the Synquacer box. 

The not so bad

Inside my server, 8GB RAM, SSD, GT210 (no need for graphics) and a PCIe NIC.

While a lot works out of the box and that’s a first in consumer Arm boards, not everything works perfectly well and needs a bit of fine tuning. Disregarding the need for more / better hardware in the box (you’ll eventually have to buy more RAM and an SSD), there are a few other things that you may need to fiddle.

For example, while Nouveau works out-of-the-box, it does need the following config in its module to get to full speed (seems specific to older cards):

$ echo 'options nouveau config=NvClkMode=auto' | sudo tee /etc/modprobe.d/nouveau.conf
$ sudo update-initramfs -u

Without this, GPU works perfectly well, but it’s not fast enough. With it, I could play Nexuiz at 30fps on “normal” specs, Armagetron at 40fps with all bells and whistles, and 30fps-capped on minetest, with all options set. SuperTuxKart gives me 40fps on the LEGO level, but only 15 on the “under the sea”, and that’s very likely because of its abuse of transparency.

This is not stellar, of course, but we’re talking nouveau driver, which is known to be less performing than the proprietary NVidia drivers, on a GT710. Those games are the ones we had packages for on Debian/Arm, and they’re not the most optimised, OpenGL-wise, so all in all, not bad numbers after all.

Then there’s the problem of too many CPUs for too little RAM. I keep coming at this point because it’s really important. For a desktop, 4GB is enough. For a server, 8GB is enough. But for a build server (my case), it really isn’t. As compilers get more complicated and as programs get larger, the amount of RAM that is used by the compiler and linker can easily pass 1GB per process. On laptops and desktops with 8 cores and 16GB or RAM, that was never a problem, but when the numbers flip to 24 by 4, then it gets ridiculous.

Even 8GB gave me trouble trying to compile LLVM, because I want a sweet spot between using as many cores as possible and not swapping. This is a trial and error process, and with build times in the scale of hours, it’s a really boring process. And that process is only valid until the code grows or you update the compiler. With 8GB, -j20 seems to be that sweet spot, but I still got 3GB of swap anyway. Each DIMM like the one in the box goes for around £40, so there’s another £120 to make it into a better builder. I’d happily trade the GPU for more RAM.

LLVM builds on just over an hour with -j20 and 2 concurrent link jobs (low RAM), which is acceptable, but not impressive. Most of my problems have been RAM shortage and swapping, so I’ll re-do the test with 16GB and see how it goes, but I’m expecting nothing less than 40min, which is still twice as much as my old i7-4720HQ. It’s 1/3 of the clock, in-order, and it has 3x the number of threads, so I was expecting closer timings. Will update with more info when I’m done.

The ugly

The first thing that comes to mind is that I have to buy the whole package, for a salty price of $1210, with hardware that is, putting it mildly, outdated.

It has a case with a huge plastic cover meant for mean big Intel heat-sinks, of which the Synquacer needs none. It’s also too small for some GPUs and too full of loose parts to be of any easy maintenance. No clips, just screws and hard pushes.

The disk is 1TB, standard WD Blue, which is fine, but honestly, in 2018, I’d expect an SSD. A run-of-the-mill SanDisk 120GB SSD comes at the same price and despite being 1/8 of the total space, I’d have preferred it any day. For not much more you could get a 240GB, which is good enough for almost all desktop uses, especially one that won’t be your main box.

It can cope with 64GB of RAM (albeit, right now, firmware limits it to 16GB), but the box comes with 4GB only. This may seem fine when talking about Intel laptops with 4 cores, but the Synquacer has a whooping 24 of them. Even under-powered (1GHz A53), make -j will create a number of threads that will make the box crawl and die when the linking starts. 8GB would have been the minimum I’d recommend for that hardware.

Finally, the SoC. I have had zero trouble with it. It doesn’t overheat, it doesn’t crash, there are no kernel panics, no sudden errors or incompatibility. It reports all its features and Linux is quite happy with it. But it’s an A53. At 1GHz. I know, there are 24 of them, which is amazing when building software (provided you have enough RAM), but pretty useless as a standard desktop.

When I was using the spinning disk, starting Cinnamon was a pain. At least 15 seconds looking at the screen after login. Then I moved to SSD and it got considerably faster to about 5 seconds. With 8GB of RAM barely used, I blame the CPU. It’s not bad bad, but it’s one of the things that, if we have a slight overclock (even to 1.5GHz), it would have been an improvement.

I understand, power consumption and heating is an issue and the whole board design would have to be re-examined to match, but it’s worth it, in my view. I’d have been happier with half of the cores at twice the clock.

Finally, I’d really like to purchase the board alone, so I can put on the case I already have, with the GPU/disk/RAM I want. To me, it doesn’t make much sense to ship a case and a spinning disk halfway across the world, so I can throw it away and buy new ones.

Conclusion

Given that Linux for Arm has been around for at least a decade, it’s no surprise that it works well on the Synquacer box. The surprise is PCIx x8 working with NVidia cards and running games on open source drivers without crashing. The surprise is that I could connect a large number of DIMMs and GPUs and disks and PCI network without a single glitch.

I have been following the developer team working on the problems I reported early on, and I found a very enthusiastic (and extremely competent) bunch of folks (Linaro, Socionext, Arm), who deserve all the credit for making this a product I would want to buy. Though, I’d actually buy the board, if it came on its own, not the entire box.

It works well as an actual desktop (browser, mail, youtube and what have you), as a build server for Arm (more RAM and there you go) and as a home server (NAS, router, firewall). So, I’m quite happy with the results. The setbacks were far fewer and far less severe than I was expecting, even hoping (and I’m a pessimist), so thumbs up!

Now, just get one of those to Linus Torvalds and Gabe Newell, and we have successfully started the “year of the Arm desktops”.

FreeCell puzzles solver API

This is a little pet project I did a while ago. It’s a FreeCell puzzle‘s solver API.

The idea is to provide a basic validation engine and board management (pretty much like my old chess validation), so people can write FreeCell solvers on top of it. It has basic board setup (of multiple sizes), movement legalisation, and a basic Solver class, which you must derive to create your own solvers.

There’s even a BruteFroceSolver that can solve a few small boards, and that gives you an idea on how to create your own solvers. However, the API is not clear enough yet that children could start playing with it, and that’s the real goal of this project: to get kids interested in solving complex optimisation problems in an easy way.

Freecell is a perfect game for it. Most boards can be solved (only a handful of them were proven – by exhaustion – not solvable), some movements can be rolled back and you can freely re-use cards that have already been placed into the foundations (their final destination) back in the game again.

It’s out of the scope of this project to produce a full-featured graphic interface for kids, but making the API easy enough so they understand the concepts without dragging themselves into idiosyncrasies of C++ is important.

Compiler optimisations

The reason why I did this was to make some of the optimisations compiler engineers have to do more appealing to non-compiler engineers or children with a taste for complex problems. But what does this have to do with compilers? The analogy is a bit far-fetching and somewhat reverse, but it’s interesting nevertheless and it was worth the shot.

Programming languages are transformed into graphs inside the compiler, which should represent the intentions of the original programmer. This graphs are often optimised multiple times until you end up with a stream of instructions representing the source code in the machine language.

Ignore for now the optimisations on those graphs, and focus on the final part: selecting machine instructions that can represent that final graph in the machine language (assembly). This selection can pick any assembly instruction at will, but it has to put them into a very specific order to represent the same semantics (not just syntactic) of the original program. Since many instructions have side effects, pipeline interactions, special flags set or cleaned up, it’s not trivial to produce correct code if you don’t check and re-check all conditions every time. This is a known complex optimisation problem and can be responsible for changes in speed or code size in orders of magnitude.

What does it have to do with the Freecell puzzle? Well, in Freecell, you have a number of cards and you have to put them in a specific order, just like assembly instructions. But in this case, the analogy is reverse: the “sequence” is trivial, but the “instructions” are hard to get.

There are other similarities. For example, you have four free cells, and they can only hold one value at a time. They are similar to registers, and manipulating them, gives you a good taste of how hard it is to do register scheduling when building the assembly result. But in this case, it’s much harder to spill (move the data back to memory, or in this case, card back to cascades), since there are strict rules on how to move cards around.

Reusing cards from the foundations is similar to expanding single instructions into a sequence of them in order to circumvent pipeline stalls. In real compilers you could expand a multiply+add (very useful for digital signal processing) into two instructions: multiply and add, if that gives you some advantage on special cases on special chips. In Freecell, you can use a 9 on top of a 10, to move an 8 from another cascade and free up a card that you need to clean up your freecells (registers).

I’m sure you can find many more similarities, even if you have to skew the rules a bit (or completely reverse them), but that’s not the point. The point is to interest people into complex optimisation techniques without the hassle of learning a whole new section of computer science, especially if that section puts fear in most people in the first place.

Humble Bundle

I’m not the one to normally do reviews or ads, but this is one well worth doing. Humble bundle is an initiative hosted by Wolfire studio, in which five other studios (2D Boy, Bit Blot, Cryptic Sea, Frictional Games and the recently joined Amanita Design) joined their award-winning indie games into a bundle with two charities (EFF and Child’s Play) that you can pay whatever you want, to be shared amongst them.

All games work on Linux and Mac (as well as Windows), are of excellent quality (I loved them) and separately would cost around 80 bucks. The average buy price for the bundle is around $8.50, but some people have paid $1000 already. Funny, though, that now they’re separating the average per platform, and Linux users pay, on average, $14 while Windows users pay $7, with Mac in between. A clear message to professional game studios out there, isn’t it?

About the games, they’re the type that are always fun to play and don’t try to be more than they should. There are no state-of-the-art 3D graphics, blood, bullets and zillions of details, but they’re solid, consistent and plain fun. I already had World of Goo (from 2D Boy) and loved it. All the rest I discovered with the bundle and I have to say that I was not expecting them to be that good. The only bad news is that you have only one more day to buy them, so hurry, get your bundle now while it’s still available.

The games

World of Goo: Maybe the most famous of all, it’s even available for Wii. It’s addictive and family friendly, has many tricks and very clever levels to play. It’s a very simple concept, balls stick to other balls and you have to reach the pipe to save them. But what they’ve done with that simple concept was a powerful and very clever combination of physical properties that give the game an extra challenge. What most impressed me was the way physics was embedded in the game. Things have weight and momentum, sticks break if the momentum is too great, some balls weight less than air and float, while others burn in contact with fire. A masterpiece.

Aquaria: I thought this would be the least interesting of all, but I was wrong. Very wrong. The graphics and music are very nice and the physics of the game is well built, but the way the game builds up is the best. It’s a mix of Ecco with Loom, where you’re a sea creature (mermaid?) and have to sing songs to get powers or to interact with the game. The more you play, the more you discover new things and the more powerful you become. Really clever and a bit more addictive than I was waiting for… 😉

Gish: You are a tar ball (not the Unix tar, though) and have to go through tunnels with dangers to find your tar girl (?). The story is stupid, but the game is fun. You can be slippery or sticky to interact with the maze and some elements that have simple physics, which add some fun. There are also some enemies to make it more difficult. Sometimes it’s a bit annoying, when it depends more on luck (if you get the timing of many things right in a row) than actually logic or skill. The save style is also not the best, I was on the fourth level and asked for a reset (to restart the fourth level again), but it reset the whole thing and sent me to the first level, which I’m not playing again. The music is great, though.

Lugaru HD: A 3D Lara Croft bloody kung-fu bunny style. The background story is more for necessity of having one than actually relevant. The idea is to go on skirmishing, cutting jugulars, sneaking and knocking down characters in the game as you go along. The 3D graphics are not particularly impressive and the camera is not innovative, but the game has some charm for those that like a fight for the sake of fights. Funny.

Penumbra: If you like being scared, this is your game. It’s rated 16+ and you can see very little while playing. But you can hear things growling, your own heart beating and the best part is when you see something that scares the hell out of you and you despair and give away your hide out. The graphics are good, simple but well cared for. The effects (blurs, fades, night vision, fear) are very well done and in sync with the game and story. The interface is pretty simple and impressively easy, making the game much more fun than the traditional FPS I’ve played so far. The best part is, you don’t fight, you hide and run. It remembers me Thief, where fighting is the last thing you want to do, but with the difference is that in Thief, you could, in this one, you’re a puss. If you fight, you’ll most likely die.

Samorost 2: It’s a flash game, that’s all I know. Flash is not particularly stable on any platform and Linux is especially unstable, so I couldn’t make it run in the first attempt. For me, and most gamers I know, a game has to work. This is why it’s so hard to play early open source games, because you’re looking for a few minutes of fun and not actually fiddling with your system. I have spent more time writing this paragraph than trying to play Samorost and I will only try it again if I upgrade my Linux (in hoping the Flash problem will go away by itself). Pity.

Well, that’s it. Go and get your humble bundle that it’s well worth, plus you help some other people in the process. Helping indie studios is very important for me. First, it levels the play-field and help them grow. Second, they tend to be much more platform independent, and decent games for Linux are scarce. Last, they tend to have the best ideas. Most game studios license one or two game engines and create dozens of similar games with that, in hope to get more value for their money. Also, they tend to stick with the current ideas that sell, instead of innovating.

By buying the bundle you are, at the very least, helping to have better games in the future.