The LLVM compilation infrastructure

I’ve been playing with LLVM (Low-Level Virtual Machine) lately and have produced a simple compiler for a simple language.

The LLVM compilation infrastructure (much more than a simple compiler or virtual machine), is a collection of libraries, methods and programs that allows one to create a simple, robust and very powerful compilers, virtual machines and run-time optimizations.

As GCC, it’s roughly separated into three layers: the front-end, which parses the files and produce intermediate representation (IR), the independent optimization layer, which acts on the language-independent IR and the back-end, which turns the IR into something executable.

The main difference is that, unlike GCC, LLVM is extremely generic. While GCC is struggling to fit broader languages inside the strongly C-oriented IR, LLVM was created with a very extensible IR, with a lot of information on how to represent a plethora of languages (procedural, object-oriented, functional etcetera). This IR also carries information about possible optimizations, like GCC’s but to a deeper level.

Another very important difference is that, in the back-end, not only code generators to many platforms are available, but Just-In-Time compilers (somewhat like JavaScript), so you can run, change, re-compile and run again, without even quitting your program.

The middle-layer is where the generic optimizations are done on the IR, so it’s language-independent (as all languages wil convert to IR). But that doesn’t mean that optimizations are done only on that step. All first-class compilers have strong optimizations from the time it opens the file until it finishes writing the binary.

Parser optimizations normally include useless code removal, constant expression folding, among others, while the most important optimizations on the back-end involve instruction replacement, aggressive register allocation and abuse of hardware features (such as special registers and caches).

But the LLVM goes beyond that, it optimizes during run-time, even after the program is installed on the user machine. LLVM holds information (and the IR) together with the binary. When the program is executed, it profiles automatically and, when the computer is idle, it optimizes the code and re-compile it. This optimization is per-user and means that two copies of the same software will be quite different from each other, depending on the user’s use of it. Chris Lattner‘s paper about it is very enlightening.

There are quite a few very important people and projects already using LLVM, and although there is still a lot of work to do, the project is mature enough to be used in production environments or even completely replace other solutions.

If you are interested in compilers, I suggest you take a look on their website… It’s, at least, mind opening.

40 years and full of steam

Unix is turning 40 and BBC folks wrote a small article about it. What a joy to remember when I started using Unix (AIX on an IBM machine) around 1994 and was overwhelmed by it.

By that time, the only Unix that ran well on a PC was SCO and that was a fortune, but there were some others, not as mature, that would have the same concepts. FreeBSD and Linux were the two that came into my sight, but I have chosen Linux for it was a bit more popular (therefore could get more help).

The first versions I’ve installed didn’t even had a X server and I have to say that I was happier than using Windows. Partially because of all the open-source-free-software-good-for-mankind thing, but mostly because Unix has a power that is utterly ignored by other operating systems. It’s so, that Microsoft used good bits from FreeBSD (that allows it via their license) and Apple re-wrote its graphical environment to FreeBSD and made the OS X. The GNU folks certainly helped my mood, as I could find all power tools on Linux that I had on AIX, most of the time even more powerful.

The graphical interface was lame, I have to say. But in a way it was good, it reminded me of the same interface I used on the Irix (SGI’s Unix) and that was ok. With time, it got better and better and in 1999 I was working with and using it at home full time.

The funny thing is that now, I can’t use other operating systems for too long, as I start missing some functionalities and will eventually get locked, or at least, extremely limited. The Mac OS is said to be nice and tidy, and with a full FreeBSD inside, but I still lacked agility on it, mainly due to search and installation of packages and configuration of the system.

I suppose each OS is for a different type of person… Unix is for the ones that like to fine-tune their machines or those that need the power of it (servers as well) and Mac OS is for those that need something simple, with the biggest change as the background colour. As for the rest, I fail to see a point, really.