Emergent behaviour

There is a lot of attention to emergent behaviour nowadays (ex. here, here, here and here), but it’s still on the outskirts of science and computing.


For millennia, science has isolated each single behaviour of a system (or system of systems) to study it in detail, than join them together to grasp the bigger picture. The problem is that, this approximation can only be done with simple systems, such as the ones studied by Aristotle, Newton and Ampere. Every time scientists were approaching the edges of their theories (including those three), they just left as an exercise to the reader.

Newton has foreseen relativity and the possible lack of continuity in space and time, but he has done nothing to address that. Fair enough, his work was much more important to science than venturing throughout the unknowns of science, that would be almost mystical of him to try (although, he was an alchemist). But more and more, scientific progress seems to be blocked by chaos theory, where you either unwind the knots or go back to alchemy.

Chaos theory exists for more than a century, but it was only recently that it has been applied to anything outside differential equations. The hyper-sensibility of the initial conditions is clear on differential systems, but other systems have a less visible, but no less important, sensibility. We just don’t see it well enough, since most of the other systems are not as well formulated as differential equations (thanks to Newton and Leibniz).

Neurology and the quest for artificial intelligence has risen a strong interest in chaos theory and fractal systems. The development in neural networks has shown that groups and networks also have a fundamental chaotic nature, but more importantly, that it’s only though the chaotic nature of those systems that you can get a good amount of information from it. Quantum mechanics had the same evolution, with Heisenberg and Schroedinger kicking the ball first on the oddities of the universe and how important is the lack of knowledge of a system to be able to extract information from it (think of Schroedinger’s cat).

A network with direct and fixed thresholds doesn’t learn. Particles with known positions and velocities don’t compute. N-body systems with definite trajectories don’t exist.

The genetic code has some similarities to these models. Living beings have far more junk than genes in their chromosomes (reaching 98% of junk on human genome), but changes in the junk parts can often lead to invalid creatures. If junk within genes (introns) gets modified, the actual code (exons) could be split differently, leading to a completely new, dysfunctional, protein. Or, if you add start sequences (TATA-boxes) to non-coding region, some of them will be transcribed into whatever protein they could make, creating rubbish within cells, consuming resources or eventually killing the host.

But most of the non-coding DNA is also highly susceptible to changes, and that’s probably its most important function, adapted to the specific mutation rates of our planet and our defence mechanism against such mutations. For billions of years, the living beings on Earth have adapted that code. Each of us has a super-computer that can choose, by design, the best ratios for a giving scenario within a few generations, and create a whole new species or keep the current one adapted, depending on what’s more beneficial.

But not everyone is that patient…


Sadly, in my profession, chaos plays an important part, too.

As programs grow old, and programmers move on, a good part of the code becomes stale, creating dependencies that are hard to find, harder to fix. In that sense, programs are pretty much like the genetic code, the amount of junk increases over time, and that gives the program resistance against changes. The main problem with computing, that is not clear in genetics, is that the code that stays behind, is normally the code that no one wants to touch, thus, the ugliest and most problematic.

DNA transcriptors don’t care where the genes are, they find a start sequence and go on with their lives. Programmers, we believe, have free will and that gives them the right to choose where to apply a change. They can either work around the problem, making the code even uglier, or they can go on and try to fix the problem in the first place.

Non-programmers would quickly state that only lazy programmers would do the former, but more experienced ones will admit have done so on numerous occasions for different reasons. Good programmers would do that because fixing the real problem is so painful to so many other systems that it’s best to be left alone, and replace that part in the future (only they never will). Bad programmers are not just lazy, some of them really believe that’s the right thing to do (I met many like this), and that adds some more chaos into the game.

It’s not uncommon to try to fix a small problem, go more than half-way through and hit a small glitch on a separate system. A glitch that you quickly identify as being wrongly designed, so you, as any good programmer would do, re-design it and implement the new design, which is already much bigger than the fix itself. All tests pass, except the one, that shows you another glitch, raised by your different design. This can go on indefinitely.

Some changes are better done in packs, all together, to make sure all designs are consistent and the program behaves as it should, not necessarily as the tests say it would. But that’s not only too big for one person at one time, it’s practically impossible when other people are changing the program under your feet, releasing customer versions and changing the design themselves. There is a point where a refactoring is not only hard, but also a bad design choice.

And that’s when code become introns, and are seldom removed.


The power of networks is rising, slower than expected, though. For decades, people know about synergy, chaos and emergent behaviour, but it was only recently, with the quality and amount of information on global social interaction, that those topics are rising again in the main picture.

Twitter, Facebook and the like have risen so many questions about human behaviour, and a lot of research has been done to address those questions and, to a certain extent, answer them. Psychologists and social scientists knew for centuries that social interaction is greater than the sum of all parts, but now we have the tools and the data to prove it once and for all.

Computing clusters have being applied to most of the hard scientific problems for half a century (weather prediction, earthquake simulation, exhaustion proofs in graph theory). They also took on a commercial side with MapReduce and similar mechanisms that have popularised the distributed architectures, but that’s only the beginning.

On distributed systems of today, emergent behaviour is treated as a bug, that has to be eradicated. In the exact science of computing, locks and updates have to occur in the precise order they were programmed to, to yield the exact result one is expecting. No more, no less.

But to keep our system out of emergent behaviours, we invariably introduce emergent behaviour in our code. Multiple checks on locks and variables, different design choices for different components that have to work together and the expectancy of precise results or nothing, makes the number of lines of code grow exponentially. And, since that has to run fast, even more lines and design choices are added to avoid one extra operation inside a very busy loop.

While all this is justifiable, it’s not sustainable. In the long run (think decades), the code will be replaced or the product will be discontinued, but there is a limit to which a program can receive additional lines without loosing some others. And the cost of refactoring increases with the lifetime of a product. This is why old products don’t get too many updates, not because they’re good enough already, but because it’s impossible to add new features without breaking a lot others.

Distant future

As much as I like emergent behaviour, I can’t begin to fathom how to harness that power. Stochastic computing is one way and has been done with certain level of success here and here, but it’s far from easy to create a general logic behind it.

Unlike Turing machines, emergent behaviour comes from multiple sources, dressed in multiple disguises and producing far too much variety in results that can be accounted in one theory. It’s similar to string theory, where there are several variations of it, but only one M theory, the one that joins them all together. The problem is, nobody knows how this M theory looks like. Well, they barely know how the different versions of string theory look like, anyway.

In that sense, emergent theory is even further than string theory to be understood in its entirety. But I strongly believe that this is one way out of the conundrum we live today, where adding more features makes harder to add more features (like mass on relativistic speeds).

With stochastic computing there is no need of locks, since all that matter is the probability of an outcome, and where precise values do not make sense. There is also no need for NxM combination of modules and special checks, since the power is not in the computation themselves, but in the meta-computation, done by the state of the network, rather than its specific components.

But that, I’m afraid, I won’t see in my lifetime.

FreeCell puzzles solver API

This is a little pet project I did a while ago. It’s a FreeCell puzzle‘s solver API.

The idea is to provide a basic validation engine and board management (pretty much like my old chess validation), so people can write FreeCell solvers on top of it. It has basic board setup (of multiple sizes), movement legalisation, and a basic Solver class, which you must derive to create your own solvers.

There’s even a BruteFroceSolver that can solve a few small boards, and that gives you an idea on how to create your own solvers. However, the API is not clear enough yet that children could start playing with it, and that’s the real goal of this project: to get kids interested in solving complex optimisation problems in an easy way.

Freecell is a perfect game for it. Most boards can be solved (only a handful of them were proven – by exhaustion – not solvable), some movements can be rolled back and you can freely re-use cards that have already been placed into the foundations (their final destination) back in the game again.

It’s out of the scope of this project to produce a full-featured graphic interface for kids, but making the API easy enough so they understand the concepts without dragging themselves into idiosyncrasies of C++ is important.

Compiler optimisations

The reason why I did this was to make some of the optimisations compiler engineers have to do more appealing to non-compiler engineers or children with a taste for complex problems. But what does this have to do with compilers? The analogy is a bit far-fetching and somewhat reverse, but it’s interesting nevertheless and it was worth the shot.

Programming languages are transformed into graphs inside the compiler, which should represent the intentions of the original programmer. This graphs are often optimised multiple times until you end up with a stream of instructions representing the source code in the machine language.

Ignore for now the optimisations on those graphs, and focus on the final part: selecting machine instructions that can represent that final graph in the machine language (assembly). This selection can pick any assembly instruction at will, but it has to put them into a very specific order to represent the same semantics (not just syntactic) of the original program. Since many instructions have side effects, pipeline interactions, special flags set or cleaned up, it’s not trivial to produce correct code if you don’t check and re-check all conditions every time. This is a known complex optimisation problem and can be responsible for changes in speed or code size in orders of magnitude.

What does it have to do with the Freecell puzzle? Well, in Freecell, you have a number of cards and you have to put them in a specific order, just like assembly instructions. But in this case, the analogy is reverse: the “sequence” is trivial, but the “instructions” are hard to get.

There are other similarities. For example, you have four free cells, and they can only hold one value at a time. They are similar to registers, and manipulating them, gives you a good taste of how hard it is to do register scheduling when building the assembly result. But in this case, it’s much harder to spill (move the data back to memory, or in this case, card back to cascades), since there are strict rules on how to move cards around.

Reusing cards from the foundations is similar to expanding single instructions into a sequence of them in order to circumvent pipeline stalls. In real compilers you could expand a multiply+add (very useful for digital signal processing) into two instructions: multiply and add, if that gives you some advantage on special cases on special chips. In Freecell, you can use a 9 on top of a 10, to move an 8 from another cascade and free up a card that you need to clean up your freecells (registers).

I’m sure you can find many more similarities, even if you have to skew the rules a bit (or completely reverse them), but that’s not the point. The point is to interest people into complex optimisation techniques without the hassle of learning a whole new section of computer science, especially if that section puts fear in most people in the first place.

2011 European User Group Meeting

For all LLVM users in Europe that didn’t catch the announcement on the main list, be warned that we’re organising the European LLVM User Group Meeting in September, 16th, in London. All information necessary to register and submit your work is on the link.

It’s free of charge and we’ll also provide a happy-hour after the meeting for an extended networking. If you’re just curious about LLVM or work with it day to day or is thinking of starting your PhD with LLVM, please register and show up. You won’t regret. 😉

More information will be available on the link above and through the main mailing list. Stay tuned!

Task Driven Computing

Ever since Moore’s idea became a law (by providence), and empires were built upon this law, little has been thought about the need for such advancements. Raw power is considered to many the only real benchmark to what a machine can be compared to others. Cars, computers and toasters are all alike in those matters, and are only as good as their raw throughput (real or not).

With the carbon footprint disaster, some people began to realise (not for the correct reasons) that maybe we don’t actually need all that power to be happy. Electric cars, low-powered computers and smart-appliances are now appealing to the final consumer and, for good or bad, things are changing. The rocketing growth of the mobile market (smartphones, netbooks and tablets) in recent years is a good indicator that the easily seduced consumer mass has now being driven towards leaner, more efficient machines.

But, how lean are we ready to go? How much raw power are we willing to give away. In other words, how far goes the appeal that the media push on us to relinquish those rights bestowed by Moore? It seems not so much, with all chip companies fighting for a piece of the fat market (as well as the lean, but).

What is the question, anyway?

Ever since that became a trend, the question has always been: “how lean can we make our machine without impacting on usability?”. The focus so far has only been on creating smarter hardware, to a lesser extent (and only recently) reducing the unneeded fat of operating systems and applications, but no one ever touches the fundamental question: “Do we really need all that?“.

The questions is clearly cyclic. For example, you wouldn’t need a car if the public transport was decent. You wouldn’t need health insurance if the public health system was perfect, and so on. With computing is the same. If you rely on a text editor or a spreadsheet, it has to be fast and powerful, so you can finish your work on time (and not get fired). If you are a developer and have to re-compile your code every so often, you need a damn good (in CPU and memory) computer to make it as painless as possible. Having a slow computer can harm the creative process that involves all tasks around it, and degrade the quality of your work to an unknown quantity.

Or does it?

If you didn’t have to finish your work quicker, would you still work the same way? If you didn’t have to save your work, or install additional software, just because the system you’re working on only works on a particular type of computer (say, only available on your workplace). If you could perform tasks as tasks and not a whole sequence of meaningless steps and bureaucracy, would you still take that amount of time to finish your task?

Real world

Even though the real world is not that simple, one cannot take into account the whole reality on each investigation. Science just doesn’t work that way. To be effective, you take out all but one variable and test it. One by one, until you have a simplified picture, a model of reality. If on every step you include the whole world, the real world, in your simulations, you won’t get far.

There is one paper that touched some of these topics back in 2000, and little has changed since then. I dare to say that it actually got worse. With all these app stores competing for publicity and forcing incompatibility with invisible boundaries, has only made matters worse. It seems clear enough for me that the computing world, as far I can remember (early 80’s) was always like that and it’s not showing signs of change so far.

The excuse to keep doing the wrong thing (ie. not thinking clearly about what a decent system is) was always because “the real world is not that simple”, but in fact, the only limitation factor has been the greed of investors who cannot begin to understand that a decent system can bring more value (not necessarily money) than any quickly designed and delivered piece of software available today.

Back in the lab…

Because I don’t give a fig to what they think, I can go back to the lab and think clearly. Remove greed, profit and market from the table. Leave users, systems and what’s really necessary.

Computers were (much before Turing)  meant to solve specific problems. Today, general purpose computers create more problems than they solve, so let’s go back to what the problem is and lets try to solve it without any external context: Tasks.

A general purpose computer can perform a task in pretty much the same way as any other, after all, that’s why they’re called “general purpose”. So the system that runs on it is irrelevant, if it does not perform the task, it’s no good. A good example of that are web browsers. Virtually every browser can render a screen, and show surprisingly similar results. A bad example is a text editor, which most of them won’t even open another’s documents, and if they do, the former will do all in its power to make the result horrid in the latter.

Supposing tasks can be done seamlessly on any computer (lets assume web pages for the moment), than does the computer only computes that task, or is it doing other things as well?

All computers I know of will be running, even if broken, until they’re turned off. Some can increase and decrease their power consumption, but they’ll still be executing instructions to the world’s end. According to out least-work principle (to execute tasks), this is not particularly relevant, so we must take that out of our system.

Thus, such a computer can only execute when a task is requested, it must complete that task (and nothing else more), and stop (really, zero watts consumption) right after that.

But this is madness!

A particular task can take longer to execute, yes. It’ll be more difficult to execute simultaneous tasks, yes. You’ll spend more cycles per particular task than usual, yes! So, if you still thinking like Moore, than this is utter madness and you can stop reading right now.

Task Driven Computing

For those who are still with me, let me try to convince you. Around 80% of my smartphone’s battery is consumed by the screen. The rest is generally spend on background tasks (system daemons) and only about 5% on real tasks. So, if you could remove 95% of your system’s consumption, you could still take 20x more power consumption for your tasks and be even.

Note that I didn’t say “20x the time”, for that’s not necessarily true. The easiest way to run multiple tasks at the same time is to have multiple CPUs in a given system. Today that doesn’t scale too well because the operating systems have to control them all, and they all just keep running (even when idle) and wasting a huge amount of power for nothing.

But if your system is not designed to control anything, but to execute tasks, even though you’ll spend more time per task, you’ll have more CPUs working on tasks and less on background maintenance. Also, once the task is done, the CPU can literally shut down (I mean, zero watts) and wait for the next task. There is no idle cost, there is no operational code being run to multi-task or to protect memory or avoid race-conditions.


Of course, that’s not as easy as it sounds. Turning on and off CPUs is not that trivial, running tasks with no OS underneath (and expecting them to communicate) is not an easy task, and fitting multiple processors into a small chip is very expensive. But, as I said earlier, I’m not concerned with investors, market or money, I’m concerned with technology and it’s real purpose.

Also, the scaling is a real problem. Connection Machines were built and thrown away, clusters have peak performance way above their average performance levels, and multi-core systems are hard to work with. Part of that is real, the interconnection and communication parts, but the rest was artificially created by operating systems to solve new problems in an old way, just because it was cheaper, or quicker, or easier.

Back in the days…

I envy the time of the savants, when they had all the time and money in the world to solve the problems of nature. Today, the world is corrupted by money and even the most prominent minds in science are corrupt by it, trying to be the first to do such and such, protecting research from other peers just to claim a silly Nobel prize or to be world famous.

The laws of physics had led us into it, we live in the local minima of the least energetic configuration possible, and that’s here, now. To get our of any local minima we need a good kick, something that will take us out in a configuration of a more energetic configuration, but with enough luck, we’ll fall into another local minima that is less energetic than this one. Or, we we’re really the masters of the universe, maybe we can even live harmoniously in a place of local maxima, who knows!?

C++ class sizes #2

In my previous post about class sizes, we visited the empty classes and how their derived classes change in size. Now, we’re going to look what the ABI have to say about tail padding in inherited classes.

Again, for normal (POD) structures, everything is as it looks like, same as in C. But for more complex (non-POD) classes, things change a bit. The Itanium C++ ABI has a rather complex rule for defining the offset and size of class members, but each target-specific ABI (like ARM’s) can have additional rules that make it very complex to define the size of a class member.


So, first, a brief explanation of POD. POD is a concept in C++ that states, for the purpose of layout, a C++ structure (that is more than simple C structures) are just like plain old data. This normally means that the structure behaves like data, ie. it can be created on zero-initialized memory, copied with memcpy without any side effect, etc.

But not being a POD doesn’t always means you have a complex virtual class with multiple inheritance. Sometimes a simple change can transform it into a non-POD, like adding a simple empty constructor to the base class.

For example, in both C and C++, the structure below has 8 bytes:

struct Foo {
  int a;
  char b;

because the structure alignment is the same as the most aligned member, which is int. Assuming int is 32 bits, the second member has size one but aligns to four, yielding a size 8 for the final structure.

In C++, you can derive from that structure, and the derived class is normally implemented by having the base class as sort of a member inside it.

So, the C++ struct Bar below:

struct Foo {
	int a;
	char b;

struct Bar : Foo {
	char c;

is similar (for layout purposes) to a C construct:

struct Bar {
	struct {
		int a;
		char b;
	char c;

So, both in C and in C++, the size of Bar is 12 bytes. Now, doing a slight change in Foo, we’re going to break that:

struct Foo {
	int a;
	char b;
	Foo() {}

struct Bar : Foo {
	char c;

You may be surprised to know that the size of Bar is still 8 bytes.

non-POD structures

Adding the constructor is one way of killing the POD-ness of a structure, but anything that adds some C++ flavour to it is enough. For instance, Bar itself is a non-POD, because it was derived from another structure. Adding simple methods (non constructors) is fine, but adding a virtual method makes it a non-POD (as you have to add the virtual table). Anything that leaves some room for implementation detail will be considered a violation of the POD contract and will transform your structure to a non-POD.

Ok, so what does that mean? If it just means your structures will be smaller, well, I’ll add a constructor to all my structures! But, no, the only change in in the tail padding. If you want to compress your structures, use packed structures, the tail padding rules are too complex to deal with them and still maintain your code clean.

So, the original Foo structure had 8 bytes. Four for the first int, Four for the char. But the char only used one of those four, so you have a 3-byte tail. When inheriting from a non-POD structure, the C++ ABI allows you to use the tail padding of the base class, providing your new type fits in there. In our example, the new type was char that fits in the 3-byte tail of the base class, so the final layout uses two of the four bytes and still have a 2-byte tail.

Further inheritance of the same type (adding a single char member) will still pack at the end of the base structure.


Every time you think you have understood the rules of C++ layout, go back to bit-fields and you’ll always be surprised.

Lets have a look at this example for a moment:

struct Foo {
	char a;
	int b : 8;

struct Bar : Foo {
	int c : 8;

The first member is a char, so one byte. The second is an int, so 4 bytes and the final layout is 5 bytes, aligned to int goes to 8 bytes, right? Well, no. Even without the constructor, the size of that structure is 4 bytes.

Compilers are smart enough (and the ABI allows them to be) to figure that the member b will have only 8 bits, so there is no point in wasting 6 more bytes just to follow the declared type.

You may not be surprised, now, to know that the size of Bar is still 4 bytes, since the member c in Bar gets tail-padded into the base Foo.

But what about this:

struct Bar  {
	struct {
		char a;
		char a1;
		char a2;
		int b : 4;
	int c : 4;

The three chars push the final member to the last byte in the block, of which only 4 bits are really used, so you have 4 bits of tail-padding. The second member c of Bar gets tail-padded into those four bits, since they fit, and the final size of that structure is 4 bytes.

I hope to have convinced you to never try to be smart and convert your structures and classes to arrays or pointers. Even disregarding virtual inheritance (which is probably coming in another post), the C++ ABI for structure layout is too complex to rely on and use it in your own code. Worse, because this is not in the standard, concrete implementations are free to change the details and sometimes, still following the generic ABI. You’ll find a few cases in the ARM ABI where they differ slightly from the ABI without actually being incompatible, sometimes not that lucky.

The final lesson is: be smart, use the language features and rely on the compiler to optimise things for you. If you’re not happy with the optimization, fill a bug rather than hack your code.

C++ class sizes

In a recent struggle to define the class size in C++, I thought would be good to share some of my findings, since there isn’t much about it on the net. The empty class size is all over the place, neatly put in Stroustroup’s FAQ, but the other weird cases were less common to find.

For obvious reasons, the C++ standard is pretty vague about class sizes and memory layout. This is more of an ABI issue than a language standard, but the problem is that even the Itanium C++ ABI is a bit vague on that point.

The C++ standard allows the memory layout to be defined by the ABI and position the class members on where it fits more naturally on the target, but there is one specific clause: an object cannot have zero size. While most layout decisions are target specific, this is strictly a language decision. If objects were allowed to have zero bytes in size, an array of 1000 zero-sized objects would occupy no space at all and it would be impossible to choose one from the other.

Empty classes

First, a background on empty classes. An empty class is somewhat useless in most cases, but there is one case in which it plays an important role: type safety.

If you want to group objects that have nothing in common (to use in an algorithm), or to force a specific class of template parameters, or even differentiate between exceptions, even if they don’t have any data on it (just to catch them differently), you can use an empty class and it does the job pretty well. Thus, using and deriving from empty classes is somewhat a common enterprise in C++.

Empty class size

So, what happens when you declare an empty type?

class Empty {

What is the size that objects of this type will have? It cannot be zero, as the standard forbids, so it has to be something. The best choice is one byte, since no architecture will benefit from having less (please, don’t think of bit-fields!) and if alignment is a problem, most (all?) compilers will adjust the byte to a word aligned block in memory.

Obviously, an empty class that derives from another empty class also has to follow the same rule:

class Empty {

class NowWhat : public Empty {

In this case, both empty and NowWhat have size 1. But if NowWhat has data on it, will the extra byte still be there? On both standard and ABI, there is nothing saying that it shouldn’t, but also nothing saying it should. What’s important here is the reason why you need the extra byte: the address of different objects must be different.

When the derived class already has data, its objects will already be at different locations when laid out in memory, so there is no necessity of the extra byte.

class Empty {

class NowWhat : public Empty {
  int a;

Now, Empty still has 1 byte, but NowWhat has 4, not 5. Some people consider this an optimization, I just think it’s following the rules by what the standard requires.

The reverse case is much simpler. If the derived class is the empty one (for whatever reason), the size is already non-zero, so the classes below have both the same sizes (4):

class Full {
	int a;

class Empty : public Full {

Multiple Inheritance

Multiple inheritance brings a bit of colour to this problem. The C++ standard requires that two objects always occupy different memory locations (so they compare differently by their pointers and don’t allow non-empty zero-sized arrays). So what is the size of the Derived class below?

class Empty {

class M1 : public Empty {

class M2 : public Empty {

class Derived : public M1, M2 {

Before, we could make the NowWhat class with only one byte because that would be sufficient to discriminate it in an array, but what about comparison. If you cast a Derived object to Empty via M1 you get a different (logical) object than if you do it via M2, so they must compare differently. For that, you need to add another byte, and return the first address when via one class and the second via the other.

Empty members

With single and multiple inheritance nailed down, let’s think of a more complicated case.

class Empty {

class Another : public Empty {

class NotSoMuch : public Empty {
	Another a;

What happening here is that the first item of the derived class is, in fact, of an empty class type, that also derives from the same class as the type.

See, this last bit is important, because if you replace Another a; by another empty class type (say Empty2), the size of the derived class will still be 1. But in this case above, this is not true. The size of NotSoMuch is actually 2.

The field Another a; has size one, as any empty class objects, so what’s the second byte for? The second byte is a padding, at the beginning of the class, to avoid type conflicts.

Type conflicts

When deriving from empty classes, the ABI (2.4-II-3) states that you should always try to put the derived members at zero offset and, in case of type conflict, move down (alignment-wise) until you don’t have any more type conflicts. The question is, what is a type conflict in this case? In the multiple inheritance, the type conflict was clear (the diamond inheritance), but here, not so much.

Since both the class and its first member can be converted to Empty types (as pointers, for instance), you can have two Empty objects that, if compared must return different addresses. In the code below, n is (a pointer to) the derived object. Itself is converted to an Empty pointer, stored in en.

	NotSoMuch *n = new NotSoMuch();
	Empty *en = (Empty*)n;
	Empty *ea = (Empty*)&n->a;

If the member Another a; was in the offset zero of the class, the pointer ea would be in the same address as en, and on a comparison for equality, it’d return true.

The type conflict in this case is that, while both en and ea are pointers to an Empty type, the former gets there via NotSoMuch and the latter via Another. According to the C++ standard, they must be different, thus, return different addresses.

Again, if the empty class member is not the first element, none of that happens. The example below has size 4 (instead of 5), because the two similar (but different) Empty pointers would now be separated by 4 bytes, thus not violating the standard.

class Empty {

class Another : public Empty {

class NotSoMuch : public Empty {
	int i;
	Another a;


Template code is also susceptible to this problem, of course, and the iostream library is full of such examples. The problem is not so much for the abuse of empty classes (that doesn’t happen that much in STL), it’s just that non-virtual classes that only have member functions and no data fields are considered empty classes. And since templates are a good replacement for most virtual inheritance (and its inherent run-time complexity and slowness), basic libraries have to make extensive use of templates to avoid slowing down every single C++ program.

It’s true that the iostream library is particularly heavy and cumbersome, but it could be much worse.

The additional problem with templates (for the compiler, at least), is figuring out if there is a type conflict. While templates are already expanded by the time the basic C++ compiler kicks in, it’s complicated to carry the information on the source code about inheritance graph to every single template variation and specialization. Not to mention the new standard, where C++ is getting a variadic templates mechanism

I hope this is one more reason for you to avoid C-style pointer casts (especially void*) and only use C++ static_cast and dynamic_cast. Even reinterpret_cast is dangerous, but at least you can grep them on your source files and identify them one by one.

Computer Science vs Software Engineering

The difference between science and engineering is pretty obvious. Physics is science, mechanics is engineering. Mathematics is (ahem) science, and building bridges is engineering. Right?

Well, after several years in science and far too much time in software engineering that I was hoping to tell my kids when they grow up, it seems that people’s beliefs are much more exacerbated about the difference, if there’s any, than their own logic seems to imply.


General beliefs that science is more abstract fall apart really quickly when you compare maths to physics. There are many areas of maths (statistics, for example) that are much more realistic and real world than many parts of physics (like string theory and a good part of cosmology). Nevertheless, most scientists will turn their noses up at or anything that resembles engineering.

From different points of view (biology, chemistry, physics and maths), I could see that there isn’t a consensus on what people really consider a less elaborate task, not even among the same groups of scientists. But when faced with a rejection by one of their colleagues, the rest usually agree on it. I came to the conclusion that the psychology of belonging to a group was more important than personal beliefs or preferences. One would expect that from young schoolgirls, not from professors and graduate students. But regardless of the group behaviour, there still is that feeling that tasks such as engineering (whatever that is) are mundane, mechanical and more detrimental to the greater good than science.

Real World

On the other side of the table, the real world, there are people doing real work. It generally consists of less thinking, more acting and getting things done. You tend to use tables and calculators rather than white boards and dialogue, your decisions are much more based on gut feelings and experience than over-zealously examining every single corner case and the result of your work is generally more compact and useful to the every-day person.

From that perspective, (what we’re calling) engineers have a good deal of prejudice towards (what we called) scientists. For instance, the book Real World Haskell is a great pun from people that have one foot on each side of this battle (but are leaning towards the more abstract end of it). In the commercial world, you don’t have time to analyse every single detail, you have a deadline, do what you can with that and buy insurance for the rest.

Engineers also produce better results than scientists. Their programs are better structured, more robust and efficient. Their bridges, rockets, gadgets and medicines are far more tested, bullet-proofed and safe than any scientist could ever hope to do. It is a misconception that software engineers have the same experience than an academic with the same time coding, as is a misconception that engineers could as easily develop prototypes that would revolutionise their industry.

But even on engineering, there are tasks and tasks. Even loathing scientists, those engineers that perform a more elaborate task (such as massive bridges, ultra-resistant synthetic materials, operating systems) consider themselves above the mundane crowd of lesser engineers (building 2-bed flats in the outskirts of Slough). So, even here, the more abstract, less fundamental jobs are taken at a higher level than the more essential and critical to society.

Is it true, then, that the more abstract and less mundane a task is, the better?


Since the first thoughts on general purpose computing, there is this separation of the intangible generic abstraction and the mundane mechanical real world machine. Leibniz developed the binary numeral system, compared the human brain to a machine and even had some ideas on how to develop one, someday, but he ended up creating some general-purpose multipliers (following Pascal’s design for the adder).

Leibniz would have thrilled in the 21th century. Lots of people in the 20th with the same mindset (such as Alan Turin) did so much more, mainly because of the availability of modern building techniques (perfected for centuries by engineers). Babbage is another example: he developed his differential machine for years and when he failed (more by arrogance than anything else), his analytical engine (far more elegant and abstract) has taken his entire soul for another decade. When he realised he couldn’t build it in that century, he perfected his first design (reduced the size 3 times) and made a great specialist machine… for engineers.

Mathematicians and physicists had to do horrible things (such as astrology and alchemy) to keep their pockets full and, in their spare time, do a bit of real science. But in this century this is less important. Nowadays, even if you’re not a climate scientist, you can get a good budget for very little real applicability (check NASA’s funded projects, for example). The number of people working in string theory or trying to prove the Riemann hypothesis is a clear demonstration of that.

But computing is still not there yet. We’re still doing astrology and alchemy for a living and hoping to learn the more profound implications of computing on our spare time. Well, some of us at least. And that comes to my point…

There is no computer science… yet

The beginning of science was marked by philosophy and dialogue. 2000 years later, man kind was still doing alchemy, trying to prove the Sun was the centre of the solar system (and failing). Only 200 years after that that people really started doing real science, cleansing themselves from private funding and focusing on real science. But computer science is far from it…

Most computer science courses I’ve seen teach a few algorithms, an object oriented language (such as Java) and a few courses on current technologies (such as databases, web development and concurrency). Very few of them really teach about Turin machines, group theory, complex systems, other forms of formal logic and alternatives to the current models. Moreover, the number of people doing real science on computing (given what appears on arXiv or news aggregation sites such as Ars Technica or Slashdot) is probably less than the number of people working with string theory or wanting a one-way trip to Mars.

So, what do PHDs do in computer science? Well, novel techniques on some old school algorithms are always a good choice, but the recent favourite has been breaking the security of the banking system or re-writing the same application we all already have, but for the cloud. Even the more interesting dissertations like memory models in concurrent systems, energy efficient gate designs are all commercial applications at most.

After all, PHDs can get a lot more money in the industry than remaining at the universities, and doing your PHD towards some commercial application can guarantee you a more senior position as a start in such companies than something completely abstract. So, now, to be honestly blunt, we are all doing alchemy.

Interesting engineering

Still, that’s not to say that there aren’t interesting jobs in software engineering. I’m lucky to be able to work with compilers (especially because it also involves the amazing LLVM), and there are other jobs in the industry that are as interesting as mine. But all of them are just the higher engineering, the less mundane rocket science (that has nothing of science). But all in all, software engineering is a very boring job.

You cannot code freely, ignore the temporary bugs, ask the user to be nice and have a controlled input pattern. You need a massive test infrastructure, quality control, standards (which are always tedious), and well documented interfaces. All that gets in the way of real innovation, it makes any attempt of doing innovation in a real company a mere exercise of futility and a mild source of fun.

This is not exclusive of the software industry, of course. In the pharmaceutical industry there is very little innovation. They do develop new drugs, but using the same old methods. They do need to get new medicines, more powerful out of the door quickly, but the massive amount of tests and regulation they have to follow is overwhelming (this is why they avoid as much as possible doing it right, so don’t trust them!). Nevertheless, there are very interesting positions in that industry as well.

When, then?

Good question. People are afraid of going out of their area of expertise, they feel exposed and ridiculed, and quickly retract to their comfort area. The best thing that can happen to a scientist, in my opinion, is to be proven wrong. For me, there is nothing worse than being wrong and not knowing. Not many people are like that, and the fear of failure is what keeps the industry (all of them) in the real world, with real concerns (this is good, actually).

So, as far as the industry drives innovation in computing, there will be no computer science. As long as the most gifted software engineers are mere employees in the big corporations, they won’t try, to avoid failure, as that could cost them their jobs. I’ve been to a few companies and heard about many others that have a real innovation centre, computer laboratory or research department, and there isn’t a single one of them that actually is bold enough to change computing at its core.

Something that IBM, Lucent and Bell labs did in the past, but probably don’t do it any more these days. It is a good twist of irony, but the company that gets closer to software science today is Microsoft, in its campus in Cambridge. What happened to those great software teams of the 70’s? Could those companies really afford real science, or were them just betting their petty cash in case someone got lucky?

I can’t answer those questions, nor if it’ll ever be possible to have real science in the software industry. But I do plea to all software people to think about this when they teach at university. Please, teach those kids how to think, defy the current models, challenge the universality of the Turin machine, create a new mathematics and prove Gödel wrong. I know you won’t try (by hubris and self-respect), but they will, and they will fail and after so many failures, something new can come up and make the difference.

There is nothing worse than being wrong and not knowing it…

Zero-cost exception handling in C++

C++ exception handling is a controversial subject. The intricate concepts are difficult to grasp, even to seasoned programmers but, in complexity, nothing compares to its implementation. I can’t remember any real-world (commercial or otherwise) product that makes heavy use of exception handling. Even STL itself only throws a couple of them and only upon terminal failures.

Java exceptions, on the other hand, are much more common. Containers throw them at will (ex. OutOfBounds) and creating your own exceptions is very easy, developers are encouraged to use them. But still, some C++ projects’ coding standards hint (like LLVM’s) or specifically disallow (like Google’s) the use of exceptions. The Android NDK, for example, doesn’t even have (nor plans to) support exceptions.

Some of the cons commonly listed are the encouragement of bad style (throwing at will, like in Java), compilation problems with old code and all the gotchas that exception handling can bring to your code, making it a nightmare to understand the real flow of your program and to figure out the critical DON’Ts of exception handling (like throwing from inside a destructor). For a deeper discussion on that subject, read Sutter’s book, for the implementation details of exception handling from a compiler point of view, keep reading.

Three distant worlds

Exception handling is not just a language construction, like if, it’s an intricate interaction between three distant worlds. All the complications that exception handling brings to us have to be implemented by a three-layer infrastructure: the user code, the run-time library and the compiler. The communication between them has to be absolutely perfect, one bit different and the control flow goes south, you start executing random code and blow up your program for good.

Not only it’s three very different places, but the code is written by three different kinds of people, at different times. The only formal document that binds them together is the ABI (ex. the Itanium C++ ABI), which is, to say the least, vague. Luckily, the guys who write the run-time libraries are close enough to the guys writing the compiler (and hopefully, but not necessarily, the same writing the ABI).

In the end, if the compiler implementation is correct, the user only has to care about the rules imposed by the C++ standard. But that’s not too comforting, the C++ standard is complex and frequently driven by previous implementations, rather than a fix and clear rule of thumb. Most users don’t precisely understand why you can’t throw from inside a destructor or why throwing a newly created object is calling terminate on their code, they just avoid it at all costs.

Dirty and expensive

The easy exception handling implementation is called SetJump/LongJump. It literally sets jumps between your code and exception handling code (created by the compiler). If an exception was thrown, the flow jumps directly to the exception handling code. This is very easy, but extremely expensive. Usually, if you follow good practices, you’d expect that your program would spend most of the time doing real computation and only occasionally handling exceptions. But if for every action you create a bunch of exception handling data, you’re wasting a lot of resources to check for something that might not even happen during the whole run of your program.

When an exception happens, you need to go back down the call graph and inspect if any of those functions (including the one you just thrown) is catching that exception. This is called unwinding the stack, and in a SetJump/LongJump exception handling this is done by inspecting if any LongJump was called by a series of SetJump checks. The problem is, that those checks are done even if an exception was not thrown at all.


So, how is it possible to only execute exception handling code when an exception really happen? Simple, let the compiler/library do it for you. 😉

The whole idea of a zero-cost exception handling is to create a set of functions (like __cxa_throw, __cxa_begin_catch) that will deal with the exception handling flow by sending it to a different path, the EH path. Under normal situations, your code flow will be identical with or without exceptions. The flow could never reach the EH blocks and it’s only because the compiler knows those blocks are in the EH path that they don’t get discarded as unreachable code, since there is no explicit flow between user code and EH code.

Only the run-time library knows how to get there, and that’s helped by a table, normally in a read-only data section, specific to exception handling (ex. the .eh_frame ELF section), created by the compiler when reading the user’s exception flow. This is the core interaction between user, compiler and library codes.

When an exception is thrown via __cxa_allocate_exception + __cxa_throw, the first function allocates space and prepare the object to be thrown and the second starts reading the exception handling table. This table contains a list of exceptions that the particular piece of code is catching (often sorted by address, so it’s easy to do a binary search) and the code that will catch the exception.

The code that catches exceptions is called the personality routine. The personality routine library call enables you to use multiple zero-cost EH mechanisms in the same program. Normally, the table entry contains a pointer to the routine and some data it’ll use while unwinding, but some ABIs (like ARM’s EHABI) can add specially crafted tables for their purposes (like having the unwind code as 16-bits instructions embedded in the data section).

All in all, the general flow is the same: the personality routine will find the EH block (also called landing pads, unreachable from normal code) and jump there, where the catching happens. That code will get the exception allocated before the throw and make it available to the user code (inside the catch block) to do what the user requested. If there is no entry in the EH table that catches that exception, the stack will be unwound again and the library will read the caller’s EH table, until it reaches a table that does catch it or terminate if none does.


So far so good, but what happens with the local variables and the temporaries (stack-based data)? If they have non-trivial destructors, they must be called, which could chain a lot of cleaning up to do. Clean-up code is part in the unreachable blocks (when you’re cleaning up EH stuff) and part in normal code, for automatic destruction at the end of a lexical block. Although they do similar things, their purpose is completely different.

Normal flow clean-up blocks are always called and always clean up all objects. But EH clean-up code has to clean up only the objects that have been allocated so far, in the current lexical block or in any parent until the root block of the function. If you can imagine deeply nested blocks, creating objects at will, with a throw inside in the increment part of a for loop, where the object beeing incremented has a non-trivial destructor, you got the picture.

What happens if you throw an exceptions when other was being caught? Here is where the complexity met reality and the boundaries were settled. If you are unwinding one exception, and have a flow being dealt with by the run-time library, and another exception is thrown, you can’t distinguish which exception you are handling. Since there is space for one exception been unwounded at a time, whenever a second exception is thrown, you have to terminate.

Of course, you could argue to create multiple lanes for exception handling, where in every exception handling is handled separately, so you know that one exception was caused when throwing another. That may be easy to implement in Java (or other interpreted/bytecode languages), because the VM/interpreter has total control of the language, but as stated earlier, exception handling in C++ is just a contract between three parties: compiler, run-time library and the user, and often these three parties are completely different groups in completely different space-time realities.


This is just a very brief, shallow demonstration of how complex exception handling can be. Many other pitfalls appear, especially when dealing with specific user code running on a specific hardware platform with a specific binary configuration for the object file. It’s impossible to test every single combination (including room temperature, or annual rainfall forecast) to make it an implementation final and weird bugs creep up every once in a while.

Enhancing the ABI to allow that is possible, but the complexities of doing so, together with the very scarce use of exception handling in C++, makes it a very unlikely candidate to be changed in the near future. It’d have to be done in so many levels (C++ standard, ABIs, compilers, run-time libraries) that it’s best left alone.

Probabilistic processor and the end of locks

Every now and then, when I’m bored with normal work, I tend to my personal projects. For some reason, that tends to be mid-year, every year. So, here we go…

Old ideas, new style

This time was an old time project, to break the parallelization barrier without the use of locks. I knew simple boolean logic on regular Turing machines wouldn’t do, as they rely on the fact that the tape is only written by one head at a time. If you have more than one head writing to the same place, the behaviour is undefined. I’ve looked into several other technologies…

Some were just variations of Turing machines, like molecular, ternary and atomtronic computers. Others were simply interesting toys, like the programmable water, but there were two promising classes: quantum computers (taking a bit too long to impress) and non-deterministic Turing machine (NTM), that could get around some concurrency problems quite well.

Quantum computers are nice, but they’re more like vector computing on steroids. You can operate on several qbits at once, maybe even use non-locality to speed up communications, but until all that reaches the size of a small building, we need something different than yet-another n-core frying-pan Intel design. That may be non-deterministic machines…

Non-deterministic Turing machines are exactly like their deterministic counterparts, but the decision process can take different paths given the same input. So, instead of

if (a == 10) do_something;

you have

if (probability(a) > threshold) do_something;.

The problem with that is that you still can’t ignore different processes writing to the same data, since if a process happens to read it (even if by chance), the data you get from it is still being concurrently changed, therefore, still undefined. In other words, even if the state change is probabilistic, the data is still deterministic.

So, how can you make sure that, no matter how many processes are writing to a piece of data at random times, you still have meaningful data?

Well, if the reading part is not interested into one particular value of that data, but say, only in the probability of that data be a certain value, then it doesn’t matter in which order the writes are performed, as long as they’re all writing about the same thing. That can be viewed as a non-deterministic register, not a non-deterministic Turing machine. You can still have a deterministic machine reading and writing on non-deterministic registers, that you still won’t need locks for those values.

Probabilistic registers

Suppose you want to calculate an integral. You can use several very good deterministic algorithms (like the Romberg’s method), but if you create producers to get the values and consumers to reduce it to a sum, you’ll need a synchronized queue (ie. locks) to make sure the consumer is getting ALL the values. If you loose some values, you might get a completely wrong result.

If instead, you create random generators, getting the value of any (multi-dimensional) function from random values, and accumulate them into a non-deterministic register (say by updating the average of the last values, like recharging a capacitor that keeps discharging), when the consumer process read the value from it, it’ll get an average of the previously written values. Because the writes are all random, and if you keep only the average of the last N writes, you know how much data is being filled in and what’s the average value for that block.

In essence, you’re not storing any particular value, but information enough to re-create the original data. Since specific values are not important in this case, the order in what they appear (or if they appear at all) is completely irrelevant.

In a way, it’s similar to quantum mechanics, where you have the distribution of probabilities but not the values themselves. With a few tricks you can extract every information from those distributions. This is also similar to what Monte Carlo methods are. Taking the probabilistic side of computations in order to achieve a rough image of the problem quicker than by brute-force, when the complexity of the problem would be non-polynomial and the problem size would be big enough to need a few billion years to calculate.

Here are some examples of a non-det register.

A new logic

Obviously, current deterministic algorithms cannot run on such machine. All of them are based on the fact that specific data is stored in a specific place. If you compress a file and uncompress it back again, it’s just impossible to work with probable values, your original file will be completely undecipherable. Also, boolean logic cannot operate on such registers, Operating an AND between probabilities or a XOR of two averages doesn’t make any sense. You need a different logic, something that can make sense of probable inputs and analog streams. Fortunately, Bayesian logic is just the thing for such machines.

But the success of Bayesian logic in computing is not enormous. First, it’s not simple to create algorithms using a network of probabilities, and very few mundane algorithms can benefit from it to be worth the trouble. (Optimor labs is a good example). But with the advent of this new machine, a different logic has to be applied and new algorithms must be developed to make use of it.

Probably by mixing deterministic with non-deterministic hardware and creating a few instructions that would use the few non-deterministic registers (or accumulators) available. Most likely they would be used for inter-process communication in the beginning, but as you increase the number of those registers you can start building complex Bayesian networks or also neural networks in the registers themselves. That would open the door to much more complex algorithms, probably most of them unpredictable or found by trial and error (genetic approach?), but with time, people would begin to think that way and that would be the dawn of the non-deterministic computing.

Eureka moment

The probabilistic register was an Eureka moment, when I finally implemented it and saw that my simple monte-carlo or my neuron model was really a three-line code, but that didn’t last long…

A few days ago I read the news that DARPA has developed a fully featured probabilistic processor, taking random computation to the masses, even using Bayesian logic in place of boolean logic! The obvious place for a “public trial” was spam filtering (though I think the US military is not that interested into filtering spam…), but according to them, the sky is the limit. I do have to agree…

Anyway, I wouldn’t have 5 spare years nor $18mi in the bank to implement that project like they did, so in a way I’m glad that it turned out to be a good idea after all. I just hope that they make good use of it (ie. not use it for Carnivore 2.0), and they realize that not only the military have uses for it, or anti-spam filters, for that matter. We (them?) should develop a whole new set of algorithms to work on that logic and make good use of a hardware that is not a simple progression of current technology (to keep Moore’s law going), but could be a game changing in the next decades.

I could re-focus on defining the basic blocks for that processor’s new logic, but I’m sure DARPA has much better personnel to do that. So, what’s next, then? I’ll tell you next year…