Declaration of Internet Freedom

We stand for a free and open Internet.

We support transparent and participatory processes for making Internet policy and the establishment of five basic principles:

  • Expression: Don’t censor the Internet.
  • Access: Promote universal access to fast and affordable networks.
  • Openness: Keep the Internet an open network where everyone is free to connect, communicate, write, read, watch, speak, listen, learn, create and innovate.
  • Innovation: Protect the freedom to innovate and create without permission. Don’t block new technologies, and don’t punish innovators for their users’ actions.
  • Privacy: Protect privacy and defend everyone’s ability to control how their data and devices are used.

Don’t get it? You should be more informed on the power of the internet and what governments around the world have been doing to it.

Good starting places are: Avaaz, Ars Technica, Electronic Frontier Foundation, End Software Patents, Piratpartiet and the excellent Case for Copyright Reform.

Source: http://www.internetdeclaration.org/freedom

K-means clustering

Clustering algorithms can be used with many types of data, as long as you have means to distribute them in a space, where there is the concept of distance. Vectors are obvious choices, but not everything can be represented into N-dimensional points. Another way to plot data, that is much closer to real data, is to allow for a large number of binary axis, like tags. So, you can cluster by the amount of tags the entries share, with the distance being (only relative to others) the proportion of these against the non-sharing tags.

An example of tag clustering can be viewed on Google News, an example of clustering on Euclidean spaces can be viewed on the image above (full code here). The clustering code is very small, and the result is very impressive for such a simple code. But the devil is in the details…

Each red dots group is generated randomly from a given central point (draws N randomly distributed points inside a circle or radius R centred at C). Each centre is randomly placed, and sometimes their groups collide (as you can see on the image), but that’s part of the challenge. To find the groups, and their centres, I throw random points (with no knowledge of the groups’ centres) and iterate until I find all groups.

The iteration is very simple, and consists of two steps:

  1. Assignment Step: For each point, assign it to the nearest mean. This is why you need the concept of distance, and that’s a tricky part. With Cartesian coordinates, it’s simple.
  2. Update Step: Calculate the real mean of all points belonging to each mean point, and update the point to be at it. This is basically moving the supposed (randomly guessed) mean to it’s rightful place.

On the second iteration, the means, that were randomly selected at first, are now closer to a set of points. Not necessarily points in the same cluster, but the cluster that has more points assigned to any given mean will slowly steal it from the others, since it’ll have more weight when updating it on step 2.

If all goes right, the means will slowly move towards the centre of each group and you can stop when the means don’t move too much after the update step.

Many problems will arise in this simplified version, for sure. For instance, if the mean is exactly in between two groups, and both pull it to their centres with equally strong forces, thus never moving the mean, thus the algorithm thinks it has already found its group, when in fact, it found two. Or if the group is so large that it ends up with two or more means which it belongs, splitting it into many groups.

To overcome these deficiencies, some advanced forms of K-means take into account the shape of the group during the update step, sometimes called soft k-means. Other heuristics can be added as further steps to make sure there aren’t two means too close to each other (relative to their groups’ sizes), or if there are big gaps between points of the same group, but that kind of heuristics tend to be exponential in execution, since they examine every point of a group in relation to other points of the same group.

All in all, still an impressive performance for such a simple algorithm. Next in line, I’ll try clustering data distributed among many binary axis and see how k-means behave.

Emergent behaviour

There is a lot of attention to emergent behaviour nowadays (ex. here, here, here and here), but it’s still on the outskirts of science and computing.

Science

For millennia, science has isolated each single behaviour of a system (or system of systems) to study it in detail, than join them together to grasp the bigger picture. The problem is that, this approximation can only be done with simple systems, such as the ones studied by Aristotle, Newton and Ampere. Every time scientists were approaching the edges of their theories (including those three), they just left as an exercise to the reader.

Newton has foreseen relativity and the possible lack of continuity in space and time, but he has done nothing to address that. Fair enough, his work was much more important to science than venturing throughout the unknowns of science, that would be almost mystical of him to try (although, he was an alchemist). But more and more, scientific progress seems to be blocked by chaos theory, where you either unwind the knots or go back to alchemy.

Chaos theory exists for more than a century, but it was only recently that it has been applied to anything outside differential equations. The hyper-sensibility of the initial conditions is clear on differential systems, but other systems have a less visible, but no less important, sensibility. We just don’t see it well enough, since most of the other systems are not as well formulated as differential equations (thanks to Newton and Leibniz).

Neurology and the quest for artificial intelligence has risen a strong interest in chaos theory and fractal systems. The development in neural networks has shown that groups and networks also have a fundamental chaotic nature, but more importantly, that it’s only though the chaotic nature of those systems that you can get a good amount of information from it. Quantum mechanics had the same evolution, with Heisenberg and Schroedinger kicking the ball first on the oddities of the universe and how important is the lack of knowledge of a system to be able to extract information from it (think of Schroedinger’s cat).

A network with direct and fixed thresholds doesn’t learn. Particles with known positions and velocities don’t compute. N-body systems with definite trajectories don’t exist.

The genetic code has some similarities to these models. Living beings have far more junk than genes in their chromosomes (reaching 98% of junk on human genome), but changes in the junk parts can often lead to invalid creatures. If junk within genes (introns) gets modified, the actual code (exons) could be split differently, leading to a completely new, dysfunctional, protein. Or, if you add start sequences (TATA-boxes) to non-coding region, some of them will be transcribed into whatever protein they could make, creating rubbish within cells, consuming resources or eventually killing the host.

But most of the non-coding DNA is also highly susceptible to changes, and that’s probably its most important function, adapted to the specific mutation rates of our planet and our defence mechanism against such mutations. For billions of years, the living beings on Earth have adapted that code. Each of us has a super-computer that can choose, by design, the best ratios for a giving scenario within a few generations, and create a whole new species or keep the current one adapted, depending on what’s more beneficial.

But not everyone is that patient…

Programming

Sadly, in my profession, chaos plays an important part, too.

As programs grow old, and programmers move on, a good part of the code becomes stale, creating dependencies that are hard to find, harder to fix. In that sense, programs are pretty much like the genetic code, the amount of junk increases over time, and that gives the program resistance against changes. The main problem with computing, that is not clear in genetics, is that the code that stays behind, is normally the code that no one wants to touch, thus, the ugliest and most problematic.

DNA transcriptors don’t care where the genes are, they find a start sequence and go on with their lives. Programmers, we believe, have free will and that gives them the right to choose where to apply a change. They can either work around the problem, making the code even uglier, or they can go on and try to fix the problem in the first place.

Non-programmers would quickly state that only lazy programmers would do the former, but more experienced ones will admit have done so on numerous occasions for different reasons. Good programmers would do that because fixing the real problem is so painful to so many other systems that it’s best to be left alone, and replace that part in the future (only they never will). Bad programmers are not just lazy, some of them really believe that’s the right thing to do (I met many like this), and that adds some more chaos into the game.

It’s not uncommon to try to fix a small problem, go more than half-way through and hit a small glitch on a separate system. A glitch that you quickly identify as being wrongly designed, so you, as any good programmer would do, re-design it and implement the new design, which is already much bigger than the fix itself. All tests pass, except the one, that shows you another glitch, raised by your different design. This can go on indefinitely.

Some changes are better done in packs, all together, to make sure all designs are consistent and the program behaves as it should, not necessarily as the tests say it would. But that’s not only too big for one person at one time, it’s practically impossible when other people are changing the program under your feet, releasing customer versions and changing the design themselves. There is a point where a refactoring is not only hard, but also a bad design choice.

And that’s when code become introns, and are seldom removed.

Networks

The power of networks is rising, slower than expected, though. For decades, people know about synergy, chaos and emergent behaviour, but it was only recently, with the quality and amount of information on global social interaction, that those topics are rising again in the main picture.

Twitter, Facebook and the like have risen so many questions about human behaviour, and a lot of research has been done to address those questions and, to a certain extent, answer them. Psychologists and social scientists knew for centuries that social interaction is greater than the sum of all parts, but now we have the tools and the data to prove it once and for all.

Computing clusters have being applied to most of the hard scientific problems for half a century (weather prediction, earthquake simulation, exhaustion proofs in graph theory). They also took on a commercial side with MapReduce and similar mechanisms that have popularised the distributed architectures, but that’s only the beginning.

On distributed systems of today, emergent behaviour is treated as a bug, that has to be eradicated. In the exact science of computing, locks and updates have to occur in the precise order they were programmed to, to yield the exact result one is expecting. No more, no less.

But to keep our system out of emergent behaviours, we invariably introduce emergent behaviour in our code. Multiple checks on locks and variables, different design choices for different components that have to work together and the expectancy of precise results or nothing, makes the number of lines of code grow exponentially. And, since that has to run fast, even more lines and design choices are added to avoid one extra operation inside a very busy loop.

While all this is justifiable, it’s not sustainable. In the long run (think decades), the code will be replaced or the product will be discontinued, but there is a limit to which a program can receive additional lines without loosing some others. And the cost of refactoring increases with the lifetime of a product. This is why old products don’t get too many updates, not because they’re good enough already, but because it’s impossible to add new features without breaking a lot others.

Distant future

As much as I like emergent behaviour, I can’t begin to fathom how to harness that power. Stochastic computing is one way and has been done with certain level of success here and here, but it’s far from easy to create a general logic behind it.

Unlike Turing machines, emergent behaviour comes from multiple sources, dressed in multiple disguises and producing far too much variety in results that can be accounted in one theory. It’s similar to string theory, where there are several variations of it, but only one M theory, the one that joins them all together. The problem is, nobody knows how this M theory looks like. Well, they barely know how the different versions of string theory look like, anyway.

In that sense, emergent theory is even further than string theory to be understood in its entirety. But I strongly believe that this is one way out of the conundrum we live today, where adding more features makes harder to add more features (like mass on relativistic speeds).

With stochastic computing there is no need of locks, since all that matter is the probability of an outcome, and where precise values do not make sense. There is also no need for NxM combination of modules and special checks, since the power is not in the computation themselves, but in the meta-computation, done by the state of the network, rather than its specific components.

But that, I’m afraid, I won’t see in my lifetime.

FreeCell puzzles solver API

This is a little pet project I did a while ago. It’s a FreeCell puzzle‘s solver API.

The idea is to provide a basic validation engine and board management (pretty much like my old chess validation), so people can write FreeCell solvers on top of it. It has basic board setup (of multiple sizes), movement legalisation, and a basic Solver class, which you must derive to create your own solvers.

There’s even a BruteFroceSolver that can solve a few small boards, and that gives you an idea on how to create your own solvers. However, the API is not clear enough yet that children could start playing with it, and that’s the real goal of this project: to get kids interested in solving complex optimisation problems in an easy way.

Freecell is a perfect game for it. Most boards can be solved (only a handful of them were proven – by exhaustion – not solvable), some movements can be rolled back and you can freely re-use cards that have already been placed into the foundations (their final destination) back in the game again.

It’s out of the scope of this project to produce a full-featured graphic interface for kids, but making the API easy enough so they understand the concepts without dragging themselves into idiosyncrasies of C++ is important.

Compiler optimisations

The reason why I did this was to make some of the optimisations compiler engineers have to do more appealing to non-compiler engineers or children with a taste for complex problems. But what does this have to do with compilers? The analogy is a bit far-fetching and somewhat reverse, but it’s interesting nevertheless and it was worth the shot.

Programming languages are transformed into graphs inside the compiler, which should represent the intentions of the original programmer. This graphs are often optimised multiple times until you end up with a stream of instructions representing the source code in the machine language.

Ignore for now the optimisations on those graphs, and focus on the final part: selecting machine instructions that can represent that final graph in the machine language (assembly). This selection can pick any assembly instruction at will, but it has to put them into a very specific order to represent the same semantics (not just syntactic) of the original program. Since many instructions have side effects, pipeline interactions, special flags set or cleaned up, it’s not trivial to produce correct code if you don’t check and re-check all conditions every time. This is a known complex optimisation problem and can be responsible for changes in speed or code size in orders of magnitude.

What does it have to do with the Freecell puzzle? Well, in Freecell, you have a number of cards and you have to put them in a specific order, just like assembly instructions. But in this case, the analogy is reverse: the “sequence” is trivial, but the “instructions” are hard to get.

There are other similarities. For example, you have four free cells, and they can only hold one value at a time. They are similar to registers, and manipulating them, gives you a good taste of how hard it is to do register scheduling when building the assembly result. But in this case, it’s much harder to spill (move the data back to memory, or in this case, card back to cascades), since there are strict rules on how to move cards around.

Reusing cards from the foundations is similar to expanding single instructions into a sequence of them in order to circumvent pipeline stalls. In real compilers you could expand a multiply+add (very useful for digital signal processing) into two instructions: multiply and add, if that gives you some advantage on special cases on special chips. In Freecell, you can use a 9 on top of a 10, to move an 8 from another cascade and free up a card that you need to clean up your freecells (registers).

I’m sure you can find many more similarities, even if you have to skew the rules a bit (or completely reverse them), but that’s not the point. The point is to interest people into complex optimisation techniques without the hassle of learning a whole new section of computer science, especially if that section puts fear in most people in the first place.

2011 European User Group Meeting

For all LLVM users in Europe that didn’t catch the announcement on the main list, be warned that we’re organising the European LLVM User Group Meeting in September, 16th, in London. All information necessary to register and submit your work is on the link.

It’s free of charge and we’ll also provide a happy-hour after the meeting for an extended networking. If you’re just curious about LLVM or work with it day to day or is thinking of starting your PhD with LLVM, please register and show up. You won’t regret. 😉

More information will be available on the link above and through the main mailing list. Stay tuned!

Task Driven Computing

Ever since Moore’s idea became a law (by providence), and empires were built upon this law, little has been thought about the need for such advancements. Raw power is considered to many the only real benchmark to what a machine can be compared to others. Cars, computers and toasters are all alike in those matters, and are only as good as their raw throughput (real or not).

With the carbon footprint disaster, some people began to realise (not for the correct reasons) that maybe we don’t actually need all that power to be happy. Electric cars, low-powered computers and smart-appliances are now appealing to the final consumer and, for good or bad, things are changing. The rocketing growth of the mobile market (smartphones, netbooks and tablets) in recent years is a good indicator that the easily seduced consumer mass has now being driven towards leaner, more efficient machines.

But, how lean are we ready to go? How much raw power are we willing to give away. In other words, how far goes the appeal that the media push on us to relinquish those rights bestowed by Moore? It seems not so much, with all chip companies fighting for a piece of the fat market (as well as the lean, but).

What is the question, anyway?

Ever since that became a trend, the question has always been: “how lean can we make our machine without impacting on usability?”. The focus so far has only been on creating smarter hardware, to a lesser extent (and only recently) reducing the unneeded fat of operating systems and applications, but no one ever touches the fundamental question: “Do we really need all that?“.

The questions is clearly cyclic. For example, you wouldn’t need a car if the public transport was decent. You wouldn’t need health insurance if the public health system was perfect, and so on. With computing is the same. If you rely on a text editor or a spreadsheet, it has to be fast and powerful, so you can finish your work on time (and not get fired). If you are a developer and have to re-compile your code every so often, you need a damn good (in CPU and memory) computer to make it as painless as possible. Having a slow computer can harm the creative process that involves all tasks around it, and degrade the quality of your work to an unknown quantity.

Or does it?

If you didn’t have to finish your work quicker, would you still work the same way? If you didn’t have to save your work, or install additional software, just because the system you’re working on only works on a particular type of computer (say, only available on your workplace). If you could perform tasks as tasks and not a whole sequence of meaningless steps and bureaucracy, would you still take that amount of time to finish your task?

Real world

Even though the real world is not that simple, one cannot take into account the whole reality on each investigation. Science just doesn’t work that way. To be effective, you take out all but one variable and test it. One by one, until you have a simplified picture, a model of reality. If on every step you include the whole world, the real world, in your simulations, you won’t get far.

There is one paper that touched some of these topics back in 2000, and little has changed since then. I dare to say that it actually got worse. With all these app stores competing for publicity and forcing incompatibility with invisible boundaries, has only made matters worse. It seems clear enough for me that the computing world, as far I can remember (early 80’s) was always like that and it’s not showing signs of change so far.

The excuse to keep doing the wrong thing (ie. not thinking clearly about what a decent system is) was always because “the real world is not that simple”, but in fact, the only limitation factor has been the greed of investors who cannot begin to understand that a decent system can bring more value (not necessarily money) than any quickly designed and delivered piece of software available today.

Back in the lab…

Because I don’t give a fig to what they think, I can go back to the lab and think clearly. Remove greed, profit and market from the table. Leave users, systems and what’s really necessary.

Computers were (much before Turing)  meant to solve specific problems. Today, general purpose computers create more problems than they solve, so let’s go back to what the problem is and lets try to solve it without any external context: Tasks.

A general purpose computer can perform a task in pretty much the same way as any other, after all, that’s why they’re called “general purpose”. So the system that runs on it is irrelevant, if it does not perform the task, it’s no good. A good example of that are web browsers. Virtually every browser can render a screen, and show surprisingly similar results. A bad example is a text editor, which most of them won’t even open another’s documents, and if they do, the former will do all in its power to make the result horrid in the latter.

Supposing tasks can be done seamlessly on any computer (lets assume web pages for the moment), than does the computer only computes that task, or is it doing other things as well?

All computers I know of will be running, even if broken, until they’re turned off. Some can increase and decrease their power consumption, but they’ll still be executing instructions to the world’s end. According to out least-work principle (to execute tasks), this is not particularly relevant, so we must take that out of our system.

Thus, such a computer can only execute when a task is requested, it must complete that task (and nothing else more), and stop (really, zero watts consumption) right after that.

But this is madness!

A particular task can take longer to execute, yes. It’ll be more difficult to execute simultaneous tasks, yes. You’ll spend more cycles per particular task than usual, yes! So, if you still thinking like Moore, than this is utter madness and you can stop reading right now.

Task Driven Computing

For those who are still with me, let me try to convince you. Around 80% of my smartphone’s battery is consumed by the screen. The rest is generally spend on background tasks (system daemons) and only about 5% on real tasks. So, if you could remove 95% of your system’s consumption, you could still take 20x more power consumption for your tasks and be even.

Note that I didn’t say “20x the time”, for that’s not necessarily true. The easiest way to run multiple tasks at the same time is to have multiple CPUs in a given system. Today that doesn’t scale too well because the operating systems have to control them all, and they all just keep running (even when idle) and wasting a huge amount of power for nothing.

But if your system is not designed to control anything, but to execute tasks, even though you’ll spend more time per task, you’ll have more CPUs working on tasks and less on background maintenance. Also, once the task is done, the CPU can literally shut down (I mean, zero watts) and wait for the next task. There is no idle cost, there is no operational code being run to multi-task or to protect memory or avoid race-conditions.

Problems

Of course, that’s not as easy as it sounds. Turning on and off CPUs is not that trivial, running tasks with no OS underneath (and expecting them to communicate) is not an easy task, and fitting multiple processors into a small chip is very expensive. But, as I said earlier, I’m not concerned with investors, market or money, I’m concerned with technology and it’s real purpose.

Also, the scaling is a real problem. Connection Machines were built and thrown away, clusters have peak performance way above their average performance levels, and multi-core systems are hard to work with. Part of that is real, the interconnection and communication parts, but the rest was artificially created by operating systems to solve new problems in an old way, just because it was cheaper, or quicker, or easier.

Back in the days…

I envy the time of the savants, when they had all the time and money in the world to solve the problems of nature. Today, the world is corrupted by money and even the most prominent minds in science are corrupt by it, trying to be the first to do such and such, protecting research from other peers just to claim a silly Nobel prize or to be world famous.

The laws of physics had led us into it, we live in the local minima of the least energetic configuration possible, and that’s here, now. To get our of any local minima we need a good kick, something that will take us out in a configuration of a more energetic configuration, but with enough luck, we’ll fall into another local minima that is less energetic than this one. Or, we we’re really the masters of the universe, maybe we can even live harmoniously in a place of local maxima, who knows!?

C++ class sizes #2

In my previous post about class sizes, we visited the empty classes and how their derived classes change in size. Now, we’re going to look what the ABI have to say about tail padding in inherited classes.

Again, for normal (POD) structures, everything is as it looks like, same as in C. But for more complex (non-POD) classes, things change a bit. The Itanium C++ ABI has a rather complex rule for defining the offset and size of class members, but each target-specific ABI (like ARM’s) can have additional rules that make it very complex to define the size of a class member.

Plain-Old-Data

So, first, a brief explanation of POD. POD is a concept in C++ that states, for the purpose of layout, a C++ structure (that is more than simple C structures) are just like plain old data. This normally means that the structure behaves like data, ie. it can be created on zero-initialized memory, copied with memcpy without any side effect, etc.

But not being a POD doesn’t always means you have a complex virtual class with multiple inheritance. Sometimes a simple change can transform it into a non-POD, like adding a simple empty constructor to the base class.

For example, in both C and C++, the structure below has 8 bytes:

struct Foo {
  int a;
  char b;
};

because the structure alignment is the same as the most aligned member, which is int. Assuming int is 32 bits, the second member has size one but aligns to four, yielding a size 8 for the final structure.

In C++, you can derive from that structure, and the derived class is normally implemented by having the base class as sort of a member inside it.

So, the C++ struct Bar below:

struct Foo {
	int a;
	char b;
};

struct Bar : Foo {
	char c;
};

is similar (for layout purposes) to a C construct:

struct Bar {
	struct {
		int a;
		char b;
	};
	char c;
};

So, both in C and in C++, the size of Bar is 12 bytes. Now, doing a slight change in Foo, we’re going to break that:

struct Foo {
	int a;
	char b;
	Foo() {}
};

struct Bar : Foo {
	char c;
};

You may be surprised to know that the size of Bar is still 8 bytes.

non-POD structures

Adding the constructor is one way of killing the POD-ness of a structure, but anything that adds some C++ flavour to it is enough. For instance, Bar itself is a non-POD, because it was derived from another structure. Adding simple methods (non constructors) is fine, but adding a virtual method makes it a non-POD (as you have to add the virtual table). Anything that leaves some room for implementation detail will be considered a violation of the POD contract and will transform your structure to a non-POD.

Ok, so what does that mean? If it just means your structures will be smaller, well, I’ll add a constructor to all my structures! But, no, the only change in in the tail padding. If you want to compress your structures, use packed structures, the tail padding rules are too complex to deal with them and still maintain your code clean.

So, the original Foo structure had 8 bytes. Four for the first int, Four for the char. But the char only used one of those four, so you have a 3-byte tail. When inheriting from a non-POD structure, the C++ ABI allows you to use the tail padding of the base class, providing your new type fits in there. In our example, the new type was char that fits in the 3-byte tail of the base class, so the final layout uses two of the four bytes and still have a 2-byte tail.

Further inheritance of the same type (adding a single char member) will still pack at the end of the base structure.

Bit-fields

Every time you think you have understood the rules of C++ layout, go back to bit-fields and you’ll always be surprised.

Lets have a look at this example for a moment:

struct Foo {
	char a;
	int b : 8;
	Foo(){}
};

struct Bar : Foo {
	int c : 8;
};

The first member is a char, so one byte. The second is an int, so 4 bytes and the final layout is 5 bytes, aligned to int goes to 8 bytes, right? Well, no. Even without the constructor, the size of that structure is 4 bytes.

Compilers are smart enough (and the ABI allows them to be) to figure that the member b will have only 8 bits, so there is no point in wasting 6 more bytes just to follow the declared type.

You may not be surprised, now, to know that the size of Bar is still 4 bytes, since the member c in Bar gets tail-padded into the base Foo.

But what about this:

struct Bar  {
	struct {
		char a;
		char a1;
		char a2;
		int b : 4;
	};
	int c : 4;
};

The three chars push the final member to the last byte in the block, of which only 4 bits are really used, so you have 4 bits of tail-padding. The second member c of Bar gets tail-padded into those four bits, since they fit, and the final size of that structure is 4 bytes.

I hope to have convinced you to never try to be smart and convert your structures and classes to arrays or pointers. Even disregarding virtual inheritance (which is probably coming in another post), the C++ ABI for structure layout is too complex to rely on and use it in your own code. Worse, because this is not in the standard, concrete implementations are free to change the details and sometimes, still following the generic ABI. You’ll find a few cases in the ARM ABI where they differ slightly from the ABI without actually being incompatible, sometimes not that lucky.

The final lesson is: be smart, use the language features and rely on the compiler to optimise things for you. If you’re not happy with the optimization, fill a bug rather than hack your code.

C++ class sizes

In a recent struggle to define the class size in C++, I thought would be good to share some of my findings, since there isn’t much about it on the net. The empty class size is all over the place, neatly put in Stroustroup’s FAQ, but the other weird cases were less common to find.

For obvious reasons, the C++ standard is pretty vague about class sizes and memory layout. This is more of an ABI issue than a language standard, but the problem is that even the Itanium C++ ABI is a bit vague on that point.

The C++ standard allows the memory layout to be defined by the ABI and position the class members on where it fits more naturally on the target, but there is one specific clause: an object cannot have zero size. While most layout decisions are target specific, this is strictly a language decision. If objects were allowed to have zero bytes in size, an array of 1000 zero-sized objects would occupy no space at all and it would be impossible to choose one from the other.

Empty classes

First, a background on empty classes. An empty class is somewhat useless in most cases, but there is one case in which it plays an important role: type safety.

If you want to group objects that have nothing in common (to use in an algorithm), or to force a specific class of template parameters, or even differentiate between exceptions, even if they don’t have any data on it (just to catch them differently), you can use an empty class and it does the job pretty well. Thus, using and deriving from empty classes is somewhat a common enterprise in C++.

Empty class size

So, what happens when you declare an empty type?

class Empty {
};

What is the size that objects of this type will have? It cannot be zero, as the standard forbids, so it has to be something. The best choice is one byte, since no architecture will benefit from having less (please, don’t think of bit-fields!) and if alignment is a problem, most (all?) compilers will adjust the byte to a word aligned block in memory.

Obviously, an empty class that derives from another empty class also has to follow the same rule:

class Empty {
};

class NowWhat : public Empty {
};

In this case, both empty and NowWhat have size 1. But if NowWhat has data on it, will the extra byte still be there? On both standard and ABI, there is nothing saying that it shouldn’t, but also nothing saying it should. What’s important here is the reason why you need the extra byte: the address of different objects must be different.

When the derived class already has data, its objects will already be at different locations when laid out in memory, so there is no necessity of the extra byte.

class Empty {
};

class NowWhat : public Empty {
  int a;
};

Now, Empty still has 1 byte, but NowWhat has 4, not 5. Some people consider this an optimization, I just think it’s following the rules by what the standard requires.

The reverse case is much simpler. If the derived class is the empty one (for whatever reason), the size is already non-zero, so the classes below have both the same sizes (4):

class Full {
	int a;
};

class Empty : public Full {
};

Multiple Inheritance

Multiple inheritance brings a bit of colour to this problem. The C++ standard requires that two objects always occupy different memory locations (so they compare differently by their pointers and don’t allow non-empty zero-sized arrays). So what is the size of the Derived class below?

class Empty {
};

class M1 : public Empty {
};

class M2 : public Empty {
};

class Derived : public M1, M2 {
};

Before, we could make the NowWhat class with only one byte because that would be sufficient to discriminate it in an array, but what about comparison. If you cast a Derived object to Empty via M1 you get a different (logical) object than if you do it via M2, so they must compare differently. For that, you need to add another byte, and return the first address when via one class and the second via the other.

Empty members

With single and multiple inheritance nailed down, let’s think of a more complicated case.

class Empty {
};

class Another : public Empty {
};

class NotSoMuch : public Empty {
	Another a;
};

What happening here is that the first item of the derived class is, in fact, of an empty class type, that also derives from the same class as the type.

See, this last bit is important, because if you replace Another a; by another empty class type (say Empty2), the size of the derived class will still be 1. But in this case above, this is not true. The size of NotSoMuch is actually 2.

The field Another a; has size one, as any empty class objects, so what’s the second byte for? The second byte is a padding, at the beginning of the class, to avoid type conflicts.

Type conflicts

When deriving from empty classes, the ABI (2.4-II-3) states that you should always try to put the derived members at zero offset and, in case of type conflict, move down (alignment-wise) until you don’t have any more type conflicts. The question is, what is a type conflict in this case? In the multiple inheritance, the type conflict was clear (the diamond inheritance), but here, not so much.

Since both the class and its first member can be converted to Empty types (as pointers, for instance), you can have two Empty objects that, if compared must return different addresses. In the code below, n is (a pointer to) the derived object. Itself is converted to an Empty pointer, stored in en.

	NotSoMuch *n = new NotSoMuch();
	Empty *en = (Empty*)n;
	Empty *ea = (Empty*)&n->a;

If the member Another a; was in the offset zero of the class, the pointer ea would be in the same address as en, and on a comparison for equality, it’d return true.

The type conflict in this case is that, while both en and ea are pointers to an Empty type, the former gets there via NotSoMuch and the latter via Another. According to the C++ standard, they must be different, thus, return different addresses.

Again, if the empty class member is not the first element, none of that happens. The example below has size 4 (instead of 5), because the two similar (but different) Empty pointers would now be separated by 4 bytes, thus not violating the standard.

class Empty {
};

class Another : public Empty {
};

class NotSoMuch : public Empty {
	int i;
	Another a;
};

Templates

Template code is also susceptible to this problem, of course, and the iostream library is full of such examples. The problem is not so much for the abuse of empty classes (that doesn’t happen that much in STL), it’s just that non-virtual classes that only have member functions and no data fields are considered empty classes. And since templates are a good replacement for most virtual inheritance (and its inherent run-time complexity and slowness), basic libraries have to make extensive use of templates to avoid slowing down every single C++ program.

It’s true that the iostream library is particularly heavy and cumbersome, but it could be much worse.

The additional problem with templates (for the compiler, at least), is figuring out if there is a type conflict. While templates are already expanded by the time the basic C++ compiler kicks in, it’s complicated to carry the information on the source code about inheritance graph to every single template variation and specialization. Not to mention the new standard, where C++ is getting a variadic templates mechanism

I hope this is one more reason for you to avoid C-style pointer casts (especially void*) and only use C++ static_cast and dynamic_cast. Even reinterpret_cast is dangerous, but at least you can grep them on your source files and identify them one by one.

Computer Science vs Software Engineering

The difference between science and engineering is pretty obvious. Physics is science, mechanics is engineering. Mathematics is (ahem) science, and building bridges is engineering. Right?

Well, after several years in science and far too much time in software engineering that I was hoping to tell my kids when they grow up, it seems that people’s beliefs are much more exacerbated about the difference, if there’s any, than their own logic seems to imply.

Beliefs

General beliefs that science is more abstract fall apart really quickly when you compare maths to physics. There are many areas of maths (statistics, for example) that are much more realistic and real world than many parts of physics (like string theory and a good part of cosmology). Nevertheless, most scientists will turn their noses up at or anything that resembles engineering.

From different points of view (biology, chemistry, physics and maths), I could see that there isn’t a consensus on what people really consider a less elaborate task, not even among the same groups of scientists. But when faced with a rejection by one of their colleagues, the rest usually agree on it. I came to the conclusion that the psychology of belonging to a group was more important than personal beliefs or preferences. One would expect that from young schoolgirls, not from professors and graduate students. But regardless of the group behaviour, there still is that feeling that tasks such as engineering (whatever that is) are mundane, mechanical and more detrimental to the greater good than science.

Real World

On the other side of the table, the real world, there are people doing real work. It generally consists of less thinking, more acting and getting things done. You tend to use tables and calculators rather than white boards and dialogue, your decisions are much more based on gut feelings and experience than over-zealously examining every single corner case and the result of your work is generally more compact and useful to the every-day person.

From that perspective, (what we’re calling) engineers have a good deal of prejudice towards (what we called) scientists. For instance, the book Real World Haskell is a great pun from people that have one foot on each side of this battle (but are leaning towards the more abstract end of it). In the commercial world, you don’t have time to analyse every single detail, you have a deadline, do what you can with that and buy insurance for the rest.

Engineers also produce better results than scientists. Their programs are better structured, more robust and efficient. Their bridges, rockets, gadgets and medicines are far more tested, bullet-proofed and safe than any scientist could ever hope to do. It is a misconception that software engineers have the same experience than an academic with the same time coding, as is a misconception that engineers could as easily develop prototypes that would revolutionise their industry.

But even on engineering, there are tasks and tasks. Even loathing scientists, those engineers that perform a more elaborate task (such as massive bridges, ultra-resistant synthetic materials, operating systems) consider themselves above the mundane crowd of lesser engineers (building 2-bed flats in the outskirts of Slough). So, even here, the more abstract, less fundamental jobs are taken at a higher level than the more essential and critical to society.

Is it true, then, that the more abstract and less mundane a task is, the better?

Computing

Since the first thoughts on general purpose computing, there is this separation of the intangible generic abstraction and the mundane mechanical real world machine. Leibniz developed the binary numeral system, compared the human brain to a machine and even had some ideas on how to develop one, someday, but he ended up creating some general-purpose multipliers (following Pascal’s design for the adder).

Leibniz would have thrilled in the 21th century. Lots of people in the 20th with the same mindset (such as Alan Turin) did so much more, mainly because of the availability of modern building techniques (perfected for centuries by engineers). Babbage is another example: he developed his differential machine for years and when he failed (more by arrogance than anything else), his analytical engine (far more elegant and abstract) has taken his entire soul for another decade. When he realised he couldn’t build it in that century, he perfected his first design (reduced the size 3 times) and made a great specialist machine… for engineers.

Mathematicians and physicists had to do horrible things (such as astrology and alchemy) to keep their pockets full and, in their spare time, do a bit of real science. But in this century this is less important. Nowadays, even if you’re not a climate scientist, you can get a good budget for very little real applicability (check NASA’s funded projects, for example). The number of people working in string theory or trying to prove the Riemann hypothesis is a clear demonstration of that.

But computing is still not there yet. We’re still doing astrology and alchemy for a living and hoping to learn the more profound implications of computing on our spare time. Well, some of us at least. And that comes to my point…

There is no computer science… yet

The beginning of science was marked by philosophy and dialogue. 2000 years later, man kind was still doing alchemy, trying to prove the Sun was the centre of the solar system (and failing). Only 200 years after that that people really started doing real science, cleansing themselves from private funding and focusing on real science. But computer science is far from it…

Most computer science courses I’ve seen teach a few algorithms, an object oriented language (such as Java) and a few courses on current technologies (such as databases, web development and concurrency). Very few of them really teach about Turin machines, group theory, complex systems, other forms of formal logic and alternatives to the current models. Moreover, the number of people doing real science on computing (given what appears on arXiv or news aggregation sites such as Ars Technica or Slashdot) is probably less than the number of people working with string theory or wanting a one-way trip to Mars.

So, what do PHDs do in computer science? Well, novel techniques on some old school algorithms are always a good choice, but the recent favourite has been breaking the security of the banking system or re-writing the same application we all already have, but for the cloud. Even the more interesting dissertations like memory models in concurrent systems, energy efficient gate designs are all commercial applications at most.

After all, PHDs can get a lot more money in the industry than remaining at the universities, and doing your PHD towards some commercial application can guarantee you a more senior position as a start in such companies than something completely abstract. So, now, to be honestly blunt, we are all doing alchemy.

Interesting engineering

Still, that’s not to say that there aren’t interesting jobs in software engineering. I’m lucky to be able to work with compilers (especially because it also involves the amazing LLVM), and there are other jobs in the industry that are as interesting as mine. But all of them are just the higher engineering, the less mundane rocket science (that has nothing of science). But all in all, software engineering is a very boring job.

You cannot code freely, ignore the temporary bugs, ask the user to be nice and have a controlled input pattern. You need a massive test infrastructure, quality control, standards (which are always tedious), and well documented interfaces. All that gets in the way of real innovation, it makes any attempt of doing innovation in a real company a mere exercise of futility and a mild source of fun.

This is not exclusive of the software industry, of course. In the pharmaceutical industry there is very little innovation. They do develop new drugs, but using the same old methods. They do need to get new medicines, more powerful out of the door quickly, but the massive amount of tests and regulation they have to follow is overwhelming (this is why they avoid as much as possible doing it right, so don’t trust them!). Nevertheless, there are very interesting positions in that industry as well.

When, then?

Good question. People are afraid of going out of their area of expertise, they feel exposed and ridiculed, and quickly retract to their comfort area. The best thing that can happen to a scientist, in my opinion, is to be proven wrong. For me, there is nothing worse than being wrong and not knowing. Not many people are like that, and the fear of failure is what keeps the industry (all of them) in the real world, with real concerns (this is good, actually).

So, as far as the industry drives innovation in computing, there will be no computer science. As long as the most gifted software engineers are mere employees in the big corporations, they won’t try, to avoid failure, as that could cost them their jobs. I’ve been to a few companies and heard about many others that have a real innovation centre, computer laboratory or research department, and there isn’t a single one of them that actually is bold enough to change computing at its core.

Something that IBM, Lucent and Bell labs did in the past, but probably don’t do it any more these days. It is a good twist of irony, but the company that gets closer to software science today is Microsoft, in its campus in Cambridge. What happened to those great software teams of the 70’s? Could those companies really afford real science, or were them just betting their petty cash in case someone got lucky?

I can’t answer those questions, nor if it’ll ever be possible to have real science in the software industry. But I do plea to all software people to think about this when they teach at university. Please, teach those kids how to think, defy the current models, challenge the universality of the Turin machine, create a new mathematics and prove Gödel wrong. I know you won’t try (by hubris and self-respect), but they will, and they will fail and after so many failures, something new can come up and make the difference.

There is nothing worse than being wrong and not knowing it…