Going ARM

Last week I attended the Arm Research Summit, and on Tuesday there was a panel session on using Arm hardware for performance, an effort known as GoingArm. You can watch the whole panel on youtube, from 2:12 onward (~1h30). All videos are available on the website.

It was interesting to hear the opinions of people in the front lines or ARM HPC, as much as the types of questions that came, which made me believe even more strongly that now is the time to re-think about general computing in general. For a long time we have been coasting on the fact that cranking up the power and squeezing down the components were making the lives of programmers easier. Scaling up to multiple cores was probably the only big change most people had to deal with, and even so, not many people can actually take real advantages of it.

Continue reading “Going ARM”

Trashing Chromebooks

At Linaro, we do lots of toolchain tests: GCC, LLVM, binutils, libraries and so on. Normally, you’d find a fast machine where you could build toolchains and run all the tests, integrated with some dispatch mechanism (like Jenkins). Normally, you’d have a vast choice of hardware to chose from, for each different form-factor (workstation, server, rack mount) and you’d pick the fastest CPUs and a fast SSD disk with space enough for the huge temporary files that toolchain testing produces.

tcwg-rack

The only problem is, there aren’t any ARM rack-servers or workstations. In the ARM world, you either have many cheap development boards, or one very expensive (100x more) professional development board. Servers, workstations and desktops are still non-existent. Some have tried (Calxeda, for ex.) but they have failed. Others are trying with ARMv8 (the new 32/64-bit architecture), but all of them are under heavy development, so not even Alpha quality.

Meanwhile, we need to test the toolchain, and we have been doing it for years, so waiting for a stable ARM server was not an option and still isn’t. A year ago I took the task of finding the most stable development board that is fast enough for toolchain testing and fill a rack with it. Easier said than done.

Choices

Amongst the choices I had, Panda, Beagle, Arndale and Odroid boards were the obvious candidates. After initial testing, it was clear that Beagles, with only 500MB or RAM, were not able to compile anything natively without some major refactoring of the build systems involved. So, while they’re fine for running remote tests (SSH execution), they have very little use for anything else related to toolchain testing.

panda

Pandas, on the other hand, have 1GB or RAM and can compile any toolchain product, but the timing is a bit on the wrong side. Taking 5+ hours to compile a full LLVM+Clang build, a full bootstrap with testing would take a whole day. For background testing on the architecture, it’s fine, but for regression tracking and investigative work, they’re useless.

With the Arndales, we haven’t had such luck. They’re either unstable or deprecated months after release, which makes it really hard to acquire them in any meaningful volumes for contingency and scalability plans. We were left then, with the Odroids.

arndale

HardKernel makes very decent boards, with fast quad-A9 and octa-A15 chips, 2GB of RAM and a big heat sink. Compilation times were in the right ball park (40~80 min) so they’re good for both regression catching and bootstrapping toolchains. But they had the same problem as every other board we tried: instability under heavy load.

Development boards are built for hobby projects and prototyping. They normally can get at very high frequencies (1~2 GHz) and are normally designed for low-power, stand-by usage most of the time. But toolchain testing involves building the whole compiler and running the full test-suite on every commit, and that puts it on 100% CPU usage, 24/7. Since the build times are around an hour or more, by the time that the build finishes, other commits have gone through and need to be tested, making it a non-stop job.

CPUs are designed to scale down the frequency when they get too hot, so throughout the normal testing, they stay stable at their operating temperatures (~60C), and adding a heat sink only makes it go further on frequency and keeping the same temperature, so it won’t solve the temperature problem.

The issue is that, after running for a while (a few hours, days, weeks), the compilation jobs start to fail randomly (the infamous “internal compiler error”) in different places of different files every time. This is clearly not a software problem, but if it were the CPU’s fault, it’d have happened a lot earlier, since it reaches the operating temperature seconds after the test starts, and only fails hours or days after they’re running full time. Also, that same argument rules out any trouble in the power supply, since it should have failed in the beginning, not days later.

The problem that the heat sink doesn’t solve, however, is the board’s overall temperature, which gets quite hot (40C~50C), and has negative effects on other components, like the SD reader and the card itself, or the USB port and the stick itself. Those boards can’t boot from USB, so we must use SD cards for the system, and even using a USB external hard drive with a powered USB hub, we still see the failures, which hints that the SD card is failing under high load and high temperatures.

According to SanDisk, their SD cards should be ok on that temperature range, but other parties might be at play, like the kernel drivers (which aren’t build for that kind of load). What pointed me to the SD card is the first place was that when running solely on the SD card (for system and build directories), the failures appear sooner and more often than when running the builds on a USB stick or drive.

Finally, with the best failure rate at 1/week, none of those boards are able to be build slaves.

Chromebook

That’s when I found the Samsung Chromebook. I had one for personal testing and it was really stable, so amidst all that trouble with the development boards, I decided to give it a go as a buildbot slave, and after weeks running smoothly, I had found what I was looking for.

The main difference between development boards and the Chromebook is that the latter is a product. It was tested not just for its CPU, or memory, but as a whole. Its design evolved with the results of the tests, and it became more stable as it progressed. Also, Linux drivers and the kernel were made to match, fine tuned and crash tested, so that it could be used by the worst kind of users. As a result, after one and a half years running Chromebooks as buildbots, I haven’t been able to make them fail yet.

But that doesn’t mean I have stopped looking for an alternative. Chromebooks are laptops, and as such, they’re build with a completely different mindset to a rack machine, and the number of modifications to make it fit the environment wasn’t short. Rack machines need to boot when powered up, give 100% of its power to the job and distribute heat efficiently under 100% load for very long periods of time. Precisely the opposite of a laptop design.

Even though they don’t fail the jobs, they did give me a lot of trouble, like having to boot manually, overheating the batteries and not having an easy way to set up a Linux image easily deployable via network boot. The steps to fix those issues are listed below.

WARNING: Anything below will void your warranty. You have been warned.

System settings

To get your Chromebook to boot anything other than ChromeOS, you need to enter developer mode. With that, you’ll be able not only to boot from SD or USB, but also change your partition and have sudo access on ChromeOS.

With that, you go to the console (CTRL+ALT+->), login with user chronos (no password) and set the boot process as described on the link above. You’ll also need to set sudo crossystem dev_boot_signed_only=0 to be able to boot anything you want.

The last step is to make your Linux image boot by default, so when you power up your machine it boots Linux, not ChromeOS. Otherwise, you’ll have to press CTRL+U every boot, and remote booting via PDUs will be pointless. You do that via cgpt.

You need to find the partition that boots on your ChromeOS by listing all of them and seeing which one booted successfully:

$ sudo cgpt show /dev/mmcblk0

The right partition will have the information below appended to the output:

Attr: priority=0 tries=5 successful=1

If it had tries, and was successful, this is probably your main partition. Move it back down the priority order (6-th place) by running:

$ sudo cgpt add -i [part] -P 6 -S 1 /dev/mmcblk0

And you can also set the SD card’s part to priority 0 by doing the same thing over mmcblk1

With this, installing a Linux on an SD card might get you booting Linux by default on next boot.

Linux installation

You can chose a few distributions to run on the Chromebooks, but I have tested both Ubuntu and Arch Linux, which work just fine.

Follow those steps and insert the SD card in the slot and boot. You should get the Developer Mode screen and waiting for long enough, it should beep and boot directly on Linux. If it doesn’t, means your cgpt meddling was unsuccessful (been there, done that) and will need a bit more fiddling. You can press CTRL+U for now to boot from the SD card.

After that, you should have complete control of the Chromebook, and I recommend adding your daemons and settings during the boot process (inid.d, systemd, etc). Turn on the network, start the SSD daemon and other services you require (like buildbots). It’s also a good idea to change the governor to performance, but only if you’re going to use it for full time heavy load, and especially if you’re going to run benchmarks. But for the latter, you can do that on demand, and don’t need to leave it on during boot time.

To change the governor:
$ echo [scale] | sudo tee /sys/bus/cpu/devices/cpu[N]/cpufreq/scaling_governor

scale above can be one of performance, conservative, ondemand (default), or any other governor that your kernel supports. If you’re doing before benchmarks, switch to performance and then back to ondemand. Use cpuN as the CPU number (starts on 0) and do it for all CPUs, not just one.

Other interesting scripts are to get the temperatures and frequencies of the CPUs:

$ cat thermal
#!/usr/bin/env bash
ROOT=/sys/devices/virtual/thermal
for dir in $ROOT/*/temp; do
temp=`cat $dir`
temp=`echo $temp/1000 | bc -l | sed 's/0\+$/0/'`
device=`dirname $dir`
device=`basename $device`
echo "$device: $temp C"
done

$ cat freq
#!/usr/bin/env bash
ROOT=/sys/bus/cpu/devices
for dir in $ROOT/*; do
if [ -e $dir/cpufreq/cpuinfo_cur_freq ]; then
freq=`sudo cat $dir/cpufreq/cpuinfo_cur_freq`
freq=`echo $freq/1000000 | bc -l | sed 's/0\+$/0/'`
echo "`basename $dir`: $freq GHz"
fi
done

Hardware changes

batteries

As expected, the hardware was also not ready to behave like a rack server, so some modifications are needed.

The most important thing you have to do is to remove the battery. First, because you won’t be able to boot it remotely with a PDU if you don’t, but more importantly, because the head from constant usage will destroy the battery. Not just as in make it stop working, which we don’t care, but it’ll slowly release gases and bloat the battery, which can be a fire hazard.

To remove the battery, follow the iFixit instructions here.

Another important change is to remove the lid magnet that tells the Chromebook to not boot on power. The iFixit post above doesn’t mention it, bit it’s as simple as prying the monitor bezel open with a sharp knife (no screws), locating the small magnet on the left side and removing it.

Stability

With all these changes, the Chromebook should be stable for years. It’ll be possible to power cycle it remotely (if you have such a unit), boot directly into Linux and start all your services with no human intervention.

The only think you won’t have is serial access to re-flash it remotely if all else fails, as you can with most (all?) rack servers.

Contrary to common sense, the Chromebooks are a lot better as build slaves are any development board I ever tested, and in my view, that’s mainly due to the amount of testing that it has gone through, given that it’s a consumer product. Now I need to test the new Samsung Chromebook 2, since it’s got the new Exynos Octa.

Conclusion

While I’d love to have more options, different CPUs and architectures to test, it seems that the Chromebooks will be the go to machine for the time being. And with all the glory going to ARMv8 servers, we may never see an ARMv7 board to run stably on a rack.

Emergent behaviour

There is a lot of attention to emergent behaviour nowadays (ex. here, here, here and here), but it’s still on the outskirts of science and computing.

Science

For millennia, science has isolated each single behaviour of a system (or system of systems) to study it in detail, than join them together to grasp the bigger picture. The problem is that, this approximation can only be done with simple systems, such as the ones studied by Aristotle, Newton and Ampere. Every time scientists were approaching the edges of their theories (including those three), they just left as an exercise to the reader.

Newton has foreseen relativity and the possible lack of continuity in space and time, but he has done nothing to address that. Fair enough, his work was much more important to science than venturing throughout the unknowns of science, that would be almost mystical of him to try (although, he was an alchemist). But more and more, scientific progress seems to be blocked by chaos theory, where you either unwind the knots or go back to alchemy.

Chaos theory exists for more than a century, but it was only recently that it has been applied to anything outside differential equations. The hyper-sensibility of the initial conditions is clear on differential systems, but other systems have a less visible, but no less important, sensibility. We just don’t see it well enough, since most of the other systems are not as well formulated as differential equations (thanks to Newton and Leibniz).

Neurology and the quest for artificial intelligence has risen a strong interest in chaos theory and fractal systems. The development in neural networks has shown that groups and networks also have a fundamental chaotic nature, but more importantly, that it’s only though the chaotic nature of those systems that you can get a good amount of information from it. Quantum mechanics had the same evolution, with Heisenberg and Schroedinger kicking the ball first on the oddities of the universe and how important is the lack of knowledge of a system to be able to extract information from it (think of Schroedinger’s cat).

A network with direct and fixed thresholds doesn’t learn. Particles with known positions and velocities don’t compute. N-body systems with definite trajectories don’t exist.

The genetic code has some similarities to these models. Living beings have far more junk than genes in their chromosomes (reaching 98% of junk on human genome), but changes in the junk parts can often lead to invalid creatures. If junk within genes (introns) gets modified, the actual code (exons) could be split differently, leading to a completely new, dysfunctional, protein. Or, if you add start sequences (TATA-boxes) to non-coding region, some of them will be transcribed into whatever protein they could make, creating rubbish within cells, consuming resources or eventually killing the host.

But most of the non-coding DNA is also highly susceptible to changes, and that’s probably its most important function, adapted to the specific mutation rates of our planet and our defence mechanism against such mutations. For billions of years, the living beings on Earth have adapted that code. Each of us has a super-computer that can choose, by design, the best ratios for a giving scenario within a few generations, and create a whole new species or keep the current one adapted, depending on what’s more beneficial.

But not everyone is that patient…

Programming

Sadly, in my profession, chaos plays an important part, too.

As programs grow old, and programmers move on, a good part of the code becomes stale, creating dependencies that are hard to find, harder to fix. In that sense, programs are pretty much like the genetic code, the amount of junk increases over time, and that gives the program resistance against changes. The main problem with computing, that is not clear in genetics, is that the code that stays behind, is normally the code that no one wants to touch, thus, the ugliest and most problematic.

DNA transcriptors don’t care where the genes are, they find a start sequence and go on with their lives. Programmers, we believe, have free will and that gives them the right to choose where to apply a change. They can either work around the problem, making the code even uglier, or they can go on and try to fix the problem in the first place.

Non-programmers would quickly state that only lazy programmers would do the former, but more experienced ones will admit have done so on numerous occasions for different reasons. Good programmers would do that because fixing the real problem is so painful to so many other systems that it’s best to be left alone, and replace that part in the future (only they never will). Bad programmers are not just lazy, some of them really believe that’s the right thing to do (I met many like this), and that adds some more chaos into the game.

It’s not uncommon to try to fix a small problem, go more than half-way through and hit a small glitch on a separate system. A glitch that you quickly identify as being wrongly designed, so you, as any good programmer would do, re-design it and implement the new design, which is already much bigger than the fix itself. All tests pass, except the one, that shows you another glitch, raised by your different design. This can go on indefinitely.

Some changes are better done in packs, all together, to make sure all designs are consistent and the program behaves as it should, not necessarily as the tests say it would. But that’s not only too big for one person at one time, it’s practically impossible when other people are changing the program under your feet, releasing customer versions and changing the design themselves. There is a point where a refactoring is not only hard, but also a bad design choice.

And that’s when code become introns, and are seldom removed.

Networks

The power of networks is rising, slower than expected, though. For decades, people know about synergy, chaos and emergent behaviour, but it was only recently, with the quality and amount of information on global social interaction, that those topics are rising again in the main picture.

Twitter, Facebook and the like have risen so many questions about human behaviour, and a lot of research has been done to address those questions and, to a certain extent, answer them. Psychologists and social scientists knew for centuries that social interaction is greater than the sum of all parts, but now we have the tools and the data to prove it once and for all.

Computing clusters have being applied to most of the hard scientific problems for half a century (weather prediction, earthquake simulation, exhaustion proofs in graph theory). They also took on a commercial side with MapReduce and similar mechanisms that have popularised the distributed architectures, but that’s only the beginning.

On distributed systems of today, emergent behaviour is treated as a bug, that has to be eradicated. In the exact science of computing, locks and updates have to occur in the precise order they were programmed to, to yield the exact result one is expecting. No more, no less.

But to keep our system out of emergent behaviours, we invariably introduce emergent behaviour in our code. Multiple checks on locks and variables, different design choices for different components that have to work together and the expectancy of precise results or nothing, makes the number of lines of code grow exponentially. And, since that has to run fast, even more lines and design choices are added to avoid one extra operation inside a very busy loop.

While all this is justifiable, it’s not sustainable. In the long run (think decades), the code will be replaced or the product will be discontinued, but there is a limit to which a program can receive additional lines without loosing some others. And the cost of refactoring increases with the lifetime of a product. This is why old products don’t get too many updates, not because they’re good enough already, but because it’s impossible to add new features without breaking a lot others.

Distant future

As much as I like emergent behaviour, I can’t begin to fathom how to harness that power. Stochastic computing is one way and has been done with certain level of success here and here, but it’s far from easy to create a general logic behind it.

Unlike Turing machines, emergent behaviour comes from multiple sources, dressed in multiple disguises and producing far too much variety in results that can be accounted in one theory. It’s similar to string theory, where there are several variations of it, but only one M theory, the one that joins them all together. The problem is, nobody knows how this M theory looks like. Well, they barely know how the different versions of string theory look like, anyway.

In that sense, emergent theory is even further than string theory to be understood in its entirety. But I strongly believe that this is one way out of the conundrum we live today, where adding more features makes harder to add more features (like mass on relativistic speeds).

With stochastic computing there is no need of locks, since all that matter is the probability of an outcome, and where precise values do not make sense. There is also no need for NxM combination of modules and special checks, since the power is not in the computation themselves, but in the meta-computation, done by the state of the network, rather than its specific components.

But that, I’m afraid, I won’t see in my lifetime.

Task Driven Computing

Ever since Moore’s idea became a law (by providence), and empires were built upon this law, little has been thought about the need for such advancements. Raw power is considered to many the only real benchmark to what a machine can be compared to others. Cars, computers and toasters are all alike in those matters, and are only as good as their raw throughput (real or not).

With the carbon footprint disaster, some people began to realise (not for the correct reasons) that maybe we don’t actually need all that power to be happy. Electric cars, low-powered computers and smart-appliances are now appealing to the final consumer and, for good or bad, things are changing. The rocketing growth of the mobile market (smartphones, netbooks and tablets) in recent years is a good indicator that the easily seduced consumer mass has now being driven towards leaner, more efficient machines.

But, how lean are we ready to go? How much raw power are we willing to give away. In other words, how far goes the appeal that the media push on us to relinquish those rights bestowed by Moore? It seems not so much, with all chip companies fighting for a piece of the fat market (as well as the lean, but).

What is the question, anyway?

Ever since that became a trend, the question has always been: “how lean can we make our machine without impacting on usability?”. The focus so far has only been on creating smarter hardware, to a lesser extent (and only recently) reducing the unneeded fat of operating systems and applications, but no one ever touches the fundamental question: “Do we really need all that?“.

The questions is clearly cyclic. For example, you wouldn’t need a car if the public transport was decent. You wouldn’t need health insurance if the public health system was perfect, and so on. With computing is the same. If you rely on a text editor or a spreadsheet, it has to be fast and powerful, so you can finish your work on time (and not get fired). If you are a developer and have to re-compile your code every so often, you need a damn good (in CPU and memory) computer to make it as painless as possible. Having a slow computer can harm the creative process that involves all tasks around it, and degrade the quality of your work to an unknown quantity.

Or does it?

If you didn’t have to finish your work quicker, would you still work the same way? If you didn’t have to save your work, or install additional software, just because the system you’re working on only works on a particular type of computer (say, only available on your workplace). If you could perform tasks as tasks and not a whole sequence of meaningless steps and bureaucracy, would you still take that amount of time to finish your task?

Real world

Even though the real world is not that simple, one cannot take into account the whole reality on each investigation. Science just doesn’t work that way. To be effective, you take out all but one variable and test it. One by one, until you have a simplified picture, a model of reality. If on every step you include the whole world, the real world, in your simulations, you won’t get far.

There is one paper that touched some of these topics back in 2000, and little has changed since then. I dare to say that it actually got worse. With all these app stores competing for publicity and forcing incompatibility with invisible boundaries, has only made matters worse. It seems clear enough for me that the computing world, as far I can remember (early 80’s) was always like that and it’s not showing signs of change so far.

The excuse to keep doing the wrong thing (ie. not thinking clearly about what a decent system is) was always because “the real world is not that simple”, but in fact, the only limitation factor has been the greed of investors who cannot begin to understand that a decent system can bring more value (not necessarily money) than any quickly designed and delivered piece of software available today.

Back in the lab…

Because I don’t give a fig to what they think, I can go back to the lab and think clearly. Remove greed, profit and market from the table. Leave users, systems and what’s really necessary.

Computers were (much before Turing)  meant to solve specific problems. Today, general purpose computers create more problems than they solve, so let’s go back to what the problem is and lets try to solve it without any external context: Tasks.

A general purpose computer can perform a task in pretty much the same way as any other, after all, that’s why they’re called “general purpose”. So the system that runs on it is irrelevant, if it does not perform the task, it’s no good. A good example of that are web browsers. Virtually every browser can render a screen, and show surprisingly similar results. A bad example is a text editor, which most of them won’t even open another’s documents, and if they do, the former will do all in its power to make the result horrid in the latter.

Supposing tasks can be done seamlessly on any computer (lets assume web pages for the moment), than does the computer only computes that task, or is it doing other things as well?

All computers I know of will be running, even if broken, until they’re turned off. Some can increase and decrease their power consumption, but they’ll still be executing instructions to the world’s end. According to out least-work principle (to execute tasks), this is not particularly relevant, so we must take that out of our system.

Thus, such a computer can only execute when a task is requested, it must complete that task (and nothing else more), and stop (really, zero watts consumption) right after that.

But this is madness!

A particular task can take longer to execute, yes. It’ll be more difficult to execute simultaneous tasks, yes. You’ll spend more cycles per particular task than usual, yes! So, if you still thinking like Moore, than this is utter madness and you can stop reading right now.

Task Driven Computing

For those who are still with me, let me try to convince you. Around 80% of my smartphone’s battery is consumed by the screen. The rest is generally spend on background tasks (system daemons) and only about 5% on real tasks. So, if you could remove 95% of your system’s consumption, you could still take 20x more power consumption for your tasks and be even.

Note that I didn’t say “20x the time”, for that’s not necessarily true. The easiest way to run multiple tasks at the same time is to have multiple CPUs in a given system. Today that doesn’t scale too well because the operating systems have to control them all, and they all just keep running (even when idle) and wasting a huge amount of power for nothing.

But if your system is not designed to control anything, but to execute tasks, even though you’ll spend more time per task, you’ll have more CPUs working on tasks and less on background maintenance. Also, once the task is done, the CPU can literally shut down (I mean, zero watts) and wait for the next task. There is no idle cost, there is no operational code being run to multi-task or to protect memory or avoid race-conditions.

Problems

Of course, that’s not as easy as it sounds. Turning on and off CPUs is not that trivial, running tasks with no OS underneath (and expecting them to communicate) is not an easy task, and fitting multiple processors into a small chip is very expensive. But, as I said earlier, I’m not concerned with investors, market or money, I’m concerned with technology and it’s real purpose.

Also, the scaling is a real problem. Connection Machines were built and thrown away, clusters have peak performance way above their average performance levels, and multi-core systems are hard to work with. Part of that is real, the interconnection and communication parts, but the rest was artificially created by operating systems to solve new problems in an old way, just because it was cheaper, or quicker, or easier.

Back in the days…

I envy the time of the savants, when they had all the time and money in the world to solve the problems of nature. Today, the world is corrupted by money and even the most prominent minds in science are corrupt by it, trying to be the first to do such and such, protecting research from other peers just to claim a silly Nobel prize or to be world famous.

The laws of physics had led us into it, we live in the local minima of the least energetic configuration possible, and that’s here, now. To get our of any local minima we need a good kick, something that will take us out in a configuration of a more energetic configuration, but with enough luck, we’ll fall into another local minima that is less energetic than this one. Or, we we’re really the masters of the universe, maybe we can even live harmoniously in a place of local maxima, who knows!?

Probabilistic processor and the end of locks

Every now and then, when I’m bored with normal work, I tend to my personal projects. For some reason, that tends to be mid-year, every year. So, here we go…

Old ideas, new style

This time was an old time project, to break the parallelization barrier without the use of locks. I knew simple boolean logic on regular Turing machines wouldn’t do, as they rely on the fact that the tape is only written by one head at a time. If you have more than one head writing to the same place, the behaviour is undefined. I’ve looked into several other technologies…

Some were just variations of Turing machines, like molecular, ternary and atomtronic computers. Others were simply interesting toys, like the programmable water, but there were two promising classes: quantum computers (taking a bit too long to impress) and non-deterministic Turing machine (NTM), that could get around some concurrency problems quite well.

Quantum computers are nice, but they’re more like vector computing on steroids. You can operate on several qbits at once, maybe even use non-locality to speed up communications, but until all that reaches the size of a small building, we need something different than yet-another n-core frying-pan Intel design. That may be non-deterministic machines…

Non-deterministic Turing machines are exactly like their deterministic counterparts, but the decision process can take different paths given the same input. So, instead of

if (a == 10) do_something;

you have

if (probability(a) > threshold) do_something;.

The problem with that is that you still can’t ignore different processes writing to the same data, since if a process happens to read it (even if by chance), the data you get from it is still being concurrently changed, therefore, still undefined. In other words, even if the state change is probabilistic, the data is still deterministic.

So, how can you make sure that, no matter how many processes are writing to a piece of data at random times, you still have meaningful data?

Well, if the reading part is not interested into one particular value of that data, but say, only in the probability of that data be a certain value, then it doesn’t matter in which order the writes are performed, as long as they’re all writing about the same thing. That can be viewed as a non-deterministic register, not a non-deterministic Turing machine. You can still have a deterministic machine reading and writing on non-deterministic registers, that you still won’t need locks for those values.

Probabilistic registers

Suppose you want to calculate an integral. You can use several very good deterministic algorithms (like the Romberg’s method), but if you create producers to get the values and consumers to reduce it to a sum, you’ll need a synchronized queue (ie. locks) to make sure the consumer is getting ALL the values. If you loose some values, you might get a completely wrong result.

If instead, you create random generators, getting the value of any (multi-dimensional) function from random values, and accumulate them into a non-deterministic register (say by updating the average of the last values, like recharging a capacitor that keeps discharging), when the consumer process read the value from it, it’ll get an average of the previously written values. Because the writes are all random, and if you keep only the average of the last N writes, you know how much data is being filled in and what’s the average value for that block.

In essence, you’re not storing any particular value, but information enough to re-create the original data. Since specific values are not important in this case, the order in what they appear (or if they appear at all) is completely irrelevant.

In a way, it’s similar to quantum mechanics, where you have the distribution of probabilities but not the values themselves. With a few tricks you can extract every information from those distributions. This is also similar to what Monte Carlo methods are. Taking the probabilistic side of computations in order to achieve a rough image of the problem quicker than by brute-force, when the complexity of the problem would be non-polynomial and the problem size would be big enough to need a few billion years to calculate.

Here are some examples of a non-det register.

A new logic

Obviously, current deterministic algorithms cannot run on such machine. All of them are based on the fact that specific data is stored in a specific place. If you compress a file and uncompress it back again, it’s just impossible to work with probable values, your original file will be completely undecipherable. Also, boolean logic cannot operate on such registers, Operating an AND between probabilities or a XOR of two averages doesn’t make any sense. You need a different logic, something that can make sense of probable inputs and analog streams. Fortunately, Bayesian logic is just the thing for such machines.

But the success of Bayesian logic in computing is not enormous. First, it’s not simple to create algorithms using a network of probabilities, and very few mundane algorithms can benefit from it to be worth the trouble. (Optimor labs is a good example). But with the advent of this new machine, a different logic has to be applied and new algorithms must be developed to make use of it.

Probably by mixing deterministic with non-deterministic hardware and creating a few instructions that would use the few non-deterministic registers (or accumulators) available. Most likely they would be used for inter-process communication in the beginning, but as you increase the number of those registers you can start building complex Bayesian networks or also neural networks in the registers themselves. That would open the door to much more complex algorithms, probably most of them unpredictable or found by trial and error (genetic approach?), but with time, people would begin to think that way and that would be the dawn of the non-deterministic computing.

Eureka moment

The probabilistic register was an Eureka moment, when I finally implemented it and saw that my simple monte-carlo or my neuron model was really a three-line code, but that didn’t last long…

A few days ago I read the news that DARPA has developed a fully featured probabilistic processor, taking random computation to the masses, even using Bayesian logic in place of boolean logic! The obvious place for a “public trial” was spam filtering (though I think the US military is not that interested into filtering spam…), but according to them, the sky is the limit. I do have to agree…

Anyway, I wouldn’t have 5 spare years nor $18mi in the bank to implement that project like they did, so in a way I’m glad that it turned out to be a good idea after all. I just hope that they make good use of it (ie. not use it for Carnivore 2.0), and they realize that not only the military have uses for it, or anti-spam filters, for that matter. We (them?) should develop a whole new set of algorithms to work on that logic and make good use of a hardware that is not a simple progression of current technology (to keep Moore’s law going), but could be a game changing in the next decades.

I could re-focus on defining the basic blocks for that processor’s new logic, but I’m sure DARPA has much better personnel to do that. So, what’s next, then? I’ll tell you next year…

40 years and full of steam

Unix is turning 40 and BBC folks wrote a small article about it. What a joy to remember when I started using Unix (AIX on an IBM machine) around 1994 and was overwhelmed by it.

By that time, the only Unix that ran well on a PC was SCO and that was a fortune, but there were some others, not as mature, that would have the same concepts. FreeBSD and Linux were the two that came into my sight, but I have chosen Linux for it was a bit more popular (therefore could get more help).

The first versions I’ve installed didn’t even had a X server and I have to say that I was happier than using Windows. Partially because of all the open-source-free-software-good-for-mankind thing, but mostly because Unix has a power that is utterly ignored by other operating systems. It’s so, that Microsoft used good bits from FreeBSD (that allows it via their license) and Apple re-wrote its graphical environment to FreeBSD and made the OS X. The GNU folks certainly helped my mood, as I could find all power tools on Linux that I had on AIX, most of the time even more powerful.

The graphical interface was lame, I have to say. But in a way it was good, it reminded me of the same interface I used on the Irix (SGI’s Unix) and that was ok. With time, it got better and better and in 1999 I was working with and using it at home full time.

The funny thing is that now, I can’t use other operating systems for too long, as I start missing some functionalities and will eventually get locked, or at least, extremely limited. The Mac OS is said to be nice and tidy, and with a full FreeBSD inside, but I still lacked agility on it, mainly due to search and installation of packages and configuration of the system.

I suppose each OS is for a different type of person… Unix is for the ones that like to fine-tune their machines or those that need the power of it (servers as well) and Mac OS is for those that need something simple, with the biggest change as the background colour. As for the rest, I fail to see a point, really.

Who needs Microsoft’s FAT?

Hydrogenated, unsaturated fat and cholesterol are long enemies of the public, but recently a new type of fat has been added: FAT.

Microsoft has filed a patent suit against TomTom about its FAT implementation on their Linux satnavs. This is a bit of a long story and Microsoft is not tired yet. Probably because of the recent losses with patents, they’re trying to get some profit for themselves.

Luckily, there is hope. The guys at End Software Patents can see some light at the end of the tunnel. Looks like the Bilski case can give precedence for rejecting the lawsuit of that (and many other stupid patents they’re claiming) based on the tangibility of mathematical algorithms (software) when they’re not particularly tied to any concrete implementation (hardware).

This was how it was done before in the US until the first case passed through that wasn’t attached to any particular hardware and then with the final revision in 1998 that they could patent even cake recipes.

Why not ditch it for good?

So, FAT is rubbish, 30 years old and close to zero evolution since then, why keep it? It’s true that there are many other filesystems around, much faster, safer, optimized and well designed, but FAT still has its market: on embedded devices. Because it’s simple and stupid, it’s quite easy to support it on very small machines with reduced RAM and CPU power. It’s also light-weight and fits well for small flash cards and USB storage. But the biggest reason to keep it is another: Microsoft supports it since its birth.

Would you buy an SD card that needs to install a driver to make it work? What’d be the point?

Yet again, because of the market domination (and not technical merits), Microsoft forced rubbish down everyone’s throats live for longer that it was expected. And now, they’re trying to get the profits by suing everyone that followed them for decades. What a nice way to say thank you!

Speaking of which, not only they’re happy by suing companies by using Linux (TomTom in this case and many others during the FAT fight), they’re also asking for the open-source community’s help to make Visual Studio 2010 a better product, isn’t that nice? How lovely is the American way of life, I guess the world will never be able to thank them enough.

Vista is no more

It still hasn’t gone to meet it’s maker, but it was also not as bad as it could’ve been.

After Windows Vista was launched with more PR and DRM than any other, Microsoft hoped to continue its domination of the market. Maybe afraid of the steep Linux increase in desktops (Ubuntu has a great role in that) and other market pressures, they’ve rushed out Vista with so many bugs and security flaws, so slow and with such a big memory and CPU footprint that not many companies really wanted to change their whole infrastructure to see it drawn a little later.

China government ditched it for XP because it was not stable enough to run the Olympics, only to find out that the alternative didn’t help at all.

All that crap helped a lot Linux (especially Ubuntu) jump on the desktop world. Big companies shipping Linux on lots of desktops and laptops, all netbooks with Linux as primary option, lay people now using Linux as they would use any other desktop OS. So, is it just because Vista is so bad? No. Not at all. Linux got really user friendly over the last five to ten years and it’s now as easy as any other.

Vista is so bad that Microsoft had to keep supporting Windows XP, they’re rushing again with Windows 7 and probably (hopefully) they’ll make the same mistakes again. It’s got so bad that the Free Software Foundation’s BadVista campaign is officially is closing down for good. For good as in: Victory!

Yes, victory because in one year they could show the world how bad Vista really is and how good the other opportunities are. Of course, they were talking about Linux and all the free software around, including the new gNewSense platform they’re building, but the victory is greater than that. The biggest message is that Windows is not the only solution to desktops, and most of the time, it’s the worst.

In conjunction with the DefectiveByDesign guys, they also showed how Vista (together with Sony, Apple, Warner et al) can completely destroy your freedom, privacy and entertainment. They were so successful in their quest that they’re closing doors to spend time (and donors’ money) in more important (and pressing) issues.

Now, they’re closing down but that doesn’t mean that the problem is over. The idea is to stabilise the market. Converting all Windows and Mac users to Linux wouldn’t be right, after all, each person is different. But the big challenge is to have users that need (or want) a Mac, to use a Mac. Who needs Windows and can afford to pay all extra software to protect your computer (but not your privacy), can use it. For developers the real environment is Unix, they should be able to get a good desktop and good development tools as well. It’s, at least, fair.

But for the majority of users, what they really want is a computer to browse the web, print some documents, send emails and for that, any of the three is good enough. All three are easy to install (or come pre-installed), all three have all the software you need and most operations and configurations are easy or automatic. It’s becoming more a choice of style and design than anything else.

Now that Apple got rid of all DRM crap, Spore was a fiasco so EA is selling games without DRM, the word is getting out. It’s a matter of time it’ll be a minor problem, too. Would DefectiveByDesign retire too? I truly hope so.

As an exercise to the reader, go to Google home page and search for the terms: “windows vista“. You’ll see the BadVista website in the first page. If you search for “DRM” you’ll also see the DefectiveByDesign web page as well. This is big, it means that lots and lots of websites are pointing to those websites when they’re talking about those subjects!

If you care enough and you have a Google user and is using the personalised Google search, you could search for those terms and press the up arrow symbol on those sites to make them go even higher in the rank. Can we make both be the first? I did my part already.

Why not the primary key?

Recently I came to an amusing situation with Oracle (again) where the primary key was not used when explicitly requested…

The query was:

select name from table where table_id = 1;

Of course, table_id was the primary key. Astonishingly Oracle performed a FULL_TABLE_SCAN.

WTF?

When things like that happen you think there’s something quite wrong with the database, so I decided to ask the DBAs what was going on. The answer was something like:

“It may happen if Oracle decides to. Even though you have created an index, there is no guaranteed they will be used for all queries and the optimizer will decide which one is the best path”.

Seriously, if Oracle decides NOT to use the primary key for that query, there is something really wrong with the whole thing. I couldn’t think of a situation where that might be even close to valid! A friend who knows Oracle much better than me pictured two extreme cases why it could happen:

  1. There are just very few records in that table -> table data = 1 data block, reading the index root block (1 data block) and then accessing the one table data block is certainly more expensive than just read that one table data block.
  2. The index is in a Tablespace with different block size which resides on very slow disks. The buffer cache for the non-default block size is hugely under-sized. So the cost to read the index and the table data might be higher than just reading the table data. It’s a bit unrealistic, but I’ve witnessed stupid things like this.

Let’s face it, the first scenario is just too extreme to be true. If you have only one data block on your table you better use email rather than databases. And the second scenario, why would anyone put indexes on a much slower disk? Also, if the index is too big the data will be proportionally big too, so there is no gain in doing a full table scan anyway.

Later on he found out what the problem was by hacking into the configuration parameters:

  • The production database (working fine) had:
  • optimizer_index_caching = 95
    optimizer_index_cost_adj = 10

  • The development database (Oracle default values) had:
  • optimizer_index_caching = 0
    optimizer_index_cost_adj = 100

I don’t quite understand what they really mean to Oracle but index_caching = 0 seems a bit too radical for me to make it default.

At the end (and after more iterations than I’d like) it was fixed but what really pissed me off was to get the pre-formatted answer that “Oracle knows better which path to take” without even look on how stupid was that decision in the first place. This extreme confidence in Oracle drives me nuts…

gzip madness

Another normal day here at EBI when I change a variable called GZIP from local to global (via export on Bash) and I got a very nice surprise: all my gzipped files have gzip itself as a header!!!

Let me explain… I have a makefile that, among other things, gzip some files. So, I’ve created a variable called GZIP that is the same as “gzip –best –stdout” and on my rules I do:

%.foo : %.bar
        $(GZIP) < $ $@

So far so good, always worked. But I had a few makefiles redefining the same command, so I though: why not make an external include file with all shared variables? I could use the @include for makefiles but I also needed some of those variables for shell scripts as well, so I decided to use “export VARIABLE” for all make variables (otherwise they aren’t caught) and called it a day. That’s when everything started failing…

gzip environment

After a while digging the problem (I was blaming the poor LSF on that) I found that when I hadn’t the GZIP variable defined all went well, but by the moment I defined GZIP=”/bin/gzip –best –stdout” even a plain call to gzip was corrupted (ie. had the binary gzip as a header).

A quick look on gzip’s manual gave me the answer… GZIP is the environment variable that gzip stores all default options. So, if you say that GZIP=”–best –stdout”, every time you call gzip it’ll use those parameters by default.

So, by putting “gzip” on the parameter list, I was always running the following command:

$ /bin/gzip /bin/gzip --best --stdout  a.bar

and putting a compressed copy of gzip binary together with a.foo into a.bar.

What a mess can a simple environment variable do…