Emergent behaviour

There is a lot of attention to emergent behaviour nowadays (ex. here, here, here and here), but it’s still on the outskirts of science and computing.

Science

For millennia, science has isolated each single behaviour of a system (or system of systems) to study it in detail, than join them together to grasp the bigger picture. The problem is that, this approximation can only be done with simple systems, such as the ones studied by Aristotle, Newton and Ampere. Every time scientists were approaching the edges of their theories (including those three), they just left as an exercise to the reader.

Newton has foreseen relativity and the possible lack of continuity in space and time, but he has done nothing to address that. Fair enough, his work was much more important to science than venturing throughout the unknowns of science, that would be almost mystical of him to try (although, he was an alchemist). But more and more, scientific progress seems to be blocked by chaos theory, where you either unwind the knots or go back to alchemy.

Chaos theory exists for more than a century, but it was only recently that it has been applied to anything outside differential equations. The hyper-sensibility of the initial conditions is clear on differential systems, but other systems have a less visible, but no less important, sensibility. We just don’t see it well enough, since most of the other systems are not as well formulated as differential equations (thanks to Newton and Leibniz).

Neurology and the quest for artificial intelligence has risen a strong interest in chaos theory and fractal systems. The development in neural networks has shown that groups and networks also have a fundamental chaotic nature, but more importantly, that it’s only though the chaotic nature of those systems that you can get a good amount of information from it. Quantum mechanics had the same evolution, with Heisenberg and Schroedinger kicking the ball first on the oddities of the universe and how important is the lack of knowledge of a system to be able to extract information from it (think of Schroedinger’s cat).

A network with direct and fixed thresholds doesn’t learn. Particles with known positions and velocities don’t compute. N-body systems with definite trajectories don’t exist.

The genetic code has some similarities to these models. Living beings have far more junk than genes in their chromosomes (reaching 98% of junk on human genome), but changes in the junk parts can often lead to invalid creatures. If junk within genes (introns) gets modified, the actual code (exons) could be split differently, leading to a completely new, dysfunctional, protein. Or, if you add start sequences (TATA-boxes) to non-coding region, some of them will be transcribed into whatever protein they could make, creating rubbish within cells, consuming resources or eventually killing the host.

But most of the non-coding DNA is also highly susceptible to changes, and that’s probably its most important function, adapted to the specific mutation rates of our planet and our defence mechanism against such mutations. For billions of years, the living beings on Earth have adapted that code. Each of us has a super-computer that can choose, by design, the best ratios for a giving scenario within a few generations, and create a whole new species or keep the current one adapted, depending on what’s more beneficial.

But not everyone is that patient…

Programming

Sadly, in my profession, chaos plays an important part, too.

As programs grow old, and programmers move on, a good part of the code becomes stale, creating dependencies that are hard to find, harder to fix. In that sense, programs are pretty much like the genetic code, the amount of junk increases over time, and that gives the program resistance against changes. The main problem with computing, that is not clear in genetics, is that the code that stays behind, is normally the code that no one wants to touch, thus, the ugliest and most problematic.

DNA transcriptors don’t care where the genes are, they find a start sequence and go on with their lives. Programmers, we believe, have free will and that gives them the right to choose where to apply a change. They can either work around the problem, making the code even uglier, or they can go on and try to fix the problem in the first place.

Non-programmers would quickly state that only lazy programmers would do the former, but more experienced ones will admit have done so on numerous occasions for different reasons. Good programmers would do that because fixing the real problem is so painful to so many other systems that it’s best to be left alone, and replace that part in the future (only they never will). Bad programmers are not just lazy, some of them really believe that’s the right thing to do (I met many like this), and that adds some more chaos into the game.

It’s not uncommon to try to fix a small problem, go more than half-way through and hit a small glitch on a separate system. A glitch that you quickly identify as being wrongly designed, so you, as any good programmer would do, re-design it and implement the new design, which is already much bigger than the fix itself. All tests pass, except the one, that shows you another glitch, raised by your different design. This can go on indefinitely.

Some changes are better done in packs, all together, to make sure all designs are consistent and the program behaves as it should, not necessarily as the tests say it would. But that’s not only too big for one person at one time, it’s practically impossible when other people are changing the program under your feet, releasing customer versions and changing the design themselves. There is a point where a refactoring is not only hard, but also a bad design choice.

And that’s when code become introns, and are seldom removed.

Networks

The power of networks is rising, slower than expected, though. For decades, people know about synergy, chaos and emergent behaviour, but it was only recently, with the quality and amount of information on global social interaction, that those topics are rising again in the main picture.

Twitter, Facebook and the like have risen so many questions about human behaviour, and a lot of research has been done to address those questions and, to a certain extent, answer them. Psychologists and social scientists knew for centuries that social interaction is greater than the sum of all parts, but now we have the tools and the data to prove it once and for all.

Computing clusters have being applied to most of the hard scientific problems for half a century (weather prediction, earthquake simulation, exhaustion proofs in graph theory). They also took on a commercial side with MapReduce and similar mechanisms that have popularised the distributed architectures, but that’s only the beginning.

On distributed systems of today, emergent behaviour is treated as a bug, that has to be eradicated. In the exact science of computing, locks and updates have to occur in the precise order they were programmed to, to yield the exact result one is expecting. No more, no less.

But to keep our system out of emergent behaviours, we invariably introduce emergent behaviour in our code. Multiple checks on locks and variables, different design choices for different components that have to work together and the expectancy of precise results or nothing, makes the number of lines of code grow exponentially. And, since that has to run fast, even more lines and design choices are added to avoid one extra operation inside a very busy loop.

While all this is justifiable, it’s not sustainable. In the long run (think decades), the code will be replaced or the product will be discontinued, but there is a limit to which a program can receive additional lines without loosing some others. And the cost of refactoring increases with the lifetime of a product. This is why old products don’t get too many updates, not because they’re good enough already, but because it’s impossible to add new features without breaking a lot others.

Distant future

As much as I like emergent behaviour, I can’t begin to fathom how to harness that power. Stochastic computing is one way and has been done with certain level of success here and here, but it’s far from easy to create a general logic behind it.

Unlike Turing machines, emergent behaviour comes from multiple sources, dressed in multiple disguises and producing far too much variety in results that can be accounted in one theory. It’s similar to string theory, where there are several variations of it, but only one M theory, the one that joins them all together. The problem is, nobody knows how this M theory looks like. Well, they barely know how the different versions of string theory look like, anyway.

In that sense, emergent theory is even further than string theory to be understood in its entirety. But I strongly believe that this is one way out of the conundrum we live today, where adding more features makes harder to add more features (like mass on relativistic speeds).

With stochastic computing there is no need of locks, since all that matter is the probability of an outcome, and where precise values do not make sense. There is also no need for NxM combination of modules and special checks, since the power is not in the computation themselves, but in the meta-computation, done by the state of the network, rather than its specific components.

But that, I’m afraid, I won’t see in my lifetime.

Computer Science vs Software Engineering

The difference between science and engineering is pretty obvious. Physics is science, mechanics is engineering. Mathematics is (ahem) science, and building bridges is engineering. Right?

Well, after several years in science and far too much time in software engineering that I was hoping to tell my kids when they grow up, it seems that people’s beliefs are much more exacerbated about the difference, if there’s any, than their own logic seems to imply.

Beliefs

General beliefs that science is more abstract fall apart really quickly when you compare maths to physics. There are many areas of maths (statistics, for example) that are much more realistic and real world than many parts of physics (like string theory and a good part of cosmology). Nevertheless, most scientists will turn their noses up at or anything that resembles engineering.

From different points of view (biology, chemistry, physics and maths), I could see that there isn’t a consensus on what people really consider a less elaborate task, not even among the same groups of scientists. But when faced with a rejection by one of their colleagues, the rest usually agree on it. I came to the conclusion that the psychology of belonging to a group was more important than personal beliefs or preferences. One would expect that from young schoolgirls, not from professors and graduate students. But regardless of the group behaviour, there still is that feeling that tasks such as engineering (whatever that is) are mundane, mechanical and more detrimental to the greater good than science.

Real World

On the other side of the table, the real world, there are people doing real work. It generally consists of less thinking, more acting and getting things done. You tend to use tables and calculators rather than white boards and dialogue, your decisions are much more based on gut feelings and experience than over-zealously examining every single corner case and the result of your work is generally more compact and useful to the every-day person.

From that perspective, (what we’re calling) engineers have a good deal of prejudice towards (what we called) scientists. For instance, the book Real World Haskell is a great pun from people that have one foot on each side of this battle (but are leaning towards the more abstract end of it). In the commercial world, you don’t have time to analyse every single detail, you have a deadline, do what you can with that and buy insurance for the rest.

Engineers also produce better results than scientists. Their programs are better structured, more robust and efficient. Their bridges, rockets, gadgets and medicines are far more tested, bullet-proofed and safe than any scientist could ever hope to do. It is a misconception that software engineers have the same experience than an academic with the same time coding, as is a misconception that engineers could as easily develop prototypes that would revolutionise their industry.

But even on engineering, there are tasks and tasks. Even loathing scientists, those engineers that perform a more elaborate task (such as massive bridges, ultra-resistant synthetic materials, operating systems) consider themselves above the mundane crowd of lesser engineers (building 2-bed flats in the outskirts of Slough). So, even here, the more abstract, less fundamental jobs are taken at a higher level than the more essential and critical to society.

Is it true, then, that the more abstract and less mundane a task is, the better?

Computing

Since the first thoughts on general purpose computing, there is this separation of the intangible generic abstraction and the mundane mechanical real world machine. Leibniz developed the binary numeral system, compared the human brain to a machine and even had some ideas on how to develop one, someday, but he ended up creating some general-purpose multipliers (following Pascal’s design for the adder).

Leibniz would have thrilled in the 21th century. Lots of people in the 20th with the same mindset (such as Alan Turin) did so much more, mainly because of the availability of modern building techniques (perfected for centuries by engineers). Babbage is another example: he developed his differential machine for years and when he failed (more by arrogance than anything else), his analytical engine (far more elegant and abstract) has taken his entire soul for another decade. When he realised he couldn’t build it in that century, he perfected his first design (reduced the size 3 times) and made a great specialist machine… for engineers.

Mathematicians and physicists had to do horrible things (such as astrology and alchemy) to keep their pockets full and, in their spare time, do a bit of real science. But in this century this is less important. Nowadays, even if you’re not a climate scientist, you can get a good budget for very little real applicability (check NASA’s funded projects, for example). The number of people working in string theory or trying to prove the Riemann hypothesis is a clear demonstration of that.

But computing is still not there yet. We’re still doing astrology and alchemy for a living and hoping to learn the more profound implications of computing on our spare time. Well, some of us at least. And that comes to my point…

There is no computer science… yet

The beginning of science was marked by philosophy and dialogue. 2000 years later, man kind was still doing alchemy, trying to prove the Sun was the centre of the solar system (and failing). Only 200 years after that that people really started doing real science, cleansing themselves from private funding and focusing on real science. But computer science is far from it…

Most computer science courses I’ve seen teach a few algorithms, an object oriented language (such as Java) and a few courses on current technologies (such as databases, web development and concurrency). Very few of them really teach about Turin machines, group theory, complex systems, other forms of formal logic and alternatives to the current models. Moreover, the number of people doing real science on computing (given what appears on arXiv or news aggregation sites such as Ars Technica or Slashdot) is probably less than the number of people working with string theory or wanting a one-way trip to Mars.

So, what do PHDs do in computer science? Well, novel techniques on some old school algorithms are always a good choice, but the recent favourite has been breaking the security of the banking system or re-writing the same application we all already have, but for the cloud. Even the more interesting dissertations like memory models in concurrent systems, energy efficient gate designs are all commercial applications at most.

After all, PHDs can get a lot more money in the industry than remaining at the universities, and doing your PHD towards some commercial application can guarantee you a more senior position as a start in such companies than something completely abstract. So, now, to be honestly blunt, we are all doing alchemy.

Interesting engineering

Still, that’s not to say that there aren’t interesting jobs in software engineering. I’m lucky to be able to work with compilers (especially because it also involves the amazing LLVM), and there are other jobs in the industry that are as interesting as mine. But all of them are just the higher engineering, the less mundane rocket science (that has nothing of science). But all in all, software engineering is a very boring job.

You cannot code freely, ignore the temporary bugs, ask the user to be nice and have a controlled input pattern. You need a massive test infrastructure, quality control, standards (which are always tedious), and well documented interfaces. All that gets in the way of real innovation, it makes any attempt of doing innovation in a real company a mere exercise of futility and a mild source of fun.

This is not exclusive of the software industry, of course. In the pharmaceutical industry there is very little innovation. They do develop new drugs, but using the same old methods. They do need to get new medicines, more powerful out of the door quickly, but the massive amount of tests and regulation they have to follow is overwhelming (this is why they avoid as much as possible doing it right, so don’t trust them!). Nevertheless, there are very interesting positions in that industry as well.

When, then?

Good question. People are afraid of going out of their area of expertise, they feel exposed and ridiculed, and quickly retract to their comfort area. The best thing that can happen to a scientist, in my opinion, is to be proven wrong. For me, there is nothing worse than being wrong and not knowing. Not many people are like that, and the fear of failure is what keeps the industry (all of them) in the real world, with real concerns (this is good, actually).

So, as far as the industry drives innovation in computing, there will be no computer science. As long as the most gifted software engineers are mere employees in the big corporations, they won’t try, to avoid failure, as that could cost them their jobs. I’ve been to a few companies and heard about many others that have a real innovation centre, computer laboratory or research department, and there isn’t a single one of them that actually is bold enough to change computing at its core.

Something that IBM, Lucent and Bell labs did in the past, but probably don’t do it any more these days. It is a good twist of irony, but the company that gets closer to software science today is Microsoft, in its campus in Cambridge. What happened to those great software teams of the 70’s? Could those companies really afford real science, or were them just betting their petty cash in case someone got lucky?

I can’t answer those questions, nor if it’ll ever be possible to have real science in the software industry. But I do plea to all software people to think about this when they teach at university. Please, teach those kids how to think, defy the current models, challenge the universality of the Turin machine, create a new mathematics and prove Gödel wrong. I know you won’t try (by hubris and self-respect), but they will, and they will fail and after so many failures, something new can come up and make the difference.

There is nothing worse than being wrong and not knowing it…

Silly project of the week: molecule dynamics

This week’s project is a molecular dynamics simulation. Don’t get too excited, it’s not using any of the state-of-art algorithms nor is assembling 3-dimensional structures of complex proteins. I began with a simple carbon chain using only coulomb’s law in a spring-mass system.

The molecule I’m using is this:

Molecular Dynamics

The drawing program is quite simple and wont work for most molecules, but for the 2-dimensional simple molecules (max. of 3 connections per atom) it kinda works.

Later on, putting the program to run, each atom “pushes” all others electrically and the spring “pulls” them back. A good way to solve that is to say that q1 . q2 / x² = – k . x = m . d²x/dx² (where x is a vector) and integrate numerically using Runge-Kutta.

But that’s my first openGL program, so I decided to go easy on the model and actually see it pseudo-working with an iterative-based simulation following the same equations above. This picture is a frame after a few iterations.

Quoting its page: “As this simulation is not using any differential solution, the forces grow and grow until the atom becomes unstable and break apart. Some Runge-Kutta is required to push the realism further.

UPDATE:

The webpage of the fully-functional prototype is HERE.

What is bioinformatics anyway?

After almost three years in the field I’m pretty sure I have no idea. A few months ago I though I knew and wrote an essay about software quality on bioinformatics but I now figured out that, even though those things might make sense to the rest of us, for bioinformatics it doesn’t.

Wikipedia (which have a much higher quality than many papers I’ve read) defines software as: “a general term used to describe a collection of computer programs, procedures and documentation that perform some tasks on a computer system”. It also defines programming as: “the process of writing, testing, debugging/troubleshooting, and maintaining the source code of computer programs”.

So, every one that writes programs (let’s forget about documentation, tests, maintenance etc for now) is a programmer. But a computer programmer IS NOT a software engineer. Programmers can write as much code as they want but without formal definitions, metrics, good design decisions and practices, tests, documentation and so on, they are useless as ants without pheromone.

Quick tip: Whenever you see a job for a software engineer in a bioinformatics institute, beware: It generally means a developer to maintain random code and make random changes in random environments.

So what?

I might not have a clue about what bioinformatics is, but now I’m pretty sure what it ISN’T: Software Engineering. You will find a huge amount of code, scripts, programs, databases but rarely find a fair piece of software. Therefore, my previous ideas could be valid for software quality, but not at all to bioinformatics.

Don’t get me wrong, I know some bioinformaticians (and programmers around) that understand the basic ideas about software and quality and why we should have them, but the whole structure, the scientific community, the people that give them money, have no idea whatsoever of what software really is or where it fits in the loop.

Still, bioinformaticians are getting half-programming and half-biology degrees, on two fields that each has more to know than the whole humanity can hold on their brains added up. How is it possible (and fair) to put those poor guys to work on such sub-human conditions, without any guidance or quality control, without any clue, in fact, to what they really should be doing in the first place.

Some of them come out pretty well, so well that they abandon the field and go work on better companies, with much better software strategies, proper engineering, scientific development in the right place (sandboxes) and production code done by real engineers with solid experience in mission-critical environments.

In the end, it leaves bioinformatics (to be fair, the informatics part only) in the hands of inexperienced people in all sorts of fields and levels, students writing production software, people that never saw a mission-critical environment coordinating databases, filesystems and development, with one bad decision after the other.

Is it just a rant, then?

No, not really. It’s a liberation. For a while I struggled to understand the motives behind those weird decisions. I knew that, in every industry, you have a whole set of values and people can, sometimes take completely awkward decisions, which turns out to be the right one. I’ve seen it happening when moving between jobs, especially when I worked at Yahoo! (big company, big culture). But with time, the awkward decisions still sounded awkward, even after considering all the new information I had.

Other people got fed up with all this and left, one after the other. I talked to them, and the answer was always the same: random (generally bad) decisions, ego in astronomic proportions and zero technical knowledge from all parts. Now I’m leaving for good and you won’t need to ask me why, will you?

I generally need a very good reason to leave a work place. I was feeling out-placed but couldn’t leave without a very good reason, but now I got a good bunch of them…

A liberation indeed!

Is there a way out?

Seriously, no. In 10 years definitely no. In 15 quite likely no. In 20, maybe… but things must start changing now!

Being optimistic, assuming they stop running like headless chickens, they would still need a strong guidance, which is virtually impossible to happen because of the strong ego of scientists in general. Bioinformatics exists for decades already, who is the software engineer that will tell them they’re doing all wrong?

Besides, the people that grant them money (governments) have no clue about software engineering (nor they should) and they will keep sending money every year, as long as, in the reports, they pretend to be doing great things. In fact, most could’ve been done in a few weeks with two or three people prepared to compromise.

Who doesn’t want a job where they can do almost nothing at all, get paid every month without even the remote fear of loosing their jobs and still pretend they’re doing great things? Who say no to this and start working for real gets a really bad reputation… While this win-win situation keeps going, there is little or zero chance of doing real stuff in the field and bioinformatics is doomed to constant failure and ineffectiveness.

At last, it’s not a specific problem, where you can just change a couple of people and everything will be all right, as many believe. This is nobody’s fault, it’s just the way the two fields: biology and informatics, joined together some decades ago and was never straightened. If there is a way out, I’d be very glad to see and will congratulate those who managed to do it, but this is much more politics than software development and I am, very luckily, just a programmer…

Book: Flat and Curved Space Times

The first time I read this book was during my special relativity course at university. I couldn’t understand a thing the teacher was saying (probably because his explanations were always: “you won’t be able to understand that”) and I needed to replace a 35% grade I got in the first exam to complete the course.

Well, hopeless as I was, headed to the library in search of a magical book (other classmates were helpless as well) and found this one. The magic in it is that, instead of trying to force the Lorentz transformations down the throat first and then explain the basic principles of relativity, it does it by simply showing the topology of the space and assuming that the speed of light is constant (pretty much the same path Einstein took in the first place).

So, the first chapter has no equations whatsoever, only graphics with light waves going back and forth and he derives the light-cones automagically from it, what happens to the “world” at high speeds and how does it affect our senses of reality. It goes on for all kinematic principles only using Newton equations and gamma. Lorentz transformations only appear in the fourth chapter.

After that, not only I could understand relativity as a whole, but I also got 90% grade on the final exam! It’s an old (88) book but time has no meaning for a very good book, especially for a subject that hasn’t changed that much in the last decades.

I recommend it to physics-wannabe as well as lay people with little background in math, and if your teacher is as hopeless as mine was, ignore him and read this book.

Click here for the US version.

How close is nano-computing?

In September, Sunny Bains wrote Why Nano still macro? and since then I’m thinking about it once in a while.

Recently, a study in the University of California showed how to create a demodulator using nanotubes. So far there were advances in memory containers such as this and that and also batteries but all of them, as Sunny remembers, trying to build small structures following the design of big things.

Quantum computation nowadays have exactly the same problem, quantum effects in a classic assembly, big, clumsy and very expensive. If it was required a quantum effect (the transistor) to make classical computational cheap and available what will be required to make quantum computers cheap? A SuperString effect? Something messing around with the Calabi–Yau shape of the 6 additional dimensions?

Anyway, back to nanotech, building a nano-battery is cool but using ATPs as the primary source for energy would be much cooler! Using the available nano-gears and nanotubes to make a machine is also cool but creating a single 2,3 Turing machine (recently proven to be universal) would be way better!

Once you have the extremely simple processor like that, a nano-modem, some storage and ATP as food you can do whatever you want for how long you like inside any living being on Earth. Add a few gears to make a propeller and you’re mobile! 😉

Of course it’s not that simple, but most of the time to state that something is viable means exactly the same as to say that it’s classic as in boring and clumsy and expensive and brute force… well, you got the idea…