What I don’t miss about Java

Disclaimer: This is not a rant

I spent my last year working with Java, and it was not at all bad. But while Java has its moments and shines, I always felt a bit out of place when using it. In fact, when I moved back to C++, contrary to when I moved to Java, I felt that I actually wasn’t missing much…

Last year, while writing in Java at work, I felt compelled more often than usual to write C++ programs at home. Even simple programs, that would do better with scripting languages, they all came in C++.

Recently, working full time with C++, I noticed I’m doing very little home development and definitely not doing any Java. So, what did I miss about C++ that I don’t miss about Java?

Expressiveness: While functional languages are much more expressive than C++, there are few languages less expressive than Java. Java encourages child-like programming like forcing to call everything by methods not operators. By not having explicit pointers, operator overload and other dangerous things from C++, you end up repeating yourself quite a lot and it’s very hard to understand the logic afterwards, when all you have is bloatware.

While Java designers tried to avoid pointers and operators, they couldn’t. We still have null references (throwing null pointer exceptions) and the fake operators (like toString(), hash(), compare()) that can easily be overridden to change the expected behaviour pretty much the same way as C++ operators, but in the “method” notation.

In the end, you can do some bad things, but not all. So, they took away dangers by taking away functionality, without a proper redesign of C++.

Abuse of Object Orientation: While in Ruby, everything is an object, in Java, almost everything can be. Every class derive from Object silently, but base types do not. So you have the basic objects (Integer et al) which get automatically converted into basic types in subtle ways it’s hard to predict and has a huge performance impact (see auto-boxing).

Not just performance, but the language design is, again, incomplete.

Most OO programmers (mainly Java ones) complain a lot about Perl OO. They say Perl (or Python for that matter) has no proper OO, since everything is a hash and there is no concept of protection.

While Java objects and members are strongly typed, and you have the concept of protection, it’s way too easy to transform Java OO into Perl OO with reflection.

Of course, with C++ you can cast things to void pointers, mess up in the memory and so on, but getting objects by name, removing the private protection in a safe way is simply wrong. It’s like giving loaded guns to children and telling them where the lock is.

Abuse of Design Patterns: Java developers are encourage to use design patterns, to the point of stupidity. The first thing I learnt from design patterns is that their misuse is actually an anti-pattern.

Properties are important when the requirements change too often, not when they’re static. Factories are used when the objects created may differ or be customized, not for never-changing one-object construction. Still, most libraries (all?) will have Factories, Properties and so on, just for the sake of Design Patterns Compliance ™.

Fact, one of the strengths of Java development is that every one is encouraged to do things the same way. No Larry Wall style, all factory workers, doing their share in the big picture. While this is good for big, quick projects on companies with high turn-over (like consultancy companies), it’s horrible for start-ups or more creative development.

Half-implemented features: Well, templates is an issue. There is no template mechanism in Java. With the so-called Generics (like cheap version of meds), there is no type safety at all, it’s just syntax sugar for lists of Objects.

That generates a lot of misunderstandings and bad code being generated when the syntax is obviously correct, that is, if the types were actually being checked.

Again, incomplete design for the sake of backward compatibility with old codes and VMs.

Performance: Running in a JVM is already a bad start for performance, but a good compiler and a well done JIT environment can take most of it away by intelligently removing unused code, re-optimizing most used code during run-time and using profiling results to change branch-prediction code.

While the JVM does some of it, it also introduces several problems that take away the advantage and put it back on the back of the class. Auto-boxing and generics create a lot of useless casts, that can be a huge performance hit. Very few Java programmers really care about it and the compiler doesn’t do a good job in reducing that impact or even warning the programmer.

I often see Java developer scorn at performance issues. The phrase used most is “a programmer shouldn’t care about memory footprint or performance, only about business logic”. That, together with the fact that almost all universities now are teaching Java in undergraduate courses, kinda frightens me a bit.

Strong dependency on IDEs: Borland made quite a lot of money out of C++ IDEs in the 90’s, but most C++ programmers I know still use VIM or Emacs. On the other hand, every Java programmer I know use Eclipse, IntelliJ or something of the sort.

This is not just ease of use (code completion, syntax colouring, hints, navigation), it’s all about speeding up the development process by taking away boiler-plate code generation and refactoring.

IDEs are capable of writing complete pieces of code, refactor and re-write things (even behind your back). The programmers don’t care about it, the code becomes bloated, unintelligible and forgotten. Not to mention the desire of IDEs and people following IDE-style to use certain patterns for everything, like using Properties where simple structures would suffice. (see above, Abuse of Design-Patterns).

False Guarantees: The big selling point of Java, besides cheap cross-platform development, is it’s apparent safety and ease of use. But it isn’t in so many levels…

The abuses and problems related above are only part of the story. The garbage collector is another…

Some good garbage collection routines can help the initial development of programs, and they do take away the job of the lazy programmers to manage their own memory, but the Java garbage collection became a beast, with incomprehensible command-line options, undefined behaviour and total lack of control over it. You’re rendered hostage to its desires.

Not to mention the complete memory management that won’t cope with dynamic memory allocation. I mean, if you want to make memory management easy for programmers (as they went to all that trouble for a garbage collection), you could have gone a bit further and actually figured out the available memory and used it politely.

Join those with the fact that pointers and operators are still available, and you have a language that is not so much simpler than C++, with a huge price in performance and weirdness.

Undocumented APIs: Java claims to be platform independent, but has quite a few available (but undocumented) APIs to use platforms specific functionality (like signals). Still, Sun (now Oracle) reserves the right to change whenever they wish and there’s little you (or anyone) can do about it.

And that takes us to the final point:

Standards (or lack thereof): Sun did a nice job at many things (mostly hardware and OS), but they screwed up neatly when it came to support software. There is no standard, IBM and even Microsoft created their own JVM (which was better than Sun’s, btw) without any final definition about the standard API. During the Java 1.1 days, it was possible to be platform agnostic but VM specific in the same platform!

Conclusion: Java was meant to be an easy language, but it turns out that it’s deceitful enough to be just as bad as any other. And recent changes are making it worse.

Programmers are loosing the ability to understand how the machine works, how their languages behave and, more importantly, to know the implications of their actions.

Why spend time understanding the fiddlings some people had with Java if you can spend the same time understanding how the machines actually work and therefore be able to use any programming language you want?

Some argue that Java is the new Cobol and will disappear the same way… I tend to agree…

Barrelfish

Minix seems to be inspiring more operating systems nowadays. Microsoft Research is investing on a micro-kernel (they call it multi-kernel, as there are slight differences) called Barrelfish.

Despite being Microsoft, it’s BSD licensed. The mailing list looks pretty empty, the last snapshot is half a year ago and I couldn’t find an svn repository, but still more than I would expect from Microsoft anyway.

Multi-kernel

The basic concept is actually very interesting. The idea is to be able to have multi-core hybrid machines to the extreme, and still be able to run a single OS on it. Pretty much the same way some cluster solutions do (OpenMPI, for instance), but on a single machine. The idea is far from revolutionary. It’s a natural evolution of the multi-core trend with the current cluster solutions (available for years) and a fancy OS design (micro-kernel) that everyone learns in CS degrees.

What’s the difference, then? For one thing, the idea is to abstract everything away. CPUs will be just another piece of hardware, like the network or graphic cards. The OS will have the freedom to ask the GPU to do MP floating-point calculations, for instance, if it feels it’s going to benefit the total execution time. It’ll also be able to accept different CPUs in the same machine, Intel and ARM for instance (like the Dell Latitude z600), or have different GPUs, nVidia and ATI, and still use all the hardware.

With Windows, Linux and Mac today, you either use the nVidia driver or the ATI one. You also normally don’t have hybrid-core machines and absolutely can’t recover if one of the cores fail. This is not the same with cluster solutions, and Barrelfish’s idea is to simulate precisely that. In theory, you could do energy control (enabling and disabling cores), crash-recovery when one of the cores fail but not the other, or plug and play graphic or network cards and even different CPUs.

Imagine you have an ARM netbook that is great for browsing, but you want to play a game on it. You get your nVidia and a coreOcta 10Ghz USB4 and plug in. The OS recognizes the new hardware, loads the drivers and let you play your game. Battery life goes down, so once you’re back from the game, you just unplug the cards and continue browsing.

Scalability

So, how is it possible that Barrelfish can be that malleable? The key is communication. Shared memory is great for single-processed threaded code and acceptable for multi-processed OSs with little number of concurrent process accessing the same region in memory. Most modern OSs can handle many concurrent processes, but they rarely access the same data at the same time.

Normally, processes are single threaded or have a very small number of threads (dozens) running. More than that is so difficult to control that people usually fall back to other means, such as client/server or they just go out and buy more hardware. In clusters, there is no way to use shared memory. For one, accessing memory in another computer via network is just plain stupid, but even if you use shared memory in each node and client/server among different nodes, you’re bound to have trouble. This is why MPI solutions are so popular.

In Barrelfish there’s no shared memory at all. Every process communicate with each other via messages and duplicate content (rather than share). There is an obvious associated cost (memory and bus), but the lock-free semantics is worth it. It also gives Barrelfish another freedom: to choose the communication protocol generic enough so that each piece of hardware is completely independent of all others, and plug’n’play become seamless.

Challenges

It all seem fantastic, but there’s a long road ahead. First, message passing scales much better than shared memory, but nowadays there isn’t enough cores in most machines to make it worth it. Message passing also introduces some other problems that are not easily solvable: bus traffic and storage requirements increase considerably, and messages are not that generic in nature.

Some companies are famous for not adhering to standards (Apple comes to mind), and a standard hardware IPC framework would be quite hard to achieve. Also, even if using pure software IPC APIs, different companies will still use slightly modified APIs to suit their specific needs and complexity will rise, exponentially.

Another problem is where the hypervisor will live. Having a distributed control centre is cool and scales amazingly well, but its complexity also scales. In a hybrid-core machine, you have to run different instructions, in different orders, with different optimizations and communication. Choosing one core to deal with the scheduling and administration of the system is much easier, but leaves the single-point-of-failure.

Finally, going the multi-hybrid-independent style is way too complex. Even for a several-year’s project with lots of fund (M$) and clever people working on it. After all, if micro-kernel was really that useful, Tanembaum would have won the discussion with Linus. But, the future holds what the future holds, and reality (as well as hardware and some selfish vendors) can change. Multi-kernel might be possible and even easier to implement in the future.

This seems to be what the Barrelfish’s team is betting on, and I’m with them on that bet. Even if it fails miserably (as did Minix), some concepts could still be used in real-world operating systems (like Minix), whatever that’ll mean in 10 years. Being serious about parallelism is the only way forward, sticking with 40 years old concepts is definitely not.

I’m still aspiring for non-deterministic computing, though, but that’s an even longer shot…

Gtk example

Gtk, the graphical interface behind Gnome, is very simple to use. It doesn’t have an all-in-one IDE such as KDevelop, which is very powerful and complete, but it features a simple and functional interface designer called Glade. Once you have the widgets and signals done, filling the blanks is easy.

As an example, I wrote a simple dice throwing application, which took me about an hour from install Glade to publish it on the website. Basically, my route was to apt-get install glade, open it and create a few widgets, assign some callbacks (signals) and generate the C source code.

After that, the file src/callbacks.c contain all the signal handlers to which you have to implement. Adding just a bit of code and browsing this tutorial to get the function names was enough to get it running.

Glade generates all autoconf/automake files, so it was extremely easy to compile and run the mock window right at the beginning. The rest of the code I’ve added was even less code than I would add if doing a console based application to do just the same. Also, because of the code generation, I was afraid it’d replace my already changed callbacks.c when I changed the layout. Luckily, I was really pleased to see that Glade was smart enough not to mess up with my changes.

My example is not particularly good looking (I’m terrible with design), but that wasn’t the intention anyway. It’s been 7 years since the last time I’ve built graphical interfaces myself and I’ve never did anything with Gtk before, so it shows how easy it is to use the library.

Just bear in mind a few concepts of GUI design and you’ll have very little problems:

  1. Widget arrangement is not normally fixed by default (to allow window resize). So workout how tables, frames, boxes and panes work (which is a pain) or use fixed position and disallow window resize (as I did),
  2. Widgets don’t do anything by themselves, you need to assign them callbacks. Most signals have meaningful names (resize, toggle, set focus, etc), so it’s not difficult to find them and create callbacks for them,
  3. Side effects (numbers appearing at the press of a button, for instance) are not easily done without global variables, so don’t be picky on that from start. Work your way towards a global context later on when the interface is stable and working (I didn’t even bother)

If you’re looking for a much better dice rolling program for Linux, consider using rolldice, probably available via your package manager.

SLC 0.2.0

My pet compiler is now sufficiently stable for me to advertise it as a product. It should deal well with the most common cases if you follow the syntax, as there are some tests to assure minimum functionality.

The language is very simple, called “State Language“. This language has only global variables and static states. The first state is executed first, all the rest only if you call goto state;. If you exit a state without branching, the program quits. This behaviour is consistent with the State Pattern and that’s why implemented this way. You can solve any computer problem using state machines, therefore this new language should be able to tackle them all.

The expressions are very simple, only binary operations, no precedence. Grouping is done with brackets and only the four basic arithmetic operations can be used. This is intentional, as I don’t want the expression evaluator to be so complex that the code will be difficult to understand.

As every code I write on my spare time, this one has an educational purpose. I learn by writing and hopefully teach by providing the source, comments and posts, and by letting it available on the internet so people can find it through search engines.

It should work on any platform you can compile to (currently only Linux and Mac binaries provided), but the binary is still huge (several megabytes) because they contain all LLVM libraries statically linked to it.

I’m still working on it and will update the status here at every .0 release. I hope to have binary operations for if statements, print string and all PHI calculated for the next release.

The LLVM compilation infrastructure

I’ve been playing with LLVM (Low-Level Virtual Machine) lately and have produced a simple compiler for a simple language.

The LLVM compilation infrastructure (much more than a simple compiler or virtual machine), is a collection of libraries, methods and programs that allows one to create a simple, robust and very powerful compilers, virtual machines and run-time optimizations.

As GCC, it’s roughly separated into three layers: the front-end, which parses the files and produce intermediate representation (IR), the independent optimization layer, which acts on the language-independent IR and the back-end, which turns the IR into something executable.

The main difference is that, unlike GCC, LLVM is extremely generic. While GCC is struggling to fit broader languages inside the strongly C-oriented IR, LLVM was created with a very extensible IR, with a lot of information on how to represent a plethora of languages (procedural, object-oriented, functional etcetera). This IR also carries information about possible optimizations, like GCC’s but to a deeper level.

Another very important difference is that, in the back-end, not only code generators to many platforms are available, but Just-In-Time compilers (somewhat like JavaScript), so you can run, change, re-compile and run again, without even quitting your program.

The middle-layer is where the generic optimizations are done on the IR, so it’s language-independent (as all languages wil convert to IR). But that doesn’t mean that optimizations are done only on that step. All first-class compilers have strong optimizations from the time it opens the file until it finishes writing the binary.

Parser optimizations normally include useless code removal, constant expression folding, among others, while the most important optimizations on the back-end involve instruction replacement, aggressive register allocation and abuse of hardware features (such as special registers and caches).

But the LLVM goes beyond that, it optimizes during run-time, even after the program is installed on the user machine. LLVM holds information (and the IR) together with the binary. When the program is executed, it profiles automatically and, when the computer is idle, it optimizes the code and re-compile it. This optimization is per-user and means that two copies of the same software will be quite different from each other, depending on the user’s use of it. Chris Lattner‘s paper about it is very enlightening.

There are quite a few very important people and projects already using LLVM, and although there is still a lot of work to do, the project is mature enough to be used in production environments or even completely replace other solutions.

If you are interested in compilers, I suggest you take a look on their website… It’s, at least, mind opening.

FSF Settles Suit Against Cisco

The long dispute with Cisco has finally come to an agreement. For me, that means two things: first, they’re not trolls sucking money from the big corps for stupid patent infringement, as some might fear. Second, they’re very patient, understanding and sometimes a bit too naive.

Why the fear?

When building embedded systems or when you’re too close to the hardware (such as Cisco) you may take a wise decision to use open source software, as it’s quite likely to be stable and taken care by a good bunch of good people. Even though there are several ways of doing it independently, so your software is not virally infected by the GPL, it’s not always possible and you may have to re-invent the wheel because of that.

It’s not only GPL, patents can also cause a whole lot of damage, and it seems that TomTom has decided to go head first with the Linux community.

So, although the fear is understandable, it’s more of a hysteria than based on actual facts. The FSF hasn’t had much to show on court, and that adds up to the uncertainty of the lawyers, but it’s in cases like the Cisco that they show a much higher maturity that most companies have shown recently, even mature companies like Microsoft.

Richard Stallman

The FSF is not only Stallman. Even though he’s the boss, the organization is a large list of people, sponsors, advisers (and now interns). One thing is to fear what RMS will do when he finds you using GPL in your kitchen scale, but a completely different matter is what the FSF (as an organization) does.

The Cisco case has been going for several years. They offered help, they’ve asked politely, they’ve warned about the potential dangers and so on. A lot has been made before they have actually filled the suit, and they’ve settled it nicely. This shows that they’re not just waiting the next infringement to get you down, they actually care about their (and your) freedom.

The day the FSF starts acting stupid is the day people will drive away. It’s not like Microsoft that you have no option, there’s plenty of options out there, software, licences, partners, advisers, programmers, etc. GNU/Linux is not the decent open source operating system, the BSDs are as good, sometimes better, especially in the embedded case.

The year of Linux

Every year since 1995 is the year of Linux. For me it always was, but I can’t say the same for the rest of the world. Recently, Linux (and other open source software) has played an important role in defining the future of mankind and more and more the Linux community feels that it’s their sweat and blood.

There is a great chance it’ll become the platform of all things in a very short time-frame. Cars, mobile phones, PDAs, netbooks, laptops, desktops, servers, clusters, spaceships. One platform to rule them all and in the darkness bind them, but if they play dumb, their glory might never see daylight.

Lots of people disagree with the new revisions of the GPL license, they feel it bites the hand that feeds it. Many companies feed back open source regularly and that kinda broke the synergy. I personally think that it’s excellent for some cases, but not for all. For instance, development tools should not be restricted, especially when it comes to platforms they can’t reach. Opening the platform is an obvious way around it, but not everything can be exposed and they can’t figure out every implementation detail.

Drivers might also have trouble with GPLv3 for the same reason. Again, there are ways around it, the FSF recently opened a backdoor to develop proprietary plug-ins if they’re blessed, but that might not be suitable for every case.

Solution?

Sorry, not today. Stick to FreeBSD if you can’t cope with GPLv3, find a way to co-exist with the GCC exception and provide the source code of what you have to. If it’s not your core business, you could donate your code to the community and make it GPL too and treat your program as enabling technology, of course, providing your code doesn’t expose any patent or trade secret.

So, well, yeah. Each case is a different case, that’s the problem of being in the long tail.

MySQL down the drain?

Almost 10 years ago, MySQL became a great open source database, part of the LAMP platform (Perl, not PHP) and had everything to compete with the big players in the next few years.

It was then that they have done major releases, with a huge set of new features each, almost once a year. The community was happy using, developing and integrating with other products. But it was around 2005 that the things started going bad…

Back in 2005, when I was still in the loop, I have to say that I wasn’t impressed with the progress that the database had. I wasn’t also impressed with the new view the board gave to big companies (such as Yahoo!) on what was a good bet and what wasn’t.

After release 5.0 (still the production release, irrespective of what Sun says) there wasn’t a major development until Sun acquired MySQL and only then they’ve released 5.1 which they better shouldn’t.

In the old days, MySQL became famous by not implementing foreign keys and transactions, something that every other database had, because of speed issues. That decision became the core of the company and allowed other storage engines (such as InnoDB and BerkeleyDB which had those features) to be integrated, making it very easy to plan your database, using only the features you needed where you needed.

Who’s to blame?

I’m not sure it has something to do with Oracle buying InnoDB and Sleepycat (and now buying Sun, which owns MySQL). Even with all the politics of Oracle slowly buying MySQL in pieces, I don’t believe it’s the whole story. I see much more of an internal conflict and a lack of vision (probably for the lack of guts to keep taking weird decisions and succeeding) than anything else.

Now, MySQL is going down the same drain InnoDB and Sleepycat went, but with a twist: the source code is still GPL. Sun screwed up MySQL in a way I thought it wasn’t possible, Oracle will do it much more efficiently, even if they still play as good guys, it is definitely the end.

Don’t take my word only, my good friend and MySQL guru Jeremy Cole is taking himself out of the loop to avoid the useless politics. Steven (Computerworld) also cannot see how any of the involved companies will get anything in return of this deal.

Is there a light at the end?

Could Monty’s fork become a new MySQL without all the fuss? Could he, the odd guy with odd ideas, put MySQL on the map again? I do hope so, but that will cost MySQL the hall of fame. They’ll need to start over again and eventually fail once they’re there again and restart…

It’ll be fun to watch, at least MySQL had a GPL license which always ease forks and future development. Long live the open source revolution!

UPDATE:

Two excellent articles about the same issue from The Register and Ars Technica.

When refactoring goes wrong

Recently I had to implement a very simple feature that would cross the barrier between a few components. As any good software, the communication between the components is done via public interfaces, and that case wasn’t an exception.

Unfortunately, some core interfaces would need to be changed and I knew I was looking for trouble. Nevertheless, I started it anyway and, in the beginning, it was not as bad as I thought. Lots of changes, of course, but nothing too complex. But the devil is in the details…

Each refactoring pointed out to another refactoring needed, which, if I implemented the first I’d either need to do the second or have to hack it, so I did it. What happened is that it didn’t stop in the second, it went on and on. Each refactoring uncovered another and another. In the end, the state of the program and the unit tests were hardly stable.

I’ve reached a cycle, where refactoring A would break B, B would break C and C would break A again in a different way. The snake was biting its own tail…

That taught me some very important lessons:

  1. Sometimes a simple refactoring can cost you the week and still have to be rolled back. In this case I believe I couldn’t have done differently, I had no way to know how bad it was before actually start poking around the code.
  2. When you face this situation, the best thing you can do is take notes on your changes and roll back. Trying to fix it, especially when you’ve changed quite a lot of unit tests, is recipe to disaster.
  3. Examine your notes and decide which refactoring need to be done first. It’ll probably be backwards to what you had to do in the first refactoring. When in doubt, start from the last and go up the stack.
  4. Never do more than one refactoring at a time. When faced with this situation, stop, take notes, roll back and do the last refactoring first. Test everything and commit before you start the next step.

The last item is especially true when more developers are actively changing the code. This will give them time to adapt their changes to yours and adapt your changes to theirs.

Those lessons I’ve learned, I believe, are irrespective of the version control you use. I know that GIT has some pretty impressive conflict resolutions when merging your code but I doubt it’ll successfully merge high-level concepts (such as object orientation principles and design patterns). If you can’t tell if the test is right or wrong, how could the version control?

Closed source development

While closed source development has its niche (and a very important one), it does feel a bit weird.

I’m now working on low-level development (debuggers) at ARM, one of the things I like most but also a rare thing to find good quality open source development (with the exception of the gnu tools, of course). Of course there is a portion of your work that goes back to the community (via open standards, limited support for the open tools) but it’s not easy to find a job to write code exclusively to the gdb or gcc.

What I’m finding weirder is the fact that the documentation you need is seldom on the Internet (Google or usenet). The good side is that the guys that created the standards and tools are at your doorstep, so it’s quite easy to get hold of them in case you need something off the charts. But that’s normally true with open source as well.

The other weird thing is knowing what you can tell and what you can’t. I have no idea of what part of my current project is public so I just don’t talk about anything of it. But I think that’s just a matter of getting used to, just like I did before. Besides, albeit at EBI I could even show my (or anybody else’s) source code, I don’t think that anybody ever cared that much.

At last, licences. It’s so easy when you develop GPL or LGPL (or similar). Just write whatever you want, use whatever library you need and put a GPL3 tag on your code. That’s it. Simple as that. Now I have to think what would be the impact of that library on the license of what I write, and that’s something I didn’t want to care…

Also, if a document is GPL-ed, you have to GPL it too. If it’s version 3, everything you write (including company’s previous ideas) become GPLv3 as well. That’s a big nuisance. I do understand GPLv3 for code, even apply that to my own source code, but it does annoy a lot when applied to documents.

Although weird for some reasons, it’s not bad at all. I have many more reasons to love my new job. Excellent team, great environment and an impressive code quality, which for me, is a must.

Who’s afraid of the big bad code?

What would Bruce Schneier say about the magic list that the NSA is putting together with Microsoft and Symantec of the 25 biggest errors in code that normally lead to a security flaw.

Don’t get me wrong, putting out a list of bad practices is a fantastic job, that’s for sure. It makes programmers more aware of the dangers, and as the article says itself, newbies can learn from experience before getting into a new field.

But the way that (lay) people take it makes it so magical that the practical side of such list is greatly reduced.

Order and size of the list

I understand that the order must have some sense, but which? Is it ordered by number of attacks in the last 12 months? Or by the sum of all reported losses caused by them? Or by number of such errors found in common code (on those companies’ code, of course)? Or by any other subjective “importance” factor from a bunch of “Security Experts”?

Also, why 25? Why not 30? Who says that the 25th is so important to show up in the list and not the 26th?

Real-world

We programmers know about most of them, know the problems they pose and normally how to fix them. We often want to fix them, but that normally requires some refactoring and now it’s time to implement those features that our client needs for the demo, right? We can think about that later… can we? Will we?

Than, NSA decides to make this a priority for the country and claim it as a national security problem. Big companies like fancy terms, and would strive to adopt any new standard that shows up in the market.

Then, comes down the VP of engineering and say:

“We need to make sure every programmer knows how to write code that is free of the top 25 errors.”

Done, he can put the GIF image from the NSA saying his company’s software is secure against all odds, according to the NSA and DHS.

Now, coders and technicians, tell me: Would any editor, IDE or compiler ever be able to spot those errors with 100% accuracy?

“Then we need to make sure every programming team has processes in place to find and fix these problems [in existing code] and has the tools needed to verify their code is as free of these errors,”

Of course not, but they will try, and Microsoft will put a beta on Visual C++ and other companies will tell their clients that their software is being tested with the new product and the clients will buy, after all, who are them to say anything about that matter?

Protect against who?

Now, after so much time and effort, 30+ companies and government departments working hard to come up with a (quite good) list of the most common errors that lead to security flaws for what?

“The real dedicated serial attacker will probably find a way in even if all these errors were removed. But a high school hacker with malicious intent – ankle-biters if you will – would be deterred from breaking in.”

WHAT?!?! All that to stop script-kids? For heavens’ sake, I thought they were serious on that… Well, maybe I expected too much from the NSA… again…

(Note: quotes from original article, ipsis litteris)