Software bloat makes me sad

Jul 03, 2015

and maybe it should

Most software is bloated, meaning: it’s taking both a lot more space to store (both in memory and on disk) and more time to run than required. A few examples:

  1. The system requirements of the popular Linux distrubtion Ubuntu state that: “we all know that it is recommended to have 2048 MiB RAM to properly run a day to day Ubuntu” and its “Live CD” has grown so big as to no longer actually fit on a CD.

  2. The “text editor we’ve always wanted” that’s currently being built at GitHub (atom.io) has no support for 20% of use-cases involving ‘large files’ (defined as files larger than a whopping 2MB).

  3. Running the “lightweight” editor Vim, in combination with a few small extensions may lead to your machine locking up for a full minute

The above examples are in no way meant to single out certain pieces of software; them making it to this list is related only to my own personal recent experiences with them. Given that, rather than looking at the above examples in particular, the general point is:

  1. The magnitudy of bloat is almost incomprehensible. Many programs have grown with more than a thousand-fold with respect to their equivalents 2 decades ago. The same CD that hasn’t been able to fit Ubuntu since 2011 still fits approximately 150,000 pages of unformatted English text without any compression.

  2. The burden of selecting software that is not bloated is entirely on the user. The default is bloated, if you want the unbloated version, you’ll have to work (search) for it yourself. And in many cases (e.g. anything that needs a web browser) such a search may not even be fruitful.

  3. There are very little tools available to to help the user select unbloated software. Very few packages make any claims about their storage and runtime charactaristics at all (in fact, when composing the above list an internet search was of very little help for exactly this reason), except those that are specifically created with the purpose of not being bloated.

  4. Over time, the battle against bloat is always lost. Even Ubuntu, which has traditionally presented itself (besides other things) as a method to extract a few extra life-years out of old hardware, is mentioned in the list above. In other words: it’s only less bloated than the alternatives.

Philosophical underpinnings

So why is software so bloated? I’m not quite sure, and this article is an attempt to get closer to the answer rather than to present the one and final truth about the matter. However, I do have at least some idea.

Firstly, software is bloated because software developers believe writing bloated software is actually the right thing to do. By this I mean that, generally, software developers believe some instance of the following basic argument:

  • Constructing software without care for its performance characteristics is easier than with the extra consideration of “not making things bloated”
  • Software developer time is expensive.
  • Hardware is cheap, and is becoming cheaper all the time.
  • Therefore, it is best to think about optimizing the performance of software only once it becomes a noticable problem.

In many cases, this is combined with the “Agile” belief that “requirements will surely change, therefore it’s more important than anything (including writing something non-bloated) to get something out the door”.

Some real-life examples of this philosphy are Make it work, make it right, make it fast (though the last step is often forgotten), the often misunderstood “Premature optimization is the root of all evil” and this article by Joel Spolsky.

In other words: software bloat is not an accident, but rather logical outcome of the belief-set of the field of software development. It’s not that we’re trying to keep a check on bloat and failing; it’s that we’re not even trying at all.

Lack of tools

This fundamental lack of care for building unbloated software is directly reflected in the lack of readily available tools which have performance as a focus. 1

Of course, tools exist which have performance as their focus. The generally available ones, however, all focus on “testing” or “analyzing”. In other words, they are empirical in nature, as they focus on measuring performance generally, or measuring which particular parts of the software are bloated. This fits with the earlier observation that the reduction bloat of software only receives attention once it becomes a “noticable problem”. Surely there is a place for empirical tools and methods in general. However, if they form the prime method of understanding artifacts of our own creation, we must surely become suspicious.

A particular area of interest are the tools we have for decomposition & composition: breaking complex systems into parts that can be understood, and building systems out of such parts. Many tools for decomposition exist, such as dividing a system into different processes, modules, libraries, classes, methods etc. However, performance considerations can generally not be specified at the interfaces across which we decompose. On the contrary, they are often explicitly considered to be an implementation detail that we must abstract away from.

To wit: the fact that a Java List Interface may be implemented using either a LinkedList or an ArrayList, implementations which have vastly different performance charactaristics, is (rightly) claimed as a victory for abstraction. However, the obvious consequence of this is that users of such a general interface cannot know what the performance consequences of their actions will be. Which means that with regards to performance, they will have to study the actual implementation, defeating the purpose of formalizing the interface. Furthermore: because the system is decomposed, stuying the actual implementation may not be sufficient once the implementation changes. And because such formalisms are lacking at every single boundary, bad decisisions with regards to bloat can propagate through systems without being noticed and may be introduced at any point in time.

In short, the lack of formalisms means that the only thing preventing a system from becoming more bloated is the care taken by all individual programmers to construct an unbloated system. Given the lack of focus on containing bloat, and the many other pressures on programmers to deliver, this is not a hopeful position to be in.

The case against bloat

So why is bloat so problematic in the first place? Shouldn’t I stop “moaning” about this non-existant problem as mr. Spolsky suggests? A few points:

Firstly, for any usage that actually matters, performance will in fact, at some point, become a problem when this approach is taken. In other words: any succesful application will at some point run into performance problems. Given the assumption of useful software, this means that the performance problems need to be solved at that point in time.

This raises the question “wouldn’t it have been easier not to create the problem in the first place?” It would seem obvious that the answer to this is indeed “yes”. In a scenario of “doing it right at the first try” none of the costs asociated with the repeated work of re-understanding the problem space (potentially even by different persons) will be incurred. So at least in scenarios where succes is expected, it’s clearly best to “optimize” from the start.

The usual counter to this is that the actual (as opposed to imagined) bottlenecks will only become apparent after intense usage. I find this hard to believe. This would imply that either the constructed software or the usage scenarios are hardly understood (though a cynical observer would rule out neither option). In fact, in my experience the process of finding such bottlenecks on running systems is itself quite time-consuming - time which cannot be spent actually reducing bloat. Case in point: which part of Ubuntu must we shave off, now that we notice it doesn’t fit on a CD anymore? Apperantly this question is sufficiently hard to remain open for the past four years.

Secondly, having bloat in itself creates all kinds of new complexity, such as managing more hardware (since bloated software won’t fit on a single machine), the introduction of caching layers, shifting parts of the program into background processes (hiding parts of the slowness from your user) or even simply the introduction of flash screens. (By the way, note the irony of a proud blog-post which deals with the performance optimizations an artifact whose sole purpose is to “entertain” the user while she’s waiting for the actual program to become available.)

Finally, there is the emotional argument: bloated software simply makes me sad. As a user, at best, slow software takes a certain directness and feeling of control away from me. At worst, the experience is one of outright frustration and powerlessness. As professional creators of software, this poor user experience, multiplied by millions of users, is “on us”. Furthermore, the implication of bloated software is that we, as a field, have no idea what we’re doing. From such a position it is hard to feel professional pride.

Making an emotional argument is always dangerous, even more so in such a cerebral field. However, I choose to stay angry about the problem, until we can make it go away.

  1. I should put up a big disclaimer here: perhaps such tools do exist in particular specializations, such as real time software or games; In any case they did not receive major uptake in the general field of software development.