Programming design style – keeping simple

Someone needs to come up with a slick name for “designed to fail during test instead of production” or for the more common “soft” type of programming. When we write code, we assume we screwed up somewhere, an assumption based on years of bitter experience (well, we assume someone screwed up, maybe in the tools, maybe in the spec, maybe in hardware that comes with 10G of obscure errata, or maybe even now and then someone else in the company). In any case, we test and fix. We design code to fail on the slightest suspicion that something is not right – so our tests can catch failure. Soft programming is full of palliatives that have the effect of causing catastrophes. Here’s an example: out of memory process killers in page allocators. The idea is that when memory becomes too tight, some randomly chosen (or heuristically chosen, comes to the same thing) process is terminated to make room for the others and to prevent deadlock. Somewhere down the line the OOM Killer will kill exactly the wrong process at exactly the wrong moment – because, contrary to what OS developers think, the applications are the purpose the the whole system. That somewhere down the line is improbable but inevitable. So OOM Killers make it unlikely that tests of OOM conditions will find the worst case. That seems exactly wrong. You should want your tests to find every possible flaw as soon as possible so that you can root them out of your released code. Is that so wrong?

Programming design style