Andy Tanenbaum writes a defense of microkernels that (1) misses the content of Linus Torvald’s critique, (2) ignores the most relevant paper on software development, David Parnas’ Software Jewels paper, and (3) pretends RTLinux does not exist. The problem with microkernels is that they are not modular and the problem with all the “nearly ready for prime time” microkernels is that they are not real products and the problem with many new academic OS projects is that they don’t have much new in them.

Parnas asks why we don’t see more elegant, simple, kludge-free “jewels” in systems software and he, rather gently, explains the compatibility, time pressures, and other constraints that make “clean and lean” so elusive. Parnas is just too nice to hammer the point home. Anyone can write small fast clean operating system that doesn’t do anything useful. Over and over, researchers stumble on the discovery that as you add minor inessential feature after minor inessential feature (all in response to some importuning victim who actually needs to do something with your creation), you recreate the “bloat” that wasn’t needed when your software didn’t do anything. In fact, the natural result of making a nice clean simple lean fast OS into something useful is the creation of another implementation of POSIX. VMS and then Windows NT are very POSIX like, despite the intentions of the designers and so is Mach in OS-X and so are many traditional RTOSs. After all this time, this should not be so surprising anymore.

Linus Torvald’s response to Tanenbaum is pretty clear, but Tanenbaum misses the point:

Linus also made the point that shared data structures are a good idea. Here we disagree. If you ever took a course on operating systems, you no doubt remember how much time in the course and space in the textbook was devoted to mutual exclusion and synchronization of cooperating processes. When two or more processes can access the same data structures, you have to be very, very careful not to hang yourself. It is exceedingly hard to get this right, even with semaphores, monitors, mutexes, and all that good stuff.

My view is that you want to avoid shared data structures as much as possible. Systems should be composed of smallish modules that completely hide their internal data structures from everyone else. They should have well-defined ‘thin’ interfaces that other modules can call to get work done. That’s what object-oriented programming is all about–hiding information–not sharing it. I think that hiding information (a la Dave Parnas) is a good idea.

And this gets both Torvalds and Parnas wrong. Information hiding is only good design when the hidden information is not needed by the software it is hidden from! If you hide information that you need to share you’re just wasting time. A great example of real modularity is the splitting off of the command interpreter (first in CTSS (corrected)) from the kernel. This split is possible because the designers recognized that there is very little information exchange between kernel and interpreter. An example of a fake module is the attempt of most microkernels to split virtual memory paging and storage caching. The most elegant and efficient message passing interface that can be imagined cannot fix the problem that too much information must be shared to make the resulting monstrosity run properly. Here’s Torvalds

The fundamental result of access space separation is that you can’t share data structures. That means that you can’t share locking, it means that you must copy any shared data, and that in turn means that you have a much harder time handling coherency. All your algorithms basically end up being distributed algorithms.

And Tanenbaum’s response is that sharing data structures is hard! Well, yeah – many engineering systems are complicated and hard to do right. During the 30 years of development of CTSS, Multics, UNIX, Plan9, and Linux, many components were split off into modules. What remains cannot be broken into decoupled parts in any obvious way. Let no man or woman split what Dennis and Ken have wrought without a damn good reason. Until the hardware changes or someone discovers something new about software structure in operating systems, complaining that this kernel model is complicated seems as useful as complaining that petroleum refineries are too big and filled with chemicals. The POSIX model is a high performing engineering design – it works well and it is just perverse to assume its success is due only to the ineptness of everyone else.

It would be very interesting to study a production operating system like Linux or Windows XP and try to discover hidden modularity – functions that appear to not need to be bound to each other. Do such functions exist? Perhaps – one was discovered only a decade ago. The basis of RTLinux is the recognition that “real-time” can be separated from a time-sharing kernel and put into a module that can operate in a decoupled manner. The result was immediately useful but RTCore, the RTLinux real-time kernel, does not correspond to any of the traditional “modules” advocated by microkernelistas. There is no reason to suppose more such unconventional modules cannot be found. If we keep going back to the same boring list of “modules” that are familiar to generations of students forced to look at that ridiculous layer cake picture of an operating system, however, we’ll keep getting the same lack of results and the same “almost as fast as, almost complete” projects.

Note: RTLinux has been property of WindRiver Systems since 2007

Note2: Here’s a more abstract look at modularity.

Note 3: 2019 – I should have also said something about loadable kernel modules.

Microkernels and why academic OS research is boring
Tagged on:             

3 thoughts on “Microkernels and why academic OS research is boring

  • Pingback:OS design at keeping simple

  • July 7, 2009 at 12:57 am
    Permalink

    [the problem with all the “nearly ready for prime time” microkernels is that they are not real products]
    Hello Mr Yodaiken,
    How about QNX ? Cisco IOS XR based on this OS, and maybe much more another product.

    Thanks

  • July 7, 2009 at 11:07 am
    Permalink

    QNX has always been the exception that proves the rule – but it’s not a classical microkernel. The posix process server lives in the same address space as the microkernel and combines mm, process management, and some file system work.

Comments are closed.