On Microkernels

June 12, 2024

Yes, another showerthought inspired TL;DR blog. But “DR” is in the name of the blog so it's not exactly as if I'm bothering you with this.

Anyway. Microkernels. Good idea? Or bad idea? Torvalds was right... or wrong?

Well, both. Let's have a talk about what they are first.

In the 1980s, Microkernels were considered THE FUTURE. Virtually every respected academic pointed out that shoving lots of complicated code into one computer program that your entire computer relied upon was insanely stupid. They argued that it should be relatively easy to split out your kernel into “servers”, single purpose programs that managed specific parts of the system. For example, you might have a Disk Operating System that applications use to access files. That Disk Operating System might talk to Handler processes that manage the file systems on each device. And those in turn might talk to the Device Drivers. Each part of what in modern terms would be a kernel talking to programs that can, at worst, crash without taking down the rest of the system. The servers – DOS, the File System Handlers, the Device Drivers, would all be processes running under a very simple kernel (the microkernel! Geddit?!) that would schedule processes, manage memory, and provide ways for the servers to communicate with one another.

(Microkernels should not be confused with hypervisors, which is a thin kernel intended to run multiple operating systems. Though much of the early hype about microkernels overlapped, with advocates pointing out that in theory you could create “personalities”. For example, in the above example, in addition to having the Disk Operating System, you could have a completely different API server, with that server providing a view of the world that looked like Unix. And that server could talk to its own set of handlers or the same set.)

Academics generally agreed that microkernels were the only acceptable design for modern operating systems. Against this were more traditional operating systems, Unix being an obvious example, which had a single monolithic kernel with all the file systems and device drivers compiled into it. (Systems like CP/M and MS DOS weren't really advanced enough to have a discussion about.)

Microkernels enter the real world

Academia brought us MINIX and Mach during the 1980s. Mach was the basis of several commercial projects such as MkLinux and, more successfully, XNU (the kernel of NEXTSTEP and Mac OS X) but those commercial projects weren't microkernels, they were always hybrid kernels – kernels where most of the servers were integrated into a single space where they could freely communicate with one another at the cost of security.

The commercial world in turn tried to implement the concept but inevitably failed. Many readers will have read my description of how microkernels work above, mentioning “DOS” and “handlers” and “Device drivers” and immediately think of AmigaOS, which was structured as a microkernel based system, but wasn't one. At first sight it's easy to see why, the Amiga had no memory management chip so it literally wasn't possible to sandbox the different components. But in reality the problems were deeper than that. AmigaOS demonstrated that you could get good performance out of a microkernel operating system if the different components could quickly and easily communicate with one another. In AmigaOS, a device driver could talk to a handler just by sending it, via the kernel, the address of, say, where it had just loaded some data from disk. Suddenly that handler had 512 bytes of data available to do with whatever it needed to do. But that's not compatible with how memory management is done in modern CPUs. Modern CPUs are about sandboxing processes, sending 512 bytes from one process to another means rather more than simply sending it a four byte address, it involves either reconfiguring the memory map of both processes to see the same 512 byte block of RAM, or asking the kernel to copy that data byte by byte. These are expensive operations. AmigaOS only worked because there was no memory management as we know it, just a giant shared block of memory everything used. And because memory was shared, a crash by one device driver could, actually, take the entire system down, rather than just affect access to the device involved.

This expense in the end crippled a series of other commercial projects that almost certainly looked like a good idea at the time, elegant, modular, exactly the type of thing every programmer starts to develop only to realize will never work once they start coding. A big question for me in the 1980s was why Acorn lumbered the amazing ARM-based computers they created, with a crappy third rate operating system descended from Acorn's 8 bit BBC OS, “MOS”. The answer is... they did try to create a modern microkernel-based OS for it, called ARX, and immediately got stuck. Despite running on one of the world's fastest microcomputer environments, the system had performance issues that the creators couldn't get around. The moment the elegant design hit reality, it failed, and Arthur/MOS's “good enough” environment was expanded into RISC OS, which used cooperative multitasking and other kludges to make something useful out of a woefully underpowered base.

On the other side of the Atlantic, other companies enthusiastically writing next generation operating systems also had the same issues. Apple started, then co-funded, then walked away from, Taligent. DEC was keeping Dave Cutler busy with MICA, which didn't go anywhere. Finally Microsoft, which was working on a more traditional system with IBM (OS/2), for various reasons hired Dave Cutler away from DEC after MICA's cancellation to develop Windows NT.

The latter was the nearest commercial microkernel based operating system to achieve some level of success. In practice though, Microsoft didn't feel comfortable making Windows NT its primary operating system, despite high levels of compatibility from NT 4 onwards, until the early 2000s, at which point the system was no longer a classic microkernel system, with many essential services, including the graphics drivers (!), integrated into the main kernel.

So why the failures?

At first sight, it's easy to blame the failure of microkernels on performance issues. But that's not actually what happened. There are two bigger issues: the first was that in the case of most commercial microkernel projects, the microkernels were a part of a much bigger project that attempted to build an operating system from scratch that were elegant and well designed. The microkernel was only one component.

But the second was modern memory management. At some point in the 1980s, the major makers of microcomputer CPUs started to release advanced, secure, memory management systems for their existing CPUs. Motorola and Intel both grafted virtual memory systems onto their existing systems by allowing operating systems to rearrange the addressable memory as needed. This was all that was needed for Unix to work, and Unix was considered the most advanced operating system a personal computer user would want to run.

And yes, Unix somehow managed to be both a very big deal and an irrelevance in the personal computing world. Microsoft, acknowledging MS DOS 1.0 was little more than a CP/M like program loader, saw DOS's future as converging with Xenix, its Unix fork. The press described anything with multitasking, from AmigaOS to OS-9, as “Unix-like”, no matter how unlike Unix it was, because Unix was seen as The Future.

So from the point of view of the big CPU makers, a simple memory remapping system was “good enough” for the most advanced operating systems envisaged as running on their chips. There was another factor behind both Intel and Motorola designing MMUs this way: Motorola had designed a very successful 32-bit ISA for its CPUs that programmers adored. Intel's segmented approach had proven to be a failure, propped up only by IBM's decision to include the 8088 in its PC. Intel was focusing on making a pure 32 bit ISA for its next generation of processors, while Motorola saw no need to change its ISA, and saw MMUs as something that could be bolted on to the architecture of a 68000-based system. By the time it became important, neither saw any value in taking a risk and introducing architectures that would integrate memory management with their ISAs.

Why is this important? Well, go back to the AmigaOS description earlier. In the Amiga, the pseudo-microkernel was fast because servers only needed to send each other addresses to transmit large amounts of data between them. On the 68000 ISA there is no way to graft security onto this system – you can't validate a pointer or the memory it points to, but in the mid-1960s and early 1970s, hardware memory management systems were devised that allowed exactly this kind of thing. The system is called Capability Addressing. Capabilities are pointers to blocks of memory, typically with permissions associated with them (like a file.) Creating new capabilities is a privileged operation, you can't just use some pointer arithmetic to create one. Storing a capability in memory requires the CPU have some way to flag that value as being a capability, typically an extra bit for every word of memory. This way programs can load and store capabilities in memory without risking reading normal data as a pointer or vice versa.

A capability architecture would be perfect for an operating system like AmigaOS. It would, with relatively small modifications, be secure. The different servers would be able to communicate by passing capabilities instead of pointers. If one crashes, it wouldn't be able to write to memory not allocated to it because it wouldn't have any capabilities in its memory space that point at that data.

The problem, of course, is that no popular CPUs support capabilities, and most of those that did were also considered failures. Intel tried to produce a system in the very 1980s called iAPX 432 which was not part of their 80x6 family. It was chronically slow. And the 1980s were not a time to produce such a chip, the extra bit required for each 32 bit (at minimum) pointer would have been considered cost prohibitive at a time when computers came wit hundreds of kilobytes of RAM.

It would be remiss of me to mention there was also another theoretical possibility: managed code. In managed code, programs are compiled to an intermediate language, which can be proven “secure” – that is, unable to access resources it hasn't been given direct access to. The two most famous examples are the Java Virtual Machine and .NET. Both systems have problems however: their garbage collectors require the memory of the machines they're running on be locked for indeterminate amounts of time while they account for what's in use (a process called “marking”), though it's worth mentioning that Rust's alternative approach to garbage collection proves a VM could be built with better real time behavior. Another problem was that during the 1980s C became the standard applications development language, with personal computers not being taken seriously unless they were capable of running it: but the high level approach of a VM is at serious odds with C's low level memory management, making it impossible to create an efficient C compiler for such an environment.

So, TL;DR, it wasn't that microkernels were wrong, it's that the technology choices of the 1980s and 1990s, the time when it was most important, made microkernels inefficient and difficult to implement. By the time memory prices had fallen to a point that a CPU architecture optimized for microkernels would have been viable, the world had standardized on operating systems and system architectures that weren't compatible with the concept.

The failures of the 1980s were mostly because developers were being overly ambitious and didn't have the right architectures to work with in the first place.

All of which is a shame. I'd love my primary OS to be like AmigaOS is/was, but with security.