#?.info

I live in Florida. Florida has bad weather. You may have seen it on the news. Our home lost Internet (we get it from Comcast) due to Hurricane Milton, which isn't the first time. I've been looking for a backup Internet solution for some years now, especially as I work from home. Finally, T-Mobile has stepped up with a $20/month thing called T-Mobile Home Internet Backup. It's capped at 130G a month, but is otherwise identical to their regular 5G Home Internet service.

So I'm trying it.

And... I'm in two minds.

Let's start with the positives: it's fast. Very fast. I get around 400Mbps down, and 40Mbps up. That's close to Comcast for the downloads, and double (!) for the uploads. Latency is similarly comparable.

The main negative is that it's crude, uncustomizable, and isn't user friendly for its intended purpose.

The basic system is supplied as a combined 5G modem and router with no options. No, really. You can set the “admin” password but can't use it for anything, and determine the SSID and password to the Wifi network you create. But you cannot:

  • Turn off the Wifi (there are two Ethernet ports so it's not like Wifi is necessary)
  • Turn off the DHCP server
  • Set up reserved IPs
  • Port forward (well, it's probably CGNAT anyway)

If you plug in your own router, it will do nothing to accommodate you. It won't turn off the Wifi. It won't give your router any IPv6 features (DHCP-PD is not provided, I'm not sure any IPv6 functionality is provided even if you connect directly.) You'll certainly have issues with certain types of application.

The unit does have undocumented features, you can download an app called HINT Control which will allow you to turn off the Wifi or restrict it to a single frequency band. If you're feeling more adventurous there's a web app you can install, if you have the environment to run it, that'll make it feel a little more like a regular router, called KVD Admin. The options though are the same as the above app. Advanced features such as port forwarding or IPv6 support are out of the question however.

It's marketed as a backup but it doesn't act like one.

I am a nerd, so I know my needs are more substantial than Joe Smartphone, but it's not clear to me that even users with simpler needs will find it anything but a jarring experience when used as a backup.

The ideal implementation for something like this would sit between someone's router and their home network. Unfortunately that's not generally practical, most people use a combined Wifi/router/gateway, which means you can't just slip something in that reroutes packets when the main Internet is down.

So T-Mobile's solution to this is essentially give up. Instead of putting in any effort at all into allowing end users to integrate the T-Mobile system into their existing home network. The apparent assumption is that you'll simply go through every single device you have – your laptops, your tablets, your smart TV, or any other “smart” devices (heaven help us) you rely on, your Alexa hub, your security camera, etc – and reconfigure them to use the T-Mobile wireless access point. And when the main Internet comes back up, you'll somehow notice and reverse that process and reconfigure every one of your devices back again.

How it should work

An end user, whether nerd me or Joe Sixpack, actually wants this to act more like a switch, where flipping one way means our Wifi router is routing Internet access via the regular route, and flipping the other way routes it via T-Mobile instead.

In an age when every ISP insists on sending customers preconfigured all-in-one boxes with limited customizability, it's tough to offer that in a user friendly way. But at the same time, you feel T-Mobile could have at least tried by providing a solution that meets the majority of configurations. One obvious way would be for T-Mobile's Wifi to be configurable as a pass through. People would connect to it by default, but instead of providing DHCP and routing to the Internet, it would normally pass packets on to the customer's gateway. When the switch is flipped however, it would intercept packets for the gateway and route them via T-Mobile instead.

And obviously, if you're like me, and want to use your own Wifi router, you could plug your router (and the rest of your network) into the T-Mobile box.

How I configured it

For now, my options are limited. I put an old router in front of the T-Mobile box to prevent its DHCP server from touching my network directly, and my ISC DHCP server (I don't use my router's) can be configured if needed to change the default gateway to that router should I lose access to the Internet via Comcast. I can also do it manually, or on a per-machine basis.

This is, to be honest, not ideal. It means every device has to be forced to renew its DHCP lease when there's an Internet issue. Some devices do this easily, I can get my laptop to get the latest information just by reconnecting it to Wifi.

A possible option would be to have an intermediary act as the default router, but I haven't yet figured out the best approach to doing that. It seems extremely inefficient.

Final thoughts

The system continues the industry's desire to control how we use the Internet and ensure it fits into the bizarrely limited world view of those who market ISP services. Which is a shame because the same desire for a “simple Internet” that “just works” also cripples T-Mobile's intended customer base for this specific service. You can't beat the price though, and if you're prepared to duct tape a bunch of kludges together, you can make it work.

I made a strange decision I didn't think I would when building two new servers. Instead of arranging the SATA SSDs I bought as a raidset, I just set them up individually. Different VMs are on different drives. There's a reason for it, and I'm not entirely sure it was the right decision, but I didn't get any feedback suggesting I was fundamentally wrong.

When are you supposed to use RAID?

“RAID is not a back-up”

It's one of those glib phrases that's used in discussions about RAID when people are asking whether they should use it. It's also, very obviously, wrong (except in the case of RAID 0, of course.) RAID is primarily about preventing data loss, which it does by duplicating data. That's literally a back-up. And for many people, that's the same improvement they'd get from copying all their data to another drive on a regular basis, except that RAID results in less data loss than that strategy because it's automatic and immediate.

There are situations RAID doesn't handle. But there are also situations “copying your disk” doesn't handle either. Both react badly to the entire building catching fire unless you're in the habit of mailing your back-up disks to Siberia each time you make a back-up, or are ploughing through your monthly 1Tb Comcast quota shoving the data, expensively into the cloud. On the other hand, RAID is pretty bad at recovering from someone typing 'rm -rf /', though that happens less frequently than people think it does. And arguably, a good file system lets you recover from that too. And a good back-up system should somehow recognize when you intentionally remove a file.

“RAID is about uptime and high availability”

...which RAID does by automatically backing everything up in real time. Also that's 100% true of RAID on an enterprise grade server. But if you shoved five disks into your whitebox PC (keyword: into) and configured them as a RAID 6 set (because someone told you RAID 5 is bad, we'll get to that in a moment) then it's more about predictable up time in the sense you'll be able to say “Well, I'm going to need to turn off the computer to fix this bad disk but I can wait until things are quiet before doing it.”

Anyway it's kind of about uptime and, more importantly, high availability. And it's actually high availability that's the thing you're probably after. And RAID only has a limited impact on high availability generally.

“RAID makes some things faster, and other things slower”

This bit's awkward because while it's been technically true in the past that RAID's spreading of data across multiple disks has helped with speed, assuming a fast controller that can talk to multiple disks at once, it's always been a six of one, half a dozen of the other, kind of situation. You probably read files more than you write them, and most RAID configurations speed up the former at the expense of the latter, but over all it's unclear what kind of advantage you'd gain from using it. RAID 1 or “RAID 10” were once probably best if you're just looking for ways to speed up a slow server. But technologies change. We have SSDs now. It is very unlikely RAIDing SSDs makes any difference to read speeds whatsoever.

Conclusion: RAID is ultimately a way to make one component of your system, albeit a fairly important one, more reliable and less prone to data loss. But it's considered inadequate even when it works, and requires augmentation by other systems to prevent data loss.

How are technology changes affecting RAID?

Disk sizes, error rates, and matrixing

Until the early 2010s, the major technology change concerning storage was the exponential increase in capacity regular hard disks were seeing. Around 2010 or so, experts started to warn computer users that continuing to use RAID 5 might be a problem, because RAID controllers would mark entire disks as bad if they saw as many as just one unrecoverable failure Disk capacities had grown faster than their reliability per bit, so the chances of there being a bad sector in your disk grew with its capacity.

The scenario RAID 5 opponents worried over was essentially that one disk would fail, and then when recovering the data to add to another disk, the RAID controller would notice a problem with one or other of the other drives. This might make the data in question unrecoverable assuming the original failed drive was completely offline, which it might be if the entire disk failed, or in some silly circumstances where the RAID controller wants to be snotty about disks it found errors on, or if there are only three hotswap bays. Regardless, most (all?) RAID controllers will refuse to continue at that point because there's no way to recover without losing data, possibly a lot of data because of the matrixing algorithms used in RAID 5.

RAID 6 requires two redundant drives instead of one, and so, in 2010, was considered better than RAID 5. But recently we've been hearing similar concerns expressed about RAID 6. At the time of writing there is no standard RAID 7, with three redundant drives, but several unofficial implementations. A potential option is just to use RAID 1 with more than two drives, with a controller that can tolerate occasional, non-overlapping, errors.

These concerns are rarely expressed for RAID 1/10, but these have the same potential issue. A disk fails, you replace it, the controller tries to replicate the drive and finds a bad sector. But in theory, less data would be lost under that scenario because of the lack of matrixing.

SMR/Shingled drives

Post 2010s, the major innovation in hard disk technologies is a technology called “shingling”. As disk capacities increase, the size of magnetic tracks on those disks inevitably decreases until it becomes difficult to even make robust read/write heads capable of writing tracks small enough. Shingling is the great hope, instead of making the heads smaller, you keep them the same size, but you write and rewrite several tracks in succession as a single operation – writing each track so it overlaps with the track next to it. This means your big ass head might be wide enough to write three tracks at once, but you can still get multiple tracks in the same space.

For example, suppose you group seven together, you can write these like this:

First pass:

111??????

Second pass:

1222?????

Third (and so on) passes: 12333???? 123444??? 1234555?? 12345666? 123456777

Now if you hadn't used shingling, then in that same space you'd have only fit three tracks:

123456777 111222333

But, due to magic, you have seven! Hooray!

Let's not kid ourselves, this is a really good, smart, idea. It almost certainly improves reliability (you don't want to shrink disk heads too much or massively increase the number of heads and platters – the latter adds cost as well as increasing things that can fail.) But it does come at the expense of write speeds. Because every time you write to a drive like this, the drive has to (in the above case) read 7 tracks into memory, patch your changes into that image, and then rewrite the entire thing. It can't just just locate the track and sector and write just that sector because it'll overwrite two other sectors when it does it.

A less controversial technology that has a similar, albeit nowhere near as bad, effect is Advanced Format, often known as 4K. This increases the drive's native sector size from 512 bytes to 4k, essentially eliminating 7 sector gaps per sector on a drive which can be used to store data instead. Because operating systems have assumed 512 byte sectors since the mid-1980s, most drives emulate 512 byte sectors by... reading an entire sector into memory, patching it with the 512 bytes just read, and writing it out as needed. This is less of a problem than shingling because (1) operating systems can just natively use 4k sectors if they're available and (2) usually consecutive sectors contain consecutive data, so normally, during a copy, the drive can wait a while before updating a sector and will see the an entire 4k sector's worth of data given to it so it doesn't have to read and patch anything.

Shingled drives however don't have anything that would make their approach more efficient. They end up being very slow, and during a RAID recovery process, many RAID controllers will simply give up and assume a fault with a drive if it's shingled.

Shingled drives are rapidly becoming default for hard disks. Despite needing more on-board memory they end up being cheaper per terabyte, for obvious reasons.

Solid State Drives (SSD)

Probably the single biggest innovation that's occurred in the last 20 years has been the transition to SSDs as the primary storage medium. SSDs have very few disadvantages over HDDs. They're faster, so fast the best way to use an SSD is to wire it directly to the computer's PCIe bus (that's what “M.2” is), they use less power, and they're far more reliable. Their negatives are that they're more expensive per terabyte (though costs are coming down, and are comparable with 2.5” HDDs at the time of writing) and they have a “write limit”. The write limits are becoming less and less of a problem, though much of that is because of operating system vendors being more sensitive to how SSDs should be written to.

SSDs are RAIDable but there's a price. RAID usually astronomically increases the number of writes. Naively you'd expect at least a doubling of the number of writes, purely because of the need for redundancy. This isn't an issue with RAID 1 as you have double the disks, but for RAID 5 you have at most 1.5x as many disks, and RAID 6 isn't much better.

But, aside from RAID 1, it isn't “double” the writes. Each write in a RAID 5 or RAID 6 environment requires also updating error correction data, and while some of that can be buffered and done as a single operation, that does substantially increase the number of writes per underlying write.

For HDDs this isn't much of a problem, HDD life is mostly affected by general environment, drives will literally live for decades if cared for properly, no matter how much you write to them, but for SSDs each write is a reduction in the SSD's lifetime. And given most RAID users would initially start with a similar set of SSDs, that also increases the chance of more than one SSD failing in a short space of time, reducing the chance to recover a RAID set if one drive fails.

Ironically, SSDs should be easier to recover from than HDDs in this set of circumstances because a typical SSD, upon noticing its write count is almost up, will fall back to a read-only mode. So your data will be there, it's just your RAID controller may or may not be able to recover it because it was built for an earlier time.

Preempting a predicted reply about the above: many argue SSDs don't have write limit problems in practice because manufacturers are getting better at increasing write limits and they haven't had a problem in the whole five years they had one in their PC. Leaving aside issues with anecdotal data, remember that most operating systems have been modified to be more efficient and avoid unnecessary writes, and remember that RAID is inherently not efficient and makes many redundant writes (that's the 'R' in RAID). The performance of an SSD in a RAID 5 or RAID 6 set will not be similar to its performance in your desktop PC.

Alternative approaches to high availability

A common misunderstanding about high availability is that it's a component level concern, rather than an application level concern. But ultimately as a user you're only interested in the following:

  • The application needs to be up when I want to use it
  • I don't want to lose my work when using it.

RAID only secures one part of your application stack, in that it makes it less likely your application will fail due to a disk problem. But in reality, any part of the computer the application runs on could fail. It might become unpowered due to a UPS failure or a prolonged power outage, or because the PSU develops a fault. The system's CPU might overheat and burn out, perhaps because of a fan failure. While enterprise grade servers contain some mitigations to reduce the chance of these types of failure happening, the reality is that there are many failure modes, and RAID is designed to deal with just one of them, and increasingly it's bad at it.

But let's look at the structure of an average application: It serves web pages. All those web pages are generated from content stored in a database. For images and other semi-static assets, you can put them in a Minio bucket. You'd have nginx in front of all of this doing reverse proxying.

For the application and nginx, you are looking at virtually no file system modifications except when deploying updates. So you could easily create two nginx servers and two application servers (assuming the application server tolerates the idea of multiple servers, most will and are designed like that to begin with.) So you don't actually need RAID for either server, if one suffers a disk crash, already unlikely, you can just switch over to the back-up server and clone a new one while you wait.

For the database, both MySQL/MariaDB, and PostgreSQL, the two major open source databases, support replication. This means you can stand up a second server of the same type, and with a suitable configuration the two will automatically sync with one another. Should one server suffer a horrific disk crash, you can switch over to the back up, and stand up a new server to replace the back-up.

For your assets server, you can also do replication just as you can with the database.

One huge advantage of this approach over trying to make your server hardware bullet proof is that you don't have to have everything in the same place. Your Minio server can be a $1/month VPS somewhere. Your PostgreSQL replication server can be two servers on different PCs in your house. Your clones of your nginx and application servers can be on Azure and Amazon respectively. You could even have your primary servers running from unRAIDed SSDs, and secondary servers on a big slow box running classic HDDs in a RAIDed configuration.

This is how things should work. But there's a problem: some applications just don't play well in this environment. There are very few email servers, for example, that are happy storing emails in databases or blob storage. Wordpress needs a lot of work to get it to support blob storage for media assets, though it's not impossible to configure it to do so if you know what you're doing.

But if you do this properly, RAID isn't just unnecessary, it just complicates your system and makes it more expensive.

Final thoughts

This is a summary of the above, you can CTRL-F to find the justifications for each if you skimmed the above and do not understand why I would come to the conclusions I have come to, but I came to them, so suck it up:

  1. RAID is viable right now, but within the next 5-10 years seems likely to be become mostly obsolete and a danger to anyone who blindly uses it without limiting themselves to those specific instances where it's necessary, and favoring large numbers of small drives in a RAID 1 configuration for those instances.
  2. RAID is still useful if you're using them to create redundant high availability storage of relatively small disks, I've seen numbers as low as 2T quoted for RAID 5, but I'm sure you can go a little bigger on that.
  3. Other than RAID 1, RAID with SSDs is probably an unwise approach.
  4. Going forward, you should be choosing applications that rarely access the file system, preferring instead to use storage servers like databases, blog storage APIs, etc, that in turn support replication. If you have to run some legacy application that doesn't understand the need to do this, then use RAID, but limit the amount you use them for. (If it helps, a Raspberry Pi 5 with 4Gb of RAM and a suitably large amount of storage can probably manage MariaDB, PostgreSQL, and Minio replication servers. None of these servers are heavy on RAM.)
  5. Moving away from RAID, forced or not, means you must, again, look at your back-up strategy. From a home user's point of view: consider adding a hotswap drive slot on your server and some automated processes that mount that drive, copy stuff to it, and unmount it, every night. You probably have a ton of unused SATA HDDs anyway, right?

These instructions refer to Xen 4.17 with a Debian “Bookworm” 12 dom0 and the bundled version of PyGrub. It's entirely possible that by the time you read this PyGrub will have gained btrfs support. But if you tried it and got errors, read on!

The problem

While some of Xen, notably xen-create-image, kinda sorta supports btrfs, that's not true of PyGrub, the Xen bootloader for PVs and PVHes doesn't. This means you're left with fairly ugly options if you want to use btrfs with Xen, including HVMs. And if you have to use HVMs then why use Xen at all? There are virtualization platforms with much better support than Xen that do things the HVM way.

I said “kinda sorta” for xen-create-image because when I created a Debian 12 image with it (-dist bookworm) it was unbootable (well, it booted read only) because the fstab contained a non-btrfs option in the entry for “/”. We'll get to that in a moment.

So, before you begin, I must warn you that the solution I'm about to propose, while modifiable for whatever the devil it is you plan to do, requires eschewing xen-create-image for the heavy lifting and doing all the steps it does manually. But to make things easier, we'll at least create a template using xen-create-image.

So what I want you to do first is create an image somewhere, using EXT4 as the file system, using xen-create-image, with the same name as the domU you intend to create, go into it, kick the tires, make sure it works and has network connectivity, make any modifications you need to do to get it into that state, and then come back here. Oh, if it's a bookworm image, be prepared to fix networking, /etc/network/interfaces assumes an 'eth0' device. The easiest fix is change all references to it to 'enX0' (assuming that's the Xen Ethernet device you booted with.)

Other notes:

  • We're essentially doing a set of commands that all require root. Rather than sudo everything, I'm assuming you 'sudo -s' to get a root shell. It's much easier.
  • I'm assuming the use of volume groups, specifically using a group called 'vg-example', and am using 'testbox' as the hostname. The general flow should translate to other storage backends, but will require different commands. You can change everything as needed as you go, testbox to whatever you call the machine, etc.

Back up the image

OK, shutdown your VM, and then mount the image somewhere. Typically you'll see lines in your /etc/xen/testbox.cfg that read something like:

disk        = [
                  'phy:/dev/vg-example/testbox-disk,xvda2,w',
              ]

There may be two lines if you created a swapfile. Take the path for the root file system anyway, and mount it somewhere, say, /mnt (if you mount somewhere else, change /mnt to your mount point accordingly when following these instructions. Likewise vg-example and testbox-root should be changed to whatever you see in the disk line. If you're not using volume groups, then adjust accordingly. You can mount disk images using the loopback device, mount -o loop /path/to/disk.img /mnt, for example.):

# mount /dev/vg-example/testbox-disk /mnt

Now do the following:

Edit /mnt/etc/fstab and change the line that mounts root to assume btrfs and put in some good btrfs options, for example:

/dev/xvda2 / btrfs defaults,noatime,compress=lzo 0 1

cd to /mnt/boot, and type this:

# ln -s . boot

That bit of magic will help pygrub find everything, because ultimately we're going to create a separate boot partition, and pygrub will mount it and think it's root, so it'll get confused there's no “boot/grub/grub.cfg” file unless you put that softlink there.

Finally, cd to /mnt and type:

# tar zcf /root/testbox-image.tgz -S .

(The -S is something I type by habit, it just makes sure sparse files are compressed and decompressed properly. If you're paranoid you might want to look up the other options for tar such as those handling extended attributes. Or use a different archiver like cpio. But the above is working for me.)

Finally, umount the image, and then delete it. eg:

# umount /mnt # lvremove vg-example/testbox-disk

(If you're not using volume groups, ignore the last line, just use rm if it's a disk image, or some other method if it's not a disk.)

Creating the first domU

OK, so we have a generic image we can use for both this specific domU and new domUs in future (if you plan to create a whole bunch of Debian 12 images, there's no need to duplicate it. I'll explain later how to do this.)

First, let's create the two partitions we need, for root and boot. I'm going to assume 'vg-example' is the volume group for this, but nothing requires you use this. or even use volume groups. If you're not using volume groups, do an equivalent with whatever system you have. You can create disk images using dd, eg # dd if=/dev/zero of=image.img iflag=fullblock bs=1M count=100 && sync for a 100M file and use losetup (eg # losetup loop1 image.img) to create a virtual device so you can mkfs it and mount it.

# lvcreate -L 512M -n testbox-boot vg-example # lvcreate -L 512G -n testbox-root vg-example

This creates our two new partitions, root and boot. Note they don't have to be part of the same volume group. You could even, probably, make one a disk image file and the other a volume group partition.

Now create the file systems on both:

# mkfs.btrfs /dev/vg-example/testbox-root # mkfs.ext4 /dev/vg-example/testbox-boot

Now mount the main root:

# mount /dev/vg-example/testbox-root /mnt

Create the mount point for boot inside the root (this is important):

# mkdir /mnt/boot

Mount the boot partition

# mount /dev/vg-example/testbox-boot /mnt/boot

Finally create the image:

# cd /mnt # tar zxf /root/testbox-image.tgz

After you're done (if you need to do anything at all), just umount boot and root in that order:

# cd ; umount /mnt/boot ; umount /mnt

Finally, modify your .cfg file, so load it into your favorite editor.

# vi /etc/xen/testbox.cfg

Modify the disk = [ ] section to look more like this:

disk        = [
                  'phy:/dev/vg-example/testbox-boot,xvda3,w',
                  'phy:/dev/vg-example/textbox-root,xvda2,w'
              ]

If your original had a swap partition, leave the entry there.

OK, moment of truth: boot using

# xl create /etc/xen/testbox.cfg -c

It should come up with the Grub menu. And then it should boot into your VM. And when you log into your VM, everything should be working as it was before you made your modifications.

Creating clones from the original archive

This is easy too. Keep that archive around. For this guide we'll keep most things the same but call the new domU you're creating clonebox.

Before you begin, copy across /etc/xen/testbox.cfg to /etc/xen/clonebox.cfg. You can edit it as you go, but at least begin by changing all references of testbox to clonebox, and change the MAC address. You can easily get one from https://dnschecker.org/mac-address-generator.php: use 00163E as the prefix (I have no connection with the makers of that tool, it seems to work OK, just want to save you a search engine result.) Finally allocate a new IP address, assuming you're not using DHCP.

Now, create the two partitions we need, for root and boot. You can create a swap partition too if your original had one.

As earlier, I'm going to assume 'vg-example' is the volume group and we're using volume groups.

# lvcreate -L 512M -n clonebox-boot vg-example
# lvcreate -L 512G -n clonebox-root vg-example
# mkfs.btrfs /dev/vg-example/clonebox-root
# mkfs.ext4 /dev/vg-example/clonebox-boot

(Again, for the second one change the -L to whatever you need it to be. A useful alternative to know is “-l '100%FREE'” – yes, lowercase L, that'll give the remainder of the disk to that partition.)

If you created partitions whose device names do not match what's in /etc/xen/clonebox.cfg, modify the disks = [] line in the latter file to use the new device paths (eg /dev/vg-example2/clonebox-root if you use a different volume group.).

Mount it as before, adding the boot mountpoint, and unarchive the original archive:

# mount /dev/vg-example/clonebox-root /mnt
# mkdir /mnt/boot
# mount /dev/vg-example/clonebox-boot /mnt/boot
# cd /mnt
# tar zxf /root/testbox-image.tgz

Now before you unmount things, you'll need to adjust the image.

The default Debian image is borked and creates an /etc/network/interfaces file with the wrong Ethernet device. When you were fixing it earlier, you either fixed /etc/network/interfaces, or you created a file called etc/systemd/network/10-eth0-link that maps eth0 to the device with the right MAC address. The former is probably easier, but if you did the latter, update /mnt/etc/systemd/network/10-eth0-link to whatever the new MAC address is.

If you're using static IPs, you'll also need to change /mnt/etc/network/interfaces and include the IP for this domU.

Regardless of everything else, you'll also definitely need to modify these files:

  • /mnt/etc/hostname
  • /mnt/etc/hosts
  • /mnt/etc/mailname

Something else you probably want to do is update the SSH keys on the system. To do this, you can chroot the file system and then run a tool called ssh-keygen -A:

# chroot /mnt /bin/bash
# cd /etc/ssh
# rm ssh_host*key*
# ssh-keygen -A
# exit

Once you're done, unmount:

# cd ; umount /mnt/boot ; umount /mnt

Save your new .cfg file, and then:

# xl create /etc/xen/clonebox.cfg -c

If it comes up, log in, poke around, make sure networking's working etc.

Convert Xen PVs to PVHs

After you get a PV domU working, it's time to see if it'll work as a PVH. PVHs are like PVs but use a more efficient memory management technique. They're still under development and considered experimental technology after approximately a decade of development, largely because, well, Kevin's out sick, and everyone else has been hired by RedHat to work on KVM. So use at your own risk. But if you've ever heard someone tell you PVs are old hat, and the hotness is HVMs, and thrown up a little in the back of your mouth because why use Xen then, well PVHs are the things that are more efficient than both, using some of the lessons learned developing HVMs while keeping the efficiencies of having operating systems work with the hypervisor as with PV.

So, edit your /etc/xen/testbox.cfg or /etc/xen/clonebox.cfg or whatever, and add the line “type = 'pvh'” somewhere. Save. Do xl create /etc/xen/.cfg and check it comes up (in PV mode I find the kernel writes messages to the console, while in PVH it doesn't, so wait a minute or two before declaring it broken,

How to get out of the console

Control-]. Both the xl create commands have a -c to attach the console so you can see what's happening, but if you're not used to that, well, CTRL-] is the thing to use to escape from that.

OK, so this one's tough to describe. I had an issue where, because I wanted to have a VM use SSD storage, I wanted to use the btrfs file system. Turns out Xen's PV/PVH system is not btrfs friendly. xen-create-image will happily do it, but pygrub won't boot from a btrfs file system.

There are obvious workarounds. One is to boot the kernel directly, but then you have to pull the kernel and initrd.img files out of the guest's file system. Every time you do an update, you'll have to do this again. If the guest has the same OS as the dom0 then, I guess, you can use that kernel and just make sure you update both at the same time. But it's not really best practices, is it?

Another workaround you can do is create a separate boot partition. This isn't as easy as it sounds, as you'll need to point pygrub at it and somehow convince it the files in /boot are there, because it's looking for /boot/grub/grub.cfg, not [root of device]/grub/grub.cfg. You can do that with a softlink of course (cd /boot ; ln -s . boot) which is a bit hacky, but it should work. But using xen-create-image is going to be a little more complicated if you go down this route. If you're interested in going down this route, I have instructions here.

Finally there's the thing I'm experimenting with which may also solve other problems you had. And that's why I'm experimenting with it, because it does solve other problems.

What if... the domU's file system was actually native to the dom0 rather than just a block device there?

Well, you can do this with NFS. You can actually tell Linux to boot from an NFS share served from the dom0's NFS server, and have that NFS share be part of the dom0's file system. There's all kinds of reasons why this is Good, Actually™:

  • You can pick your own file system, hence the description above
  • You can have two or more domUs share storage rather than having to shut them down and manually resize their partitions every time they get too big.
  • Backing up becomes much easier as the dom0 has access to everything and is likely the VM any attached back-up media will mount on.
  • Very easy to clone VMs, there's this command called “cp” you can use.
  • A domU crashing will not destroy its own file system (probably)

There are, of course, downsides to this approach:

  • It's probably slightly slower.
  • There's some set up involved.
  • You better make sure your dom0 is locked down because it has all the data open to view in its own file system. Of course, technically it already does have the data available, just not visible in the file system.
  • Most importantly, some stuff just doesn't like NFS.

To clarify on the last point, Debian (at least) can only boot using NFS version 3 or lower. Several modern file system features, notably extended attributes, require version 4.2 (in theory a modified, non-default, initrd.img can be built that supports later NFSes, if I figure it out I'll update the instructions here.) The lack of extended attributes meant I couldn't run LXC on the host and install more recent Ubuntus, though older Ubuntus worked. Despite my best efforts I found certain tools, including a bind/Samba AD combination, just failed in this environment, becoming unresponsive.

The technique I'm going to describe is probably useless to HVM-only users. But you people should probably migrate to KVM anyway, so who cares.

So what do you need to do?

These instructions assume Debian “Bookworm” 12 is your dom0. If you're using something like XenServer or XCP-ng or whatever it calls itself today this won't be very useful, but PVs are deprecated on those platforms anyway, and PVHs didn't work at all from what I remember. They also assume use of a Linux kernel based domU.

Setting up the environment

So on your dom0, do this as root:

# apt-get install kernel-nfs-server rpcbind

Now there's a little set up involved, go edit /etc/defaults/nfs-kernel-server and change RPCMOUNTDOPTS=“—manage-gids” to RPCMOUNTDOPTS=“—manage-gids=no”

Now also edit /etc/nfs.conf, look for the line manage-gids=y and change it to manage-gids=n. There's at least two manage-gids=xxx lines in there, change the any that aren't commented out.

Both are necessary because NFS with that option starts using the computer it's running on's /etc/passwd and /etc/group files to figure out permissions if those are enabled, and this breaks group ownership unless the client and server have the exact passwd and group files. (Off topic, but why is this default? It'll break NFS shares in 99% of cases.)

# systemctl restart rpcbind kernel-nfs-server

Next up, create somewhere you can put all these NFS shares. You can change the below, just make sure to change it everywhere else when I give commands that refer to it later:

# mkdir /fs4xen

Final thing we're going to do is create an internal network for the domUs to access the NFS server with. We're keeping this separate from whatever network you've created to allow the domUs to talk to the network with to avoid the NFS server being accessed from the outside.

Edit your /etc/network/interfaces file and add this:

auto fsnet
iface fsnet inet static
        bridge_ports none
        address 172.16.20.1/16

(These are instructions for Debian but I'm aware many Ubuntu users will use Debian instructions as a starting point: If you're using Ubuntu, you'll want to do the Netplan equivalent. You should already have an example bridge in your current /etc/netplan/xxx file, the one used for allowing your domUs to access the outside world. This is more or less the same, except you don't need to bridge to an external port.)

That's the initial set-up out of the way. How do we create an image that uses it?

Creating domUs that boot from NFS

First create your PV (we'll upgrade it to a PVH later) Do it however you normally do, xen-create-image will usually do most of the work for you. But for the sake of argument, let's pretend we're left with a configuration file for the domU that looks a bit like this when stripped of comments:

bootloader = 'pygrub'
vcpus       = '4'
memory      = '16384'
root        = '/dev/xvda2 ro'
disk        = [
                  'phy:/dev/vg-example/testbox-root,xvda2,w',
              ]
name        = 'testbox'
vif         = [ 'ip=10.0.2.1 ,mac=00:01:02:03:04:05' ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'

If this configuration is supported by your current version of Xen (for example, you used a regular file system), then you might want to test it works, just to make sure. But shut it down immediately afterwards before continuing.

Verify it isn't running with xl list.

Allocate a new IP address on the internal NFS network, I'm going to use 172.16.20.2 for this example. (I mean, manually allocate it, just look for an IP that isn't in use yet.)

Now, add the root file system to your domU's fstab, and exports:

# mkdir /fs4xen/testbox
# echo '/dev/vg-example/testbox-root  /fs4xen/testbox     btrfs   noatime,compress=lzo    0       0' >> /etc/fstab
# mount /fs4xen/testbox
# echo '/fs4xen/testbox     172.16.20.2(rw,no_root_squash,sec=sys)' >> /etc/exports
# exportfs -a

If you get any errors when you enter the mount command, check /etc/fstab has the right parameters for your file system. I've assumed btrfs here. You should know roughly what the right parameters are anyway!

You may also get warnings when you type 'exportfs -a' about something called 'subtree_check', you can ignore those. Anything else you should go back and fix.

Finally, we rewrite the Xen .cfg file. This is what the new file would look like based upon everything else here. Use your favorite editor to make the changes and back up the original in case something goes wrong:

vcpus       = '4'
memory      = '16384'
kernel='/fs4xen/testbox/vmlinuz'
root='/dev/nfs'
extra=' rw elevator=noop nfsroot=172.16.20.1:/fs4xen/testbox,vers=3 ip=172.16.20.2:172.16.20.1::255.255.0.0::enX1:off:::'
ramdisk='/fs4xen/testbox/initrd.img'
name        = 'testbox'
vif         = [ 'ip=10.0.2.1 ,mac=00:01:02:03:04:05'
                'ip=172.16.20.2,mac=00:01:02:03:04:06,bridge=fsnet' ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'

Now just boot it with:

# xl create /etc/xen/testhost.cfg -c

and verify it boots and you can log in. Assuming you can, go in and create a file somewhere you have permissions (not /tmp as that's rarely exported), then switch to another session on your Xen dom0, and verify it exists in the same place under /fs4xen/testbox.)

If everything's fine, you can exit the console using Control-] in the usual way.

Turn a PV into a PVH

PVs are the default domU type created by xen-create-image, not least because they're supported on even the worst hardware. But they're considered obsolete, largely because AMD64 introduced better ways to sandbox virtualized operating systems. The new hotness is PVH which uses the newer processor features, but takes advantage of the aspects of PVs that made Xen wonderful and efficient to begin with.

Here's how to turn a PV into a PVH:

  1. Edit the .cfg file in your favorite editor
  2. Add the line “type = 'pvh'” somewhere in the file. Remove any other “type=” lines.
  3. Save
  4. Start the dom0, using 'xl create /etc/xen/testhost.cfg -c' and verify it boots to a login prompt.

If it doesn't boot, it may be a hardware issue, or the guest operating system itself might have an issue with PVHs. The most common “hardware” issue is that either your CPU doesn't support virtualization, or it does but it's been disabled in the BIOS. So check your BIOS for virtualization settings.

Yes, another showerthought inspired TL;DR blog. But “DR” is in the name of the blog so it's not exactly as if I'm bothering you with this.

Anyway. Microkernels. Good idea? Or bad idea? Torvalds was right... or wrong?

Well, both. Let's have a talk about what they are first.

In the 1980s, Microkernels were considered THE FUTURE. Virtually every respected academic pointed out that shoving lots of complicated code into one computer program that your entire computer relied upon was insanely stupid. They argued that it should be relatively easy to split out your kernel into “servers”, single purpose programs that managed specific parts of the system. For example, you might have a Disk Operating System that applications use to access files. That Disk Operating System might talk to Handler processes that manage the file systems on each device. And those in turn might talk to the Device Drivers. Each part of what in modern terms would be a kernel talking to programs that can, at worst, crash without taking down the rest of the system. The servers – DOS, the File System Handlers, the Device Drivers, would all be processes running under a very simple kernel (the microkernel! Geddit?!) that would schedule processes, manage memory, and provide ways for the servers to communicate with one another.

(Microkernels should not be confused with hypervisors, which is a thin kernel intended to run multiple operating systems. Though much of the early hype about microkernels overlapped, with advocates pointing out that in theory you could create “personalities”. For example, in the above example, in addition to having the Disk Operating System, you could have a completely different API server, with that server providing a view of the world that looked like Unix. And that server could talk to its own set of handlers or the same set.)

Academics generally agreed that microkernels were the only acceptable design for modern operating systems. Against this were more traditional operating systems, Unix being an obvious example, which had a single monolithic kernel with all the file systems and device drivers compiled into it. (Systems like CP/M and MS DOS weren't really advanced enough to have a discussion about.)

Microkernels enter the real world

Academia brought us MINIX and Mach during the 1980s. Mach was the basis of several commercial projects such as MkLinux and, more successfully, XNU (the kernel of NEXTSTEP and Mac OS X) but those commercial projects weren't microkernels, they were always hybrid kernels – kernels where most of the servers were integrated into a single space where they could freely communicate with one another at the cost of security.

The commercial world in turn tried to implement the concept but inevitably failed. Many readers will have read my description of how microkernels work above, mentioning “DOS” and “handlers” and “Device drivers” and immediately think of AmigaOS, which was structured as a microkernel based system, but wasn't one. At first sight it's easy to see why, the Amiga had no memory management chip so it literally wasn't possible to sandbox the different components. But in reality the problems were deeper than that. AmigaOS demonstrated that you could get good performance out of a microkernel operating system if the different components could quickly and easily communicate with one another. In AmigaOS, a device driver could talk to a handler just by sending it, via the kernel, the address of, say, where it had just loaded some data from disk. Suddenly that handler had 512 bytes of data available to do with whatever it needed to do. But that's not compatible with how memory management is done in modern CPUs. Modern CPUs are about sandboxing processes, sending 512 bytes from one process to another means rather more than simply sending it a four byte address, it involves either reconfiguring the memory map of both processes to see the same 512 byte block of RAM, or asking the kernel to copy that data byte by byte. These are expensive operations. AmigaOS only worked because there was no memory management as we know it, just a giant shared block of memory everything used. And because memory was shared, a crash by one device driver could, actually, take the entire system down, rather than just affect access to the device involved.

This expense in the end crippled a series of other commercial projects that almost certainly looked like a good idea at the time, elegant, modular, exactly the type of thing every programmer starts to develop only to realize will never work once they start coding. A big question for me in the 1980s was why Acorn lumbered the amazing ARM-based computers they created, with a crappy third rate operating system descended from Acorn's 8 bit BBC OS, “MOS”. The answer is... they did try to create a modern microkernel-based OS for it, called ARX, and immediately got stuck. Despite running on one of the world's fastest microcomputer environments, the system had performance issues that the creators couldn't get around. The moment the elegant design hit reality, it failed, and Arthur/MOS's “good enough” environment was expanded into RISC OS, which used cooperative multitasking and other kludges to make something useful out of a woefully underpowered base.

On the other side of the Atlantic, other companies enthusiastically writing next generation operating systems also had the same issues. Apple started, then co-funded, then walked away from, Taligent. DEC was keeping Dave Cutler busy with MICA, which didn't go anywhere. Finally Microsoft, which was working on a more traditional system with IBM (OS/2), for various reasons hired Dave Cutler away from DEC after MICA's cancellation to develop Windows NT.

The latter was the nearest commercial microkernel based operating system to achieve some level of success. In practice though, Microsoft didn't feel comfortable making Windows NT its primary operating system, despite high levels of compatibility from NT 4 onwards, until the early 2000s, at which point the system was no longer a classic microkernel system, with many essential services, including the graphics drivers (!), integrated into the main kernel.

So why the failures?

At first sight, it's easy to blame the failure of microkernels on performance issues. But that's not actually what happened. There are two bigger issues: the first was that in the case of most commercial microkernel projects, the microkernels were a part of a much bigger project that attempted to build an operating system from scratch that were elegant and well designed. The microkernel was only one component.

But the second was modern memory management. At some point in the 1980s, the major makers of microcomputer CPUs started to release advanced, secure, memory management systems for their existing CPUs. Motorola and Intel both grafted virtual memory systems onto their existing systems by allowing operating systems to rearrange the addressable memory as needed. This was all that was needed for Unix to work, and Unix was considered the most advanced operating system a personal computer user would want to run.

And yes, Unix somehow managed to be both a very big deal and an irrelevance in the personal computing world. Microsoft, acknowledging MS DOS 1.0 was little more than a CP/M like program loader, saw DOS's future as converging with Xenix, its Unix fork. The press described anything with multitasking, from AmigaOS to OS-9, as “Unix-like”, no matter how unlike Unix it was, because Unix was seen as The Future.

So from the point of view of the big CPU makers, a simple memory remapping system was “good enough” for the most advanced operating systems envisaged as running on their chips. There was another factor behind both Intel and Motorola designing MMUs this way: Motorola had designed a very successful 32-bit ISA for its CPUs that programmers adored. Intel's segmented approach had proven to be a failure, propped up only by IBM's decision to include the 8088 in its PC. Intel was focusing on making a pure 32 bit ISA for its next generation of processors, while Motorola saw no need to change its ISA, and saw MMUs as something that could be bolted on to the architecture of a 68000-based system. By the time it became important, neither saw any value in taking a risk and introducing architectures that would integrate memory management with their ISAs.

Why is this important? Well, go back to the AmigaOS description earlier. In the Amiga, the pseudo-microkernel was fast because servers only needed to send each other addresses to transmit large amounts of data between them. On the 68000 ISA there is no way to graft security onto this system – you can't validate a pointer or the memory it points to, but in the mid-1960s and early 1970s, hardware memory management systems were devised that allowed exactly this kind of thing. The system is called Capability Addressing. Capabilities are pointers to blocks of memory, typically with permissions associated with them (like a file.) Creating new capabilities is a privileged operation, you can't just use some pointer arithmetic to create one. Storing a capability in memory requires the CPU have some way to flag that value as being a capability, typically an extra bit for every word of memory. This way programs can load and store capabilities in memory without risking reading normal data as a pointer or vice versa.

A capability architecture would be perfect for an operating system like AmigaOS. It would, with relatively small modifications, be secure. The different servers would be able to communicate by passing capabilities instead of pointers. If one crashes, it wouldn't be able to write to memory not allocated to it because it wouldn't have any capabilities in its memory space that point at that data.

The problem, of course, is that no popular CPUs support capabilities, and most of those that did were also considered failures. Intel tried to produce a system in the very 1980s called iAPX 432 which was not part of their 80x6 family. It was chronically slow. And the 1980s were not a time to produce such a chip, the extra bit required for each 32 bit (at minimum) pointer would have been considered cost prohibitive at a time when computers came wit hundreds of kilobytes of RAM.

It would be remiss of me to mention there was also another theoretical possibility: managed code. In managed code, programs are compiled to an intermediate language, which can be proven “secure” – that is, unable to access resources it hasn't been given direct access to. The two most famous examples are the Java Virtual Machine and .NET. Both systems have problems however: their garbage collectors require the memory of the machines they're running on be locked for indeterminate amounts of time while they account for what's in use (a process called “marking”), though it's worth mentioning that Rust's alternative approach to garbage collection proves a VM could be built with better real time behavior. Another problem was that during the 1980s C became the standard applications development language, with personal computers not being taken seriously unless they were capable of running it: but the high level approach of a VM is at serious odds with C's low level memory management, making it impossible to create an efficient C compiler for such an environment.

So, TL;DR, it wasn't that microkernels were wrong, it's that the technology choices of the 1980s and 1990s, the time when it was most important, made microkernels inefficient and difficult to implement. By the time memory prices had fallen to a point that a CPU architecture optimized for microkernels would have been viable, the world had standardized on operating systems and system architectures that weren't compatible with the concept.

The failures of the 1980s were mostly because developers were being overly ambitious and didn't have the right architectures to work with in the first place.

All of which is a shame. I'd love my primary OS to be like AmigaOS is/was, but with security.

I have no raw figures, I have tried to get them from the Internet. All I have are memories. And the biggest I had was a colleague distributing Ubuntu CDs at work during the mid-2000s. Ubuntu was the next big thing, it was said. And when I tried it, I had to admit, I couldn't blame anyone from saying so.

Ubuntu in the 2000s was a first class OS. It had the following massive features:

  • It ran on pretty much anything powerful enough. The installer was first rate, booted in the most trying conditions, and installed an image with your wireless, sound and accelerated video all ready to use. Well, OK, ATI cards needed that fglrx thing to get acceleration, and I can't remember exactly how you installed it, but I do know it wasn't hard.
  • It ran GNOME 2. For those who are wondering why that was a good thing, GNOME 2 was basically an intuitive user interface that was somewhat more obvious and consistent than Windows, and maybe a small step back from Mac OS X. It was customizable but...
  • ...it had sane defaults everywhere. The default Ubuntu desktop at that time was easy to understand.

Did you have to drop into the command line to do anything? That depended. You sometimes did in the same way you sometimes have to with Windows or Mac OS X. You have an obscure set of technical conditions, and you need to debug something or configure something equally obscure, and just like Mac OS X and Windows you'd have to use the “nerd” user interface. But an “average” user who just wanted a web browser and office suite would not ever need to do that.

So it wasn't surprising that, anecdotally (like I said, it seems to be rough getting any concrete figures, Statcounter claims a 0.65 marketshare for “Linux” in 2009 but I don't trust them as far as I can throw them, and more importantly they have no pre-2009 figures online making it hard to show growth during that period. Also it's contradicted by other information I'm finding on the web) Ubuntu started to drive installs of GNU/Linux. People really seemed to like it. I even heard major figures in the Mac world at the time switching. Ubuntu was the OS everyone wanted, it “just worked”.

So what happened in the 2010s to halt this progress? Everything changed? Yes.

And by everything, I mean Ubuntu.

Ubuntu decided to change its user interface from GNOME 2 to Unity. In part this was driven by the GNOME team themselves who, for whatever reason, decided the GNOME 2 user interface was obsolete and they should do something different.

I'm not necessarily opposed to this thinking, except the “obsolete” part, but what neither party (Canonical, authors of Ubuntu and the Unity user interface, and the GNOME team) did was to go about this understanding the impact on existing users. Namely:

  • The user interfaces they proposed were in most cases radically different from GNOME 2. So existing users wanting to upgrade would find they would literally have to learn how to use their computers again.
  • The user interfaces proposed only partially used the paradigms that everyone had gotten used to and trained on during the 1990s. GNOME 3 in particular switched to a search model for almost everything. Unity was a little more standard, but launching infrequently used applications in both environments was confusing. These user interfaces were only slightly closer to what had become standard in the 1990s than the new mobile touchscreen UIs that doubtless had influenced their authors.

To understand how massive a problem this was, look at Apple and Microsoft's experience with user interface refreshes.

Apple does it right

Let's start with Apple, because Apple didn't fail:

In the 1990s and early 2000s, Apple switched from their 1980s MacOS operating system to the NEXTSTEP derived Mac OS X. NEXTSTEP and MacOS were nothing alike from a user interface point of view, making shipping NEXTSTEP with new Macs a non-starter. So Apple took pains to rewrite the entire NEXTSTEP user interface system to make it look and feel as close as possible to contemporary MacOS.

The result was Rhapsody. Rhapsody had some “feel” issues in the sense of buttons not quite responding the same way they did in MacOS, some things were in a different place, and running old MacOS applications felt clumsy, but a MacOS user could easily switch to Rhapsody and while they would be aware they were running a new operating system, they knew how to use it out of the box.

Rhapsody was well received by those who got it (it was released in beta form to developers, and sold for a while as Mac OS X Server 1.0), but from Apple's point of view, they still had time to do better. So they gave the operating system's theme an overhaul, creating Aqua. But the overhaul was far more conservative than people give Apple credit for:

  • If something was recognizably a button in Rhapsody/MacOS, it was recognizably a button in Aqua.
  • If something was a window in Rhapsody/MacOS, it was recognizably a window in Aqua.
  • If you did something by dragging it or clicking it or poking your tongue out at it in Rhapsody/MacOS, you'd do the same thing in Aqua.
  • If it was in the top left corner in Rhapsody/MacOS, it was in the top left corner in Aqua. Positions generally stayed the same.

...and so on. The only major new user interface element they added was a dock. Which could even be hidden if the user didn't like it.

So the result, when Apple finally rolled this out, was an entirely new operating system with a modern user interface that looked fantastic that was completely 100% usable by people used to the old one.

Microsoft “pulls a Ubuntu/GNOME” but understands how to recover

In some ways saying Apple did it right and Microsoft didn't is unfair, because Microsoft has done operating system upgrades correctly more times than you might imagine. And they even once even managed a complete GNOME-style UI overhaul that actually succeeded: replacing Windows 3.x's UI with Windows 95's UI. At this time they were successful though for a variety of reasons:

  • Windows 3.x was really hard to use. Nobody liked it.
  • The new Windows 95 user interface was a composite UI based upon Mac OS, Amiga OS, GEM, Windows 1.x, OS/2, and so on. It was instantly familiar to most people who had used graphical mouse-driven user interfaces before.
  • In 1995, there were still people using DOS. Windows 3.x was gaining acceptance but wasn't universally used.

Since then, from 1995 to 2012, Microsoft managed to avoid making any serious mistakes with the user interface. They migrated NT to the 95 UI with Windows NT 4. They gave it a, in my view ugly, refresh with Windows XP which was a purely visual clean up similar to, though not as radical as, the Rhapsody to Aqua user interface changes I noted above. But like Rhapsody to Aqua, no serious changes in the user interface paradigm were made.

They did the same thing with Vista/7 creating a clean, composited, UI that was really quite beautiful, yet, again, kept the same essential paradigms so a Windows 95 user could easily switch to Windows 7 without having to relearn anything.

Then Microsoft screwed up. Convinced, as many in the industry were at the time, the future was touch user interfaces and tablets, they released Windows 8, which completely revamped the user interface and changed how the user interacted with the computer. They moved elements around, they made things full screen, they made things invisible.

Despite actually being very nice on a tablet, and despite PC manufacturers pushing 2 in 1 devices hard on the back of Windows 8's excellent touch screen support, users revolted and refused to have anything to do with it.

Windows 8 generated substantial panic at Microsoft, resulting in virtually all the user interface changes being taken out of Windows 10, its major successor. Windows 10 itself was rushed out, with early versions being buggy and unresponsive. But compared to Windows 7, the user interface changes were far less radical. It retained the Windows 7 task bar, the start menu, and buttons were where you'd expect them. A revised preferences system was introduced that... would have been controversial if it wasn't for the fact earlier versions of Windows had a fragmented system of half written preferences systems anyway. A notifications bar was introduced, but it wasn't particularly radical.

But windows, buttons, etc, all operated the same way they did in Windows 7 and its predecessors.

What is NOT the reason Ubuntu ceased to be the solution in the 2010s.

Amazingly, I've heard the argument Ubuntu failed because the underlying operating system is “too nerdy”. It isn't. It's no more nerdy than Mac OS X, which was based on a similar operating system.

Mac OS X is based on a kernel called XNU, which in turn is based on a kernel called Mach, that's been heavily modified, and a userland that's a combination of – let's call it user interface code – and BSD. There are some other small differences like the system to manage daemons (in old school BSD this would have been bsdinit), but nothing that you'd immediately notice as an end user.

All versions of GNU/Linux, including Ubuntu, are based on a kernel called Linux, and a userland that's a combination of the GNU project and some other projects like X11 (which maintains the core windowing system) and some GNU projects like GNOME (which does the rest of the UI.) There are multiple distribution specific changes to things like, well, the system to manage daemons.

So both are XNU or Linux, BSD or GNU, and then some other stuff that was bolted on.

XNU and Linux are OS kernels designed as direct replacements for the Unix kernel. They're open source, and they exist for slightly different reasons, XNU's Mach underpinnings being an academic research project, and Linux being Linus Torvald's efforts to get MINIX and GNU working on his 386 computer.

BSD and GNU are similar projects that ultimately did the same things as each other but for very different reasons. They're both rewrites of Unix's userland, that started as enhancements, and ultimately became replacements. In BSD's case it's just a project to enhance Unix that grew into a replacement because of frustration at AT&T's inability to get Unix out to a wider audience. In GNU's case, it was always the plan to have it replace Unix, but it started as an enhancement because it's easier to build a replacement if you don't have to do the whole thing at once.

So... that's all nerd stuff right? Sure. But dig into both OSes and you'll find they're pretty much built the same way. A nice friendly user interface bolted onto that Unix-like underpinnings that'll never be friendly to non-nerds. So saying Ubuntu failed because it's too nerdy is silly. Mac OS X would have failed for the same reason if that were true. The different origins between the two does not change the fact they're similar implementations of the same underlying concept.

So what did Ubuntu do wrong and what should it have done?

The entire computer industry at this point seems to be obsessed with changing things for the sake of doing so, to make it appear they're making progress. In reality, changes should be small, and cosmetic changes are better for both users and (for want of a better term) marketing reasons than major paradigm changes. The latter is bad for users, and doesn't necessarily help “marketing” as much as marketing people think it helps them.

Ubuntu failed to make GNU/Linux take off because it clumsily changed its entire user interface in the early 2010s for no good reason. This might have been justifiable if:

  • The changes were cosmetic as they were for the user interfaces in Windows 95 vs XP vs Vista/7 vs 10/11, and Rhapsody vs Aqua. They weren't.
  • The older user interface it was replacing was considered user unfriendly (like the replacement of Windows 3.1's with 95.) It was, in fact, very popular and easy to use.
  • The older user interface prevented progress in some way. If this is the reason, the apparent progress GNOME 3+ and Unity enabled has yet to be identified.
  • The older user interface was harder for users migrating from other systems to get used to than its replacements. This is laughably untrue.

Radically changing a user interface is a bad idea. It makes existing users leave unless forced to stay. And unless it's sufficiently closer to the other user interfaces people are using, it won't attract new users. It was a colossal misstep on GNOME and Canonical's part.

GNOME 3/Unity should, to put it bluntly, have had the same fundamental paradigm as GNOME 2. Maybe with an optional dock, but not the dock-and-search focused system they put in instead.

Where both teams should have put their focus is simple modernization of the look and focused larger changes on less frequently used parts of the system or internals needed to attract developers. I'm not particularly pro-Flatpak (and Snap can die a thousand deaths) but making it easier to install third party applications (applications not in repositories) would have also addressed some of the few holes in Ubuntu that other operating systems did better. There's a range of ways of doing this that do not involve sandboxing things and forcing developers to ship and maintain all the dependencies of their applications such as:

  • Identifying a core subset of packages that will only ever be replaced by backward compatible versions in the foreseeable future, and will always be installed by default and encouraging static linking for libraries outside of those packages, even making static linking default. (glibc and the GTK libraries are obvious examples of the former, libraries that should be fully supported going forward with complete backward compatibility, while more obscure libraries and those that have alternatives, image file parsers would be an example, should be statically linked by default.)
  • Supporting signed .DEBs
  • Making it easy to add a third party repository while sandboxing it (to ensure only relevant packages are ever loaded from it) and authenticating the identity of the maintainer at the time it's added. (Canonical's PPA system is a step in the right direction but it does force the repos to be maintained by them.)
  • Submitting Kernel patches that allow for more userland device drivers (giving them a stable ABI)

Wait! This is all “nerd stuff”. But non-nerds don't need to know it, from their perspective they just need to know that if they download an application from a website, it'll “just work”, and continue to when they install GreatDistro 2048.1 in 24 years.

What is NOT the solution?

The solution is not an entirely different operating system, because any operating system that gets the same level of support of GNU/Linux will find itself making the same mistakes. To take, for example, off the top of my head, no particular reason to select this one except it's a well regarded open source OS that's not GNU/Linux, ooh, Haiku, the OS inspired by BeOS?

Imagine Haiku becoming popular. Imagine who will be in charge of it. Will these people be any different to those responsible for GNOME and Canonical's mistakes?

No.

Had Haiku been the basis of Ubuntu in the 2000s, it's equally possible that Haiku would have suffered an unnecessary user interface replacement “inspired” by the sudden touch screen device craze. Why wouldn't it? It happened to GNOME and Ubuntu. It happened to Windows for crying out loud. Haiku didn't go there not because it's inherently superior but because it was driven by BeOS loving purists in the time period in question. If Haiku became popular, it wouldn't be driven by BeOS loving purists any more.

Frankly, I don't wait Haiku to become popular for that reason, it'd ruin it, I'd love however for using fringe platforms to be more practical...

Been using this today:

https://cambridgez88.jira.com/wiki/spaces/OZVM/overview

The Z88 was the last computer released by Sinclair Research (using the name Cambridge Computer as Amstrad by then had bought the rights to the Sinclair name.) The Z88 was an A4-paper (that's “Like Letter-size but sane” to 'murricans) sized slab-style laptop computer. By slab-style I mean the screen and keyboard were built into a single rectangular slab, it didn't fold like a modern laptop. It was Z80 based, had solid state memory, and a 640x64 monochrome (supertwist LCD) display which looked gorgeous. There was 32k of battery backed RAM but my understanding is functionality was very limited unless you put in a RAM expansion – other than the Spectrum that was a Sinclair trademark. In classic Sinclair style it had a rubber “dead flesh” keyboard, though there was a justification given, that the keyboard was “quiet” and that was probably legitimately a selling point.

Sir Clive had a dream dating back to the early 1980s that everyone should have a portable computer that was their “main” computer. The idea took shape during the development of the ZX81, and was originally the intended use of the technologies that went into the QL. Some of the weirder specifications of the QL, such as its 512x256 screen being much wider than the viewable area of most TVs, came from Sinclair's original intention to use a custom CRT with a Fresnel lens set up as the main display for the machine. Early on it was found that the battery life of the portable computer designed around the ZX83 chips was measured in minutes, and the idea was discarded. (I believe, from Tebby's account, that the ZX83 chips remained unchanged because they started to have difficulty getting new ULA designs tested.)

So... after selling up to Amstrad, Sinclair tried one last time and made a Z80-based machine. He discarded both Microdrives (which weren't energy efficient, and I suspect belonged to Amstrad at this point) and his cherished flat screen CRT technologies (which were widely criticized) and finally adopted LCDs. And at that point it looks like everything came together. There were still issues – the machine needed energy efficient static RAM which did (and does) cost a small fortune, so the Z88 had limited storage in its base form. Flash was not a thing in 1988, EEPROMs were expensive and limited, but more conventional EPROMs (which used UV light to reset them) were affordable storage options.

So, with a combination wordprocessor/spreadsheet (Pipedream), BASIC, Calendar/clock, and file management tools, the computer was definitely finally useful.

I never got a Z88 as I was still a teenager at the time and the cost was still out of my league. When I got my QL it was 80GBP (on clearance at Dixons) which I just had enough savings for. Added a 25GBP monitor a few months later. But that gives you some idea of the budget I was on during the height of the original computer boom.

Anywho, IIRC the Z88 ended up being around 200GBP and the media was even more expensive, which would have been a hell of a gamble for me at the time given despite Sir Clive's intentions it was far from a desktop replacement. It had limited programmability – it came with BBC BASIC (not SuperBASIC, as Amstrad now had the rights to that) but otherwise development was expensive. And a 32K Z80 based computer in 1988 was fairly limited.

But I really would have gotten one had I had the money. I really loved the concept.

The emulator above comes as a Java package that requires an older version of Java to run. It wouldn't start under OpenJDK 17 (as comes with Debian 12), but I was able to download OpenJDK 6 from Oracle's site (https://www.oracle.com/java/technologies/javase-java-archive-javase6-downloads.html) which ran fine from the directory I installed it into without having to mess with environment variables.

Anyway, a little glimpse into what portable computing looked like in the 1980s, pre-smartphones and clamshell laptops.

See also:

There's also the ill-fated Commodore LCD, a 6502 KERNAL based system designed by Bill Herd. It wasn't a slab, having a fold out screen, but was similar in concept. It was killed by an idiotic Commodore Manager who asked Radio Shack if they should enter the market with a cheap laptop, and who believed the Radio Shack executive he spoke to when said exec told him there wasn't a market. Radio Shack was, of course, selling the TRS-80 Model 100 at the time, and making money hand over fist.

Final comment: these types of slab computer weren't the only “portable computers” in the 1980s. Excluding luggables (which weren't true portables in the sense they couldn't run without a mains power connection), and a few early attempts at clamshell laptops, there were also “pocket computers”. Made mostly by Casio and Sharp, these were miracles of miniaturization, usually with only a few kilobytes of memory at most and a one or two line alphanumeric LCD display. I had a Casio PB-80 which had about 500 bytes of usable memory. (IIRC they called bytes “steps”, reflecting the fact these things were designed by their manufacturer's programmable calculator divisions) They did have full versions of BASIC, and arguably their modern successors are graphing calculators. These devices were nice, but their lack of any communications system or any way to load/save to external media made them limited for anything beyond really simple games and stock calculator functions.

So again, a set of random thoughts. But it culminates with wondering whether the official story behind at least one of the major UI changes of the 21st Century isn't... bollocks.

History of the GUI

So let's go back to 1983. Apple releases a computer called the Lisa. It's slow, expensive, and has a few hardware flaws, notably the stretched pixels of the screen that seemed OK when they were designing it but obviously broke it later on. But to many, it's the first glimpse of the modern GUI. Drop down menus. Click and double click. Icons. Files represented by icons. Windows representing documents. Lisa Office was, by all accounts (I've never used it) the pioneer that set the stage for everything that came afterwards. Apple is often accused of stealing from Xerox, and certainly the base concepts came from Doug Engelbart's pioneering work and Xerox's subsequent development of office workstations, but the Lisa neatly fixed most of the issues, and packaged everything in a friendly box.

The Mac came out a year later, and while the Mac is often described as a low cost version of the Lisa, that's not really fair to the Mac team. The latter were developing, for the most part, their system at the same time as the Lisa, and swapped ideas with one another. The Mac contained hardware changes such as 1:1 pixels that never made it into the Lisa, cut a sizable number of things down so they'd work on lower capacity hardware, and introduced an application-centric user interface compared to the Lisa's more document-centric approach.

Meanwhile Microsoft and Digital Research tried their hands at the same thing, Microsoft influenced primarily by the Lisa, and DR by the Mac, with the latter's GEM system coming out almost exactly a year after the Mac, and Microsoft's Windows, after a lot of negotiations and unusual development choices, coming out nearly a year after that.

The cat was out of the bag, and virtually every 16/32 bit computer after the Macintosh came with a Mac/Lisa inspired GUI from 1985 onwards. There are too many to name, and I'd offend fans of {$random 1980s 16/32 bit platform} by omitting their favorite trying to list them all, but there were a lot of choices, a lot of great and not so great decisions made, some were close to the Mac, others were decidedly different, though everyone adopted the core window, icon, pointer, mouse, scrollbars, drop downs, etc, concepts, from NeXT to Commodore.

By the early 1990s, most mainstream GUIs, Windows and NEXTSTEP excepted, were very similar, and in 1995, Microsoft's Windows 95 brought Windows to within spitting distance of the that common user interface. The start menu, task bar, and the decision to put menus on the tops of every window instead of the screen, distinguished Windows from the others, but it was close enough that someone who knew how to use an Amiga, ST, or Mac could use a Windows PC and vice versa without effort.

Standardization

But what made these UIs acting in a similar way useful wasn't cross platform standardization, but cross application standardization. Even before Windows 95, there was an apex to the learning curve that everyone could reach. If you knew how to use Microsoft Excel, and you knew what a word processor was, you could use Microsoft Word. You could also use Wordperfect. You could also use Lotus 123, at least, the Windows version when it finally came out.

This was because despite differences in features, you operated each in the same way. The core applications built a UI from similar components. Each document had its own window. The menu was ordered in approximately the same way. The File menu allowed you to load and save, the Edit menu allowed block copying, if there was a format menu, that allowed you to change roman to italics, etc. Tools? You could reasonably guess what was there.

Underneath the menu or, in the Mac's case, usually as moveable “palettes” were toolbars, which were frequently customizable. The blank sheet of paper on one let you create a new document, the picture of the floppy save it. The toolbar with the bold B button, underlined U button, and drop down with a list of fonts, allowed you to quickly adjust formatting. So you didn't have to go into the menus for the most common options.

The fact was all programs worked that way. It's hard to believe in 2024, because most developers have lost sight of why that was even useful. To an average dev in 2024, doing the same thing as another program is copying it. But to a dev in 1997, it meant you could release a complex program to do complex difficult understand things that people already knew how to use.

Microsoft breaks everything

You may have noticed that's just not true any more. Technically both, say, Chrome and Firefox have the regular drop down menus still, but they've gone to great levels to hide it, and encourage people to use an application specific “Hamburger menu” instead. And neither has a toolbar. The logic is something like “Save screen space and also make it work like the mobile version”, but nobody expects the latter, and “saving screen space” is... well, an argument for another time.

(Side note: I've been arguing for a long time among fellow nerds that the Mac's “top of screen” rather than “Top of window” approach to having menus is the superior option (I'm an old Amiga user), and explained Fitt's Law to them and how putting a menu at the top of the screen makes it easy to quickly select the menu options when trying to do it when the menu is at the top of a window is fiddly, and usually the response comes back “Oh so you're saying it saves screen space? Pffft who needs to, we all have 20” monitors now”, and I shake my head in annoyance at the fact nobody reads anything any more, not even things they think they're qualified to reply to. Dumbasses. No wonder GUIs have gone to hell. Anywho...)

Anyway, while it's kind of relevant that nerds don't appear to understand why UIs are designed the way they are and aren't interested in finding out why, that's not the point I was making, which was obviously if “We all have 20” monitors now so have plenty of space” is some kind of excuse for wasting desktop space, then refusing to have a menu in the first place isn't justifiable on that basis.

But Google and Mozilla are just following a trend. The trend wasn't set by either (though they're intent on making things worse), and wasn't even set by the iDevices when Apple announced them (though those have given devs excuses to make custom UIs for their apps.) It was set by Microsoft, in 2007, with the introduction of The Ribbon.

The Ribbon is an Office 2007 feature where menus and toolbars have been thrown out and replaced by a hard coded, giant, toolbarish thing. Things are very, very, roughly categorized, and then you have to scan the entire thing to find the function you want on the ribbon tab you think it might appear on because they've been put on in no particular order.

It is, hands down, the single worst new UX element ever introduced in the post-1980s history of GUIs. Not only do you now need to learn how to use an application that uses it, because your knowledge of how other similar applications work no longer applies, but you can spend your whole life not realizing basic functionality exists because it's hidden behind a component in the ribbon that's not immediately relevant.

And learning how to use Excel, and knowing how a word processor works (maybe you used Office 2003?) brings you no closer to knowing how to use Microsoft Word if you use a ribbon version.

Microsoft was roundly criticized for this UI innovation, and a good thing to, but Microsoft decided, rather than responding to criticism, to dig in their heels and wait for everything to blow over. They published usability studies that claimed it was more productive, but it's unclear how that could possibly be true. The claim was also made that it was based upon earlier usability studies. Users, it was claimed, always used the toolbar and almost never used menus, for everything!

Well, no sugar Sherlock. Most toolbars are set up by default to have the most frequently used features on them. And many of the menu options users remember the keyboard shortcuts so use those. So of course people will rarely dig into the menus. The menus are there to make it easy to find every feature, not just the most frequently used features, so it stands to reason they'd be rarely used if they're only being used to find infrequently used functionality!

My personal theory though is that this wasn't a marketing department making a bad choice and wanting to stand by it to save face. This was a deliberate decision by Microsoft to push through a UI change that would intentionally make even Office harder to use. After all, where would the harm have been supporting both user interfaces? Chrome and Firefox do it, and there was nothing in the Ribbon that couldn't have been triggered by a menu.

Anti-Trust and the importance of Office.

The work that lead to the Ribbon almost certainly started shortly after Microsoft's anti-trust problems concluded and during a phase where they were under even more anti-trust scrutiny. Until the 2001 Bush administration, Microsoft had been butting heads with the DoJ culminating in Judge Jackson's finding-of-fact that Microsoft had illegally used its market position to force out competitors.

While Microsoft's issues with Digital Research/Caldera (DR DOS) and IBM (OS/2) were highlights of the FoF, the issues that had sparked intervention were related to their attempts to dominate web browsers and Microsoft's decision to integrate web browsers into their operating system. Microsoft had made the decision to do so in order to own the web, in order to tie what should have been an open standard into the Windows APIs. By 1999, Internet Explorer had an even more extreme hold on Internet usage than Chrome does today, with many websites banning other browsers, and many others being broken on websites that weren't IE. These weren't obscure websites nobody needed to use either, I personally recall being blocked from using various banking and governmental websites at this time.

In 2000, the courts ruled in favor of a break up of Microsoft into an applications company and operating system company. In 2001, this was overturned, but a sizable reason for the appeal court doing so was related to the Judge's conduct rather than the merits of the case. Bush's DoJ stopped pushing for a break-up in late 2001, largely in line with Bush's opposition to anti-trust actions, and Microsoft was given more or less free rein, outside of an agreement to be more open their APIs.

From Microsoft's point of view, “winning” the anti-trust case must have been bittersweet because of the way it was won. The findings of fact were upheld throughout the legal proceedings, and Microsoft only avoided break-up because they successfully wound up the judge enough for him to behave unprofessionally, and because they waited out the clock and were able to get the end of the legal case overseen by a more sympathetic government. There were no guarantees the same thing would happen next time.

It's not clear exactly when Microsoft started to judge Office as being more important than Windows to their future, but certainly in the 2000s we saw the beginning of changes of attitude that made it clear Microsoft was trying to find a way forward that was less reliant on Windows. Office was the most obvious second choice – most corporate PCs run Office, as do a sizable number of non-corporate PCs. Even Macs run Office. Office had a good reputation, it was (and is) extremely powerful. And because of its dominance of the wordprocessing and spreadsheets market, the files it generated were themselves a form of lock-in. If you wanted to interact with another Word user, you needed to use Word. There were third party wordprocessors that tried to support Word's file format, but it turned out supporting the format was only half the problem: if your word processor didn't have the exact same document model that Word did, then it would never be able to successfully import a Word document or export one that would look the same in Word as it would in your third party wordprocessor.

But until 2006, Office's dominance due to file incompatibility wasn't certain. In 2000, Microsoft had switched to a more open file format, and in 2006, under pressure from the EU, had published the complete specification. Critics at the time complained it was too complicated (the entire format is 6,000 pages), but bear in mind this includes the formats for all applications under the Office umbrella.

Two decades later, compatibility from third party applications remains elusive, most likely because of internal model conflicts. But it wasn't clear in the early 2000s that even publishing the Office file formats wouldn't be enough to allow rivals to interoperate within the Office eco-system

The importance of UI balkanization

So, faced with the belief that third parties were about to create office clones that would cut a significant chunk of Microsoft's revenue, and knowing that they couldn't use the operating system any more to just force people to use whatever applications Microsoft wanted users to buy, Microsoft took an extreme and different approach – destroying the one other aspect of interoperability that is required for users to move from one application to another – familiarity.

As I said above, in the late 1990s, if you knew Word, you knew how to use any wordprocessor. If you knew Excel, and you knew about wordprocessing, you could use Word. The standardization of menus and toolbars had seen to that.

To kill the ability of customers to move from a Microsoft wordprocessor to a non-Microsoft wordprocessor, Microsoft needed to undermine that standardization. In particular, it needed a user interface where there was no standard, intuitive, way to find advanced functionality. While introducing such a kludgy, unpleasant, user interface was unpopular, Microsoft had the power to impose such a thing in the early 2000s, as its office suite was a near monopoly. Customers would buy Office with the ribbon anyway, because they didn't have any choice. And with the right marketing, they could even make it sound as if the changes were a positive.

Hence the Ribbon. Until you actually try to use it, it doesn't look unfriendly, making it very easy to market. And for, perhaps, the most popular wordprocessing features, it's no worse than a toolbar. But learning it doesn't help you learn the user interface of any other application. Anyone introduced to wordprocessing through the Ribbon version of Word will have no idea how to use LibreOffice, even if LibreOffice has a ribbon. The user interface will have to be relearned.

Note that Microsoft didn't merely introduce the Ribbon as an optional user interface. Firefox and Chrome, to this day, still have the ability to bring up a traditional menu in addition to their hamburger menu because they know end users benefit from it. It's just, inexplicably, hidden (use the ALT key!) But in Word, there is no menu, there's nothing to make it easier for end users to transition to the ribbon or keep doing things the way they always did, despite the ease with which Microsoft could have implemented that.

We forgot everything

Microsoft's success foisting the Ribbon on the world basically messed up user interfaces from that point onwards. With the sacred cow of interoperable user interfaces slaughtered, devs started to deprecate standardization and introduce “new” ways to do things that ignored why they'd been developed in the first time. Menus have been replaced with buttons, scrollbars have been replaced by... what the hell are those things... and there's little or no logic behind any of the changes beyond “It's new so it doesn't look old”. Many of the changes have been implemented to be “simpler” but in most cases the aesthetic is all that's been simplified, finding the functionality that a user wants to find is harder than ever before.

It would help if devs had realized at the time Microsoft had done this for all the wrong reasons. It's not as if most trust Microsoft or believe they do things for the right reasons.

I started watching a lot of videos on retrocomputing recently. Well, the era they call retro I call “when I learned what I know now”. The 1980s was a fun time, as far as computers were concerned. There was variety, and computer companies were trying new things.

The more jarring thing I watched though was a review of the Timex Sinclair 2068, essentially the US version of the Sinclair Spectrum, which – as you'd imagine from the subject – was a very American view of why that computer failed. And the person reviewing the 2068 felt it failed because it represented poor value compared to... the Commodore VIC 20?

Which now I've spent some time thinking about it, I think I understand the logic. But it wasn't easy. You see, when I was growing up the school yard arguments were not about the ZX Spectrum vs the VIC 20, but it's vastly superior sibling, the Commodore 64. And both sides had a point, or so it seemed at the time.

The principle features of the ZX Spectrum were:

  • A nice BASIC. That was considered kind of important then, even in a world where actually the primary purpose of the computer was gaming. Everyone understood that in order for people to get to the point they were writing games in the first place, the computer had to be nice to program.
  • 48k of RAM, of which 41-42k was available to programmers.
  • A fixed, single, graphics mode of 256x192, with each 8x8 pixel block allowed to use two colours picked from a palette of... I want to say 16 but I can't remember for sure.
  • An awful keyboard. There was a revision called the Spectrum+ that had a slightly better keyboard based on the Sinclair QL's (but not really like the QL's, the QL's had a lighter feel to it.)
  • A beeper type sound device, driven directly by the CPU
  • Loading and saving from tape.
  • A single crude expansion slot that was basically the Z80's pins on an edge connector.

The Commodore VIC 20 had 5k of RAM, 3.5k available. It had a single raw text mode, 22x24 IIRC, with each character position allowed to have two colours. It did allow characters to be user defined. BASIC was awful. Expansion was sort of better, it had a serial implementation of IEEE488 that was broken, a cartridge port, and a serial port. Like the Spectrum it was designed to load and save programs primarily from tape. Despite the extra ports, it just wasn't possible to do 90% of the things a Spectrum could do, so I'm baffled the reviewer saw fit to compare the two. They were only similar in terms of price. And the VIC 20 was way cheaper than the Spectrum in the UK.

The Commodore 64, on the other hand, was, on paper, superior:

  • OK, BASIC wasn't. It was the same version as the VIC 20.
  • 64k of RAM. Now we're getting somewhere.
  • A mix of graphics and text modes, including a “better than ZX Spectrum” mode which used a similar attribute system for 8x8 blocks of pixels, but had a resolution of 320x200 and which supported sprites. And programmers could also drop the resolution to 160x200 and have four colours per 8x8 cell.
  • A great keyboard
  • A dedicated sound processor, the famous SID
  • Loading and saving from tape.
  • That weird serial implementation of IEEE488 that the VIC 20 had, with the bug removed... but a with a twist.
  • Cartridge, and a port for hooking up a modem. And a monitor port. And, well, ports.

So if the C64 was so much technically better, why the schoolyard arguments? Other than kids “not knowing” because they didn't understand the technical issues, or wanting to justify their parents getting the slightly cheaper machine? Well, it was because the details mattered.

  • Both systems had BASIC, but Commodore 64 BASIC was terrible.
  • The extra 16k of RAM was a nice to have, but in the end both machines were in the same ballpark. (Oddly the machine in the UK considered to be superior to both, the BBC Micro, only had 32k.)
  • Programmers loved the 160x200 4 colour mode. It meant there was less “colour clash”, an artifact issue resulting from limiting the palette per character cell. But oddly, the kids were split on that. Most preferred higher resolution graphics over less colour clashing issues. So even though the Commodore 64 was superior technologically, it was encouraging programmers to do things that were unpopular. One factor there was that most kids were hooking up the computer to their parent's spare TV, which was usually monochrome.
  • The keyboard really didn't matter, to kids. Especially given the computer was being used to play games, and Sinclair's quirky keyword input system and proto-IDE was arguably slightly more productive for BASIC programming than a “normal” keyboard in a world full of new typists.
  • Both computers loaded and saved from tape, but the Spectrum used commodity cassette recorders and loaded and saved programs at a higher speed, around 1500bps vs 300bps.
  • The IEEE488 over serial thing was... just not under consideration. Floppy drives were an expensive luxury that didn't take off until the 16 bit era in the UK when it came to home computers. But, worse, the Spectrum actually ended up being the better choice if random access storage was important to you. Sinclair released a system called the ZX Microdrive, similar to the American stringy-floppy concept (except smaller! Comparable to 2-3 full size SD cards stacked on top of one another), where the drives and interface for the Spectrum came to less than 100GBP (and additional drives were somewhere in the region of 50GBP.) The Commodore floppy drives, on the other hand, cost 300-500GBP each. Worse, they were slower than they'd been on the VIC 20 (about as slow as the cassette drive no less!), despite the hardware bug being fixed, because the computer couldn't keep up with the incoming data.
  • Cartridge ports should also have been a point in Commodore's favour, but for some reason cartridges were very expensive compared to software on tape. (I didn't learn until the 2000s that cartridges were actually cheaper to make.)
  • The other ports were for things kids just weren't interested in. Modems? In Britain's metered phone call system they just weren't going to be used by anyone under the age of 25. Monitors? TVs are cheaper and you can watch TV shows on them!

Over time many of these issues were resolved. Fast loaders improved the Commodore 64 software loading times, though the Spectrum had them too. But in the mean time, the kids didn't see the two platforms as “Cheap Spectrum vs Technically Amazing C64”, they were seen as equals, and to be honest, I don't think it was completely unfair in that context they were seen that way. There's no doubt the C64, with its sound and sprites, was the superior machine, but the slow cassette interface and expensive and broken peripheral system undermined the machine. As did programmers using features the kids didn't like.

Go across the pond and, sure, nobody would compare the TS2068 with the C64. Americans weren't using tape drives with their C64s. But I'm still not sure why they'd compare the TS2068 to the VIC 20 either.

The Spectrum benefited from its fairly lightweight limited spec. Not only did it undercut the more advanced C64 on price, it also meant it didn't launch with as many unsolvable hardware bugs. The result was Sinclair and third parties could sell the add-ons needed to make the Spectrum equal or better its otherwise technically superior rivals, and the entire package still ended up costing less. In the mean time, the feature set on launch was closer to what the market – kids who just wanted a cheap computer to hook up to their parent's spare TV set to play games – wanted.

All of which said, the TS2068 probably didn't fail because Americans were comparing it to the VIC 20, so much as it being released late and the home computer market being already decided by that point. Word of mouth mattered and nobody would have been going into a computer store in 1984 undecided about what computer to buy. Timex Sinclair had already improved the TS2068 over the Spectrum by adding a dedicated sound chip, and could have added sprites, and maybe even integrated the microdrives into the system, and fixed the keyboard, and not added much to the cost (the microdrives were technologically simpler than cassette recorders, so I suspect would have cost under $10 each to add) and the system would still have bombed. It was too late, the C64 and Apple II/IBM PC dominated the popular and high ends of the US market respectively, there wasn't any space for another home computer.

Finally set up Writefreely to do my long form blogging which, hopefully, will mean I can write longer stuff of the type most people will skip over. Once I figured out why it didn't work the first time, it seems to work fine. My own platform is one I want to share with friends so there are multiple complications: it's behind a reverse proxy, and I'm using Keycloak to supply SSO.

The only issue I have with what I've configured is that registration is still a “process”, you don't automatically get dropped into the system the first time you log in with openid-connect.

For those interested, my Keycloak OpenID-Connect configuration required the following:

[app]
...
single_user           = false
open_registration     = true
disable_password_auth = true

[oauth.generic]
client_id          = (client id from Keycloak)
client_secret      = (Client secret from Keycloak)
host               = https://(keycloak prefix)/realms/(realm)
display_name       = Virctuary Login
callback_proxy     = 
callback_proxy_api = 
token_endpoint     = /protocol/openid-connect/token
inspect_endpoint   = /protocol/openid-connect/userinfo
auth_endpoint      = /protocol/openid-connect/auth
scope              = profile email
allow_disconnect   = false
map_user_id        = preferred_username
map_username       = preferred_username
map_display_name   = name
map_email          = email

In the above (client id) and (client secret) are from the configuration I set up in Keycloak's client configuration for WriteFreely. For the Keycloak prefix, if you haven't reverse proxied the /auth part of Keycloak URIs away, then you'll need that part to look something like domain/auth, otherwise just domain, eg:

host = https://login.example.social/auth/realms/example/
host = https://login.example.social/realms/example/

In terms of use, I'm still getting used to Writefreely. The formatting takes some getting used to, it's a mixture of raw HTML (the fixed font blocks above are in HTML <PRE> tags) and Markdown. In theory Markdown supports fixed font blocks too, but I can't get it to work. The fact you can always resort to raw HTML is good though, and only an issue if you actually need to use < anywhere...

One other thing, for some reason Keycloak's installation instructions include this block in their example reverse proxy configuration:

location ~ ^/(css|img|js|fonts)/ { root /var/www/example.com/static; # Optionally cache these files in the browser: # expires 12M; }

This breaks everything. Either remove it, or introduce some smart caching for those paths. Another default configuration snafu is that the built in configurator has Writefreely listening on localhost if you tell it you're using a reverse proxy, but there's absolutely no reason for it to assume the reverse proxy is on the same computer. So when you edit your config afterwards, change “bind” from localhost to [::] if you're using an external reverse proxy.