Setting up a XEN PV/PVH domU with btrfs
These instructions refer to Xen 4.17 with a Debian “Bookworm” 12 dom0 and the bundled version of PyGrub. It's entirely possible that by the time you read this PyGrub will have gained btrfs support. But if you tried it and got errors, read on!
The problem
While some of Xen, notably xen-create-image, kinda sorta supports btrfs, that's not true of PyGrub, the Xen bootloader for PVs and PVHes doesn't. This means you're left with fairly ugly options if you want to use btrfs with Xen, including HVMs. And if you have to use HVMs then why use Xen at all? There are virtualization platforms with much better support than Xen that do things the HVM way.
I said “kinda sorta” for xen-create-image because when I created a Debian 12 image with it (-dist bookworm) it was unbootable (well, it booted read only) because the fstab contained a non-btrfs option in the entry for “/”. We'll get to that in a moment.
So, before you begin, I must warn you that the solution I'm about to propose, while modifiable for whatever the devil it is you plan to do, requires eschewing xen-create-image for the heavy lifting and doing all the steps it does manually. But to make things easier, we'll at least create a template using xen-create-image.
So what I want you to do first is create an image somewhere, using EXT4 as the file system, using xen-create-image, with the same name as the domU you intend to create, go into it, kick the tires, make sure it works and has network connectivity, make any modifications you need to do to get it into that state, and then come back here. Oh, if it's a bookworm image, be prepared to fix networking, /etc/network/interfaces assumes an 'eth0' device. The easiest fix is change all references to it to 'enX0' (assuming that's the Xen Ethernet device you booted with.)
Other notes:
- We're essentially doing a set of commands that all require root. Rather than sudo everything, I'm assuming you 'sudo -s' to get a root shell. It's much easier.
- I'm assuming the use of volume groups, specifically using a group called 'vg-example', and am using 'testbox' as the hostname. The general flow should translate to other storage backends, but will require different commands. You can change everything as needed as you go, testbox to whatever you call the machine, etc.
Back up the image
OK, shutdown your VM, and then mount the image somewhere. Typically you'll see lines in your /etc/xen/testbox.cfg that read something like:
disk = [ 'phy:/dev/vg-example/testbox-disk,xvda2,w', ]
There may be two lines if you created a swapfile. Take the path for the root file system anyway, and mount it somewhere, say, /mnt (if you mount somewhere else, change /mnt to your mount point accordingly when following these instructions. Likewise vg-example and testbox-root should be changed to whatever you see in the disk line. If you're not using volume groups, then adjust accordingly. You can mount disk images using the loopback device, mount -o loop /path/to/disk.img /mnt, for example.):
# mount /dev/vg-example/testbox-disk /mnt
Now do the following:
Edit /mnt/etc/fstab and change the line that mounts root to assume btrfs and put in some good btrfs options, for example:
/dev/xvda2 / btrfs defaults,noatime,compress=lzo 0 1
cd to /mnt/boot, and type this:
# ln -s . boot
That bit of magic will help pygrub find everything, because ultimately we're going to create a separate boot partition, and pygrub will mount it and think it's root, so it'll get confused there's no “boot/grub/grub.cfg” file unless you put that softlink there.
Finally, cd to /mnt and type:
# tar zcf /root/testbox-image.tgz -S .
(The -S is something I type by habit, it just makes sure sparse files are compressed and decompressed properly. If you're paranoid you might want to look up the other options for tar such as those handling extended attributes. Or use a different archiver like cpio. But the above is working for me.)
Finally, umount the image, and then delete it. eg:
# umount /mnt # lvremove vg-example/testbox-disk
(If you're not using volume groups, ignore the last line, just use rm if it's a disk image, or some other method if it's not a disk.)
Creating the first domU
OK, so we have a generic image we can use for both this specific domU and new domUs in future (if you plan to create a whole bunch of Debian 12 images, there's no need to duplicate it. I'll explain later how to do this.)
First, let's create the two partitions we need, for root and boot. I'm going to assume 'vg-example' is the volume group for this, but nothing requires you use this. or even use volume groups. If you're not using volume groups, do an equivalent with whatever system you have. You can create disk images using dd, eg # dd if=/dev/zero of=image.img iflag=fullblock bs=1M count=100 && sync for a 100M file and use losetup (eg # losetup loop1 image.img) to create a virtual device so you can mkfs it and mount it.
# lvcreate -L 512M -n testbox-boot vg-example # lvcreate -L 512G -n testbox-root vg-example
This creates our two new partitions, root and boot. Note they don't have to be part of the same volume group. You could even, probably, make one a disk image file and the other a volume group partition.
Now create the file systems on both:
# mkfs.btrfs /dev/vg-example/testbox-root # mkfs.ext4 /dev/vg-example/testbox-boot
Now mount the main root:
# mount /dev/vg-example/testbox-root /mnt
Create the mount point for boot inside the root (this is important):
# mkdir /mnt/boot
Mount the boot partition
# mount /dev/vg-example/testbox-boot /mnt/boot
Finally create the image:
# cd /mnt # tar zxf /root/testbox-image.tgz
After you're done (if you need to do anything at all), just umount boot and root in that order:
# cd ; umount /mnt/boot ; umount /mnt
Finally, modify your .cfg file, so load it into your favorite editor.
# vi /etc/xen/testbox.cfg
Modify the disk = [ ] section to look more like this:
disk = [ 'phy:/dev/vg-example/testbox-boot,xvda3,w', 'phy:/dev/vg-example/textbox-root,xvda2,w' ]
If your original had a swap partition, leave the entry there.
OK, moment of truth: boot using
# xl create /etc/xen/testbox.cfg -c
It should come up with the Grub menu. And then it should boot into your VM. And when you log into your VM, everything should be working as it was before you made your modifications.
Creating clones from the original archive
This is easy too. Keep that archive around. For this guide we'll keep most things the same but call the new domU you're creating clonebox.
Before you begin, copy across /etc/xen/testbox.cfg to /etc/xen/clonebox.cfg. You can edit it as you go, but at least begin by changing all references of testbox to clonebox, and change the MAC address. You can easily get one from https://dnschecker.org/mac-address-generator.php: use 00163E as the prefix (I have no connection with the makers of that tool, it seems to work OK, just want to save you a search engine result.) Finally allocate a new IP address, assuming you're not using DHCP.
Now, create the two partitions we need, for root and boot. You can create a swap partition too if your original had one.
As earlier, I'm going to assume 'vg-example' is the volume group and we're using volume groups.
# lvcreate -L 512M -n clonebox-boot vg-example # lvcreate -L 512G -n clonebox-root vg-example # mkfs.btrfs /dev/vg-example/clonebox-root # mkfs.ext4 /dev/vg-example/clonebox-boot
(Again, for the second one change the -L to whatever you need it to be. A useful alternative to know is “-l '100%FREE'” – yes, lowercase L, that'll give the remainder of the disk to that partition.)
If you created partitions whose device names do not match what's in /etc/xen/clonebox.cfg, modify the disks = [] line in the latter file to use the new device paths (eg /dev/vg-example2/clonebox-root if you use a different volume group.).
Mount it as before, adding the boot mountpoint, and unarchive the original archive:
# mount /dev/vg-example/clonebox-root /mnt # mkdir /mnt/boot # mount /dev/vg-example/clonebox-boot /mnt/boot # cd /mnt # tar zxf /root/testbox-image.tgz
Now before you unmount things, you'll need to adjust the image.
The default Debian image is borked and creates an /etc/network/interfaces file with the wrong Ethernet device. When you were fixing it earlier, you either fixed /etc/network/interfaces, or you created a file called etc/systemd/network/10-eth0-link that maps eth0 to the device with the right MAC address. The former is probably easier, but if you did the latter, update /mnt/etc/systemd/network/10-eth0-link to whatever the new MAC address is.
If you're using static IPs, you'll also need to change /mnt/etc/network/interfaces and include the IP for this domU.
Regardless of everything else, you'll also definitely need to modify these files:
- /mnt/etc/hostname
- /mnt/etc/hosts
- /mnt/etc/mailname
Something else you probably want to do is update the SSH keys on the system. To do this, you can chroot the file system and then run a tool called ssh-keygen -A:
# chroot /mnt /bin/bash # cd /etc/ssh # rm ssh_host*key* # ssh-keygen -A # exit
Once you're done, unmount:
# cd ; umount /mnt/boot ; umount /mnt
Save your new .cfg file, and then:
# xl create /etc/xen/clonebox.cfg -c
If it comes up, log in, poke around, make sure networking's working etc.
Convert Xen PVs to PVHs
After you get a PV domU working, it's time to see if it'll work as a PVH. PVHs are like PVs but use a more efficient memory management technique. They're still under development and considered experimental technology after approximately a decade of development, largely because, well, Kevin's out sick, and everyone else has been hired by RedHat to work on KVM. So use at your own risk. But if you've ever heard someone tell you PVs are old hat, and the hotness is HVMs, and thrown up a little in the back of your mouth because why use Xen then, well PVHs are the things that are more efficient than both, using some of the lessons learned developing HVMs while keeping the efficiencies of having operating systems work with the hypervisor as with PV.
So, edit your /etc/xen/testbox.cfg or /etc/xen/clonebox.cfg or whatever, and add the line “type = 'pvh'” somewhere. Save. Do xl create /etc/xen/.cfg and check it comes up (in PV mode I find the kernel writes messages to the console, while in PVH it doesn't, so wait a minute or two before declaring it broken,
How to get out of the console
Control-]. Both the xl create commands have a -c to attach the console so you can see what's happening, but if you're not used to that, well, CTRL-] is the thing to use to escape from that.