Setting up a Xen PV/PVH with NFS
OK, so this one's tough to describe. I had an issue where, because I wanted to have a VM use SSD storage, I wanted to use the btrfs file system. Turns out Xen's PV/PVH system is not btrfs friendly. xen-create-image will happily do it, but pygrub won't boot from a btrfs file system.
There are obvious workarounds. One is to boot the kernel directly, but then you have to pull the kernel and initrd.img files out of the guest's file system. Every time you do an update, you'll have to do this again. If the guest has the same OS as the dom0 then, I guess, you can use that kernel and just make sure you update both at the same time. But it's not really best practices, is it?
Another workaround you can do is create a separate boot partition. This isn't as easy as it sounds, as you'll need to point pygrub at it and somehow convince it the files in /boot are there, because it's looking for /boot/grub/grub.cfg, not [root of device]/grub/grub.cfg. You can do that with a softlink of course (cd /boot ; ln -s . boot) which is a bit hacky, but it should work. But using xen-create-image is going to be a little more complicated if you go down this route. If you're interested in going down this route, I have instructions here.
Finally there's the thing I'm experimenting with which may also solve other problems you had. And that's why I'm experimenting with it, because it does solve other problems.
What if... the domU's file system was actually native to the dom0 rather than just a block device there?
Well, you can do this with NFS. You can actually tell Linux to boot from an NFS share served from the dom0's NFS server, and have that NFS share be part of the dom0's file system. There's all kinds of reasons why this is Good, Actually™:
- You can pick your own file system, hence the description above
- You can have two or more domUs share storage rather than having to shut them down and manually resize their partitions every time they get too big.
- Backing up becomes much easier as the dom0 has access to everything and is likely the VM any attached back-up media will mount on.
- Very easy to clone VMs, there's this command called “cp” you can use.
- A domU crashing will not destroy its own file system (probably)
There are, of course, downsides to this approach:
- It's probably slightly slower.
- There's some set up involved.
- You better make sure your dom0 is locked down because it has all the data open to view in its own file system. Of course, technically it already does have the data available, just not visible in the file system.
- Most importantly, some stuff just doesn't like NFS.
To clarify on the last point, Debian (at least) can only boot using NFS version 3 or lower. Several modern file system features, notably extended attributes, require version 4.2 (in theory a modified, non-default, initrd.img can be built that supports later NFSes, if I figure it out I'll update the instructions here.) The lack of extended attributes meant I couldn't run LXC on the host and install more recent Ubuntus, though older Ubuntus worked. Despite my best efforts I found certain tools, including a bind/Samba AD combination, just failed in this environment, becoming unresponsive.
The technique I'm going to describe is probably useless to HVM-only users. But you people should probably migrate to KVM anyway, so who cares.
So what do you need to do?
These instructions assume Debian “Bookworm” 12 is your dom0. If you're using something like XenServer or XCP-ng or whatever it calls itself today this won't be very useful, but PVs are deprecated on those platforms anyway, and PVHs didn't work at all from what I remember. They also assume use of a Linux kernel based domU.
Setting up the environment
So on your dom0, do this as root:
# apt-get install kernel-nfs-server rpcbind
Now there's a little set up involved, go edit /etc/defaults/nfs-kernel-server and change RPCMOUNTDOPTS=“—manage-gids” to RPCMOUNTDOPTS=“—manage-gids=no”
Now also edit /etc/nfs.conf, look for the line manage-gids=y and change it to manage-gids=n. There's at least two manage-gids=xxx lines in there, change the any that aren't commented out.
Both are necessary because NFS with that option starts using the computer it's running on's /etc/passwd and /etc/group files to figure out permissions if those are enabled, and this breaks group ownership unless the client and server have the exact passwd and group files. (Off topic, but why is this default? It'll break NFS shares in 99% of cases.)
# systemctl restart rpcbind kernel-nfs-server
Next up, create somewhere you can put all these NFS shares. You can change the below, just make sure to change it everywhere else when I give commands that refer to it later:
# mkdir /fs4xen
Final thing we're going to do is create an internal network for the domUs to access the NFS server with. We're keeping this separate from whatever network you've created to allow the domUs to talk to the network with to avoid the NFS server being accessed from the outside.
Edit your /etc/network/interfaces file and add this:
auto fsnet iface fsnet inet static bridge_ports none address 172.16.20.1/16
(These are instructions for Debian but I'm aware many Ubuntu users will use Debian instructions as a starting point: If you're using Ubuntu, you'll want to do the Netplan equivalent. You should already have an example bridge in your current /etc/netplan/xxx file, the one used for allowing your domUs to access the outside world. This is more or less the same, except you don't need to bridge to an external port.)
That's the initial set-up out of the way. How do we create an image that uses it?
Creating domUs that boot from NFS
First create your PV (we'll upgrade it to a PVH later) Do it however you normally do, xen-create-image will usually do most of the work for you. But for the sake of argument, let's pretend we're left with a configuration file for the domU that looks a bit like this when stripped of comments:
bootloader = 'pygrub' vcpus = '4' memory = '16384' root = '/dev/xvda2 ro' disk = [ 'phy:/dev/vg-example/testbox-root,xvda2,w', ] name = 'testbox' vif = [ 'ip=10.0.2.1 ,mac=00:01:02:03:04:05' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart'
If this configuration is supported by your current version of Xen (for example, you used a regular file system), then you might want to test it works, just to make sure. But shut it down immediately afterwards before continuing.
Verify it isn't running with xl list.
Allocate a new IP address on the internal NFS network, I'm going to use 172.16.20.2 for this example. (I mean, manually allocate it, just look for an IP that isn't in use yet.)
Now, add the root file system to your domU's fstab, and exports:
# mkdir /fs4xen/testbox # echo '/dev/vg-example/testbox-root /fs4xen/testbox btrfs noatime,compress=lzo 0 0' >> /etc/fstab # mount /fs4xen/testbox # echo '/fs4xen/testbox 172.16.20.2(rw,no_root_squash,sec=sys)' >> /etc/exports # exportfs -a
If you get any errors when you enter the mount command, check /etc/fstab has the right parameters for your file system. I've assumed btrfs here. You should know roughly what the right parameters are anyway!
You may also get warnings when you type 'exportfs -a' about something called 'subtree_check', you can ignore those. Anything else you should go back and fix.
Finally, we rewrite the Xen .cfg file. This is what the new file would look like based upon everything else here. Use your favorite editor to make the changes and back up the original in case something goes wrong:
vcpus = '4' memory = '16384' kernel='/fs4xen/testbox/vmlinuz' root='/dev/nfs' extra=' rw elevator=noop nfsroot=172.16.20.1:/fs4xen/testbox,vers=3 ip=172.16.20.2:172.16.20.1::255.255.0.0::enX1:off:::' ramdisk='/fs4xen/testbox/initrd.img' name = 'testbox' vif = [ 'ip=10.0.2.1 ,mac=00:01:02:03:04:05' 'ip=172.16.20.2,mac=00:01:02:03:04:06,bridge=fsnet' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart'
Now just boot it with:
# xl create /etc/xen/testhost.cfg -c
and verify it boots and you can log in. Assuming you can, go in and create a file somewhere you have permissions (not /tmp as that's rarely exported), then switch to another session on your Xen dom0, and verify it exists in the same place under /fs4xen/testbox.)
If everything's fine, you can exit the console using Control-] in the usual way.
Turn a PV into a PVH
PVs are the default domU type created by xen-create-image, not least because they're supported on even the worst hardware. But they're considered obsolete, largely because AMD64 introduced better ways to sandbox virtualized operating systems. The new hotness is PVH which uses the newer processor features, but takes advantage of the aspects of PVs that made Xen wonderful and efficient to begin with.
Here's how to turn a PV into a PVH:
- Edit the .cfg file in your favorite editor
- Add the line “type = 'pvh'” somewhere in the file. Remove any other “type=” lines.
- Save
- Start the dom0, using 'xl create /etc/xen/testhost.cfg -c' and verify it boots to a login prompt.
If it doesn't boot, it may be a hardware issue, or the guest operating system itself might have an issue with PVHs. The most common “hardware” issue is that either your CPU doesn't support virtualization, or it does but it's been disabled in the BIOS. So check your BIOS for virtualization settings.