quiz: a tool for rapid OpenZFS development

My preferred programming style is highly interactive and exploratory, which requires a fast edit-compile-test cycle. But then a year ago I started working on OpenZFS, which is a big chunk of kernel code. Kernel code means developing on either real hardware or a VM, which tend to be awkward to get real code on to, and then, being kernel code, it’s pretty easy to crash, wedge or otherwise damage the kernel, requiring a reboot.

I wanted a way to work on OpenZFS just like I’d work on any other program - make a change, compile it, run it and see what happens. I’d heard a little about microVMs being able to boot a kernel to userspace in a fraction of a second, so I decide to look into it more to see if I could make something like that work for me.

As it turns out, it can be made to work and work well. The unbelievably stupidly-named quiz is what I came up with, and now I’m compiling and running OpenZFS and test programs many tens of times a day, on VMs that live for seconds at a time.

Micro machines 🔗

First, we need a VM that we can start and stop quickly. The venerable QEMU has a “microvm” machine type design explicitly to start quickly. It does this by leaving out things you’d normally find in a real computer or a full machine emulation, like the PCI bus. Fewer devices means fewer things to initialise at boot.

Then, we need a custom kernel. The typical kernel that comes with your favourite Linux distribution has support for everything, and will spend a load of time hunting around to figure out what hardware exists. Since we know it won’t them, we can save a lot of time by only including support for the hardware that we know exists.

The other thing we need to do is build everything we need into the kernel binary. A generic kernel is usually accompanied by a minimal root filesystem, the “initrd”, which has all the driver modules and startup scripts inside it. This is all in service of hardware detection, which we already don’t need, as well as adding more time while it runs and eventually transfers control to the “real” root filesystem. By compiling all that into the kernel, we only need to the kernel file itself to get started.

However, we still keep module support, because we want to load OpenZFS later as a module, rather than recompiling the kernel every time.

As for devices, we actually don’t need much - CPUs, memory, a console, and some storage (which we’ll get to later). It ends up being a pretty hefty command line but nothing that isn’t easily scripted.

quiz ships with a program quiz-prepare-kernel that takes care of downloading a kernel, building it, and getting it ready to go. There’s also a set of kernel configs for different kernel minor versions to make them work well in the quiz VM environment.

Taken together, we can boot a machine to /sbin/init in a second or two. Neat!

host$ ./quiz
[quiz] 20240303-16:22:21 starting microvm
[    1.072287] quiz: starting user program
[INFO  tini (1)] Spawned child process '/bin/bash' with pid '511'
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell

A big ball of files 🔗

Even thought we don’t need an initrd, we still need files inside our VM - we need something to run! In a regular VM, that’s likely to be a plain old disk image, but we’re not building a regular VM, but rather, something that provides an environment for to quickly test our code, and then disappear without a trace. We need something a bit more complicated.

It’s actually a question of separating the data we need in our VM by some notion of “permanence”. quiz does this by building the complete filesystem out of “layers”, using Linux’s filesystem overlay feature.

At the bottom, we need all the basic trappings of a unix system to just operate things at all - a shell and standard tools, basic system services, debug tools, and so on. This layer is effectively immutable - we can build it once, and reuse it over and over again.

We use mmdebstrap to create the bottom layer. It builds a very minimal Debian base system with a bunch of additional packages we need, and emits an ext4 image. Into that we add a tiny “stage 1” /sbin/init, which creates the layered filesystem out of all the pieces, and then jumps into the second stage to do the rest of the setup and run whatever test program we’re running. This two-stage process is mostly there because the disk image is a little awkward to regenerate so we want to reduce how often we need to do it. The disk image itself is mostly there because it’s the easiest thing to get out of mmdebstrap, and means we don’t have to worry about things like permissions.

Above that, we need places to put our “local” programs in that we modify between runs. This includes the outputs from our OpenZFS build with whatever we’re working on, and any test programs and other stuff we might want to modify on the host but have present inside the VM.

The “middle” layers are just regular files in host directories, which are exposed to the VM as 9pfs shares, which QEMU creates and the guest kernel can mount. They are kept separate as each has a different purpose, and is managed by a different process on the host.

Finally, at the top, we put a ramdisk, where any and all modifications to the filesystem go. This allows the VM to produce output, write logs and “modify” existing files, and disappear without a trace afterwards.

The init1 script in the bottom image assembles all these pieces before bouncing into init2 to start the system. Here, it’s enough to show that there’s just one apparent filesystem:

root@quiz:/# df -a
Filesystem     1K-blocks  Used Available Use% Mounted on
overlay          1018224   112   1018112   1% /
none                   0     0         0    - /proc
none                   0     0         0    - /sys
none             1015744     0   1015744   0% /dev

All this gives the programs inside the VM a nice, uncomplicated filesystem that they don’t need to know anything about - no special paths or off-limits areas, and no restrictions on writing. This is good; we want to test things “for real”, without any special treatment.

Building blocks 🔗

In its simplest mode, the VM just drops into a shell. That’s nice, but not very useful. I want to be able to run a program almost exactly as I would if I ran it locally, which means just being able to type it. But also, I want to be able to run different programs, and that can mean different environments, and so we need a way to set those up. This is all the job of the main quiz program on the host, and the init2 script inside the VM.

init2 does some basic common setup, like setting the hostname, starting udev, and some other environment bits. It ends with starting tini, a init that performs the duties required from PID1, which then runs whatever program we asked for, or bash if we didn’t.

If we do request a program, quiz creates a tiny script fragment /.quiz/run before QEMU is started, and puts the requested program and args in it. init2 looks for this, and asks tini to run it if it exists:

host$ ./quiz cat /proc/version
[quiz] 20240303-16:40:39 creating run script
[quiz] 20240303-16:40:39 starting microvm
[    0.868304] quiz: starting user program
[INFO  tini (1)] Spawned child process '/bin/sh' with pid '511'
+ cat /proc/version
Linux version 5.10.170 (robn@lucy) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #5 SMP Thu Jan 18 18:59:45 AEDT 2024
[INFO  tini (1)] Main child exited normally (with status '0')

“Profiles” can extend this further. Adding a profile will make additional setup happen in quiz on the host, and then additional setup inside the VM. The two halves of this can work together to get the environment ready before the user program or the shell is called.

The zfs profile is very simple. In quiz itself, it calls depmod to ensure the various module dep files are properly updates for the latest build (we can’t run it inside the VM, because the modules are in the system layer, and so read-only). Then, from init2, we do modprobe zfs. All this is to make sure that OpenZFS is ready to go when the program/shell starts:

$ ./quiz -p zfs
[quiz] 20240303-16:42:43 including profile: zfs
[quiz] 20240303-16:42:43 starting microvm
[    1.138229] quiz: starting profile init: zfs
[    1.443530] spl: loading out-of-tree module taints kernel.
[    2.649652] zfs: module license 'CDDL' taints kernel.
[    2.649707] Disabling lock debugging due to kernel taint
[    3.589363] ZFS: Loaded module v2.2.99-365_g8f2f6cd2a (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[    3.590452] quiz: starting user program
[INFO  tini (1)] Spawned child process '/bin/bash' with pid '526'
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
root@quiz:/# lsmod
Module                  Size  Used by
zfs                  6864896  0
spl                   172032  1 zfs

Of course, we need block devices to create pools with, so the memdev profile is useful. This creates some memory-backed block devices, ready to go once the user command runs:

$ ./quiz -p zfs,memdev zpool create tank raidz2 quizm0 quizm1 quizm2 quizm3 quizm4 quizm5 '&&' zpool status
[quiz] 20240303-17:02:20 including profile: zfs
[quiz] 20240303-17:02:20 including profile: memdev
[quiz] 20240303-17:02:20 creating run script
[quiz] 20240303-17:02:20 starting microvm
[    1.061698] quiz: starting profile init: zfs
[    1.271540] spl: loading out-of-tree module taints kernel.
[    2.465834] zfs: module license 'CDDL' taints kernel.
[    2.465903] Disabling lock debugging due to kernel taint
[    3.399311] ZFS: Loaded module v2.2.99-365_g8f2f6cd2a (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[    3.400475] quiz: starting profile init: memdev
[    3.550705] quiz: memdev: created quizm0
[    3.759659] quiz: memdev: created quizm1
[    3.930080] quiz: memdev: created quizm2
[    4.110221] quiz: memdev: created quizm3
[    4.350369] quiz: memdev: created quizm4
[    4.578329] quiz: memdev: created quizm5
[    4.579062] quiz: starting user program
[INFO  tini (1)] Spawned child process '/bin/sh' with pid '561'
+ zpool create tank raidz2 quizm0 quizm1 quizm2 quizm3 quizm4 quizm5
+ zpool status
  pool: tank
 state: ONLINE

	tank        ONLINE       0     0     0
	 raidz2-0  ONLINE       0     0     0
	   quizm0  ONLINE       0     0     0
	   quizm1  ONLINE       0     0     0
	   quizm2  ONLINE       0     0     0
	   quizm3  ONLINE       0     0     0
	   quizm4  ONLINE       0     0     0
	   quizm5  ONLINE       0     0     0

errors: No known data errors
[INFO  tini (1)] Main child exited normally (with status '0')

There’s a blockdev variant that creates backing files on the host and then plumbs them through, which are useful when we need devices that are larger, or with higher latency, or similar. I get a ton of mileage from just zfs,memdev though.

Change the system 🔗

So we have everything need to play with our OpenZFS-in-development, but we still need to get our changes in there. In theory, this is simple: OpenZFS is built by Autotools, so all we have to do is give it the system share as an alternate install prefix, and off we go.

Well, that’s the idea anyway. It’s not quite that simple. Technically we’re doing a form of cross-compile, because even though the architecture is the same, the build and the run hosts are different. OpenZFS currently can’t be cross-compiled reliably, but at least in this case the host and build environments are similar enough that we can get what we need with a bunch of extra options.

Both configure and make install need adjusting, so there’s a helper script quiz-build-zfs to take care of those.

I don’t love this part, to be honest, but it’s going to take improvements to OpenZFS’ build system to do better. I’ve some ideas for tighter integration but I haven’t thought those through, but for the moment this is quite fine.

Start to finish 🔗

So once the quiz environment is up and running, a typical OpenZFS development session is:

$ ./autogen.sh
$ ~/quiz/quiz-build-zfs configure --enable-debug --enable-debuginfo
[hack hack hack]
$ make -j5
$ ~/quiz/quiz-build-zfs make install
$ ~/quiz/quiz -p zfs,memdev zpool create tank quizm0 ...

It’s a few more steps than I’d like, but it rarely gets in my way, and I add little tweaks and changes as I go. It’s made me hugely productive without needing any special hardware or long boot times, and I’m pretty pleased with it!

I’ve got a ton more ideas. I do have the start of a multiarch version done, so that I can run a QEMU for a different architecture. This is important because OpenZFS aims to be endian-agnostic, so being able to test on a big-endian architecture is a huge help.

I also want to do a similar setup for FreeBSD, so I can check OpenZFS changes for both major platforms. This is rather more complicated; if cross-compiling for another architecture is a challenge, cross-compiling for another OS entirely is even more exciting. Still, at the end of the day it’s all just code, and it’s just a matter of getting the options right.

Try it yourself! 🔗

If you’re doing OpenZFS or any kind of kernel dev on Linux, then quiz might be just the thing! It’s a bit rough in places, but I’m using it hundreds of times a week and the proof is in the eating: every single commit I have in OpenZFS since the start of last year has seen the inside of a quiz VM at least once! You can get it from my Github:

If you do try it, I’d love to hear about your experience, and what you’d like to see from it. I’m not sure how committed I am to making it useful for people that aren’t me, but on the other hand, if it helps to make kernel development accessible to more people, then I’m all for it!

And if you like this or anything else you see from me, consider putting some money on it. Open source is cool, but it’s not cheap!