once more with feeling
I finished implementing the read side of the block cache yesterday, but I haven’t even tried to compile it yet. I really don’t like it. The logic seems to make sense, but it just feels wrong. I’ve learnt to trust that feeling. So I did the unthinkable instead. I actually did a little research into block/buffer caches to see what has come before.
Turns out I was on the right track initially. Tanenbaum’s book described a scheme very similar to what I had originally devised on my own. Basically, we allocate a pile of block structures, which are the raw block data plus flags and other meta data. All the blocks are stored in a linked list, which is hooked up to a hashtable of which the key is the bottom N bits of the block number. There’s also a double-linked list of recently used blocks.
When a block is asked for, the hash bucket index is extracted from the block number, and the corresponding block list is searched for the block. If its found, a use count is incremented and the block returned. If not, it has to be loaded from disk. A block gets recycled from the recently-used list, and if the data in there is dirty, it gets written out. Then we read the data from disk, and mark the block as used.
Earlier this seemed unworkable to me, but it seems this is fairly standard, at least as a first cut. The important bit is the low-level process running through the block list and regularly pruning and writing things to disk. Fortunately the whole thing is much easier to understand, so hopefully it won’t take me long to have it written and the code will be readable this time. Initially I’ll implement for single-block I/O, and later extend it to do multiple blocks where possible. This is also reasonable well established in OS theory - the concept is called “precaching”.
Anyway, thats all. Back to my relaxing weekend - rebuilding my garage PC and playing Mariokart :)