a global problem

Christmas and New Year festivities are over, and I enjoyed them thoroughly. I spent some awesome time with both sides of my family, played some cricket and soccer, played some Wii, ate way too much several times, and scored a nice pile of DVDs and t-shirts. In the long drives between various parties and dinners I’ve had a lot of time to ponder a WebKit problem, which I document here :)

WebCore has some functions that arrange for a simple timer to be implemented. Its very basic; there’s three functions: one to set a function to call when the timer goes off, one to set an absolute time that the timer should go off, and one to disable the currently set timer. This simple interface is used by the higher-level Timer class which can be instantiated multiple times. It handles coordinating the current timers and making sure the system timer is requested at the proper interval.

I did a first implementation of this using timer.device directly, but it really didn’t feel right. The interface has no provisions for initialising or finalising the timer, so I hacked it such that the first call would open the timer device if it wasn’t already open. I ignored the finalisation for the time being, and started looking at how to arrange triggering the timer.

We’re back to the old problem that AROS basically does not have any provisions for signals/interrupts that preempt the running process in the process context (actually, task exceptions can, but they’re too low-level for our purposes and don’t work properly under AROS anyway). When timer.device fires, it pushes a message onto the IO request port, which either raises a signal (MP_SIGNAL port) or calls a function directly from the scheduler context (MP_SOFTINT port). There’s also MP_CALL and MP_FASTCALL ports; these are the same as MP_SOFTINT for our purposes.

Having a soft interrupt that calls the timer callback doesn’t work, as it would cause us to do large amounts of work inside the scheduler which is bad for system performance. Having a signal requires the main process to Wait() for that signal and then call the timer callback. The main loop is controlled by the application and by Zune, both things we have no control over.

I confirmed via #webkit that the timer callback is indeed supposed to be called from UI main loop. Studying the MUI docs and the Zune code, it seems that it is possible to have the Zune main loop setup a timer and trigger the callback itself using MUIM_Application_AddInputHandler. This is perfect for our needs, as it removes any need for initialisation and finalisation in the shared timer code itself.

The only thing that has to be arranged then is for the shared code to get hold of the application object to setup the timer. The application object is created and controlled by the application, of course, but there is only ever supposed to be one of them per application, and I can’t think of a good reason why there should ever be more than one. Its easy to get hold of this object from any Zune object inside the application, via the _app() macro, with the slight quirk that its only available when the object is actually attached to the application object. We can detect that well enough though and defer calls into WebKit until we’re attached, so all that remains is to grab the application object, stow a pointer to it in a global variable, and then have the shared timer code use that variable.

This all took me a few hours to work out, and then I happily went off to do Christmas things. Over the next couple of days, the nagging seed of doubt that I had in the beginning grew into some kind of spooky pirahna flower thing. This morning while hanging clothes out to dry I finally understood the issue. Its all to do with how global variables work, and its has much greater implications for this project than just getting hold of the Zune application object.

Lets think about what happens when you load a program into memory. Forgetting about the details of the loader doing relocations, setting up space for variables, etc and the program startup making shared libraries available, effectively you just have the system allocating a chunk of memory, loading the program from disk into that memory, and then running the code within it. Space for and initial values for global variables are all held within that chunk of memory, and only the program code knows where they are and what they’re for. Nothing else on the system can reasonably access them so there’s nothing to worry about.

A shared library is essentially the same as this, except that it is only ever loaded into memory once. When a second program requests it, the systems checks if the library is already in memory, and if it is arranges for the program to use it. This is where things can get complicated. The big chunk of memory contains some things that are sharable because they can be considered read-only - things like program code, const data, and so on. Regular global variables are generally not sharable, as you generally don’t want changes made by one process to be seen by another.

In systems that have a MMU, the usual way that this is dealt with is to make a copy of the global data somewhere else in memory, and then map it into the process address space at the appropriate location. That is, process share the read-only parts of the shared library, but have their own copies of the writable areas. (In practice its quite a bit more complicated, but this is the general idea).

AROS, like AmigaOS before it, has all processes, libraries and anything else coexisting in the same memory space. Shared libraries pretty much don’t use global data. There is no support for MMUs so the kind of copying and remapping descibed above is impossible. If per-process data is required, then various techniques are employed explicitly by the shared library author - per-opener library bases, data access arbitration using semaphores, and so on. That works fine, because the author is fully aware of these limitations when he designs and implements the library.

Its worth noting that this problem is not isolated to AROS, but to every system where a MMU is not available. uClinux has had the same issue in the past and dealt with it in a couple of different ways.

Now lets look at what I’m trying to do. My goal is and has always been to make WebKit a shared library (actually a Zune custom class, though as far as the OS is concerned its the same thing). WebKit and its dependencies all make use of global variables as necessary, and assume that their globals are isolated to a single process, which is a reasonable assumption given that basically every system out there that WebKit currently runs on works this way. For AROS though, this is a huge problem.

The cheap way out is to just ignore the whole mess by producing a static libWebKit.a and requiring any applications to link it statically. This is essentially what I’m doing now. It works well enough, but currently the (non-debug) library weighs in at a touch under 18MB, and thats with barely any AROS-specifics implemented. For every WebKit-using application you have running, thats at least 18MB of duplicated code that you have to hold in memory. There’s also all the usual issues with static linkage: greater disk usage, no ability to upgrade just the library and have all its users get the update, and so on.

The least favourable option would be to rewrite all the parts of WebKit and its dependencies that use global variables and either find a way to remove them or otherwise move them into a per-process context. This is horrendously difficult to do and would pretty much remove any hope of contributing the code back to its upstream sources, which I consider an imperative for this project. So lets say no more about it.

The only other option is to add support to the OS to do the appropriate remapping stuff. This is no small undertaking either, but I think as time goes on, its a very good thing for us to have. I haven’t investigated it in depth, but in addition to actually implementing the stuff in the loader, its also necessary to make some changes to the way modules are held in memory and shared between users.

Currently a module can exist in memory and be used as-is by multiple users without too much effort. Because there’s no global data, sharing a module is as simple as incrementing a use count, so that the module isn’t purged from memory ahead of time.

When sharing an object with global data, in the absence of a MMU, its necessary to allocate new global data for each opener and do its relocations each time. This requires keeping a record of the required relocations. There’s also the issue of constructing the global offset table and the procedure linkage tables, and making sure the pointer to the GOT is carried around the application appropriately. Work that will be usefel here is Staf Verhaegen’s current project on library bases and preserving the %ebx register. Of course this will all have to integrate nicely with that.

Then there’s also the matter of detecting when to use all this new stuff over the standard loading and linking code. I think I can make that as simple as requiring all code to be shared in this way be position-independent (ie compiled with -fPIC). Code compiled in this way is incompatible with the standard load method anyway, and for this type of shared object its far simpler to implement this whole mess if PIC is enabled. If it is, then detecting which type to use should be as simple as looking for the presence of the .got section in the object.

Thats about as far as my thinking on the matter has come. The shared timer stuff that originally provoked all this is working happily, but if WebKit is ever to be a shared object on AROS, all this will need to be revisited. Because its such a huge undertaking I’m going to leave it until after the WebKit and Traveller are in some kind of usable state. At that time I’ll look at handing off care of the web browser to someone else for a little while and work on this stuff instead.