when good libraries go bad

So after the compiler shenanigans of last week I finally managed to write some actual code on Friday. I started with just calls to SDL_Init() and SDL_Quit(), but the compile blew up in my face. The problem came from the fact that I was linking with -lSDL, which would have been fine except that AROS has its own libSDL for SDL apps running inside of AROS. The linker found that first, which is entirely not what was wanted, though even if it had found the right one I guess we’d be looking at namespace clashes for anyone who wanted to run a SDL app inside AROS.

After a bit of thought, it seemed to me that the only way out was to not link to the system libSDL at all but instead load it runtime using dlopen() and friends. This can work but isn’t without its problems, as loading a library is not the same as linking.

When you write code, you call lots of functions that exist somewhere than other in your .c file. When you compile your .c file, it leaves placeholders for all the functions in the resultant .o file. Linking is the process of pulling in a pile of objects (.o), object archives (.a, also known as static libraries) and shared libraries (.so) and updating all the placeholders to point to the right bits of code.

When you link with a shared library, the link process replaces the function placeholders with stubs that refer to a library file that exists on disk somewhere. When you run the program, a program called the runtime linker (known as ld.so on Linux) looks through it, finds all the stubs, loads all the needed libraries and then fills in all the pieces to make a fully working program.

The idea is simple. By not having to carry a full copy of every required library with every program, program binaries are smaller and so use less disk space. Additionally, its possible for the runtime linker to only keep a single copy of a shared library in memory and point all programs to it, so you save memory when there’s lots of programs running. The downside to the whole mess is the increased complexity in linking, the runtime linker needing to find all the pieces (/etc/ld.so.conf, LD_LIBRARY_PATH and ld’s -rpath option), the fact that programs can’t be as easily copied around because they have libraries that they need, etc. You don’t notice this most of the time because we have smart tools to take care of all this stuff.

So back to AROS. dlopen() is not a linker. It merely opens a shared library and allows you to get at pointers inside it. You can obtain a pointer to a function, and then use that pointer to call the function inside the library. So this is possible:

    void *handle = dlopen("libSDL.so", RTLD_NOW | RTLD_LOCAL);
    void *fn = dlsym(handle, "SDL_Init");

The problem here is that the library does not contain prototypes, so we have no idea how to pass arguments to the function. We could build the stack by hand (assuming we knew the arguments), but then you don’t get the benefit of the compiler doing type and prototype checking.

The normal home for prototypes is in the header files that come with the library. The problem here is that they define functions as real “first-class” functions. If we used them, it would cause the compiler to leave a placeholder for the function which would never get resolved because we never link -lSDL. Thats a build failure. Obviously though, we need the headers as they have all the prototype information, as well as other things we’ll need like structure definitions.

Another problem we have is that we’re going to need many, many functions from this library. libSDL has almost 200 functions. While we won’t need all of them we can expect to need a fair few, so we need prototypes and calls to dlsym() for each one.

All this really has to be bruteforced. The method is to create a giant struct which has space to store many many pointers, and then, for each wanted function, call dlsym() and populate the list. Function pointers can be declared with the same name as a first-class function (as they’re not in the same namespace) and with a prototype. An example is SDL_SetVideoMode, which has the prototype:

    SDL_Surface * SDL_SetVideoMode (int width, int height, int bpp, Uint32 flags);

We can create storage for a function pointer with the same prototype like so:

    SDL_Surface * (*SDL_SetVideoMode) (int width, int height, int bpp, Uint32 flags);

Once we have a struct with all the function pointers declared and initialised, then we’d call a function in it like so:

    struct sdl_funcs *funcs = <allocate and initialise>;
    funcs->SDL_SetVideoMode(640, 480, 16, 0);

The “allocate and initialise” portion of that is a loop that runs through all the function names (stored in a big array), calls dlsym() on each and stows the returned pointer in the struct.

All this is heaps of setup, but it works very well. To help with the setup, I’ve written a script called soruntime. It takes a shared library and one or more header files as input. It scans the library (using nm) and extracts the names of all the functions that the library provides, then expands the headers (using cpp -E) looking for prototypes for those functions. Once it finds them, it outputs a header file with the library struct (ie all the prototypes), and a code file that has functions to setup and teardown a library.

I’m currently integrating this into my source tree for the SDL HIDD. It could (and probably will) be extended to the X11 HIDD as well, which will provide some uniformity and make it so that if we ever do get an X server ported to AROS, there will be no clashes.

Another thought. With a HIDD that provides facilities for a AROS program/driver to ask the host to load and provide access to a shared library, the graphics HIDDs would not have to be compiled into the kernel anymore and instead could just be standard pieces “inside” AROS. If the UnixIO HIDD was extended to provide better file access features, the other HIDDs (parallel, serial, and the emul filesystem handler) could be modified to use it and thus also be moved into AROS-space. This gives a tight kernel with basically no dependencies. I’ve started stubbing a hostlib.hidd which will expose dlopen() and friends to AROS for just this purpose.