i'm doing it wrong

On the way home today I started poking at the bitmap class, looking for fallback methods that were calling PutImage() or PutPixel() multiple times, triggering multiple flushes and slowing things down. I found that most images are drawn by a function called PutAlphaImage() which, as the name suggests, blits a rectangle to the bitmap taking the alpha channel on the source data into account. The fallback method does its usual line-at-a-time operation, applying the alpha itself. As usual, works, but is slow.

The one thing I didn’t want to do was copy the superclass code wholesale, just changing the PutImage() and GetImage() calls to direct pixelbuffer access. It would work, but that kind of code duplication is really bothering me (and I’m considering a more permanent fix for that problem, which I’ll write about later). So I started to read through the SDL documentation to find out what it could do with blits and alpha things.

The way I ended up implementing it was to create a SDL surface out of the source pixelbuffer passed to PutAlphaImage(), using SDL_CreateRGBSurfaceFrom(). This function is pretty simple - you pass a pointer to the raw memory data, the dimensions of the data, and the RGB and alpha masks, and you get a surface back. For PutAlphaImage(), the masks are fixed, so they can be hardcoded. Once the surface is obtained, it can be blitted to the target surface using SDL_BlitSurface(), and then discarded. Creating a surface from existing data is an exceptional lightweight operation, as the original data is used - no copying done. Freeing the surface leaves the original data intact, so really its just allocating and destroying a SDL_Surface.

By letting SDL do the copy, you get all the benefits of the SDL video engine, which at is core is a hand-optimised blitter, with special SSE2/Altivec/etc versions that get used where appropriate. Basically, its faster than any code I’m ever going to write, and it shows - icons and decorations, the two big users of PutAlphaImage(), now appear instantly.

So I committed that and went looking for more speedups. I noticed that windows were drawing a little more slowly than I liked. When a window appears, the window outline is drawn first, then the theme pixmaps, scrollbars, etc blitted over the top. The outline draws a line at a time (which you can see with debugging on), the pixmaps go fast due to the above changes. I traced this code and, as expected, found multiple calls to PutImage(), but this time they were coming from .. PutImage() itself.

This threw me for a moment until I looked at my PutImage() implementation. Currently it does what most of the other drivers do. It checks the pixel format for the data to be blitted, and if its Native (same format as the target surface) or Native32 (same format as the target surface, but with every pixel in 32 bits so they need to be “compressed” as they’re copied to surfaces with lower depths). Anything else gets referred to the line-at-a-time superclass method, which will do the appropriate format conversion. This is what was happening in this case.

My great revelation was that SDL does pixel format conversion natively when blitting, and its almost certainly going to be faster than graphics.hidd can do, even without the line-at-a-time overhead. All I have to do so is supply the appropriate masks for the pixel formats, which are easily obtained from the bitmap object’s PixFmt attribute.

Time to stop waffling and write some code :)