A code generation bug

It took about six hours but I finally found the problem that is making Windows builds of unusable - its a code generation bug in GCC/MinGW 4.7.0. Unfortunately I’ve been unable to reduce it to a test case, so I’m left with having to ask gcc-users to use Pioneer to find the problem. I’ll try to find to time to write it up soon (which of course means it won’t get done). In the mean time, downgrading MXE to 4.6.3 to get the cross-builds working again.

GeoSphere.h:67: m_terrain->GetColor(...). m_terrain is correctly set to an instanced Terrain object, but once we arrive in GetColor, this points to the object’s vtable (so m_terrain->Terrain._vtbl, if you speak GDB). It doesn’t die because GetColor() only reads uses the (now random) stuff in m_fracdef but its far enough out of reasonable bounds that it takes seconds for GetColor() to return, resulting in it taking several minutes to generate the intial patches on game start, even for a simple system (eg Lave).

I had to drop down to assembly to see what was going on. From what I can tell, its assuming the GCC thiscall calling convention for the call to GetColor(), but as of GCC 4.7.0 its switched to using the MSVC __thiscall calling convention by default, which is what GetColor() is expecting. Its not the whole story though, because its still correctly expecting GetColor() to clean the stack before returning (which is why it didn’t just blow up). So it looks like there’s some bizarre edge case that causes it choose the wrong calling convention when setting up the call but not for the cleanup.

Unfortunately its been impossible to reproduce. This unlucky scenario involves virtual inheritance, templates, a static library link and references and doubles as args. Its a lot of variables, and I’ve tried several combinations in tests to try and reduce it for a bug report, but I haven’t been able to make it happen.

For now I’m sitting on it until I can figure out what to do with it. I’ve reverted the build environment to GCC 4.6.3 which has the old and working calling convention, and its all good for now. Hopefully someone somewhere bumps into it and it gets fixed all by itself!