Broken wire protocol
Always interesting to find weird broken shit in wire protocols. This from a hopscotch debugging session:
$ curl -I http://www.ibm.com/social-business/us/en/newway/images/email/ipad_image.jpg
HTTP/1.1 200 OK
...
epKe-Alive: timeout=10, max=76
All other headers are correct.
So I started poking around a bit:
$ curl -I http://www.ibm.com/social-business/us/en/
HTTP/1.1 200 OK
...
Kp-eeAlive: timeout=10, max=81
$ curl -I http://www.ibm.com/
HTTP/1.1 302 Moved Temporarily
Server: AkamaiGHost
Dunno where the problem is. It doesn’t actually affect me. It did make me smile, though that might just be memories of working with Lotus Domino and its myriad protocol brokenness. Maybe its all IBM software? ;)
Update 2025-04-03 (yes, really, 11+ years later)
I received a lovely email from Mordy Ovits, who has been lately been brave enough to tackle my entire back catalogue of blog posts. They were kind enough to tell me exactly what I was seeing here. Here is their explanation, lightly adapted with their permission:
It’s middleware boxes, typically load balancers, disabling HTTP headers that interfere with their work. The first clue is that the “brokenness” is only in HTTP headers (e.g. Keep-Alive, Connection) relating to connection management, a LB’s bailiwick. When the LB sees an HTTP header it doesn’t like, it disables the header by breaking it.
They go on:
The LBs in question are packet devices, mostly working on individual datagrams. They’re a little smarter than just packet-by-packet, but they are decidedly not working on a fully decoded TCP stream that can be manipulated at will and then written to another TCP socket towards the client of the LB.
The actual optimization being done is in the second clue: the precise way the headers are mangled. In your post you noticed that the Keep-Alive header was mangled two different ways:
epKe-Alive: timeout=10, max=76 Kp-eeAlive: timeout=10, max=81
It’s mangled by swapping two adjacent 16-bit words. Which 16-bit word depends on alignment, so you saw two slightly different swaps, but they’re both 16-bit swaps. LBs do it this way, instead of just overwriting the data or removing it, because it means they do not need to recalculate the TCP checksum in the packet! IP and TCP checksum works by summing the 16-bit words. Since the order in which the words are added has no effect on the result (commutative property), swapping 16-bit words does not invalidate the TCP checksum like naive overwriting or deletion would. Less per-packet work for the LB to do.
That’s actually pretty clever! I’ve seen (and used) “no-op” bit flips, overwrites, etc to change behaviour when the length of the data can’t be changed, but I don’t think I’d ever considered it in relation to checksums.
Thank you, Mordy!