don't you die on me!

I use IRC a lot, both for work and personal stuff. I use bip, an IRC proxy, to keep me in my channels all the time and log stuff so that I never miss a thing. I run it on my home server and connect to it from XChat from work, home or wherever else I happen to be. It works well.

I also use IRC from my phone, using AndChat. I connect directly the networks and channels I’m interested in with that. It works very nicely and lets me keep track of things as I move around, which happens a lot. Unfortunately its at the mercy of the madness that is mobile connectivity, but that’s hardly its fault.

Lately though, I’ve had a problem. AndChat has been unable to hold a connection to Freenode. It will connect fine, but then after a little while if I go to send something, I find the connection has actually dropped in the background. AndChat dutifully reconnects, but by that time I’ve lost any conversation that was happening. It also meant that the other people in the channel were seeing lots of connects and disconnects from me. Its fairly normal for IRC, but it looks messy and I’m not keen on that.

The thing that I found curious through all of this was that my conenction to work’s IRC server never dropped. So its likely not AndChat at fault, but something lower down. I have been upgrading the Android version on my phone quite a bit, trying to find the “best” community version of ICS for it. Its likely there’s a change there.

After a lot of searching and piecing things together, the conclusion I’ve come to is that the particular build of Android (at least, maybe all 4.0.4 builds) don’t send TCP keepalives as often as they have in previous versions. Whatever interval is set is longer than the connection idle timeout set by my service provider. That is not a problem for work’s IRC server as it sends keepalives far more regularly. Freenode however does not seem to send any at all.

I pointed the phone at the bip proxy for both services, which sees them both losing connection. This appears to confirm my suspicions, and unfortunately also shows that bip doesn’t send keep[alives. Happily its open-source, so I can fix it. Into the code we go!

The way to enable keepalives on a socket is quite simple:

int on = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on));

Keepalives have three parameters: time the connection has to be idle before keepalives start, interval between keepalives, and number of keepalives sent without response before the the connection is declared invalid. These parameters are defined at the OS level for all sockets, and on Linux default to 7200, 75 and 9. That’s right, two hours idle before starting keepalives. Not at all suitable for what we need.

There’s no standard interface for changing this parameters on a per-socket basis, but Linux has helpfully provided its own socket options to allow Linux applications to do this. I’m hardly concerned with portability for this hack, so these are exactly what we need:

int idle = 60;
int interval = 60;
int count = 5;

setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, &idle, sizeof(idle));
setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, &interval, sizeof(interval));
setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, &count, sizeof(count));

That is, start sending keepalives after one minute idle, send every minute after that, and five missed responses mean the connection is dead. These seemed like reasonable numbers. I don’t want to ping too often as each ping makes the phone do work and thus use a little bit of battery. This seems to be working very well though!