A crash course in SASL and DIGEST-MD5 for XMPP

Introduction

XMPP requires the use of the SASL DIGEST-MD5 mechanism in order to authenticate clients. The RFC itself is difficult to follow in places, however, the actual functionality the clients are required to implement in order to successfully authenticate to a DIGEST-MD5 aware server are minimal.

It is the goal of this document to show the basics of SASL and DIGEST-MD5. Hopefully this will be enough to get SASL newcomers up and running.

Note that this document is not intended to be any kind of authorative documentation on XMPP, SASL or DIGEST-MD5. If you’re in doubt, read the relevant specs. This document assumes that you’ll never want to use the channel integrity and encryption features that DIGEST-MD5 provides.

Utility code

Clients need a few utility functions available:

MD5 hashing function
Base64 encoder
Base64 decoder

Startup

After the XML stream has been set up, the server will send a list of stream features that it supports. In amongst this will be a list of SASL mechanisms that are supported:

<stream:features>
  <mechanisms xmlns='urn:ietf:params:xml:ns:xmpp-sasl'>
    <mechanism>DIGEST-MD5</mechanism>
    <mechanism>PLAIN</mechanism>
    <mechanism>KERBEROS_V4</mechanism>
  </mechanisms>
</stream:features>

If DIGEST-MD5 is not here, then the rest of this document will be fairly useless. However, its reasonable to assume that the majority of servers out there will have DIGEST-MD5 available, since XMPP requires servers to implement it, and it has no known security holes.

To initiate the authentication exchange, the client sends a SASL authentication request, selecting DIGEST-MD5 as the desired mechanism:

    <auth xmlns='urn:ietf:params:xml:ns:xmpp-sasl' mechanism='DIGEST-MD5'/>

Step one - challenge from server

The server will respond by sending a challenge, something like this:

<challenge xmlns='urn:ietf:params:xml:ns:xmpp-sasl'>
    cmVhbG09ImNhdGFjbHlzbS5jeCIsbm9uY2U9Ik9BNk1HOXRFUUdtMmhoIi
    xxb3A9ImF1dGgiLGNoYXJzZXQ9dXRmLTgsYWxnb3JpdGhtPW1kNS1zZXNz
</challenge>

The contents of this challenge is encoded using Base64, and might look like this when decoded:

realm="cataclysm.cx",nonce="OA6MG9tEQGm2hh",qop="auth",charset=utf-8,algorithm=md5-sess

The challenge can then be parsed out into a number of comma-seperated key-value pairs (though not all require values, as indicated below). Brief descriptions are as follows:

realm (zero or more occurences)

If present (one or more), a list of authentication realms that the client may attempt to authenticate into. If not present, the client should ask the user to specify a realm.
nonce (one occurence)

An opaque string that is generated by the server. The client must send this back to the server as part of its response, which helps to prevent replay attacks. If the server doesn’t send a nonce, the client should abort the exchange.
qop (zero or one occurence)

A list of “quality of protection” values supported by the server. One of the returned values should be “auth”, to indicate the server supports authentication. Other values (“auth-int”, “auth-conf”) may be returned, but are outside the scope of this document. If the value “auth” is not present, the client should abort the exchange.
charset (zero or one occurences)

If present, specifies that the server supports UTF-8 encoding for the username and password. If not present, the username and password must be encoded with ISO 8859-1. This is only needed for backward compatibility with HTTP Digest, and its unlikely that a XMPP server will ever not send it. If there’s more than one, the client should abort the exchange.
algorithm (one occurence)

Required for backward compatibility with HTTP Digest (which can use other algorithms besides MD5). It can be ignored, but if the server doesn’t send it, or sends it more than once, the client should abort the exchange.

Step two - response from client

The client is required to respond with the appropriate credentials (Base64 encoded, of course). An example response might be:

<response xmlns='urn:ietf:params:xml:ns:xmpp-sasl'>
    dXNlcm5hbWU9InJvYiIscmVhbG09ImNhdGFjbHlzbS5jeCIsbm9uY2U9Ik
    9BNk1HOXRFUUdtMmhoIixjbm9uY2U9Ik9BNk1IWGg2VnFUclJrIixuYz0w
    MDAwMDAwMSxxb3A9YXV0aCxkaWdlc3QtdXJpPSJ4bXBwL2NhdGFjbHlzbS
    5jeCIscmVzcG9uc2U9ZDM4OGRhZDkwZDRiYmQ3NjBhMTUyMzIxZjIxNDNh
    ZjcsY2hhcnNldD11dGYtOCxhdXRoemlkPSJyb2JAY2F0YWNseXNtLmN4L2
    15UmVzb3VyY2Ui
</response>

The decoded form of this is:

username="rob",realm="cataclysm.cx",nonce="OA6MG9tEQGm2hh",cnonce="OA6MHXh6VqTrRk",nc=00000001,qop=auth,digest-uri="xmpp/cataclysm.cx",response=d388dad90d4bbd760a152321f2143af7,charset=utf-8,authzid="[email protected]/myResource"

The meaning of the values described here are as follows:

username (one occurence)

The user’s name in the specified realm, encoded according to the value of the “charset” directive sent by the server. Must be present once, or the authentication fails.
realm (zero or one occurence)

The authentication realm that this user’s account is in. This is required if the server specified realms in the first challenge, and should be set to one of those realms. If this is missing, it will be set to the empty string.
nonce (one occurence)

The string specified by the server in the nonce option during the first challenge. If missing or specified more than once, authentication fails.
cnonce (one occurence)

An opaque string that is generated by the client. The server will send this back to the client as part of future challenges, which helps to prevent replay attacks. If missing or specified more than once, authentication fails.
nc (one occurence)

This is the hexadecimal count of the number of responses (including this one) that the client has sent with the nonce value in this request. This is used to help the server detect replays during subsequent authentication, and is not used here. Set it to “00000001”.
serv-type (one occurence)

Indicates the type of service. This should be set to “xmpp”.
host (one occurence)

The DNS hostname or IP address for the service requested (though the DNS hostname is preferred). For XMPP, this should be set to the hostname of the remote server.
digest-uri (one occurence)

The full name of the service that the client is trying to connect to, formed from the serv-type and host options. For example, the XMPP service on “jabber.org” would have a digest-uri value of “xmpp/jabber.org”.
response (one occurence)

A string of 32 hex digits (with the alphabetic characters lower-cased) that proves the user knows the password (see below for details of how to compute this). If missing or specified more than once, authentication fails.
charset (one occurence)

If present, specifies that the client has used UTF-8 encoding for the username and password. If not present, the username and password must be encoded in ISO 8859-1.
authzid (one occurence)

The “authorization ID”, encoded in UTF-8. This should be the full JID (including resource) of the session that the client wishes to start once authentication is complete.

Computing the response value

This is where the magic happens. The value of the response directive is computed as follows:

Create a string of the form “username:realm:password”. Call this string X.
Compute the 16 octet MD5 hash of X. Call the result Y.
Create a string of the form “Y:nonce:cnonce:authzid”. Call this string A1.
Create a string of the form “AUTHENTICATE:digest-uri”. Call this string A2.
Compute the 32 hex digit MD5 hash of A1. Call the result HA1.
Compute the 32 hex digit MD5 hash of A2. Call the result HA2.
Create a string of the form “HA1:nonce:nc:cnonce:qop:HA2”. Call this string KD.
Compute the 32 hex digit MD5 hash of KD. Call the result Z.

The resultant string Z should be sent to the server as the value of the “response” directive.

Step three - challenge from server

The server will now authenticate the client based on the presented credentials. If the authentication failed, the server will return:

<response xmlns='urn:ietf:params:xml:ns:xmpp-sasl'/>

If it was successful, the server will send a final challenge:

<challenge xmlns='urn:ietf:params:xml:ns:xmpp-sasl'>
    cnNwYXV0aD1lYTQwZjYwMzM1YzQyN2I1NTI3Yjg0ZGJhYmNkZmZmZA==
</challenge>

This challenge, when decoded, will contain a single directive “rspauth”. This directive is not useful for the purposes of this document, and may be safely ignored.

Step four - response from client

The client should respond with an empty response:

<response xmlns='urn:ietf:params:xml:ns:xmpp-sasl'/>

Step five - result from server

At this point, the server should inform the client of successful authentication, like so:

<success xmlns='urn:ietf:params:xml:ns:xmpp-sasl'/>

Starting a session

The connection is now authenticated. The client must now request an IM session by sending the following:

<iq type='set' id='sess_1'>
  <session xmlns='urn:ietf:params:xml:ns:xmpp-session'/>
</iq>

Assuming that your username/realm has allowed to start a session as the authzid that was specified during the SASL handshake, the session start request will succedd:

<iq type='result id='sess_1'/>

The session then continues as normal - get roster, send presence, etc.