woes.

Donnerstag, 16. Januar 2014,

Update 7: ejabberd has seen the light
Update 6: the author of yaxim does a much more complete analysis
Update 5: I've looked at more sending clients and more servers here
Update 4: Added another sending client to the list of tested software here
Update 3: I've written a new post linking to a few reactions here
Update 2: this post has gotten a few reactions. There will be a followup-post soon-ish
Update: made some more inline links to the bug reports

Intro (what is this about)

So for a long time I've had an XMPP (historically also known as Jabber) account on jabber.ccc.de. There are a lot of reasons for preferring xmpp over other protocols for instant messaging but I don't really feel like repeating stuff that has been said a million times.

Anyway, long after most other people I got myself a smartphone 3 years ago and ever since have struggled with the fact that even though in theory, xmpp should be much greater than all the closed alternatives, in reality and for mobile use it often isn't, which is why one can evangelise all one wants, people will continue using e.g. WhatsApp not just because of ignorance but because, sadly, it just works.

Now what do I mean by "it just works": It is well known that security-wise, WhatsApp is a nightmare, but, and this is an important point, messages reach their destination most of the time, and if they don't, there is some way the user finds out about that.

The problem with the open version of mobile xmpp (I presume WhatsApp is just some kind of modified xmpp internally) is that this isn't the case here.

A scenario. Alice wants to talk to Bob. Alice might be sitting in front of her computer or use her mobile phone, it doesn't matter, what matters is Bob is using his mobile phone. Now if Alice sends a message and Bob loses his network connectivity for a few seconds before receiving this message, the message is just lost when using xmpp. This comes as a surprise to a lot of people, since after all, xmpp is TCP/IP based, but it is still the case, since xmpp doesn't use any confirmation mechanism. This is, afaict, because the standard was written at a time when it was thought that the main use would be users that are all connected to the server via short, high-speed LANs, probably in the same building or so. Also, it takes a while for an xmpp server to realise that one of its users has dropped off the net, so Alice might see a that Bob has logged off .. but that will happen up to 15 minutes after he actually lost connectivity.

Bob might come back online after a few seconds, but, this is the awful part, neither the server nor Alice will ever know that he never received the message that was sent while he was away (because neither even saw that he was away). And if Bob has his mobile phone in his pocket he will also not notice. So there might be messages before and after the lost message that he does receive. It can become quite confusing.

There are extensions to the XMPP protocol which are labelled "XEP-something" where "something" is a number. Those extensions (there are quite a lot of them) are listed at this link. Some of those were created explicitly with the goal of solving the problem I describe above. The first interesting one here is XEP-0184: Message Delivery Receipts which allows clients to tell each other "yep, got that message". But the most important one is XEP-0198: Stream Management, which allows a client to tell the server "oh hey, I was gone for a few seconds/minutes there, got any messages for me?"

Overview of the current situation

So, case closed, right? Everybody implements XEP-0198 in their software and things are peachy!

Well no. Because they don't. Implement it, that is.

For this to work, the client (the app on the mobile phone) as well as the server have to support XEP-0198.

daemon software

In the server world, the situation is:

  • ejabberd, written in Erlang claims to be the "worlds most popular XMPP application server". Offers its own version of "mobile support", but only in the non-free "Enterprise Edition". The Open Source "Community Edition" has neither that proprietory implementation nor XEP-0198. A full breakdown of the XEPs that are supported is here. There is a longstanding bug in the ejabberd issue tracker that quite clearly shows that the developers are not interested in implementing this.
    Note that jabber.ccc.de uses ejabberd.
  • There is a fork of ejabberd (presumably of the community edition) on github, with the explicit purpose of adding XEP-0198 to ejabberd. The last development on this fork seems to have happened three years ago.
  • prosody, written in Lua, has support for XEP-0198 in a plugin which implements part of the spec (namely, what is still missing is the possibility for the server to commit lost messages to offline storage, currently it only implements the "send an error if receiver is offline for a time longer than timeout").

There are of course other jabber daemons but those are the ones I looked at.

client software

mobile clients for Android

I mainly looked at clients for Android since an Android phone is what I have access to.

Popular clients include

  • xabber. Does not support XEP-0198. There is a bug in the xabber-bugtracker that doesn't have any comments from developers and isn't assigned to anyone.
  • beem. Does not support XEP-0198. There is a bug in the beem bugtracker which was opened two years ago and has no activity beyond being changed from "bug" to "feature request" 9 months ago.
  • chatsecure, formerly known as Gibberbot, is probably the most feature-complete xmpp-app for Android, at least according to its specs. Supports OTR as well as XEP-0198.
  • yaxim supports XEP-0198 as well as XEP-0184.

linux clients

There isn't really any necessity for a client that doesn't use mobile networking or is stationary to implement XEP-0198, but that doesn't mean that we can ignore those clients. According to XEP-0198, if a server detects that a client is gone for good, it may react "by either returning an error to the sender or committing the stanza to offline storage."

That "error to the sender" is a stanza with the same message-ID as the original message, no content and is of type="error". One would presume that a client receiving such an error message would endeavour to inform its user but alas, this is not the case. I looked at the following clients:

  • BitlBee is a gateway that translates various chat protocols (including XMPP) to IRC, thereby allowing the user to have them all in his/her favourite IRC client.
  • psi pure XMPP client, graphical.
  • pidgin Multi-IM client, graphical.

my findings

I used my own ARCH Linux server and installed prosody and the XEP-0198 plugin on it. It has to be noted that the mod_smacks-plugin currently in AUR is not the current one and therefore doesn't work. Installing it by hand is easy, since it's just one file. Then I used one account for the desktop clients and one for the mobile clients to test scenarios of loss of connectivity for different lengths of time, as well as mobile to mobile chats.

I simulated loss of connectivity on the phone by first deactivating mobile networking (to take away the fallback option) and then turning off the wifi. What should happen is that when one re-enables wifi after a short time, the client should tell the server that he's back from an outage and should receive queued messages. A remote chat party shouldn't see the disconnect/reconnect. When the outage goes on for longer (in my tests at least 10 minutes, up to 15 minutes), the server realises something is amiss and starts his countdown. After the countdown is over (5 minutes default), the sender receives an error message from the server.

sending side

Of the senders tested (Bitlbee, Psi, Pidgin, Yaxim for mobile to mobile), initially none showed the error message to the user (see bug reports below).

Yaxim fixed that bug on the same day and now shows a red cross over the message concerned as well as an extra message in the chat stream that reads "recipient-unavailable(-1)" and has a red background.

BitlBee acknowledged the bug on IRC and discussed how they might fix it. Bitlbee has the problem that up untill now it doesn't generate message IDs and it is (AFAIK) not easy to change the way a line that has already been written to the IRC client is displayed after the fact. The discussion went in the direction of basically putting a timestamp into the message-ID, thereby allowing a message of the type "your message from 'time' could not be delivered" to be generated upon error. Since at least one developer has tested the problem together with me I presume the bug is being worked upon.

Psi have so far not reacted to the bug report.

Pidgin have so far not reacted to the bug report.

server side

Prosody implements XEP-0198 correctly, as far as I can see. Nevertheless, I would prefer if the second option ("or committing the stanza to offline storage") were available as well since that would increase comfort for the user (no need to resend messages). After discussion in the chat with Prosody developers, I filed a wishlist-bug for this.

receiving side

Yaxim works as advertised (after fixing the bug ). The version that incorporates these bug fixes should make its way to the Play Store "later this week".

Chatsecure works as advertised some of the time. And then it doesn't. And then it has trouble reconnecting. And then it takes 5 minutes to reconnect. And then it sometimes doesn't see contacts as online even though they clearly are and nothing can make it see the error of its ways. So even though on paper it should be the best client out there, it made my blood boil with anger ...

Xabber just plain doesn't support XEP-0198 so no point complaining about it losing messages. At least it quickly reconnects after connection loss.

Beem is extremely bad in that it not only fails to support XEP-0198, a loss of connectivity also causes it to display an error message and quit, not reconnecting automatically. So unless the user intervenes every single time that connectivity is lost, it won't work at all.

Conclusion

For now, Yaxim seems to be the best client out there (but be very aware that it doesn't encrypt your messages end-to-end, so unless you trust all the servers in between yourself and the recipient to correctly implement encryption and not store your messages, treat it as unsafe for sensitive content!). If it supported OTR as well, it would be perfect. The yaxim bugtracker has a bug about OTR and another one about AGP.

Prosody as server works as advertised. If you don't want to install one yourself, Yaxim offers an open Prosody server that supports XEP-0198 at yax.im.

Chatsecure could be great if they manage to fix all the annoyances detailed above.

Xabber is no contender unless it implements XEP-0198. Same for Beem.

I did not investigate the situation for iPhones since I don't own or have access to an iPhone. I would be happy to hear about it though.

bugs found and reported during this investigation

texts and people that had an influence on this post

  • Here is a great explanation on why XEP-0198 is needed.
  • people in the #yaxim channel on freenode were very responsive and helpful.
  • people in the #bitlbee channel on oftc were very responsive and helpful as well as delightfully snarky.
  • people in the prosody community chat (see here) were very responsive and helpful.
  • Peter Schwindt, on Twitter as @vautee, admin of jabber.ccc.de, has been discussing this issue with me for quite some time now and had great influence on me finally deciding to sit down and actually do some real tests besides complaining that "things don't work".