Concurrency Issues in SASLAuthentication.java

I’m simply trying to connect to an XMPP server with SASL authentication (using the Smack library), but I keep on getting the exception:

Resource binding not offered by server:

at org.jivesoftware.smack.SASLAuthentication.bindResourceAndEstablishSession(SASLA uthentication.java:416)

at org.jivesoftware.smack.SASLAuthentication.authenticate(SASLAuthentication.java: 331)

at org.jivesoftware.smack.XMPPConnection.login(XMPPConnection.java:395)

at com.benbria.xmpp.ServerClient.connect(ServerClient.java:32)

at com.benbria.xmpp.ServerClient.run(ServerClient.java:67)

at com.benbria.xmpp.ServerClient.main(ServerClient.java:81)

Looking into the code,

private String bindResourceAndEstablishSession(String resource) throws XMPPException {
        // Wait until server sends response containing the <bind> element
        synchronized (this) {
            if (!resourceBinded) {
                try {
                    wait(30000);
                }
                catch (InterruptedException e) {
                    // Ignore
                }
            }
        }         if (!resourceBinded) {
            // Server never offered resource binding
            throw new XMPPException("Resource binding not offered by server");
        }

so it seems that it sits and waits on the wait block, and times-out and throws the exception since resourceBinded is false.

The interesting part, the only place I see resourceBinded being set is in the bindingRequired method:

void bindingRequired() {
        synchronized (this) {
            resourceBinded = true;
            // Wake up the thread that is waiting in the #authenticate method
            notify();
        }
    }

The problem is if the wait in the bindResourceAndEstablishSession method is holding the lock on the instance, the call to bindingRequired can never wake up the initial wait. Meaning unless the bindingRequired is called first, the bindResourceAndEstablishSession will always wait 30s, even worse, if the bindingRequired loses the race condition for setting the resourceBinded to true, an exception gets called and the resource isn’t binded, which is what I think I am seeing when I run a debugger on the code.

The synchronization block in that part of the code seems to have been there for years, so I hope I’m just missing something. So if anyone can enlighten me, please do.

Thanks,

Dan

Hi Dan,

dule schrieb:

The problem is if the wait in the bindResourceAndEstablishSession method is holding the lock on the instance, the call to bindingRequired can never wake up the initial wait.

I think this not correct, (see Why wait() within synchronized context?). I also debuged it and it doesn’t lock on my machine. While the bindResourceAndEstablishSession method was waiting the bindingRequired was processed and resumed the wait.

But sorry, I don’t have an idea what goes wrong on your startup procedure.

Thanks Guenther. You are correct, I didn’t realize that a wait would release the lock.

So I did a little further debugging and it still seems to be a concurrency problem of another nature. I took a stack dump of all the threads (the signifcant stacktraces are shown below, the rest removed for brevity):

Full thread dump Java HotSpot(TM) Client VM (11.3-b02 mixed mode): "Smack Packet Reader (0)" daemon prio=6 tid=0x0b47b000 nid=0x1104 waiting for monitor entry [0x0bb2f000..0x0bb2fd14]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.jivesoftware.smack.XMPPConnection.getAccountManager(XMPPConnection.java:561)
        - waiting to lock <0x02e9b700> (a org.jivesoftware.smack.XMPPConnection)
        at org.jivesoftware.smack.PacketReader.parseFeatures(PacketReader.java:464)
        at org.jivesoftware.smack.PacketReader.parsePackets(PacketReader.java:309)
        at org.jivesoftware.smack.PacketReader.access$000(PacketReader.java:44)
        at org.jivesoftware.smack.PacketReader$1.run(PacketReader.java:76) "main" prio=6 tid=0x003a7000 nid=0x10b8 in Object.wait() [0x0090f000..0x0090fe50]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x02ea6858> (a org.jivesoftware.smack.SASLAuthentication)
        at org.jivesoftware.smack.SASLAuthentication.bindResourceAndEstablishSession(SASLAuthentication.java:406)
        - locked <0x02ea6858> (a org.jivesoftware.smack.SASLAuthentication)
        at org.jivesoftware.smack.SASLAuthentication.authenticate(SASLAuthentication.java:331)
        at org.jivesoftware.smack.XMPPConnection.login(XMPPConnection.java:395)
        - locked <0x02e9b700> (a org.jivesoftware.smack.XMPPConnection)
        at com.sandbox.xmpp.ServerClient.connect(ServerClient.java:32)
        at com.sandbox.xmpp.ServerClient.run(ServerClient.java:67)
        at com.sandbox.xmpp.ServerClient.main(ServerClient.java:81)

Although bindResourceAndEstablishSession does release it’s lock on the instance of SASLAuthentication, it still holds the lock to XMPPConnection, preventing the Packet Reader thread from parsing through the stream features. From what I’ve read, this problem is called a nested monitor lockout.

<stream:features>
        <register xmlns="http://jabber.org/features/iq-register" />
        <bind xmlns="urn:ietf:params:xml:ns:xmpp-bind" />
        <session xmlns="urn:ietf:params:xml:ns:xmpp-session" />
      </stream:features>

org.jivesoftware.smack.PacketReader.parseFeatures(PacketReader.java:464):

else if (parser.getName().equals("register")) {
        connection.getAccountManager().setSupportsAccountCreation(true);
    }

So when trying to parse the stream features, it begins parsing the register element, but the connection.getAccountManager() gets blocked trying to acquire the XMPPConnection instance lock held by the main thread waiting in bindResourceAndEstablishSession, and so cannot parse the bind element in time to wake the main thread.

Now did I overlook anything?

Dan

I’m trying to write a patch to fix it myself, but I don’t have a good enough understanding of the XMPPConnection class to make certain decisions.

As a temporary fix, I’ve put a second if-guard in the XMPPConnection.getAccountManager method to prevent the method from locking when it doesn’t need to (should be more efficient on repeated access to the method too, since it doesn’t need to synchronized beyond the first time).

public AccountManager getAccountManager() {
  if (accountManager == null) {
    synchronized (this) {
      if (accountManager == null) {
        accountManager = new AccountManager(this);
      }
    }
  }   return accountManager;
}

This only works if the original synchronize was to prevent multiple accountManagers from being created. If it was used for other reasons (e.g., preventing access to accountmanager while a login is in progress), then this breaks things…

And the only reason this fixes my problem is because accountManager get’s initialized at an earlier step before the stream features with my bind element, before the XMPPConnection.login method ever takes the lock. Technically there may still be a scenario for a nested monitor lock if the accountManager is not initialized at a prior step.

Is it possible for someone with JIRA create access to open a ticket for this problem?

Thanks,

Dan

As i said in live chat, i can create a ticket. But i’m waiting for a strong confirmation and a clear description of the bug. Now it looks like you are not sure if your fix isnt breaking anything. So, i’ll wait, maybe Guenther will reply. Anyway, the ticket won’t help much, as Jive developers dont have time to work on Smack/Openfire.

Hi Dan,

>       <stream:features>
>         <register xmlns="http://jabber.org/features/iq-register" />
>         <bind xmlns="urn:ietf:params:xml:ns:xmpp-bind" />
>         <session xmlns="urn:ietf:params:xml:ns:xmpp-session" />
>       </stream:features>

this looks for me a little bit strange, the stream supports no authentication? Neither non SASL nor a SASL mechanism is listed, I’m not sure what this means. Can you send more details, maybe a log of all send and received stanzas? Do you develop a Jabber server or what server do you use? It would be nice if I’m able to reproduce your problem.

I’m running against a Tigase server (4.1.0). The authentication features is offered at an earlier stage, here’s the full the conversation:

<?xml version="1.0"?>
<stream:stream to="dev.brass.local" xmlns="jabber:client"
xmlns:stream="http://etherx.jabber.org/streams" version="1.0">   <?xml version='1.0'?>
  <stream:stream xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'
  from='dev.brass.local' id='3ade629b-24f9-42ec-bf7f-218051c3c400' version='1.0' xml:lang='en'>
    <stream:features>
      <auth xmlns="http://jabber.org/features/iq-auth" />
      <register xmlns="http://jabber.org/features/iq-register" />
      <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls" />
      <mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl">
        <mechanism>PLAIN</mechanism>
        <mechanism>DIGEST-MD5</mechanism>
        <mechanism>CRAM-MD5</mechanism>
        <mechanism>ANONYMOUS</mechanism>
      </mechanisms>
      <session xmlns="urn:ietf:params:xml:ns:xmpp-session" />
    </stream:features>
        <auth mechanism="DIGEST-MD5" xmlns="urn:ietf:params:xml:ns:xmpp-sasl"></auth>
        <challenge xmlns="urn:ietf:params:xml:ns:xmpp-sasl">
    cmVhbG09ImRldi5icmFzcy5sb2NhbCIsbm9uY2U9IjJPcjNESFpQUUNkSkU5TEFhUGJGQUdGTlp1SzBBcm1qVER0ZHRJL1kiLHFvcD0iYXV0aCIsY2hhcnNldD11dGYtOCxhbGdvcml0aG09bWQ1LXNlc3M=</challenge>
        <response xmlns="urn:ietf:params:xml:ns:xmpp-sasl">
    Y2hhcnNldD11dGYtOCx1c2VybmFtZT0iYmxhemVjYXN0IixyZWFsbT0iZGV2LmJyYXNzLmxvY2FsIixub25jZT0iMk9yM0RIWlBRQ2RKRTlMQWFQYkZBR0ZOWnVLMEFybWpURHRkdEkvWSIsbmM9MDAwMDAwMDEsY25vbmNlPSJhbWJhWTNxNXZYWTdjbFJXaTcvTVJFNDFuT2huYjB2QjZZb1NhaFYzIixkaWdlc3QtdXJpPSJ4bXBwL2Rldi5icmFzcy5sb2NhbCIsbWF4YnVmPTY1NTM2LHJlc3BvbnNlPTk4YjBjM2QyN2EyNTI0YTIyNmQ0ZmM4N2JmYjBjNzI3LHFvcD1hdXRoLGF1dGh6aWQ9ImJsYXplY2FzdCI=</response>
        <success xmlns="urn:ietf:params:xml:ns:xmpp-sasl">
    cnNwYXV0aD0wMDlkYTQzYzk2ZTkzMThkOWNiYmYyODczMzhhMTUxZg==</success>
        <stream:stream to="dev.brass.local" xmlns="jabber:client"
    xmlns:stream="http://etherx.jabber.org/streams" version="1.0">
          <?xml version='1.0'?>
      <stream:stream xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams'
      from='dev.brass.local' id='3ade629b-24f9-42ec-bf7f-218051c3c400' version='1.0' xml:lang='en'>
        <stream:features>
          <register xmlns="http://jabber.org/features/iq-register" />
          <starttls xmlns="urn:ietf:params:xml:ns:xmpp-tls" />
          <bind xmlns="urn:ietf:params:xml:ns:xmpp-bind" />
          <session xmlns="urn:ietf:params:xml:ns:xmpp-session" />
        </stream:features>
      </stream:stream>
    </stream:stream>
  </stream:stream>
</stream:stream>

Thanks Guenther for taking a look at this.

Dan

Hi Dan,

Yes I can confirm the concurrency issue during the login process with a Tigase Jabber server. This bug also prevents to use Spark with Tigase (with enabled in-band registration) since Spark is based on Smack. I also don’t know so much about the XMPPConnection class and the smack login implementation that I can advice your patch. It fixes the login bug with Tigase but maybe there is a more adequate solution. If I have more free time the next days I will look at the login process in more detail.

Thanks for your bug report and patch!

I’ve checked the XMPPConnection and the AccountManager in more detail. In my opinion there is no need for the synchronization lock in the XMPPConnection#getAccountManager method at all since every critical operation is synchronized more fine-grained within the AccountManager itself (which is better ). Also the creation of the AccountManager doesn’t need this protection because in the worst case the info about the account creation service get lost and the AccountManager ask the server again. And for this worst case I don’t see a realistic scenario how this can happen and even if it happened, it would be no problem.

Did I overlooked something and there is a reason against the attached patch?

Best regards

Günther
getAccountManager.patch (575 Bytes)

Created the JIRA ticket SMACK-271.

Awesome, thanks for the time you put into this!

I’m a bit confused, was this patch supposed to fix the issue? I agree it fixes the immediate deadlock on the manager, but I’m still seeing concurrency problems in the original snipit so cannot login to my tigase server :

synchronized (this) {
if (!resourceBinded) {
try {
wait(30000);
}
catch (InterruptedException e) {
// Ignore
}
}
}

  •    if (!resourceBinded) {
          // Server never offered resource binding
          throw new XMPPException("Resource binding not offered by server");
      }*
    

if I step through this, resourceBinded will be true in the sync block, but false by the time we reach the 2nd* if (!resourceBinded)… *

If I manage to time it just right I can almost debug my way into a successful login, but its always thwarted by the* if (sessionSupported).*. that lies a bit further down.

any idea on eta for a fix for this?

Thanks

That’s odd. The only place resourceBinded is set to false is in the init() of that class. Meaning for the flag to switch from true to false would require that method to be called, which AFAIK is only in XMPPConnection.shutdown and the constructor of SASLAuthentication. I can’t imagine a flow path for that to occur. Could you put breakpoints and see if either are called after the sync block but before the 2nd check (i.e., try to locate when and where resourceBinded is being set to false).

I’ve committed my patch to the subversion trunk version.

sorry for the late response. I agree that the scenario I described does not seem possible, and after rebooting my problem vanished, so I’m putting it down to a very confused Eclipse.

Thanks for the update.

I now have a spring ServiceExporter using hessian over xmpp based on hikage’s spring/smack xmpp template (for now, component or s2s server option would be better) and working very well. If anyone else is interested I could tidy it up and make it available somewhere.

this problem has returned
i just checked out the latest trunk (in order to try the new pubsub stuff)

and this problem exists again. I can login against tigase with my older version, but not with the latest trunk.

also - I was unable to edit this site in chrome. had to switch to firefox.