Unreliable LDAP authentication?

I just setup a 3.6.4 Openfire install on Ubuntu 9.04, authenticating against Active Directory. About 25% of login attempts fail. Sometimes it will fail several times before authenticating a login successfully. I also have an existing 3.1.1 Wildfire/Openfire install on Ubuntu 6.06 that has never had this problem. Authenticates every time.

Any ideas? Has anyone else had similar issues?

I’m experiancing the same problem. I’ve have a very stable 1,000 concurrent user v3.6.3 OpenFire instance authing against Active Directory for months. But standing up a test instance of v3.6.4 OpenFire, I’m getting intermittent login failures.

The only fix related to LDAP in 3.6.4 is JM-1516. Maybe it introduced the bug.

http://www.igniterealtime.org/builds/openfire/docs/latest/changelog.html

What information can I provide to help identify the bug?

Well, at least the error log from Openfire. I dont really know what this patch fixed and how the whole thing is working. It was just a suggestion for someone with a java skills to check in that direction.

Here is the error log.

We’re running on a RHEL5.3 64bit system, using the OpenFire RPM package (with included JRE).
error_log_censored.zip (7007 Bytes)

Has anyone resolved this issue yet? I am having the same problem.

Any news on this? I’m surprised more people haven’t run into this problem. We’re still using 3.1 and stuck there until 3.6.4 can authenticate every time.

I’ve been working on trying to debug this problem for several weeks (off and on), in order to deploy a new server on our internal network. Same symptoms–we’re using Active Directory and logins fail intermittently. I’ve verified that when there is a failure, the Windows server never gets a request–I used Network Monitor to watch packets. When a login failure occurs, it is due to some low-level network connectivity issue or a bug in establishing a request from the Openfire server, not from a true authentication failure on the server or in processing the reply. The LDAP server never gets the request.

I’ve tried several things which I thought might fix the problem, but it just recurred again–it’s hard to verify that an intermittent problem is actually fixed since sometimes it seems to work fine for days and then fails again.

Today I hit on a new theory that I’m just testing now. Figured I’d post it just in case “this is it” and it works for someone else.

I realized that our WS2003 system actually has a name that resolves to two separate IP addresses: 192.168.50.2 (the right one) and 192.168.50.25 (the one that’s allocated by RRAS for VPN connections). DNS returns both addresses, and the order flip-flops from query to query.

Port 389 (ldap) seems to listen on both addresses, but I have a gateway/firewall sitting between the Openfire server and the Windows server that has a hole punched for port 389 on the primary static IP, and I never opened that port up to the other IP address used by RRAS. That wouldn’t even be practical since RRAS allocates that IP address from DHCP and it might change from time to time. Any connection attempts to ldap at the RRAS server address would fail. So… My theory is that OpenFire does a DNS lookup for the hostname, sometimes finds the RRAS IP address, tries to connect but the port isn’t open, and fails.

The solution (if this is actually the issue here) is to change the ldap.host parameter to the primary IP address of the Windows server rather than using the FQDN and thus bypass the round-robin DNS.

I just implemented this and it’s working fine right now. I’m not 100% certain this fixes it, however, due to the intermittent nature of this error. I’d love feedback from you guys if you find that your networks have similar configurations and this is a possible explanation for your problems.

Thanks forsharing your findings. I think you may be on to something. We have the same type of round robin dns setup for our domaincontrollers.

I just made the change you suggested, so I’ll let you know how it goes. It seems to be working right now though.

If you are running AD for your source of LDAP, you can set the ldap host to the domain instead of a particular server. Then openfire will authenticate against the nearest AD domain controller. Secondly if you have a server with 2 NICs but different functions you can unlink the NICs so they can have sepparate names in DNS to prevent invalid lookups. There is no reason to resolve the name to 2 sepparate IP addresses. In fact that will cause many issues.

This fixed it. Thanks ddeppner!

Outstanding detective work!!! Looks like it fixed it for me as well. Thank you so much for sharing your findings!

Having openfire will authenticate against the nearest AD domain controller seems like very good implemenation design. What would the syntax of using the Domain instead of the Hostname for LDAP authentication against A.D.? Would you mind posting a quick example?

it’s worked fine for a time then stopped i think there is a bug can say for sure but sso worked fine for a few days by us the it failed no setting changed just failed