New wildfire implementation showing java memory errors

Hi,

as you are using Windows it should help to specify “-Xss128k -Xoss128k” along with “-Xmx512m” in the wildfire-service.vmoptions file. Lower values may be better (96k or 64k) to get more threads but no one did test them as far as I know.

You could also use two or more connection managers if setting Xss does not help.

LG

Thanks for the tip.

We have added the following items to the wildfire-service.vmoptions file and restarted the process:

-Xms512m

-Xmx1024m

-Xss128k

-Xoss128k

Upon startup (and now), the java memory usage is showing at

267.11 MB of 986.12 MB (27.1%) used

We are still getting the errors - this time with only 328 or so sessions.

The machine has 2 cpu’‘s and 2gig of memory. The OS shows memory usage very low, so it’'s not running out at the OS level.

You mention taking the Xss and Xoss values even lower - can you explain the relationship of those to Xmx and the sessions?

How would the connection managers help with this?

Thanks much for the help - i’'m really in the hot seat here…

Pat

Hi Pat,

visit http://www.tagtraum.com/gcviewer-vmflags.html or similar pages for an explanation of the parameters.

Xss: Sets the maximum native stack size for any thread

Xoss: Sets maximum Java stack size for any thread

Increasing Xmx will not help you to solve your problem as you don’'t get normal OutOfMemory errors but “unable to create new native thread” errors.

The java.exe divides the memory in the java heap which you set with Xmx and the native heap, I have no idea about it’'s size and whether it is a fixed size. Increasing Xmx (the java heap) may also steal some native heap memory, so a lower Xmx could help.

As every thread uses some native heap it makes sense to decrease Xss to 96k, this should help if it not to small.

LG

Thanks again for the feedback.

There is an interesting article here: http://support.bea.com/application_content/product_portlets/support_patterns/wls /InvestigatingOutOfMemoryMemoryLeakProblemsPattern.html#Java

Lots to digest there…

I just assumed that Wildfire under Windows would handle my modest (albiet not small) implementation out of the box. We were going to stress test, but it was very hard to set up a test AD domain to handle the authentication part.

Do you think it would prudent to swing this all over to a Linux implementation? I see notes about people running much larger Wildfire servers.

I guess I’‘ll have to plan some trial changes and see how it goes’’ - but it’'s not pretty. My user community is ready to abandon the wildfire implementation all togethar at this point - they can be so fickle! ( I love it when my pursuit of jabber nervana ends up turning my user community off of jabber)

Thanks again.

Pat

A bit of an update.

We turned off the memory tweaks - just removed the file - to get back to square one and watch.

In addition, we completly disabled the subscription plugin we had been using, as well as disabling external components (that were not in use)

We have lasted longer, with only intermittant memory errors. We have 560-580 active connections, with only 14 memory errors. Interestingly, it was two series of 7 events within minutes of each other, followed by periods of no errors.

I did notice that these errors are also showing up in the stderror.log file on the log directory - with additional detail. The entries in this log have a one to one relationship with the memory items in the error log. Perhaps they have additional ‘‘clues’’ for those better at Java environments.

Exception in thread “Client SR - 13346867” java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method)

at java.lang.Thread.start(Unknown Source)

at com.sun.jndi.ldap.Connection.(Unknown Source)

at org.jivesoftware.wildfire.ldap.LdapManager.checkAuthentication(LdapManager.java :346)

at org.jivesoftware.wildfire.ldap.LdapAuthProvider.authenticate(LdapAuthProvider.j ava:93)

at org.jivesoftware.wildfire.auth.AuthFactory.authenticate(AuthFactory.java:127)

at org.jivesoftware.wildfire.net.SASLAuthentication.doPlainAuthentication(SASLAuth entication.java:336)

at org.jivesoftware.wildfire.net.SASLAuthentication.handle(SASLAuthentication.java :172)

at org.jivesoftware.wildfire.net.SocketReadingMode.authenticateClient(SocketReadin gMode.java:117)

at org.jivesoftware.wildfire.net.BlockingReadingMode.readStream(BlockingReadingMod e.java:136)

at org.jivesoftware.wildfire.net.BlockingReadingMode.run(BlockingReadingMode.java: 62)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:123)

at java.lang.Thread.run(Unknown Source)

Thanks again.

Pat

Hi,

“com.sun.jndi.ldap” could seem to speak for itself, but you should have only one or two threads with an LDAP connection.

A full thread dump sent to Gato may help him to check what’'s going on.

See http://tmitevski.users.mcs2.netarray.com/trace.do and http://www.adaptj.com/root/main/tracehowtos for a documentation and program how to get one.

LG

Hey Pat,

Which Wildfire version are you using? What other plugins are you using? That information plus a thread dump is what we need to find out what is going on.

Thanks,

– Gato

Howdy all.

I’‘ll work to get the thread dump - I don’'t have JDK installed on the server.

Is this processes invasive (aka - will the server go off-line?)

Install particulars:

Wildfire 3.0.1

Plugins installed:

broadcast - active

Presence service - active

Search - active (makes me wonder if this puppy isn’'t making tons-o-ldap-calls)

Subscription - inactive (we turned this off to see if it made things worse)

User Import - inactive (won’'t work with LDAP/AD)

Database MS Sql 8.00.0760

Authentication to Active Directory:

(config stanza:

)

Thanks again. Thread dump tomorrow.

BTW - some interesting links we’'ve dug up. One has a nice table about threads and the memory settings.

http://forum.java.sun.com/thread.jspa?threadID=605782&messageID=3360044

http://jroller.com/page/rreyelts/20040909

Pat

Hey Pat,

Could you remove inactive plugins so they are not dangling in memory?

Just trying to isolate variables.

Thanks,

– Gato

Hi all,

We expirienced the same problems about one year ago. We found that sometimes -Xss option is unable to set stack size lower than 1Mb if you using JRE. If you use JDK binaries of java all is fine. May be this information is not up to date, but we still using JDKs and all is fine

Thx,

Tim

Ok - done, although the subscription one is having issues unloading.

Repeated in the log over and over:

2006.08.25 08:59:09 [org.jivesoftware.wildfire.container.PluginManager$PluginMonitor.run(PluginMana ger.java:872)

] Error unloading plugin subscription. Will attempt again momentarily.

I should probably let the plug-in developer know about that one.

P

Hello superhelpers…

Ok - I’'ve got stacktrace running, and have made my first attempt at a thread dump.

Something tells me this is not what you want. I must say that I have no experience coding in Java, so please excuse my ignorance. I did not sent a ctrl-break as I did not want the process to kill the server. If I need to do that, I’'ll need to schedule the outage.

Here is what I obtained.

Full thread dump Java HotSpot™ Client VM (1.5.0_07-b03 mixed mode, sharing):

Pat

Message was edited by: pmalone@lsil.com

Hi,

I bet that this is not a stacktrace of Wildfire.

If you can use Ctrl+Break then use it. There are some buggy database drivers around which close all connections but normally it works fine. It’'s just a matter of IO to write all the stacks to disk than something else. So if you have 500 threads then you may write two MB to disk, this will take some time.

After startup with 20 threads it will take just a second.

LG

Hello all.

I finally have what I think is a thread dump.

It’‘s somewhat big (13k lines), so I won’'t post it here. To whom should I send this?

I can make it available via ftp if that would help.

Thanks!

Pat

gaston@jivesoftware.com should be fine.

Well, I sent several thread/memory dumps along but we haven’'t made any progress.

It just seems that I can’'t get the Java process to grab enough memory, even though it is out there to be had.

(from the dump:

======

Memory

======

Used: 171515408 (~164MB)

Free: 5890544 (~6MB)

Total: 177405952 (~169MB)

Max: 517013504 (~493MB)

)

If anyone has any suggestions for me, I’‘d love to hear it - I’'m pretty much out of ideas short of abandoning wildfire.

Are there other people sucesfully running this in a Windows environment with 1000+ concurrent connections? Everything I’‘ve read says it should handle 1000-2000 with no problem. Yet I can’'t get over 560.

Are the the bigger implementations doing this under Linux vs Windows?

It looks like if I wanted to use the connection manager, I would need a separate machine? Is that accurate? If so, that’'s … total overkill imho. My system should easily handle this application (my old jabberd1.4 is a 5 year old desktop and it handles the load)

Pat

An update for anyone still paying attention to this lost thread…

It would seem that the biggest ‘‘obstical’’ has to do with the NT authentication.

The wildfire server opens a separate LDAP connection for every user authenticating, and holds it open for 5 or more minutes.

So - I have 400+ users authenticating at once (say after a restart), that’'s an ldap thread for each that lingers and lingers, plus the standard C2S thread.

I’‘ve come to the conclusion that enterprise level wildfire that includes NT/LDAP authentication just isn’'t ready for prime time.

It sounds like you’'re getting to the bottom of the issue. One thing that it would be good to try is enabling LDAP connection pooling.

in the conf file. You can read more about connection pooling at:

http://java.sun.com/products/jndi/tutorial/ldap/connect/pool.html

Connection pooling will help with LDAP operations that load user or group data. It will not help with authentication operations since a unique connection has to be used for every auth check (an LDAP limitation).

However, I’'m very confused as to why connections are being held open. They should be closed as soon as authentication is done. Can you paste in a few traces of the threads that are holding LDAP connections?

Regards,

Matt

Thanks.

We’'ll look into connection pooling, but since it seems to be the authentication requests that are lingering, it might not have much impact.

What type of trace do you want to see? A packet leve trace (wireshar) or something out of the stack trace tool?

One interesting part of this - and one I’'m looking into more - is that not only does it keep active TCP/IP LDAP connections, but then those sit in CLOSED WAIT state for a very long time.

We are starting to wonder if the fact that we have a firewall between the Wildfire server and the LDAP/AD server is in play.

Also: One person suggested we actually locally compile Wildfire with the JDK. Does that suggestion make sense to you?

Thanks

P

What type of trace do you want to see? A packet leve trace (wireshar) or something out of the stack trace tool?

I want to see if there’‘s anything in the thread dump that points to an LDAP connection left open. That will tell us if it’'s a problem in Java-land or something in the O.S. or firewall.

One interesting part of this - and one I’'m looking into more - is that not only does it keep active TCP/IP LDAP connections, but then those sit in CLOSED WAIT state for a very long time.

We are starting to wonder if the fact that we have a firewall between the Wildfire server and the LDAP/AD server is in play.

Interesting. That could definitely be a potential issue. Is it possibly also an operation system problem?

Also: One person suggested we actually locally compile Wildfire with the JDK. Does that suggestion make sense to you?

It doesn’‘t make any sense to me. In fact, I’'m not sure what they are talking about.

-Matt