I am stress testing our OPF 3.6.2 setup, and finding at around concurrent 250 logged in users in a chat room with Chat Room Logging enabled and Message Archiving enabled, that over time the server begins to degrade and throw OutOfMemoryErrors as follows. I increased the memory allocated to the OPF Java PID by adding the approp args to the startup (init.d) script as follows, but still the same result. Any ideas how to fix?
FYI, my load testing consisted of creating a Java Client based on Smack. Each new instance of the client joined the same MUC, and send a message every 11 seconds while logging the chat sessions as a server setting. I used three hosts to create three test scenarios by 100, 75, and 60 unique client connections. I found at 100 (x3 = 300 concurrent) and 75 (x3 = 225) connections, the java memory grew and failed. Scaling back to 60 (@ 180) is presently showing a stable loading test. I ended up adjusting the JVM params in /etc/sysconfig/openfire as follows and so far is working fine after researching more in the forum based on similar posts.
this could be a queue which is growing faster than Openfire can read. I think that Gato did post a plugin to monitor the queues and to size them. Anyhow I can’t find the thread or document right now.
Indeed, what you posted here was very helpful. I found his post and he refers to, “the load stats plugin to collect information about the MINA queues/buffers” as well as analyzing the memory dump. Even with the reduced load, the server crashes within the hour. I deleted all non-essential plugins and am retesting and reviewing this info.
Server is not holding up against stress testing, and is dropping clients as usage goes up. My latest test case shows 150 concurrent clients, sending messages at 11 second intervals, will eventually degrade the server, to about 37 connections with high memory overhead (~75%). The clients processes running from external nodes are not exiting. Server log messages as follows.
2008.12.19 18:43:08 No ACK was received when sending stanza to: org.jivesoftware.openfire.nio.NIOConnection@eb3e01 MINA Session: (SOCKET, R: /192.168.0.212:39288, L: /192.168.1.226:5222, S: 0.0.0.0/0.0.0.0:5222)
Out of curiosity, can you provide some more details on your Openfire setup? What sort of machine you’re running it on, which database you’re using, etc. Also, is there a particular reason you seem to be focusing your tests on MUCs (multi-user chats)? Having 150 users in a single room sending messages every eleven seconds would generate 110,000+ messages a minute, so if you’re machine isn’t able to process all those messages in a timely manner they’re going to back up and will consume all your machines memory.
Sure, our application is for user support. Typically users will create a trouble shooting ticket, and login to a MUC, where one or more support staff can then collaborate and message while recording conversations for a log transcript. My aim is to be confident many concurrent MUC sessions can be supported without crashing. Also to know how well OPF will scale performance wise as more and more MUC sessions are initiated, and what are the upper limits that may cause the server to crash. Knowing these details will be very important. I am a little fatigued mentally honesty, and now that you explained what my test case is doing, it makes sense to revise my load testing scenario.
public class XMPPLoadTester {
static public class ChatListener implements PacketListener {
public void processPacket(Packet packet) {
System.out.println(packet.toXML());
}
}
public static void main(String args[]) {
try {
//args[0] - host
//args[1] - uid
//args[2] - pwd
//args[3] - chat room
PacketFilter myFilter = new PacketFilter() {
public boolean accept(Packet packet) {
return true;
}
};
ConnectionConfiguration config = new ConnectionConfiguration(args[0], 5222);
XMPPConnection conn1 = new XMPPConnection(config);
ChatListener cl = new ChatListener();
conn1.connect();
conn1.addPacketListener(cl, myFilter);
conn1.login(args[1], args[2]);
MultiUserChat chat = new MultiUserChat(conn1, args[3] + "@conference.experts-exchange.com");
chat.join(args[1]);
String msg;
for (int i = 0; i < 10000; ++i) {
msg = (new Date()).toString() + " " + (new Double( Math.random())).toString();
chat.sendMessage(msg);
Thread.sleep(1100);
}
conn1.disconnect();
} catch (Exception e) { e.printStackTrace(); }
}
with this code you will have problems to generate useful load.
1st you only send messages, so you can not guarantee that the messages are also received. So it produces more or less useful results.
2nd you will have problems to scale. Which hardware do you use to start this program 100 times? Even if you reuse objects you are still using a client library which does not scale very good. You may have some difficulties to use a Smack based load test program to cause significant load for Openfire. Using one MUC and not p2p is probably a way to generate significant load but I wonder if this is a normal setup.
You may want to use a standard client and join a MUC during your load test. With more than one message per second you may have problems to keep track of the ongoing conversation.
I took a look at Tsung load testing, and boy was I not too thrilled with their Erlang dependency. I modified the Smack tester to add a incoming message sink, and that did produce useful load test, without the server crashing. I just hope some malicious community member does not decide to write a client that fakes a known good agent to crash production servers.
public class XMPPLoadTester {
static public class ChatListener implements PacketListener {
public void processPacket(Packet packet) {
System.out.println(packet.toXML());
}
}
public static void main(String args[]) {
try {
//args[0] - host
//args[1] - uid
//args[2] - pwd
//args[3] - chat room
ConnectionConfiguration config = new ConnectionConfiguration(args[0], 5222);
XMPPConnection conn1 = new XMPPConnection(config);
conn1.connect();
conn1.login(args[1], args[2]);
MultiUserChat chat = new MultiUserChat(conn1, args[3] + "@conference.mydomain.com");
chat.join(args[1]);
ChatListener cl = new ChatListener();
chat.addMessageListener(cl);
String msg;
for (int i = 0; i < 10000; ++i) {
msg = (new Date()).toString() + " " + (new Double( Math.random())).toString();
chat.sendMessage(msg);
Thread.sleep(1100);
}
conn1.disconnect();
} catch (Exception e) { e.printStackTrace(); }
}
}
A modest P4 was connecting that many times, with good loading numbers, but this unit accounts for one of three and is the slowest unit. I only need to show this will handle a decent load with long term stability.
OK, fyi 100 connections on a P4, with load average: 0.04, 0.21, 0.22. I’ll let it run for as long as possible and be watching for loading spikes or breakage.
Ah, loading did spike up to 250+, … looks like I need a new game plan =. [Follow-up note: reducing the number of clients to 50 works. I there’s now a total of 144 connections from three machines in this manner (e.g. 50, 50, 44).]
OK I took a look at the JHat Heap Histogram resulting from the heap dump obtained by -XX:+HeapDumpOnOutOfMemoryError Java flag, and found it is due to turning on Conversation Log for MUC and was the culpurit as follows.
Class
Instance Count
Total Size
class [C
476751
47412974
class org.jivesoftware.openfire.muc.spi.ConversationLogEntry
424379
11882612
class [B
16237
11616574
class java.lang.String
477956
7647296
Where ‘class [C’ :
References to this object:
com.sun.jmx.mbeanserver.OpenConverter$IdentityConverter@0xacc31988 (20 bytes) : field openClass
java.util.WeakHashMap$Entry@0xacc31948 (36 bytes) : field referent
java.io.ObjectStreamField@0xacc84678 (29 bytes) : field type
Any ideas why this is not flushing or how to force a flush (e.g. garbage collection)?
Indeed there is a way to flush the conversation logs objects. Group Chat->Group Chat Settings->Other Settings-> See Flush interval (seconds) & Batch size settings. I’m testing to see if this resolves the issue.
[Update: seeing this activity in the logs when the server gets to 97% usage. After, mem usage drops to 45% percentiles]
Latest update: OPF crashed reporting, “java.lang.OutOfMemoryError: Java heap space” as before. JHat heap dump reports 6343271 instances (consuming 50746168 bytes memory or 48M) of “class java.util.concurrent.ConcurrentLinkedQueue$Node”. Test case, based on smack library, ran approx 4 hours with 144 connections, 50 MUC, with approx 2 to 3 users per room.
nohup.log:
java.lang.OutOfMemoryError: Java heap space
72677206 [client-18] WARN org.jivesoftware.openfire.nio.ClientConnectionHandler - [/10.0.0.2:57053] Unexpected exception from exceptionCaught handler.
Bah … seems I discovered setting ‘Don’t Show History’ was necessary for my load testing. I am guessing every time a new message was sent, the history queue ring needed updating, and this was eventually hosing the server. Am retesting to verify. I see the load, java CPU and memory usage have dropped considerably!
P.S. Still not using tsung, but using the Smack based test programs mentioned earlier.