Kill -3 openfire-pid not working

Henrik_Pfluger · July 29, 2008, 7:21pm

Hi,

I have had CPU and memory issues on our Openfire server (running on OpenSUSE 10.2, 2GB RAM) for a few weeks now. So far our solution was to simply restart Openfire every now and then to “solve” the problem. The problems occurred with only little load (100 concurrent users).

We are now trying to give it another shot at really solving the problem.

I have run top and identified a thread consuming about 70% CPU and 2 other threads consuming 20%.

I tried doing a kill -3 openfire-pid to get a stack trace and match it with the thread ids. The problem is, there is no nohup.opt being generated. I have used this command before on the same machine and it worked perfectly. How can I get a stack trace? Could one of the threads be the main thread blocking signals by being busy?

Henrik

Henrik_Pfluger · July 29, 2008, 8:54pm

Ok, I was able to solve this myself. Looks like I accidentally deleted the nohup.out file. So I restarted the server and after a while get the same CPU/memory problem.

The thread I identifed is

“Thread-859” daemon prio=10 tid=0x0891b400 nid=0x79fc runnable [0xc7fa7000…0xc7fa7f30]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at pl.mn.communicator.packet.HexDump.hexDump(HexDump.java:60)
at pl.mn.communicator.packet.HexDump.hexDump(HexDump.java:30)
at pl.mn.communicator.packet.GGUtils.prettyBytesToString(GGUtils.java:31)
at pl.mn.communicator.packet.handlers.PacketChain.sendToChain(PacketChain.java:74)
at pl.mn.communicator.DefaultConnectionService$ConnectionThread.decodePacket(Defau ltConnectionService.java:412)
at pl.mn.communicator.DefaultConnectionService$ConnectionThread.handleInput(Defaul tConnectionService.java:361)
at pl.mn.communicator.DefaultConnectionService$ConnectionThread.run(DefaultConnect ionService.java:337)

The other two threads seem to be the garbage collector

“GC task thread#0 (ParallelGC)” prio=10 tid=0x0805f400 nid=0x6796 runnable

“GC task thread#1 (ParallelGC)” prio=10 tid=0x08060400 nid=0x6797 runnable

Looks like the first thread is allocating/freeing lots of memory while keeping the CPU busy. I can see the changing memory consumption on the web console, too. It is jumping from 200MB to 400MB and back in a few seconds.

Anyone have any clue what the thread is doing? Seems like it is having problems with some incoming packets?

Henrik_Pfluger · July 29, 2008, 9:12pm

Looks like this is part of the GaduGadu transport of the gateway. We just disabled GaduGadu support and will see whether this fixes the problem.