History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: JM-343
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Daniel Henninger
Reporter: Matt Tucker
Votes: 20
Watchers: 12
Operations

If you were logged in you would be able to see more operations.
Openfire

Improve connection pool recovery logic

Created: 07/22/05 10:45 AM   Updated: 03/26/08 08:33 AM
Component/s: Core, Database
Affects Version/s: 2.2.0 Beta 2
Fix Version/s: 3.4.5

Time Tracking:
Not Specified

Resolution Date: 01/12/08 07:55 PM


 Description  « Hide
Improve the connection pool recovery logic if possible. More info at:

http://www.jivesoftware.org/forums/thread.jspa?threadID=15165



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Gaston Dombiak - 07/22/05 11:41 AM
Matt,

Last week I was thinking about this same issue and discussed an idea with Bill. Basically, the idea was to create a DbCommand class that will encapsulate the logic of getting a connection from the pool, executing the query and returning the connection to the pool. When trying to execute the query a try/catch block would be used to detect SocketExceptions so if a socket exception was caught then a new connection from the pool will be obtained and the previous one is discarded. We may use a times_to_retry_limit if we want or just retry until a good connection is found thus cleaning up dead connections in the pool.

I would use an optimistic approach instead of a pesimistic one. Where optimistic approach = assume that the connection is good, try to execute the query and if the socket is dead then try to recover. A perimistic approach would be to assume that the connection is dead so a check-query should be performed before obtaining or using a connection from the pool. Note: The optimistic approach has an important overhead.

The DbCommand class would have 2 methods that subclasses would need to implement. #configureStatement and #processResultSet. A nice side effect of this class is that we won't need to remember to use a tr/catch/finally block to get and return connections to the pool and remember to close the result set and statement.


Gaston Dombiak - 07/22/05 11:43 AM
Err: Note: The optimistic approach has an important overhead. --> Note: The pesimistic approach has an important overhead.

Matt Tucker - 07/22/05 12:19 PM
Ack, I'd really like to stay away from the DbCommand approach. As soon as you start wrapping JDBC with another API the complexity can grow very quickly. In fact, we had a DbCommand approach in Jive Messenger originally and it was a bear to use from an API perspective. Another way we could approach the DbCommand logic is to implement some error handling in the ConnectionWrapper. I would propose:
  • Detect errors in ConnectionWrapper so that the connection doesn't get put back into the pool when "closed".
  • Have a health check monitor process that explicitly tries to execute a command in the database for each connection on some periodic basis.

I think the combination of those two things would handle the majority of cases for determining bad connections.


Elias Baixas - 03/03/07 09:27 AM
Hello, I've done this quickhack to workaround this issue, you can find it in
http://www.igniterealtime.org/forum/thread.jspa?threadID=25011&tstart=0
it uses the "pessimistic" approach and times_to_retry is set to 10.

best regards,

Elias


Daniel Henninger - 01/08/08 02:12 PM
FYI, actual reference in last comment is now at: http://www.igniterealtime.org/community/thread/25011

No clue where/what the original is though. =(


Daniel Henninger - 01/08/08 02:20 PM

Daniel Henninger - 01/11/08 04:22 PM
Matt provides the following ideas of possible 3rd party replacements for our home brew setup:

http://commons.apache.org/dbcp/
http://sourceforge.net/projects/c3p0
http://proxool.sourceforge.net/

So far I'm quite fond of the last one.


Daniel Henninger - 01/12/08 07:55 PM
We have migrated to proxool in lieu of our home brew setup. Looks to be very mature, working like a charm so far, I'm very impressed.

Matt Tucker - 01/13/08 08:01 PM
Can we show some simple additional stats on the database page? Check out the following:

http://proxool.sourceforge.net/admin.html

It would be super sweet to show the current info as well as a graph over time. Maybe the graphing would be in the future though.


Matt Tucker - 01/13/08 08:03 PM
One more thing: it would be awesome to test the following:

Start MySQL
Start Openfire
Verify everything working
Stop Mysql
Verify things not working due to no db
Start Mysql
See if Openfire can recover due to improved connection pool logic


Daniel Henninger - 01/13/08 10:04 PM
Hi Matt! I actually already am including the web servlet! Though now that you mention it, I might need to make a config tweak to trigger the graphing to actually occur. Good test suggestion!

Matt Tucker - 01/14/08 01:18 AM
Rather than include the web servlet, it seems like it would be much better to incorporate that same data into the existing database page.

Daniel Henninger - 01/14/08 10:24 AM
Hrm. Good point. I had planned on doing that "in the future" but might as well do it now instead of using the servlet. =)

BTW I ran your test and wow. it didn't miss a beat! (I took down MySQL, almost immediately did it detect problems, things broke, etc, and then I started MySQL back up and it immediately was happy again) I also did a rude test -> i kill -9'd mysql out from under openfire and it didn't detect the mysql shutdown immediately in terms of .. immediately show that mysql died, but the first database request caught it. (and I imagine the every 30 second check would have seen it) Brought up MySQL, all is well again immediately.

I'm quite pleased with it's behavior!

I'll look into improving the servlet.

BTW, looks like the ClearSpace folk aren't particularly interested for now, sounds like they solved their problems with their stuff. Bruce did tell me that JNDI has some sort of it's own connection pooling, but I'm not sure it has the powerful connection testing that this has. (proxool is testing the connection before and after it's used to see if it's still a valid connection, it seems to be very robust in that sense ... the caveat of course is that' it's doing extra "SELECT 1" commands, but I explicitly picked something super quick like that so it shouldn't be a noticable performance hit.

Speaking of which, at some point I'd like to investigate where we could use some indexes and such to improve our database interactions altogether. Guus found at least one instance where we were missing a possible important index that sped things up quite a bit.


James Mc Millan - 03/26/08 08:28 AM
Hi Guys

Just to let you know, we upgraded to 3.5.0rc1 from 3.3.2 and we had major issues with the Proxool connection pooling, and we had to give it an unreasonably high number of connections in order for it to work (most of the time). It seems to hold onto connections for longer than it should and not release them.

The good news is switching to c3p0 (an easy implementation of ConnectionProvider) solved the problem for us. So, if anyone else has problems, switching to c3p0 is simple enough. At the moment though, we don't have statistics on the database page, which throws an exception as it is hardcoded to look for Proxool.

Thanks
James


Daniel Henninger - 03/26/08 08:33 AM
There's an improvement coming to the current pool implemention in RC2.