Clustering Questions

So I’m attempting to implement clustering with JGroups and JBossCache.

Two questions that I have not found an answer for:

Do the servers require a shared database?

Is the existing “Clustering” admin page a dummy? I can’t seem to select the “Enabled” radio button on that page. If so, can I replace it with my own clustering admin page? What is the proper “pageID” to override that page?

Thanks in advance.

UPDATE: I’ve posted the initial code and started a new discussion… See:

http://www.igniterealtime.org/community/thread/38847

Hey Tom,

Do the servers require a shared database?

Yes. No matter which clustering solution you rely on, you will always need to have the same database. Moreover, if you want a solution with no single point of failure then your database should also be clustered.

Is the existing “Clustering” admin page a dummy?

Although I created that page I already forgot its details. Anyway, I’m almost sure that the page has a main page that can potentially be reused whern using other clustering solutions and then there is an advanced page that is 100% Oracle Coherence specific. In other words, you might need to modify the main page and get rid of the advanced page.

Regards,

– Gato

Hi Gaston, Thanks for your reply.

I think I’m not seeing a button to enable clustering b/c it says “clustering is not available on this system. Install a plugin.” My plugin did not install correctly the first time (apparently my ‘openfire.xml’ was supposed to be called ‘plugin.xml.’)

Anyway, now that the plugin is loading, I’m getting some activity, but I can’t seem to get OpenFire to use my clustering…

2009.06.09 11:02:40 [org.jivesoftware.util.cache.CacheFactory.startClustering(CacheFactory.java:568 )] Unable to start clustering - continuing in local mode
java.lang.ClassNotFoundException: com.jivesoftware.util.cache.CoherenceClusteredCacheFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.jivesoftware.util.cache.CacheFactory.startClustering(CacheFactory.java:562)
at org.jivesoftware.openfire.cluster.ClusterManager.startup(ClusterManager.java:25 7)
at org.jivesoftware.openfire.cluster.ClusterManager.setClusteringEnabled(ClusterMa nager.java:307)
at com.enernoc.rnd.openfire.cluster.JBossClusterPlugin.initializePlugin(JBossClust erPlugin.java:57)
at org.jivesoftware.openfire.container.PluginManager.loadPlugin(PluginManager.java :448)

Here the code in JBossClusterPlugin.initializaPlugin:

    JiveGlobals.setProperty(CacheFactory.CLUSTERED_CACHE_PROPERTY_NAME,
         "com.com.enernoc.rnd.openfire.cluster.cache.ClusteredCacheFactory");
    ExternalizableUtil.getInstance().setStrategy( new ExternalUtilStrategy() );
    XMPPServer.getInstance().getRoutingTable().setRemotePacketRouter( new ClusterPacketRouter() );
    XMPPServer.getInstance().setRemoteSessionLocator( new ClusteredSessionLocator() );
   
    XMPPServer.getInstance().setNodeID( NodeID.getInstance( masterWatcher.getLocalAddress().toString().getBytes() ) );       
   
    ClusterManager.setClusteringEnabled(true);
    ClusterManager.startup();
    CacheFactory.startClustering();

So – am I setting the wrong property? Should I not be attempting to enable/ startup clustering from the plugin? I’m setting cache property name to my cache factory class, so I’m not sure why OpenFire is still looking for the Coherence plugin…

Thanks in advance.

Hmm… Okay this is interesting… I’m tracing through the OpenFire source code and I came across this bit:

  • CacheFactory static initializer
  • CacheFactory.startClustering()

Now, correct me if I’m wrong, but this would suggest that once that static initializer block is run, it is impossible to change the ClusteredCacheFactory class. So the only way to solve this is to either depend on classloading order, or add a JiveGlobals property and then restart the servers, right?

While I’m on the topic, can I suggest that both ClusterManager and CacheFactory should not be static at all, but instances, retrievable via the XMPPServer instance. ugh… Not to complain, but so much use of statics and Class.forName really smells… OpenFire could really benefit from a dependency injection framework

So far, I’m able to get my clustering plugin to initialize, but I can’t log in to the admin console; I get this exception in the browser when I attempt to log in:

java.lang.NoClassDefFoundError: Could not initialize class org.jivesoftware.openfire.lockout.LockOutManager$LockOutManagerContainer
     at org.jivesoftware.openfire.lockout.LockOutManager.getInstance(LockOutManager.java:58)
     at org.jivesoftware.openfire.auth.AuthFactory.authenticate(AuthFactory.java:154)

Any idea where that would come from ?? It sounds like a classloading issue…

EDIT: This seems to have gone away after restarting the server(s)… Working on other issues now…

Message was edited by: tomstrummer

Suggest searching the jars for this class and make sure it’s in the classloader path? I am interested in helping this effort, but one question is why implement using JGroups and JBossCache as opposed to zookeeper (http://hadoop.apache.org/zookeeper/).

Actually, I started with Zookeeper, since I’ve used the framework before. I got most of the way though the implementation, then realized that the synchronous tasks which require a response are… a little awkward in ZK. I imagine the implementation would involve adding a task ID that gets sent back with a ‘response’ node – which isn’t ideal.

Plus, cacheing with ZK felt a little awkward since the key needs to be serialized and deserialized as well as the value… But node names can only be strings – so that means having two node trees (one for serialized keys, the other for values) or something similar.

Although I’m not terribly familiar with JGroups, I found out that their “views” can be used for master election – i.e. view ordering is consistent, so the first address can be considered the master. And you get a “view changed” event, i.e. if the master leaves the cluster. JGroups also has support for synchronous as well as async multicast messaging, and JBossCache is a natural fit for Openfire’s cache interface.

I’m having a little trouble now getting Openfire to use my clustering plugin, but it looks like I’m on the right track. Will post some source code soon.

Hey Tom,

Glad to hear that you are making good progress. You seem to be on the right track.

– Gato

How can I collaborate? My knowledge is dated on how to proceed on collaborative dev. Things I am aware of are sf.net, launchpad, darcs, bzr and so on outside of ignite’s source control system (which we have no commit access). Basically all I need is a place to grab your updates, make my own, and make it available.

Give me a little time to make sure it’s at least somewhat functional Then I’ll set up a project page with source code access.

I am also intereted to help in this.

After grabbing the latest tarball (http://github.com/tomstrummer/openfire-jboss-clustering/tree/master#), extracting it and running mvn build install I get:


T E S T S

Running com.enernoc.rnd.openfire.cluster.MultiUserTest
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.ServiceDiscoveryManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.XHTMLManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.muc.MultiUserChat
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.filetransfer.FileTransferManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.LastActivityManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.ServiceDiscoveryManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.XHTMLManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.muc.MultiUserChat
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.filetransfer.FileTransferManager
Error! A startup class specified in smack-config.xml could not be loaded: org.jivesoftware.smackx.LastActivityManager
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.024 sec <<< FAILURE!
Running com.enernoc.rnd.openfire.cluster.cache.JBossCacheTest
Aug 18, 2009 9:44:41 AM org.jboss.cache.jmx.PlatformMBeanServerRegistration registerToPlatformMBeanServer
INFO: JBossCache MBeans were successfully registered to the platform mbean server.
Aug 18, 2009 9:44:41 AM org.jgroups.JChannel init
INFO: JGroups version: 2.7.0.GA
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 1.32 sec <<< FAILURE!

Results :

Tests in error:
com.enernoc.rnd.openfire.cluster.MultiUserTest
testSimpleMapOperations(com.enernoc.rnd.openfire.cluster.cache.JBossCacheTest)
testSimpleMapOperations(com.enernoc.rnd.openfire.cluster.cache.JBossCacheTest)

Tests run: 3, Failures: 0, Errors: 3, Skipped: 0

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] There are test failures.

Please refer to /usr/local/src/tomstrummer-openfire-jboss-clustering-35c8913a255cd9663c19c8899f 3b46e041552ab5/target/surefire-reports for the individual test results.
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 35 seconds
[INFO] Finished at: Tue Aug 18 09:44:41 PDT 2009
[INFO] Final Memory: 14M/28M
[INFO] ------------------------------------------------------------------------

I’ve already done:

mvn install:install-file -DgroupId=org.igniterealtime.openfire -DartifactId=openfire -Dversion=3.6.4 -Dpackaging=jar -DgeneratePom=true -Dfile=/usr/share/openfire/lib/openfire.jar

required to build the maven-openfire plugin before this clustering plugin.

I don’t have much experience building Java stuff (this is my first), so maybe there’s a CPAN-ish ‘install this class bundle’ command I should be using for Maven, but I would have expected the openfire.jar to have the Smack classes.

The last activity shown on GitHub shows June 28th 2009. Are you still working on this code? It looks very promising

Thanks,

–John

The test failures may be because your local network configuration is not congruent with the test settings. See

http://github.com/tomstrummer/openfire-jboss-clustering/blob/35c8913a255cd9663c1 9c8899f3b46e041552ab5/src/main/resources/udp.xml

and

http://github.com/tomstrummer/openfire-jboss-clustering/blob/35c8913a255cd9663c1 9c8899f3b46e041552ab5/src/main/resources/udp2.xml

The relevant values: bind.address, mcast_addr, mcast_port, should be configurable via system properties, but the mcast_addr and port need to be different between those two files. Obviously, I’ve been using the default values since they’re correct for my local network settings. The easiest option to get started is to just insert the correct values into those property files.

If you can’t infer what the correct values should be for your network, you’ll have to dig into the JGroups documentation to figure it out unfortunately.

See: http://www.jgroups.org/manual/html/index.html

and http://www.jgroups.org/manual/html/protlist.html#d0e2901

You can also look at the unit test output under target/surefire-reports/*.text to see what the error messages were.

As for the status of the project: it was developed and minimally tested as part of a research effort. It’s not being worked on at the moment, as other priorities have demanded my attention. But the intent is to eventually use it in production so it won’t be abandoned (unless possibly if the OpenFire project really dies, since its health still seems a bit questionable.) Hope this helps.

If you have further questions, you might want to start another thread. This one is getting long.