The Hazelcast plugin adds support for running multiple redundant Openfire servers together in a cluster. By running Openfire as a cluster, you can distribute the connection load among several servers, while also providing failover in the event that one of your servers fails. This plugin is a drop-in replacement for the original Openfire clustering plugin, using the open source Hazelcast In-Memory Data Grid data distribution framework in lieu of an expensive proprietary third-party product.
The current Hazelcast release is version 3.9.2.
XMPP is designed to scale in ways that are similar to email. Each Openfire installation supports a single XMPP domain, and a server-to-server (S2S) protocol as described in the specification is provided to link multiple XMPP domains together. This is known as federation. It represents a powerful way to "scale out" XMPP, as it allows an XMPP user to communicate securely with any user in any other such federated domain. These federations may be public or private as appropriate. Federated domains may exchange XMPP stanzas across the Internet (WAN) and may even discover one another using DNS-based service lookup and address resolution.
By contrast, clustering is a technique used to "scale up" a single XMPP domain. The server members within a cluster all share an identical configuration. Each member will allow any user within the domain to connect, authenticate, and exchange stanzas. Clustered servers all share a single database, and are also required to be resident within the same LAN-based (low latency) network infrastructure. This type of deployment is suitable to provide runtime redundancy and will support a larger number of users and connections (within a single domain) than a single server would be able to provide.
For very large Openfire deployments, a combination of federation and clustering will provide the best results. Whereas a single clustered XMPP domain will be able to support tens or even hundreds of thousands of users, a federated deployment will be needed to reach true Internet scale of millions of concurrent XMPP connections.
To create an Openfire cluster, you should have at least two Openfire servers, and each server must have the Hazelcast plugin installed. To install Hazelcast, simply drop the hazelcast.jar into $OPENFIRE_HOME/plugins along with any other plugins you may have installed. You may also use the Plugins page from the admin console to install the plugin. Note that all servers in a given cluster must be configured to share a single external database (not the Embedded DB).
By default during the Openfire startup/initialization process, the servers will discover each other by exchanging UDP (multicast) packets via a configurable IP address and port. However, be advised that many other initialization options are available and may be used if your network does not support multicast communication (see Configuration below).
After the Hazelcast plugin has been deployed to each of the servers, use the radio button controls located on the Clustering page in the admin console to activate/enable the cluster. You only need to enable clustering once; the change will be propagated to the other servers automatically. After refreshing the Clustering page you will be able to see all the servers that have successfully joined the cluster.
Note that Hazelcast and the earlier clustering plugins (clustering.jar and enterprise.jar) are mutually exclusive. You will need to remove any existing older clustering plugin(s) before installing Hazelcast into your Openfire server(s).
With your cluster up and running, you will now want some form of load balancer to distribute the connection load among the members of your Openfire cluster. There are several commercial and open source alternatives for this. For example, if you are using the HTTP/BOSH Openfire connector to connect to Openfire, the Apache web server (httpd) plus the corresponding proxy balancer module (mod_proxy_balancer) could provide a workable solution. Some other popular options include the F5 LTM (commercial) and HAProxy (open source), among many more.
A simple round-robin DNS configuration can help distribute XMPP connections across multiple Openfire servers in a cluster. While popular as a lightweight and low-cost way to provide basic scalability, note that this approach is not considered adequate for true load balancing nor does it provide high availability (HA) from a client perspective. If you are evaluating these options, you can read more here.
The process of upgrading the Hazelcast plugin requires a few additional steps when compared with a traditional plugin due to the cross-server dependencies within a running cluster. Practically speaking, all the members of the cluster need to be running the same version of the plugin to prevent various errors and data synchronization issues.
NOTE: This upgrade procedure is neat and tidy, but will incur a brief service outage.
NOTE: Using this approach you should be able to continue servicing XMPP connections during the upgrade.
NOTE: Use this approach if you only have access to the Openfire console. Note however that users may not be able to communicate with each other during the upgrade (if they are connected to different servers).
There are several configuration options built into the Hazelcast plugin as Openfire system properties:
The Hazelcast plugin uses the XML configuration builder to initialize the cluster from the XML file described above. By default the cluster members will attempt to discover each other via multicast at the following location:
... <join> <multicast enabled="false"/> <tcp-ip enabled="true"> <member>of-node-a.example.com:5701</member> <member>of-node-b.example.com:5701</member> </tcp-ip> <aws enabled="false"/> </join> ...
Please refer to the Hazelcast reference manual for more information.
Hazelcast is quite sensitive to delays that may be caused by long-running GC cycles which are typical of servers using a default runtime JVM configuration. In most cases it will be preferable to activate the concurrent garbage collector (CMS) or the new G1 garbage collector to minimize blocking within the JVM. When using CMS, you may be able to counter the effects of heap fragmentation by using JMX to invoke System.gc() when the cluster is relatively idle (e.g. overnight). This has the effect of temporarily interrupting the concurrent GC algorithm in favor of the default GC to collect and compact the heap.
In addition, the runtime characteristics of your Openfire cluster will vary greatly depending on the number and type of clients that are connected, and which XMPP services you are using in your deployment. However, note that because many of the objects allocated on the heap are of the short-lived variety, increasing the proportion of young generation (eden) space may also have a positive impact on performance. As an example, the following OPENFIRE_OPTS have been shown to be suitable in a three-node cluster of servers (four CPUs each), supporting approximately 50k active users:
OPENFIRE_OPTS="-Xmx4G -Xms4G -XX:NewRatio=1 -XX:SurvivorRatio=4 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintPromotionFailure"
This GC configuration will also emit helpful GC diagnostic information to the console to aid further tuning and troubleshooting as appropriate for your deployment.