Bob McWhirter | 06:02pm UTC, 23 December 2008
Background on Clustering HTTP
JBoss clusters, as we all know. You can fire up a farm of AS5 worker nodes, and they'll find each other (sometimes with the help of a Gossip router). They'll share HTTP sessions and such, through the magic of JBoss Cache.
But then you end up with a farm of distinct HTTP listeners out there, each with their own IP address. So we jam a proxy out front, normally, which dispatches requests to any one of the workers on the farm.
But then your proxy has to know all about the workers. With many generic solutions, that means maintaing a list of worker nodes. Normally by a human or a ball of bash scripts.
Let me introduce JBoss mod_cluster, though, which goes a long way to making clustering a simple, happy, joyous event.
While JBoss mod_cluster has a few different modes of operation, from standalone to HA, using HTTP or AJP to chat with the back-ends, we'll be looking at the top-of-the-line implementation, since I get to play with all the big toys.
First, as the name implies, mod_cluster is a module for Apache httpd. In fact, it's a set of modules that work with mod_proxy and mod_proxy_ajp.
The Apache httpd configuration can be super-simple:
I turn off proxy server advertising because multicast isn't available to me (see below).
In the HA mode, mod_cluster takes advantage of the fact that your cluster knows itself. A worker is responsible for providing the entire cluster view to the front-end httpd processes. It also informs the cluster itself of the view of the proxy front-ends.
Then mod_cluster pipes requests through mod_proxy_ajp dynamically to find their way to a worker. You don't have to maintain worker lists yourself or through bash voodoo any more. The front-end chats AJP with the workers, so things flow efficiently. Add nodes, remove nodes, have nodes crash (never!), and the proxy responds.
The source distribution seems to include httpd and a lot of its own dependencies. I was able to compile just the modules which work fine in a stock Fedora-10 spin of apache httpd. I'll be publishing an RPM shortly, also.
Some of the magic involves multicast, which I've decided is a tool of the devil. By default, it seems disabled in VMWare, and is permanently disabled on EC2. So it might as well not exist.
With mod_cluster, the default is for the proxies to advertise over multicast so that workers can find them initially. This is awesome if your environment supports it. Mine doesn't.
But the AS5 portion of mod_cluster realizes that sometimes you can't use multicast. So you can provide a list of proxies in the mod-cluster service within AS5. I've decided to go with property substitution and modifying my JBoss boot script to check for $JBOSS_PROXY_LIST in /etc/jboss-as5.conf. This gets passed on in and consumed at AS5's boot time. Basically:
<!-- Configure this node's communication with the load balancer -->
<bean name="HAModClusterConfig" class="org.jboss.modcluster.config.ha.HAModClusterConfig" mode="On Demand">
<!-- Comma separated list of address:port listing the httpd servers
where mod_cluster is running. -->
In an EC2 environment, my puppet recipe will grab the proxy list from the boot metadata and reset my /etc/jboss-as5.conf appropriately, perhaps.
You may argue that we've just replaced the maintenance of a worker list with the maintenance of a proxy list. Which is somewhat true. But the proxy list tends to be smaller, more static, and less crashy. Workers tend to grow, shrink and crash more often. And if you do have multicast available to you, mod_cluster will sprinkle magic end-to-end, and no list maintenance is required at all.
Overall, mod_cluster is definitely another useful tool for running Java apps in scalable environments.