Question

I'm relatively new to Java EE and have already began to hear about the many different types of systems that can be clustered:

  • Virtual Machines (i.e. "that appliance is a cluster of VMs...")
  • Application servers, such as Tomcat, JBoss or GlassFish (i.e. "We're running clustered JBoss...")
  • Clustering APIs like Terracotta
  • Databases, like Oracle ("clustered database")
  • Cloud applications ("A cloud is basically a cluster...")

Wikipedia defines "clustering" as:

A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system.

I'm wondering how clustering works for each of these "cluster types/methods" (mentioned above) and how they relate to one another.

For instance, if one could benefit from having a clustered application, he/she would probably put them on a clustered app server and then throw a cluster manager into the mix (again, like Terracotta).

But because the phrase "clustering" seems to be used in vague/ambiguous ways, I'm not seeing how each of these ties into the others ones, or if they even do. Thanks in advance to any brave StackOverflowers out there who can help me make sense of this interwoven terminology!

Was it helpful?

Solution

To me, clustering implies a number of qualities to a system but it boils down to fault tolerance -- server, networking, and data persistence. There are both loosely and tightly coupled systems and all flavors in between. Tightly coupled systems have the clustering performed at level close to the hardware. Many of the old clustering systems were more tightly coupled with the applications often not recognizing that they were clustered.

Loosely coupled systems are the norm these days with a large degree of the fault tolerance accomplished at a software level entirely. Systems in the cluster only share network connectivity to be able to accomplish fault tolerance. Usually there are some specialized load balancers which route requests to the various cluster servers using specialized hardware (sometimes just software) to accomplish this.

All of the examples you mentioned have some sort of "clustering". It is going to take a very long answer to describe the details about how each of the architectures accomplish this. For me, the differences are what comes "for free" when you use the architecture, and how much work you will have to do to get it to work optimally.

How you mix and match the solutions you've mentioned depends on what your architecture looks like and your requirements. You can have a Terracotta store for local high speed persistence and the cloud for the rest. You can use Glassfish as your application server and utilize Terracotta as your persistence layer.

Here are my thoughts about the technologies you listed:

  • Cloud applications ("A cloud is basically a cluster...")

Cloud applications are the easiest to work with obviously. Your only job from an architecture standpoint is to pick a good cluster provider. Certainly Amazon and Google will do it "right" in terms of fault tolerance and data integrity. There are many other players that probably do it "good enough" and are cheaper. You program to their APIs which come with their own set of limitations and expenses. One problem with cloud applications is that it most likely will be very hard to switch to a new one. Again, you might have some [large] portion of your application running on cloud servers and have some local systems for your higher latency requirements. The trend is to put most production functions in the cloud or at least start that way until you get too big or need some services they can't provide.

  • Clustering APIs like Terracotta
  • Databases, like Oracle ("clustered database")
  • JBoss

These 3 systems provide their own clustering capabilities. They may require you to do a lot of machine and service layer configurations to get the system running well in a production environment. I hear good things about Terracotta which is a distributed persistence layer. I've used Jgroups a lot which is under Jboss and it can be tricky to get running right but Jboss may also have some good default configurations/documentation. Oracle is most likely going to be hardest to cluster right. DBA's make a lot of money tweaking Oracle configurations.

  • Virtual Machines (i.e. "that appliance is a cluster of VMs...")
  • Application servers, such as Tomcat, GlassFish

These are the most amorphous to define in terms of clustering. Some VMs are considered "clustered" in that they share networking hardware and power backplanes but are really not clusters when compared to cloud computing certainly. As mentioned, there are some clustered hardware solutions that are very custom and will require a lot of specific domain knowledge to get running well.

I have very little experience with application servers such as Tomcat and Glassfish. We have our own clustering software on top of Jgroups and run Jetty entirely. Application servers are not, in themselves, "clustered" but packages such as Jboss and Terracotta run on top of them to provide clustering and they have internal projects which have clustering software written for them.

Hope some of this helps.

OTHER TIPS

Here's a quick whack at it. How you cluster depends on what your goals are. Here are some thoughts that also tie in to GlassFish.

  • A cluster enables multiple instances to be managed as one since they share a common configuration. If you make a change to a configuration, such as defining a new resource, then all instances that belong to a cluster inherit that change. Deploying an application to a cluster deploys it to all instances of that cluster.
  • A cluster provides service availability. If one instance fails, deployed applications are still available on other instances.
  • A cluster can offer session availability. If an instance dies while a user has items in their shopping cart, then another instance can take ownership of handling that user's session such that the shopping cart contents are still there. The user never knows a backend server has failed.
  • With GlassFish, HTTP session state can be managed by GlassFish (built-in), delegated to a coherence grid, or the application can manage state itself (using terracotta, database, etc). The benefit of using the built-in capability is that it works out of the box and has gone through stress testing, QA, etc. The benefit of externalizing is that you can potentially get better scalability since you decouple session management and application logic. Externalizing lets the JVM focus on executing business logic, and uses less HEAP space since backup sessions exist elsewhere. Oracle has tested / QA'd externalizing to the Coherence Grid, and is a formal feature of the commercial Oracle GlassFish Server. If you roll your own via database, then you need to manage & QA itH yourself.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top