New Ehcache Replication Mechanism

2010's benchmark and optimization is reaching the end, it is time to write something about what we did in last few months.

Today we will talk about an EE-only feature, a new ehcache cluster replication mechanism. Since this is only available in EE version portal, i won't talk about impl detail too much. Just some general concepts about what is the problem and how we fix it.

Ehcache supports cluster, by default it uses RMI replication mechanism, which is a point to point communication graph. As you can guess this kind of structure can not scale for large cluster with many nodes. Because each node has to send out N-1 same event to other nodes, when N is too big, this will become a network traffic disaster.



To make things even more worse, ehcache creates a replication thread for each cache entity. In a large system like Liferay Portal, it is very easy to have more than 100 cache entities, this means 100+ cache replication threads. Threads are expensive, because they take resource(memory and cpu power). But these threads are most likely sleeping over 99% time, since they only start to work when a cache entity needs to talk to remote peers. Without regard to thread heap memory(Because this is application depended), just consider stack memory footprint of that 100+ threads. By default on most platform, the thread stack size is 2MB, that means 200+MB. If you include the heap memory size, this number may even reach 500MB(This is just for one node!). Even memory chips are cheap today, we still should not waste them! And massive threads can also cause frequent context switch overhead.

We need to fix both the 1 to N-1 network communication and massive threads bottleneck.

Liferay Portal has a facility called ClusterLink which is basicly an abstract communication channel, and the default impl is using JGroups' UDP multicast to communicate.

By using ClusterLink we can fix the 1 to N-1 network communication easily.
To reduce the replication thread number, we provide a small group of dispatching threads. They are dedicated for delivering cache cluster event to remote peers. Since all cache enities' cluster event will go through one place to network, this gives us a chance to do coalesce, if two modification to the same cache object are close enough, we only need to notify remote peers once to save some network traffic.



(Newer version ehcache supports JGroups replicator, it can also fix the 1 to N-1 network communication, but it cannot fix the massive threads problem and cannot do coalesce.)

For EE customer who is in interested in this feature, you can contact our support engineers for more detail info.

Blogs
Good article. Some years ago I have seen similar problem on application which was using clustered ehcache. Ehcache clustering basically can limit the size of cluster. If you are doing jGroups right are can get top of that.

As I look source code and portal.properties of LR 6, I can see signs of clusterlink. So is it going to be part of LR 6 final?
Yes, it is part of 6, but ee only. For ee customers, our support guys will provide detail info to turn on this feature in big cluster env.
Hi Zhou, it wolud be interesting find a detailed description about features of EE. Actually it's not very clear what are all the technical differences between CE and EE.

Thanks,
D.
Hey Denis, for a general comparison you can refer http://www.liferay.com/products/liferay-portal/download/ce-vs-ee, if you want to know more detail info, please contact sales@liferay.com, our sales guys can provide a detail documentation.
This is a great a improvement.
I have not seen set of properties exposed for ClusterLink but reading this article - if we coalesce does this mean we will not be sending real time cache replication instructions to peers ? There will be some delay to collect "enough" number of updates to go together ?
For clusterlink setting, search for cluster.link.* in portal.properties. For the new ehcache cluster configuration setting, contact our support guys for instructions.

We only do coalesce when the network can not deliver the cache change events fast enough. Which means event generating speed is higher than event replication speed. So the events will queue up, which gives us a chance to do coalesce. In this case, you can not update peer in real time anyway, the coalesce can actually reduce the traffic to help the system recover from overload. We are not intendedly doing a buffer for coalesce, as you said that will cause a synthetic delay which is not a desired behavior. In real scenario, coalesce can not improve performance too much(since usually the network is not that busy), but if your system generates cache change events unevenly with some randomly peaks, coalesce can smooth the peak values. It is helpful for keeping system stable.
What about Terracotta? What drawbacks are there to using that?
Terracotta is kind of heavy comparing to simply using ehcache. If you already have used Terracotta in your product, that would be fine to use it do cache replication automatically, but if not, it is not worthy to add it just for cluster cache. Ehcache will do just fine.
It is interesting, even in the versions 5.1.2 there was possible to configure ehcache to use multicasts and it seems to be working - also in CE. What is so the difference of the clustering and linking in comparison to the multicast-configured ehcache? Is it just the reduction of the numner of threads or brings it also some other benefits?
By default, CE's ehcache clustering is using RMI replication. Yes, of course you can configure it to use multicast or JGroups. The differences between that and this new mechanism are replication thread numbers, coalesce and stability. If you only have very few number caches and they are not changing frequently, that would not be a problem. But for a large system with heavy load, those threads will eat tons of memory and a lot of "duplicated" cache evict events may be fired out under peak loads. Those multicast and JGroups configuration are complex and error prone, if you do not really know them, you may make a wrong configuration easily. To use this new mechanism, all you need to do is to apply a patch and turn on a few flags. It is very easy, you won't have a chance to make mistakeemoticon
Hi clusterlink seems available to CE with the good properties in portal-ext.properties

any hint about the actual differences between EE and CE ?
We tried to do the settings needed for Cluster Link in EE, but that doesn't seem to work that well. We are facing Performance issues and need to clear Cache frequently. Is anyone currently using this successfully ?
Please contact our support guys, there was a known issue for cluster wide cache eviction. And it is fixed already, if you are seeing the same problem, support people can provide you a patch.
Hi Shuyang,

We are using Liferay 6.0.SP1 EE on Jboss. Can you provide the steps to use EHcache with Clusterlink ? I updated portal-ext.properties with below , but still liferay clustering is not working..


net.sf.ehcache.configurationResourceName=/myehcache/hibernate-clustered.xml
ehcache.multi.vm.config.location=/myehcache/liferay-multi-vm-clustered.xml

cluster.link.enabled=true

multicast.group.address["cluster-link-control"]=239.255.100.11
multicast.group.port["cluster-link-control"]=23301

multicast.group.address["cluster-link-udp"]=239.255.100.12
multicast.group.port["cluster-link-udp"]=23302

multicast.group.address["cluster-link-mping"]=239.255.100.13
multicast.group.port["cluster-link-mping"]=23303

cluster.link.channel.properties.control=UDP(bind_addr=172.16.18.70;mcast_addr=${multicast.group.address["cluster-link-control"]};mcast_port=${multicast.group.port["cluster-link-control"]};ip_ttl=8;mca
st_send_buf_size=150000;mcast_recv_buf_size=80000)emoticonING(timeout=2000;num_initial_members=3):MERGE2(min_interval=5000;max_interval=10000):FD_SOCKemoticonERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(gc_lag=50;retran
smit_timeout=300,600,1200,2400,4800;max_xmit_size=8192):UNICAST(timeout=300,600,1200,2400):pbcast.STABLE(desired_avg_gossip=20000):FRAG(frag_size=8096;down_thread=false;up_thread=false):pbcast.GMS(join_ti
meout=5000;join_retry_timeout=2000;shun=false;print_local_addr=true)
cluster.link.channel.properties.transport.0=UDP(bind_addr=172.16.18.70;mcast_addr=${multicast.group.address["cluster-link-udp"]};mcast_port=${multicast.group.port["cluster-link-udp"]};ip_ttl=8;mcast_s
end_buf_size=150000;mcast_recv_buf_size=80000)emoticonING(timeout=2000;num_initial_members=3):MERGE2(min_interval=5000;max_interval=10000):FD_SOCKemoticonERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(gc_lag=50;retransmit
_timeout=300,600,1200,2400,4800;max_xmit_size=8192):UNICAST(timeout=300,600,1200,2400):pbcast.STABLE(desired_avg_gossip=20000):FRAG(frag_size=8096;down_thread=false;up_thread=false):pbcast.GMS(join_timeou
t=5000;join_retry_timeout=2000;shun=false;print_local_addr=true)

cluster.link.channel.system.properties=\

jgroups.bind_addr:172.16.18.70,\

jgroups.mping.mcast_addr:${multicast.group.address["cluster-link-mping"]},\
jgroups.mping.mcast_port:${multicast.group.port["cluster-link-mping"]},\
jgroups.mping.ip_ttl:8

cluster.link.autodetect.address=www.google.com:80
Hey Vijay,

The clusterlink based ehcache replication feature is only available for EE customer, because it is using an EE only plugin. You can not make it working without the plugin.

If you have an EE license, you can contact our support people to ask for detail configuration steps.
Thanks a lot Shuyang. One more question..could be a dumb question..i'm new to liferay.

In liferay there is property "cl­uster.link.enabled=true" - to my understading this is for Lucene Replication.

Cluster Link mentioned in above article is for Ehcache replication (like RMI replication, Jgroups replication)

So, Cluster Link for Ehcache and Lucene replication are two seperate things ??
Since name are same, I'm little confused here with term "Cluster Link"
ClusterLink is a way to communicate among cluster nodes, it is basically a communication channel. You can use it for any purpose. As an example the clusterlink based ehcache replication is using ClusterLink to do cache replication.

Other examples include lucene replication writing and cluster wide RPC invoking.

Please be aware, just the ehcache replication plugin is EE only, all other stuff are available for CE users.
Hi, Shuyang,

You said that "Liferay Portal has a facility called ClusterLink which is basicly an abstract communication channel, and the default impl is using JGroups' UDP multicast to communicate."

I am using Liferay portal 6.0 SP1 EE version. Does the cluster link also support JGroup TCP unicast? If yes, how can I enable it? Thanks!
Before you start to do this, you have to realize this will cause performance regression on large cluster with many nodes, due to a multi tcp message sending.

1)You need to add these to your portal-ext.properties
cluster.link.enabled=true
ehcache.cluster.link.replication.enabled=true
cluster.link.channel.properties.control=tcp.xml
cluster.link.channel.properties.transport.0=tcp.xml

2)Add peer ips info to your JVM system variable, for example, you can add it to your setenv.sh/bat for tomcat
JAVA_OPTS="$JAVA_OPTS -Djgroups.tcpping.initial_hosts=192.168.1.200[7800],192.168.1.200[7801]"
Replace the IP and Port info to fit your env.

3)Deploy the ehcache-cluster-web.war

The tcp.xml is inside jgroups.jar's root directory(default setting). If you need to customize it, copy it out modify it, and then point your portal-ext.properties to the right file.