LF 6.2 CE GA6 ehcache clustering not working on Amazon EC2

掲示板

7年前に Luis Ledezma によって更新されました。

LF 6.2 CE GA6 ehcache clustering not working on Amazon EC2

New Member 投稿: 1 参加年月日: 11/03/02 最新の投稿

Hello,
I've been trying to setup ehcache unicast replication in an amazon EC2 environment, for LF 6.2 CE GA6. I followed all the documentation available in other sites and the LF forum for Jgroups unicast configuration, such as this https://web.liferay.com/community/forums/-/message_boards/message/30737580
I am using S3_PING for jgroups and it appears to be working fine. Extracted tcp.xml from jgroups.jar, added singleton_name=liferay and the S3 ping configuration. Saved as tomcat/conf/unicast.xml

These are the ehcache clustering properties:

cluster.link.enabled=true
cluster.link.autodetect.address=dap-qa.rds.amazonaws.com:5432
cluster.link.channel.properties.control=${catalina.base}/conf/unicast.xml
cluster.link.channel.properties.transport.0=${catalina.base}/conf/unicast.xml
org.quartz.jobStore.isClustered=true
lucene.replicate.write=true
ehcache.bootstrap.cache.loader.factory=com.liferay.portal.cache.ehcache.JGroupsBootstrapCacheLoaderFactory
ehcache.cache.event.listener.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory
ehcache.cache.manager.peer.provider.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory
ehcache.multi.vm.config.location.peerProviderProperties=file=${catalina.base}/conf/unicast.xml
net.sf.ehcache.configurationResourceName.peerProviderProperties=file=${catalina.base}/conf/unicast.xml
#Debugging
ehcache.statistics.enabled=true
cluster.executor.debug.enabled=true
web.server.display.node=true

All the documentation I found has ehcache.cache.manager.peer.provider.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory but this fails in my version of LF: https://issues.liferay.com/browse/LPS-61289
According to that issue, the proposed change to ehcache.cache.manager.peer.provider.factory=com.liferay.portal.cache.ehcache.JGroupsCacheManagerPeerProviderFactory is not working so to be safe I just downgraded the ehcache.jar from 2.8.3 to 2.8.2 so the unicast.xml conf file can be read.
I also read that as of 6.2 LF applies the hibernate-clustered.xml and liferay-multi-vm-clustered.xml configurations by default so there is no need to specify them in the properties.

On startup I see the jgroups channels being created:


21:12:41,997 INFO  [liferay-qa-1-startStop-1][LiferayCacheManagerPeerProviderFactory:72] portalPropertyKey ehcache.multi.vm.config.location.peerProviderProperties has value [file=/srv/app/portal/tomcat-7.0.62/conf/unicast.xml]
-------------------------------------------------------------------
GMS: address=liferay-qa-1-19945, cluster=liferay-multi-vm-clustered, physical address=10.10.2.10:41963
-------------------------------------------------------------------
21:12:57,448 INFO  [liferay-qa-1-startStop-1][ClusterBase:167] Autodetecting JGroups outgoing IP address and interface for dap-qa.rds.amazonaws.com:5432
21:12:57,451 INFO  [liferay-qa-1-startStop-1][ClusterBase:183] Setting JGroups outgoing IP address to 10.10.2.10 and interface to eth0
-------------------------------------------------------------------
GMS: address=liferay-qa-1-19576, cluster=liferay-channel-control, physical address=10.10.2.10:7800
-------------------------------------------------------------------
21:12:57,817 INFO  [liferay-qa-1-startStop-1][BaseReceiver:83] Accepted view [liferay-qa-1-19576|0] [liferay-qa-1-19576]
-------------------------------------------------------------------
GMS: address=liferay-qa-1-50505, cluster=liferay-channel-transport-0, physical address=10.10.2.10:7800
-------------------------------------------------------------------
21:12:57,934 INFO  [liferay-qa-1-startStop-1][BaseReceiver:83] Accepted view [liferay-qa-1-50505|0] [liferay-qa-1-50505]

When the second node starts, I can see it joining the cluster:


21:18:04,242 INFO  [Incoming-2,shared=liferay][BaseReceiver:83] Accepted view [liferay-qa-1-19576|1] [liferay-qa-1-19576, liferay-qa-2-38434]
21:18:04,488 INFO  [Incoming-2,shared=liferay][BaseReceiver:83] Accepted view [liferay-qa-1-50505|1] [liferay-qa-1-50505, liferay-qa-2-3018]
21:18:07,500 INFO  [Incoming-2,shared=liferay][DebuggingClusterEventListenerImpl:57] Cluster event JOIN
Cluster node {clusterNodeId=c93cb8fe-8519-48a4-a1e2-3133f0e2e8c1, portalProtocol=http, inetAddress=/10.10.4.10, port=8080}

The problem is that if I go to node 1 and move a portlet in the layout, node 2 is not reflecting this change, or any other change I make to a portlet. I've tried all sort of configurations I've found in forums and tech blogs.
Now, the first thing that seems weird to me is that there is no jgroup startup message for "hibernate-clustered". Should I see all four initialization messages? (hibernate-clustered, liferay-multi-vm-clustered, liferay-channel-control and liferay-channel-transport-0 )

I have turned on debug level logging for jgroups and ehcache, and I get cache events. If I move a portlet in node 1 I get:


21:56:23,662 DEBUG [http-bio-8080-exec-6][JGroupsCacheReplicator:48] Remove all elements called on com.liferay.portal.servlet.filters.cache.CacheUtil
21:56:23,667 DEBUG [http-bio-8080-exec-6][JGroupsCacheReplicator:48] Remove all elements called on com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portal.model.impl.LayoutImpl.List1
21:56:24,354 DEBUG [liferay-multi-vm-clustered Async Replication Thread][JGroupsCachePeer:48] Sending 18 JGroupEventMessages from the asynchronous queue.

But I don't see any response or message on node2. I suspect node2 should display debugging messages about receiving the jgroup event from the queue.

I've also seen this, which looks suspicious:


21:56:04,607 DEBUG [RuntimePageImpl-10][Cache:39] Initialised cache: com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portlet.asset.model.impl.AssetLinkImpl.List2
21:56:04,608 DEBUG [RuntimePageImpl-10][RMICacheManagerPeerListener:48] Adding to RMI listener
21:56:04,610 DEBUG [RuntimePageImpl-10][RMICacheManagerPeerListener:39] 165 RMICachePeers bound in registry for RMI listener
21:56:04,610 DEBUG [RuntimePageImpl-10][ConfigurationHelper:39] [b]CacheDecoratorFactory not configured for defaultCache.[/b] Skipping for 'com.liferay.portal.kernel.dao.orm.FinderCache.com.liferay.portlet.asset.model.impl.AssetLinkImpl.List2'.

Is that CacheDecoratorFactory message a problem?

Both nodes are in AWS security groups with all traffic enabled on all ports. I can telnet from one node to the other through port 7800 used by jgroups.

So any ideas on what could be wrong? I have looked all over the Internet for a solution with no luck.

Thanks.