Configuring a Liferay cluster (and make it use unicast)

Introduction

Configuring a Liferay cluster is part experience and part black magic. There is some information that you can find online, there's some information you can only find out while working on it and then there are some things like how to configure ehcache to use unicast that you can only discover through blood, sweat and tears. This post will first describe how to set up a Liferay 6.1 cluster with Ehcache in multicast and in unicast mode.
To get clustering to work in Liferay you need to make sure that all of the subsystems below are configured correctly:
  • Database
  • Indexing
  • Media gallery
  • Quartz
  • Cluster Link
  • Ehcache
 

Database

The first subsystem that needs to be configured for clustering, the database, is also one of the easiest to configure correctly. You just need to point each node in the cluster to the same database, either by using the same JNDI datasource

jdbc.default.jndi.name=jdbc/liferay

or by using the same JDBC configuration directly in your portal-ext.properties on each node

jdbc.default.driverClassName=com.mysql.jdbc.Driver
jdbc.default.url=jdbc:mysql://dbserver:3306/liferay_test?useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false&autoReconnectForPools=true
jdbc.default.username=dbuser
jdbc.default.password=Y@r3FiL

 

Indexing

For Liferay 6.1 the only reliable way to cluster the indexing is to use SOLR. For this you'll need to do 2 things: set up a separate SOLR server (or use an existing one) and deploy a correctly configured solr-web.war from the Marketplace to all cluster nodes.
 
Depending on which Liferay flavor you're using, CE or EE, this will be an easy process or a little bit more difficult. If you're running Liferay EE, the process is pretty straightforward as for that version there are solr-web versions available for SOLR 3 and 4. For Liferay CE it's a bit more complicated as there's only a relatively old WAR available for SOLR 1.4, which you'll need to upgrade yourself if you want to use newer SOLR versions with Liferay CE.
 
For this blog we're assuming that a dedicated Liferay SOLR instance will be used (but an additional core in an existing SOLR will also work). To set up a SOLR, you can just follow the instructions on their site: https://wiki.apache.org/solr/SolrInstall. Once you have a default SOLR up and running, you'll need to add some configuration to it so Liferay can use it for indexing. This configuration is done by replacing the existing schema.xml with the Liferay SOLR schema.xml that you can find in the WEB-INF/conf directory of the solr-web.war you downloaded.
 
If you're running on Liferay 6.1 CE and want to use a newer SOLR version than 1.4, you'll also need to change the schema.xml and possibly also the solr-spring.xml a bit to get it working. The version of the schema.xml that worked for us is:
<?xml version="1.0"?>
<schema name="liferay" version="1.1">
    <types>
        <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="integer" class="solr.IntField" omitNorms="true" />
        <fieldType name="long" class="solr.LongField" omitNorms="true" />
        <fieldType name="float" class="solr.FloatField" omitNorms="true" />
        <fieldType name="double" class="solr.DoubleField" omitNorms="true" />
        <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true" />
        <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
            </analyzer>
        </fieldType>
        <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>
        <fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
            <analyzer>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>
        <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
            <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TrimFilterFactory" />
                <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
            </analyzer>
        </fieldType>
        <fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />
    </types>
    <fields>
    <!--
        Had to add additional fields, 'dash' fields and 'copyfields' to make
        tables and sorting work correctly in certain Control Panel pages:
            
            first-name, last-name, screen-name, job-title and type
        
        otherwise you'll see the following errors in the SOLR log:

            Feb 15, 2013 11:20:23 AM org.apache.solr.common.SolrException log
            SEVERE: org.apache.solr.common.SolrException: can not sort on multivalued field: job-title
            at org.apache.solr.schema.SchemaField.checkSortability(SchemaField.java:160)
 
        http://www.liferay.com/community/forums/-/message_boards/message/21525098
        http://liferay-blogging.blogspot.be/2012/03/liferay-and-solr-solrexception-can-not.html
    -->
        <field name="comments" type="text" indexed="true" stored="true" />
        <field name="content" type="text" indexed="true" stored="true" />
        <field name="description" type="text" indexed="true" stored="true" />
        <field name="entryClassPK" type="text" indexed="true" stored="true"/>
        <field name="firstName" type="text" indexed="true" stored="true" />
        <field name="first-name" type="text" indexed="true" stored="true" />
        <field name="firstName_sortable" type="string" indexed="true" stored="true" />
        <field name="job-title" type="text" indexed="true" stored="true" />
        <field name="jobTitle_sortable" type="string" indexed="true" stored="true" />
        <field name="lastName" type="text" indexed="true" stored="true" />
        <field name="last-name" type="text" indexed="true" stored="true" />
        <field name="lastName_sortable" type="string" indexed="true" stored="true" />
        <field name="leftOrganizationId" type="slong" indexed="true" stored="true" />
        <field name="name" type="text" indexed="true" stored="true" />
        <field name="name_sortable" type="string" indexed="true" stored="true" />
        <field name="properties" type="string" indexed="true" stored="true" />
        <field name="rightOrganizationId" type="slong" indexed="true" stored="true" />
        <field name="screen-name" type="text" indexed="true" stored="true" />
        <field name="screenName_sortable" type="string" indexed="true" stored="true" />
        <field name="title" type="text" indexed="true" stored="true" />
        <field name="type" type="text" indexed="true" stored="true" />
        <field name="type_sortable" type="string" indexed="true" stored="true" />
        <field name="uid" type="string" indexed="true" stored="true" />
        <field name="url" type="string" indexed="true" stored="true" />
        <field name="userName" type="string" indexed="true" stored="true" />
        <field name="version" type="string" indexed="true" stored="true" />
        <!-- 
            http://liferay-blogging.blogspot.be/2012/03/liferay-and-solr-solrexception-can-not.html
        -->
        <field name="modified" type="text" indexed="true" stored="true" />
        <!-- 
            Added 'omitNorms' attribute on '*' to fix the following error: 
            Liferay side: 

                 12:07:05,844 ERROR [SolrIndexWriterImpl:55] org.apache.solr.common.SolrException: Bad Request

            SOLR side: 

                 Jul 30, 2012 12:07:05 PM org.apache.solr.common.SolrException log
                 SEVERE: org.apache.solr.common.SolrException: 
                 ERROR: [doc=PluginPackageIndexer_PORTLET_liferay/solr-web/6.1.0/war] cannot set an index-time boost, norms are omitted for field entryClassName: com.liferay.p
         -->
         <dynamicField name="*CategoryNames" type="string" indexed="true" multiValued="true" stored="true" />
         <dynamicField name="*CategoryIds" type="string" indexed="true" multiValued="true" stored="true" />
         <dynamicField name="expando/*" type="text" indexed="true" multiValued="true" stored="true" />
         <dynamicField name="web_content/*" type="text" indexed="true" stored="true" />
         <!--
             This must be the last entry since the fields element is an ordered set.
         -->
        <dynamicField name="*" type="string" indexed="true" multiValued="true" stored="true" omitNorms="false"/>
    </fields>
    <copyField source="firstName" dest="firstName_sortable" />
    <copyField source="first-name" dest="firstName_sortable" />
    <copyField source="job-title" dest="jobTitle_sortable" />
    <copyField source="lastName" dest="lastName_sortable" />
    <copyField source="last-name" dest="lastName_sortable" />
    <copyField source="name" dest="name_sortable" />
    <copyField source="screen-name" dest="screenName_sortable" />
    <copyField source="type" dest="type_sortable" />
    <uniqueKey>uid</uniqueKey>
    <defaultSearchField>content</defaultSearchField>
    <solrQueryParser defaultOperator="OR" />
</schema>

and for solr-spring.xml

<?xml version="1.0"?>
<beans default-destroy-method="destroy"
       default-init-method="afterPropertiesSet"
       xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:jee="http://www.springframework.org/schema/jee"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
       http://www.springframework.org/schema/jee  http://www.springframework.org/schema/jee/spring-jee.xsd
       http://www.springframework.org/schema/util  http://www.springframework.org/schema/util/spring-util-3.0.xsd">

    <bean class="com.liferay.portal.spring.context.PortletBeanFactoryPostProcessor" /> 
    
    <!-- Solr search engine --> 
    <bean id="com.liferay.portal.search.solr.server.BasicAuthSolrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> 
        <constructor-arg type="java.lang.String" value=" /> 
    </bean> 

    <bean id="com.liferay.portal.search.solr.SolrIndexSearcherImpl" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"> 
        <property name="solrServer" ref="com.liferay.portal.search.solr.server.BasicAuthSolrServer" /> 
        <property name="swallowException" value="true" /> 
    </bean> 

    <bean id="com.liferay.portal.search.solr.SolrIndexWriterImpl" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"> 
        <property name="commit" value="true" /> 
        <property name="solrServer" ref="com.liferay.portal.search.solr.server.BasicAuthSolrServer" /> 
    </bean> 

    <bean id="com.liferay.portal.search.solr.SolrSearchEngineImpl" class="com.liferay.portal.kernel.search.BaseSearchEngine"> 
        <property name="clusteredWrite" value="false" /> 
        <property name="indexSearcher" ref="com.liferay.portal.search.solr.SolrIndexSearcherImpl" /> 
        <property name="indexWriter" ref="com.liferay.portal.search.solr.SolrIndexWriterImpl" /> 
        <property name="luceneBased" value="true" /> 
        <property name="vendor" value="SOLR" /> 
    </bean> 

    <!-- Configurator --> 
    <bean id="searchEngineConfigurator.solr" class="com.liferay.portal.kernel.search.PluginSearchEngineConfigurator"> 
        <property name="searchEngines"> 
            <util:map> 
                <entry key="SYSTEM_ENGINE" value-ref="com.liferay.portal.search.solr.SolrSearchEngineImpl" /> 
            </util:map> 
        </property> 
    </bean> 
</beans>

Once you have SOLR up and running with the new schema, you just need to tweak the solr-web.war a little bit before deploying it on all nodes as it assumes SOLR is running on localhost:8080 which probably isn't the case. You can change this is in the solr-spring.xml file that you can find in the WEB-INF/classes/META-INF directory of the WAR file. Just change the constructor-arg value of the bean with id com.liferay.portal.search.solr.server.BasicAuthSolrServer so it points to the correct server and port.

 

Media Gallery

Now that the document library and image gallery have been combined to form the media gallery in newer Liferay versions, the configuration to cluster it has also simplified. To cluster the media gallery you have 2 options: database or file system. There was also an option to use Jackrabbit for this purpose, but this has been deprecated in Liferay 6.1.
 
Using the database is the simplest option as you only need to add one property to your portal-ext.properties file
dl.store.impl=com.liferay.portlet.documentlibrary.store.DBStore
This will automatically use the database that is already configured for Liferay to store al media items. As long a your database supports BLOBs of sufficient size to cover your media needs, this is an easy solution. But if your database has issues with large files, like videos for example, you can best use the second option: a common filesystem.
 
For this, you only need to configure a different store, usually either com.liferay.portlet.documentlibrary.store.FileSystemStore 
or  com.liferay.portlet.documentlibrary.store.AdvancedFileSystemStore (internally distributes lots of files over more directories to work around limitations of number of files per directory). To use a file system store correctly, you'll also need to configure the property dl.store.file.system.root.dir in your portal-ext.properties to point to a directory on the local filesystem of each node that points to a common file store, SAN, NAS, etc... . The problem with this is that the Liferay documentation doesn't exactly define what kind of file systems are supported or what kind of functionalities (locking, etc... ) they need to support. So it can be a bit of hit and miss finding one that works correctly.
 
Another option is to use the com.liferay.portlet.documentlibrary.store.CMISStore if you have an Alfresco instance to spare or the com.liferay.portlet.documentlibrary.store.S3Store if you have Amazon S3 buckets available. The only problem with these can be speed as they're usually slower than using a file system.
 

Quartz

The Quartz job scheduler that's available in Liferay also needs to be clustered to prevent problems. This can be done by adding the following line to your portal-ext.properties file:

org.quartz.jobStore.isClustered=true

 

Cluster Link

In a Liferay cluster all nodes need to be able to talk to each other to keep each other up to date. To enable this, you just need to activate the JGroups based Cluster Link system that's available in Liferay by adding the following two properties to your portal-ext.properties file:

cluster.link.enabled=true
cluster.link.autodetect.address=dbserver:dbport
The second property is needed because otherwise the Cluster Link initialization during startup will possibly fail because by default it will try to contact google.com:80 and access to the internet isn't always possible in some environments. For that reason you'll need to have a host/port combination that is reachable by the cluster link and that it can be used to set up itself. The easiest option for this is to use the database server and port that we already know (and can access) and use that to replace dbserver and dbport in the example above.
 
In order to make the JGroups Cluster Link you'll also need to set the following system properties for your JVM (for example via the JAVA_OPTS of Tomcat):
  • -Djava.net.preferIPv4Stack=true
  • -Djgroups.bind_addr=<local IP> (replace <local IP> on each node with the actual IP address of the node) 
  • -Djgroups.tcpping.initial_hosts=<node 1>[7800],<node 2>[7800] (replace <node 1>, <node 2>, etc... on each node with the actual IP addresses of the corresponding nodes and add more values, separated with a comma if your cluster has more than 2 nodes)

EhCache: multicast

In a Liferay cluster the different Ehcache based caches on a node also need to be aware of other nodes so that correct and up to date information is shown on nodes after something is changed on one node. When your server environment supports multicast (some virtualization software has issues with this) and your system administrators allow you to use it, it is pretty easy to configure Ehcache to work in a cluster. Just add the following lines to your portal-ext.properties on each node:

net.sf.ehcache.configurationResourceName=/ehcache/hibernate-clustered.xml
ehcache.multi.vm.config.location=/ehcache/liferay-multi-vm-clustered.xml

When using multicast isn't possible you'll need to use the information in the next section of this blog.

EhCache: unicast

In some server environments it might not be possible or allowed to use multicast. Unfortunately this is the default way of communication in a Liferay cluster and the only thoroughly documented way. So when we were faced with the task of setting up a cluster, while only using unicast, we had to do some Sherlock Holmes level investigations. After many hours of Googling, reading forums, trial and error, ... we were able to get it to work.

First off you need to create an JGroups configuration XML that will be the basis of the unicast setup. This XML is what will actually set up JGroups to use TCP instead of UDP. Once you have this file, you just need to provide it as configuration for a couple of properties and things will magically start working. Just create an XML file with the content below and name it unicast.xml (the name is not important as long as you remember to use the same value in the portal-ext.properties) and place it in the WEB-INF/classes directory of Liferay:

<!--
    TCP based stack, with flow control and message bundling. This is usually used when IP
    multicasting cannot be used in a network, e.g. because it is disabled (routers discard multicast).
    Note that TCP.bind_addr and TCPPING.initial_hosts should be set, possibly via system properties, e.g.
    
        -Djgroups.bind_addr=192.168.5.2 and -Djgroups.tcpping.initial_hosts=192.168.5.2[7800]
    
    author: Bela Ban
    version: $Id: tcp.xml,v 1.40 2009/12/18 09:28:30 belaban Exp $
-->
<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.8.xsd"> 
    <TCP singleton_name="liferay" 
         bind_port="7800" 
         loopback="true" 
         recv_buf_size="${tcp.recv_buf_size:20M}" 
         send_buf_size="${tcp.send_buf_size:640K}" 
         discard_incompatible_packets="true" 
         max_bundle_size="64K"
         max_bundle_timeout="30" 
         enable_bundling="true" 
         use_send_queues="true" 
         sock_conn_timeout="300" 
         timer.num_threads="4" 
         thread_pool.enabled="true" 
         thread_pool.min_threads="1" 
         thread_pool.max_threads="10" 
         thread_pool.keep_alive_time="5000" 
         thread_pool.queue_enabled="false" 
         thread_pool.queue_max_size="100" 
         thread_pool.rejection_policy="discard" 
         oob_thread_pool.enabled="true" 
         oob_thread_pool.min_threads="1" 
         oob_thread_pool.max_threads="8" 
         oob_thread_pool.keep_alive_time="5000" 
         oob_thread_pool.queue_enabled="false" 
         oob_thread_pool.queue_max_size="100" 
         oob_thread_pool.rejection_policy="discard"/> 

    <TCPPING timeout="3000" 
             initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}" 
             port_range="1" 
             num_initial_members="3"/> 

    <MERGE2 min_interval="10000" max_interval="30000"/> 
    <FD_SOCK/> 
    <FD timeout="3000" max_tries="3" /> 
    <VERIFY_SUSPECT timeout="1500" /> 
    <BARRIER /> 
    <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0" retransmit_timeout="300,600,1200,2400,4800" discard_delivered_msgs="true"/> 
    <UNICAST timeout="300,600,1200" /> 
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="400K"/> 
    <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true"/> 
    <FC max_credits="2M" min_threshold="0.10"/> 
    <FRAG2 frag_size="60K" /> 
    <pbcast.STREAMING_STATE_TRANSFER/> 
    <!-- <pbcast.STATE_TRANSFER/> --> 
</config>

Once you have this file in place you just need to add some additional configuration to your portal-ext.properties to configure the Liferay cluster link and Ehcache to use it:

cluster.link.channel.properties.control=unicast.xml
cluster.link.channel.properties.transport.0=unicast.xml
ehcache.bootstrap.cache.loader.factory=com.liferay.portal.cache.ehcache.JGroupsBootstrapCacheLoaderFactory
ehcache.cache.event.listener.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory
ehcache.cache.manager.peer.provider.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory
net.sf.ehcache.configurationResourceName.peerProviderProperties=file=/unicast.xml
ehcache.multi.vm.config.location.peerProviderProperties=file=/unicast.xml

 

More blogs on Liferay and Java via http://blogs.aca-it.be.

Blogs
Very Good Coverage.

Thanks for sharing.

Ahamed Hasan
Author of Liferay Cookbook
http://mpowerglobal.com/download-cookbook
Thanks for sharing.

About indexing I'll give a chance to Lucene.

Even in single node you have to tune "commit" and "optimize interval". But from 6.1 liferay introduces a "cluster bootstrap" functionality allowing a node to grap up-to-date indexes from another node.
I have different 6.1 installation, with high usage, and Lucene with good performance and a little overhead about "lucene replication messages".
I think Lucene vs Solr depends on portal contents and search requirements.

ClusterLInk: if you use default multicast mechanism you don't need -Djgroups*. The channels are controlled by "cluster.link.channel.properties.*" properties and the ports are declared with properties

EhCache Multicast. About "net.sf.ehcache.configurationResourceName" and "ehcache.multi.vm.config.location", why redeclare it with default values if you don't need to change cache configuration?
From 6.1 GA2 you can change ehcache configuration with a hook, so no restart is needed.

EhCache Unicast. I'm agaist placing file inside ROOT/WEB-INF/classes. If you use tomcat you can use "cluster.link.channel.properties.control=${catalina.base}/conf/unicast.xml"
We used SOLR as the customer made a master/slave SOLR cluster available, so why not use that. During testing we did indeed also try the cluster link lucene replication and that also did seem to do the trick. Regarding performance we didn't do any tests to see which solution is faster than the other.

Regarding your cluster link comments: I'll definitely try those out and see whether I can remove some unnecessary/unused ones. It's perfectly possible that some of these aren't needed anymore.

I declared the EhCache ones as that was what was indicated in the section about clustering in the Liferay manual. Will need to check.

We tried different locations and property values to get Liferay to pick up the EhCache Unicast file, but it wouldn't pick it up from other locations than WEB-INF/classes. As we already need a portal WAR overlay for several reasons this isn't a problem for us in this case, but I like your tip about using the catalina properties when on Tomcat.
Do you have an example of how you change this with a hook?
Do you want to do everything regarding clustering in 1 hook or do you want to know how to do a certain part in a hook? Without knowing one or the other there are a couple of problems: I fear that not all properties can be overridden using portal.properties in a hook, the Tomcat setenv.sh can't be changed with a hook and the unicast.xml file can't be loaded from the classpath of a hook.

What we do to bundle all these things (and other stuff) is to create one big deploy package using Maven. This package contains everything to build a certain version of the portal: this includes a certain version of Liferay, the patching tool, a set of patches, Tomcat file overlays, properties files, additional JARs, all WAR modules, scripts, ... . We can then trigger one script that will then build an environment will all these things. It makes a great subject for a blog post.

All the files that are needed or need to be changed are in this deploy package for us, but not in 1 hook. It might be possible somehow, but we haven't had the requirements yet to have it work like that.
Thanks for this article.
It's very helpful for the unicast section. I configured a cluster enviroment on Amazon and it seems work fine.
I only have a doubt about property org.quartz.jobStore.isClustered=true.

It's seems unnecessary since Liferay 6.1 and higher, because It's forced when cluster.link.enable=true in the QuartzSchedulerEngine class. Can you confirm ?

My Liferay portal instance was started the first time as non clustered (cluster.link.enable=false).
I have enabled cluster afterwards. In this case, do I need to stop Liferay, drop QUARTZ_* tables and restart?

Thanks.

Riccardo
See Mauro's answer. The property is still in the configuration in my case because it is an evolution of a property file for clustering that comes from older, pre 6.1, versions. Usually Liferay gives a signal in the logs when a property isn't needed anymore and I usually act on those and remove them accordingly.

But on your and Mauro's info I'll try to remove and verify it.
On EE 6.2 I get this error

14:48:36,460 INFO [localhost-startStop-1][LiferayCacheManagerPeerProviderFactory:76] portalPropertyKey ehcache.multi.vm.config.location.peerProviderProperties has value file=/unicast.xml
14:48:37,968 ERROR [localhost-startStop-1][JGroupsCacheManagerPeerProvider:120] Failed to create JGroups Channel, replication will not function. JGroups properties:_null [Sanitized]
java.lang.IllegalArgumentException: [JGRP00001] configuration error: the following properties in TCP are not recognized: {timer.num_threads=4}
When upgrading to a Liferay version that is newer than the one in my post you'll need to do a little extra work to get it running again:
- find the jgroups.jar file in ROOT/WEB-INF/lib
- unzip it and find the tcp.xml file
- rename it to unicast.xml and add 'singleton_name="liferay"' as the first attribute of the TCP tag inside the file

I had to do this recently for Liferay 6.2 too, encountered the same error and this is the adapted file that worked for me: https://dl.dropboxusercontent.com/u/7012383/liferay/unicast.xml
Thanks I'll give it a shot

What about this in unicast.xml? how does it work? will it be replaced w/ the initial hosts specified via the java startup command? i.e. -D parameter

initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}"
That's indeed the declaration that will be replaced by the value specified by the -D when available or use the default value as specified after the colon.
I am trying this with a two node cluster running EE 6.1.3 GA3. During startup it gives the error below, a context initialization failure the seems to boil down to NoClassDefFoundError: org/jgroups/ChannelException

The jgroups.jar in this version of the portal has the ChannelException class in it. Any ideas?

Thanks,
mark

14:27:46,080 INFO [main][LiferayCacheManagerPeerProviderFactory:76] portalPropertyKey net.sf.ehcache.configurationResourceName.peerProviderProperties has value file=/unicast.xml
14:27:46,119 ERROR [main][ContextLoader:227] Context initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.liferay.portal.spring.aop.ServiceBeanAutoProxyCreator#0' defined in class path resource [META-INF/base-spring.xml]: Cannot resolve reference to bean 'counterTransactionAdvice' while setting bean property 'methodInterceptor'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'counterTransactionAdvice' defined in class path resource [META-INF/base-spring.xml]: Cannot resolve reference to bean 'counterTransactionManager' while setting bean property 'platformTransactionManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'counterTransactionManager' defined in class path resource [META-INF/hibernate-spring.xml]: Cannot resolve reference to bean 'counterHibernateSessionFactory' while setting constructor argument; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'counterHibernateSessionFactory' defined in class path resource [META-INF/hibernate-spring.xml]: Invocation of init method failed; nested exception is java.lang.NoClassDefFoundError: org/jgroups/ChannelException
Hi Mark,

I haven tried it yet on the GA3 version, but the exception you're getting 'NoClassDefFoundError' doesn't mean it can't find the class (you'd get a ClassNotFoundException in that case), but that there was a problem loading the class. Usually the reason for this is a static initializer or constructor that threw an exception. If you're lucky the logging should have more information about this, if not it's debugging time.

One possibility could be that the format of the jgroups file we need for unicast has been changed and doesn't provide a necessary parameter for the initialization of the class. Did you extract the tcp.xml from the jgroups.jar file as the base for your unicast.xml file?
I double checked against the tcp.xml and it's identical to the one above. The log file is just a chain of those same errors trying to create different beans. The final entry in the stack trace is:

Caused by: java.lang.ClassNotFoundException: org.jgroups.ChannelException
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1484)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1329)
... 84 more

Pardon my ignorance, but I've not encountered the <property>=file=/unicast.xml syntax, is that correct?

mark
The only difference between the files should be that the one we'll use has 'singleton_name="liferay"' added to the TCP tag and that the file will be named unicast.xml and placed in the WEB-INF/classes directory of the Liferay WAR file. This file in the new location is then references 4 times in this section (which contains the file=/unicast.xml syntax) that should be part of your portal-ext.properties file:

cluster.link.channel.properties.control=unicast.xml
cluster.link.channel.properties.transport.0=unicast.xml
ehcache.bootstrap.cache.loader.factory=com.liferay.portal.cache.ehcache.JGroupsBootstrapCacheLoaderFactory
ehcache.cache.event.listener.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory
ehcache.cache.manager.peer.provider.factory=net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory
net.sf.ehcache.configurationResourceName.peerProviderProperties=file=/unicast.xml
ehcache.multi.vm.config.location.peerProviderProperties=file=/unicast.xml
Yes, I have all that in place as specified. I'll keep looking.
Thanks.
If I find some time during the weekend I'll also try it out on the version you're using.
Reporting back for anyone who treads this path in the future. All is well now. I had to get my jgroups information all synced up. EE 6.1.3 GA3 comes with jgroups 1.8. The fix pack liferay-fix-pack-platform-14-6130, however, updates it to jgroups 3.2! The exploded war file on my PC that I was using to look at the jgroups war file was the unpatched version. So I was trying to use an old configuration on the new version of jgroups. Once I realized, that, extracted the proper tcp.xml file and made the changes, then everything started up fine.

Thanks much for being responsive to my queries.
That will indeed produce the error message as it the XML file it finds comes from a totally different version. Thank you for sharing this; will be good to know for future problems.
Anyone on here have experience running Liferay in a GSLB setup?

In particular, does anyone have any experience configuring JGroups RELAY or RELAY2 for bridging remote Liferay clusters together? Does liferay have any documentation on this or does Liferay every configure this for its customers?
@narf dark: I personal haven't done that (what is GSLB even?), but in theory it should work similarly. If you've got an XML file, possibly also one from the jgroups.jar file, that has a base configuration for RELAY/RELAY2 you can point Liferay to that one using similar portal-ext.properties settings.
global server load balancing. I.E the same logically liferay installation, across multiple globally separate data-centers, however all aware of another
@narf dark: thanks for the explanation, but sadly I haven't done something similar yet. If you're a Liferay EE customer I would however try to open a ticket and see if they can possibly help you, but otherwise you're on your own I fear.
[...] Liferay defines two primary JGroups channels for what Liferay calls “cluster link”.  You enable this in your portal-ext.properties by setting cluster.link.enabled=true. By default all channels in... [...] Read More
To anyone else trying to globally load balance a liferay cluster across multiple data-centers, I've done a fair amount of research on this, see this article

http://bitsofinfo.wordpress.com/2014/05/21/clustering-liferay-globally-across-data-centers-gslb-with-jgroups-and-relay2/
sorry busted link

http://bitsofinfo.wordpress.com/2014/05/21/clustering-liferay-globally-across-data-centers-gslb-with-jgroups-and-relay2/
[...] This article is a work in progress… and a long one. Jan Eerdekens states it correctly in his article, “Configuring a Liferay cluster is part experience and part black magic” …. however doing it... [...] Read More