Pluggable Enterprise Search with Solr
Table of Contents [-]
- Integrating Solr 1.4.1 with Liferay 6.1 GA2/GA3
Integrating Solr 1.4.1 with Liferay 6.1 GA2/GA3 #
Before we begin, it's important to note that this tutorial will cover integrating Solr 1.4.1 with Liferay 6.1 GA2 or Liferay 6.1 GA3. Liferay 6.2 GA1 (and above) can be configured to use later versions of Solr, but that is not within the scope of this article. This tutorial will also cover how to configure basic replication between master and slave Solr servers with Liferay. You can find Solr 1.4.1 in the archives of Apache here.
Downloading the Solr Search Engine Plugin and Verifying Deployment #
You can find Solr Search Engine Plugin in Liferay's Marketplace; just be sure to download the original version (not Solr 3 Search Engine or Solr 4 Search Engine). Version 1.0.2 should be the latest released version for this plugin, and it's the one we'll be using for this tutorial. It is compatible with both Liferay 6.1 GA2 and Liferay 6.1 GA3.
Before we deploy our Solr Search Engine Plugin to our portal, we need to check our ports. By default, Liferay's Tomcat bundle is set to run on port 8080, and by default, the Solr Search Engine Plugin is set to connect to a server on port 8080. Assuming you don't have Solr running on the same Tomcat instance as Liferay, one of the ports will need to change in order for the plugin to deploy correctly. At this point, we want to validate a successful deployment, so we're going to try to deploy the plugin without any Solr servers hooked up, just to see if Liferay tries to connect to an invalid server. Once we see the connection requests, we'll know that plugin was successfully deployed. Rather than change the ports for our Liferay server, we'll change the default port our Solr Search Engine Plugin wants to connect to before deploying. I used a file manager to view the contents of the .lpkg file without extracting it. On the first level, navigate into solr-web-6.1.20, and then into WEB-INF/classes/META-INF/ to find solr-spring.xml. Find this line:
<constructor-arg type="java.lang.String" value="[[http://localhost:8080/solr|http://localhost:8080/solr]]" />
and replace 8080 with a different open port number (i.e. 8983). Once you save the file, close out of the file manager and verify that the Solr Search Engine Plugin .lpkg file is updated. You can now deploy this onto your Liferay portal, and once it is successfuly deployed, you should see an endless list of errors like this:
INFO [HttpMethodDirector:439] I/O exception (java.net.ConnectException) caught when processing request: Connection refused INFO [HttpMethodDirector:445] Retrying request
This is a good thing! The Solr Search Engine Plugin has been successfully deployed, and it's trying to connect to a port that we set to test Liferay's connection to Solr. Now we can move on to configuring actual Solr servers for Liferay to connect to.
Setting Up the Master and Slave Solr Servers #
If you don't already have existing Solr servers, this tutorial will help you set up two basic servers. If you already have Solr servers, you can skip to the next section.
After you've downloaded the archived Solr 1.4.1 zip or tar.gz from Apache, unzip it twice and rename the two files; we'll call them "Solr_master" and "Solr_slave." This tutorial will set up two Solr servers in the same directory as the Liferay server. For example, in a folder called tutorial, we would have /tutorial/liferay-portal-6.1.20-ee-ga2, /tutorial/Solr_master, and /tutorial/Solr_slave. Once you've renamed your two servers, there are a couple things we need to do to configure replication.
Replicating Solr #
The first thing we need to do is change the port number for the slave. In Solr_slave/example/etc, find jetty.xml and change the default port number:
<Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
to another open port (i.e. 8984). Solr_slave is now set to a different port than Solr_master (which should still be on the default 8983). Next, we need to modify solrconfig.xml for both servers:
In Solr_master/example/solr/conf/solrconfig.xml, uncomment these lines:
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="replicateAfter">commit</str> <str name="replicateAfter">startup</str> <str name="confFiles">schema.xml,stopwords.txt</str> </lst> </requestHandler>
In Solr_slave/example/solr/conf/solrconfig.xml, uncomment these lines:
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="slave"> <str name="masterUrl">http://localhost:8983/solr/replication</str> <str name="pollInterval">00:00:60</str> </lst> </requestHandler>
You can visit (http://wiki.apache.org/solr/SolrReplication]) for more detailed information on Solr replication.
Configuring solr-spring.xml to Connect to the Master and Slave Solr Servers #
Now we need to go back to solr-spring.xml in our deployed Solr Search Engine Plugin to modify the ports to connect to real servers. Before we do this, it might be a good idea to shut down Liferay. In /webapps/solr-web/WEB-INF/classes/META-INF/solr-spring.xml, make these changes:
<!-- Solr search engine --> <bean id="com.liferay.portal.search.solr.server.BasicAuthSolrServerReader" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8984/solr" /> </bean> <bean id="com.liferay.portal.search.solr.server.BasicAuthSolrServerWriter" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8983/solr" /> </bean> <bean id="com.liferay.portal.search.solr.SolrIndexSearcherImpl" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"> <property name="solrServer" ref="com.liferay.portal.search.solr.server.BasicAuthSolrServerReader" /> <property name="swallowException" value="true" /> </bean> <bean id="com.liferay.portal.search.solr.SolrIndexWriterImpl" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"> <property name="commit" value="false" /> <property name="solrServer" ref="com.liferay.portal.search.solr.server.BasicAuthSolrServerWriter" /> </bean> <bean id="com.liferay.portal.search.solr.SolrSearchEngineImpl" class="com.liferay.portal.kernel.search.BaseSearchEngine"> <property name="clusteredWrite" value="false" /> <property name="indexSearcher" ref="com.liferay.portal.search.solr.SolrIndexSearcherImpl" /> <property name="indexWriter" ref="com.liferay.portal.search.solr.SolrIndexWriterImpl" /> <property name="luceneBased" value="true" /> <property name="vendor" value="SOLR" /> </bean> <!-- Configurator -->
Modify the ports if necessary.
The last thing we need to is copy over our Solr Search Engine Plugin's schema.xml and paste it into both of our Solr servers. In /webapps/solr-web/WEB-INF/conf, you should find schema.xml. Copy that, and paste it into Solr_master/example/solr/conf to replace the existing schema.xml. Do the same for Solr_slave.
Start the Solr Servers and Restart Liferay #
To start the Solr servers, navigate to Solr_master/example and run "java -jar start.jar" and do the same for Solr_slave. You should now be able to access the Solr admin panel by hitting http://localhost:8983/solr/admin/ and http://localhost:8984/solr/admin. Start up Liferay. Your Liferay search is now automatically upgraded to use Solr. It is likely, however, that initial searches will come up with nothing: this is because you will need to reindex everything using Solr.
To reindex Liferay, go to the Admin Portlet. Click the Server tab and then click the Execute button next to Reindex all search indexes. It may take a while, but Liferay will begin sending indexing requests to Solr for execution. When the process is complete, Solr will have a complete search index of your site, and will be running independently of all of your Liferay nodes. Installing the plugin to your nodes has the effect of overriding any calls to Lucene for searching. All of Liferay's search boxes will now use Solr as the search index. This is ideal for a clustered environment, as it allows all of your nodes to share one search server and one search index, and this search server operates independently of all of your nodes.
Small Optimization #
One small performance optimization you can make is in solrconfig.xml for each server. In both Solr_master/example/solr/conf/solrconfig.xml and Solr_slave/example/solr/conf/solrconfig.xml, uncomment and modify this <autoCommit> code:
<!-- Perform a <commit/> automatically under certain conditions: maxDocs - number of updates since last commit is greater than this maxTime - oldest uncommited update (in ms) is this long ago Instead of enabling autoCommit, consider using "commitWithin" when adding documents. http://wiki.apache.org/solr/UpdateXmlMessages --> <autoCommit> <maxDocs>10000</maxDocs> <maxTime>10000</maxTime> </autoCommit>
This should help Solr run a little faster.