Backup/Restore a Liferay Environment

Introduction

This page documents two important administrative functions: backing up Liferay (and restoration) and setting up a copy for development usage.

Having a backup of the database is an important issue, in order to restore the database after some sort of catastrophe. The first thing that we're going to cover is backing up the Liferay data.

Liferay uses a database for primary data storage, but also uses the filesystem to store things such as image files, documents, etc.

Since I'm using Liferay with a Mysql database running on a Linux box, these instructions will focus only on that implementation. But the details should be adaptable to any database and operating system that can host Liferay.

So first the plan: the current Mysql server is going to host a second database. The production database will be the one that the main Liferay instance will use, and the development database will be a relatively recent copy of the production database. It will be allowed to get out of sync with production because I don't want changes I'm in the middle of making overwritten just because cron thinks it is time.

Create the Development Database

Create the development database, lpdev, with the following commands:

$ mysql -u root -p mysql
mysql> create database lpdev;
mysql> grant all privileges on lpdev.* to 'lportal'@'%' with grant option;
mysql> quit;

This will create the database lpdev and give all privileges to the lportal user.

Create Backup Script

First we'll set up a backup script for the production database and data files. I'm going to create and use a local directory, /var/liferay/backups. This will centralize the backups to make them easy to use for restoration later on.

In my /etc/cron.daily folder I created a script liferay-backup.cron with the following contents:

#!/bin/sh
#
# Back up the Liferay data.
#

# Get a variable to store the current date.
date=`date -I`

#
# Backup the liferay mysql database.
#
mysqldump -u lportal -p<password> lportal | bzip2 -9 -c > /var/liferay/backups/lportal-$date.sql.bz2
#
# Also create a backup of the Liferay data files.
#
tar cjf /var/liferay/backups/lportal-$date.data.tar.bz2 -C /opt/liferay data

You will have to change the <password> tag, replacing it with your lportal user password. NOTE: There is no space between the -p and the password. If you don't have an /etc/cron.daily directory, you can do your standard cron thing to create the backup.

This will create two files in the /var/liferay/backups directory, the sql to recreate the database and a tar file to recreate the data directory. These files are still on the same Linux box as the database and the server, so you may want to integrate the files into an off-system datastore or removable media.

This backup script runs while Liferay is still active. There is the potential of database issues (capturing partially committed data) or filesystem issues (the Lucene indices in a state of flux). Running the backup in the very early morning will help protect against this, but the potential is still there...

Recovering Liferay/Refreshing Development

The steps to recover Liferay are pretty much the same as refreshing the development environment, the only difference being the database used on the command line and the location of the data when extracted.

Restoration and refreshing should be done while Liferay is not running as changing the data on the fly could impose some serious application issues. But if you find yourself having to recover the database, it's a good bet that Liferay is not running anyway.

Recovering the Database

Recovering the database is a one-line command:

$ bzcat /var/liferay/backups/liferay-<date>.sql.bz2 | mysql -u lportal -p<password> lportal

Replace the <date> tag with a valid date from your backups. Dates will be formatted as YYYY-MM-DD. Replace the <password> tag with your lportal user password. NOTE: There is no space between the -p and the password.

Refreshing the Development Database

To refresh the development database, lpdev, you'd run the following command:

$ bzcat /var/liferay/backups/liferay-<date>.sql.bz2 | mysql -u lportal -p<password> lpdev

Do the replacements as indicated in the Recovering the Database section above.

Recovering the Data

The filesystem data has things such as the image gallery files, documents, etc. Our backup has all of these files, but you should clean out your data directory prior to expanding the backed-up data. This will ensure that you won't have any lingering data from before the restoration.

Recovery is done through the following commands:

$ /bin/rm -rf /opt/liferay/data/*
$ tar xjf /var/liferay/backups/liferay-<date>.data.tar.bz2 -C /opt/liferay

After starting Liferay, you're going to want to go to the Control Panel, Server Administration page and choose "Reindex all search indexes". This will rebuild all of the Lucene indices from the restored information in the database and the data files and leave it in a consistent state.

Refreshing the Development Data

You'll know if your local Liferay data is out of sync when you start seeing broken image tags, etc., in your development portal. To fix these kinds of issues, you'll have to refresh the development data. Since we're doing development on a Windows box, refreshing the data is a bit more complicated. Basically you're going to complete the following steps:

  1. Get the liferay-<date>.data.tar.bz2 file from your Linux box to your Windows box.
  2. Delete the contents of the c:\liferay\bundle\data directory.
  3. Expand the contents in the c:\liferay\bundle directory (Since the archive already is prefixed with the data directory, you should not expand in the data directory or you'll have c:\liferay\bundle\data\data and Liferay won't find it at startup).
  4. After starting Liferay, you're going to want to go to the Control Panel, Server Administration page and choose "Reindex all search indexes" to ensure the Lucene indices are consistent.

Conclusion

So now you're all set w/ your backup and recovery for Liferay. You can also refresh your development environment so that it matches production.

If you have just created your development database, you're going to want to run the Liferay backup script to create the sql and data files. Then follow the steps to refresh your development environment. Don't forget to change the database to lpdev in your portal-ext.properties file in c:\liferay\bundle\tomcat-6.0.29\webapps\ROOT\WEB-INF\classes so you access the right database in development.

Blogs
Very useful, thank you.
Just had to modifiy your backup script from "tar tjf ..." to "tar cjf ..." ;)
Not to nitpick but just wanted to point out that you changed "lportal" to "liferay" in your recovery section
HI David,
I wanted to see how robust this backup strategy was so I did some testing and I have some questions.

Q1 - Regarding liferay's file system, if I were to delete parts of the ROOT folder, would the data backup be able to handle that kind of modification/data loss to the server?

Q2 - While the DB backup is good for retrieving deleted files that were uploaded using the document and media portlet, yet when I purposely dropped lportal, the DB backup did help to bring up the original admin. But now my Control Panels/Admin bar at the top of the page is gone. Do you know how to recover fully so that Control Panel functionality is back?

Thanks,
David
The backup/recovery is meant to help protect you from data loss/corruption. It will not do anything to restore any of your webapps (including Liferay). So no, it's not going to handle question 1 at all. And probably the same for #2, if you restore but don't have the themes, hooks, ext, and other plugins, things may not work as you'd expect.

Should be easy to fix, though, if you just include webapps into your backup process, although some may argue that you should also grab the conf, the libs, the bin directory, ... It can easily turn into a slippery slope trying to guage how much should be included in your nightly backup.

It would be easier to grab a backup of your complete tomcat directory before/after making a change (i.e. editing a config file or deploying a new plugin). This will minimize the amount of data that you're collecting for backup yet still leave you in a position to restore in case of fatal error.

But at the end of the day system admins will have their own perspectives on how much to back up and when...
I use bacula (http://blog.bacula.org/) which can reach out to linux, mac, and windows hosts to be backed up ; in the central config (/etc/bacula/bacula-dir.conf) you can involve a pre-job reference which runs a script on the host being backed up .. central config also specifies fileset definitions to include/exclude folders/files (so including liferay */data and db dumps by mysql or postgresql can be defined)
Also one other question I had was the data recovery. According to the recovery steps, its seems the filesystem data resides in /opt/liferay/data/? But I thought everything was in the liferay home directory or even in the lifearyhomedir/data. Can you please explain why we are backing and restoring files in /opt/liferay/data? Prior to checking out this page, I never had a /opt/liferay/data folder on my server. Thanks for all the help.
You can specify the location of everything using portal-ext.properties, but my installation had /opt/liferay as the LIFERAY_HOME var, so /opt/liferay/{data,deploy,logs,etc} are all valid.

Depending upon where your LIFERAY_HOME is, you'll have to adjust the script accordingly.
thank you so much for your detailed explanation and also for you time David. I really enjoyed this article as it gave me a good starting point for my backup strategy.
This is a blog that is informative while not trying to be generic (technically) for all scenarios. "This is how I do things in my xyz enviro" by an expert is valuable and honest. I can only imaging that from forum, to blog comment, to blog, to documentation is exponential wrt contributor time spent. Complain ! make it better... with sounds that people might share. /opt/*.
Hi David,

I have a question about the database copy. You suggest that the staged server database is a recent copy of the live server database but in this case, user accounts are copied as well so this implies that every user is able to access to the staged server and may modify anything he wants. Am I right ? Isn't it a problem ?