Hidden Dangers of Trying to Keep Liferay Up to Date

At Liferay Symposium North America 2017, I had a discussion with a few of our customers on the hidden dangers of rolling back Liferay changes.

There are multiple ongoing efforts to improve how we will handle the issue for the upcoming 7.1 release, such as LPS-76923. However, the 7.0 release is already affected, and DE-40 is fast approaching with another round of hidden dangers scheduled to arrive with it.

This blog post was drafted to provide a little more transparency around what is about to happen so that people who have a rollback plan actually have enough information to formulate a complete rollback plan.

With that being said, not everyone immediately realizes there is a problem that arises when you need to rollback the Liferay platform, so I'll start with an example starting from the side of customization, which may be easier to relate to.

Imagine that you've followed the guide on Creating Upgrade Processes for Modules, and you decide to create an upgrade process in order to transition to a new version. While it's a very complete article from the perspective of what you do to move forward, there's one thing the article doesn't try to answer: what happens if you need to go backwards?

If you've ever tried it, you'll find that there is actually little you can do.

And therein lies the problem: some people do not realize that this lack of things you can do to rollback changes applies to the Liferay platform in the same way it applies to customizations.

Core Schema Changes

Of course, it turns out that in some cases, Liferay has as much of a problem moving forward as it does moving backward.

Let's say you've skimmed through the javadocs linked in Development Reference, and you've decided that you want to take advantage of some newer API that's been made available in the later tags of the Core Portal Artifacts. Or maybe, you've browsed through our issue tracker or our release notes and decided that you need one of the fixes or changes. To do so, in theory all you do is rebuild Liferay from source at a later revision that is equal to or later than the specific tag containing the API change you need.

But, what happens if you decide that the newer commit contains changes that you dislike, and you decide you want to rollback to an earlier version?

In theory, all you need to deal with is the changing API, assuming you started using that new API. However, in practice, one other thing you have to face much more prominently in the current Liferay release (at least, when compared to past releases) is schema changes.

Let's say that you started at a release that has 7.0.3 GA4 as its nearest parent tag, but you've decided that you want to roll forward to a new release where 7.0.4 GA5 is its nearest parent tag.

However, if you perform this update, Liferay's core schema version (essentially, a version number that describes the schema for service builder classes living in portal-impl) has also changed. These updates are documented as new versions added to ReleaseInfo. Any time this happens, Liferay treats this as a schema change.

Core Schema in Pre-7 Releases

Early on in Liferay's history (5.1, 5.2, 6.0), Liferay code review allowed new point releases within the same branch to have schema changes. For example, if you were on 5.2 EE SP1, there might be a schema change if you attempted to go to 5.2 EE SP2. These schema changes would apply to the next release.

Of course, with schema changes happening in both a 5.2 branch and a 6.0 branch, you can imagine that depending on where you started within the 5.2 branch, you would need different processes to run in order to reach the correct final state in 6.0. So in order to ensure Liferay ran the correct upgrade processes, you would need to set the upgrade.processes values in portal properties for the upgrade, and there was documentation describing what those properties should be depending on where you started your upgrade.

Later on in Liferay's history (6.1, 6.2), Liferay decided that all of the different permutations were getting really unwieldy. Therefore, we added all the known paths to portal properties (something we later called "seamless upgrade") so that there wouldn't be mistakes from people following the documentation, and we strongly discouraged schema changes after the official release.

If there was something that seemed like it would need a schema change in order to fix, we invested a lot of extra effort into finding a way that would not require a schema change. If there was no way to address something without a schema change (for example, performance issues that required we reorganize how we store data in a column), we might hide the schema change inside of verify.processes. This resulted in an increase in the number of upgrade-like fixes that appeared as verify processes.

Core Schema in Post-7 Releases

For Liferay 7, we added a policy around verify processes where you needed to demonstrate that it's something that must be run for every release before you could make it a verify process.

Because we knew from past experience that sometimes schema changes might need to happen anyway, this also meant that we would need to re-allow actual schema changes within the same branch. If it weren't for the shared core of CE and DXP, this would have brought us back to Liferay's darker days of multiple variations in upgrade steps, but the shared core side-steps this particular problem.

However, what happens if you try to startup Liferay against a code base that assumes a newer Liferay core schema version?

Well, we also removed the logic which automatically ran core upgrades as you started up, which used to allow you to start up Liferay as long as you were switching to a newer version. As a result, this meant that whenever a version change happened, you would always need to consciously run the upgrade process whenever you transition to that new version, giving you a stronger sense that significant changes were about to happen to your database.

However, there is one thing missing in all of this planning put into preventing startup on core schema version changes: it only works if you only use traditional CE releases, where the core schema version formally changes. If you are in an environment where that version number never changes, how many schema changes you've successfully run starts to get murky.

A particularly awkward variant of this limitation arises if you choose to use DXP rather than CE: your version number is fixed at 7.0.10 from the time you first installed or upgraded Liferay. As a result, Liferay will never force you to run additional core upgrade processes, even when more are added in subsequent Liferay releases.

Known Symptoms of Incomplete Schema Upgrades

If you upgraded to DXP before SP1 (November 1, 2016), or you upgraded with a hotfix that uses DE-7 or earlier as a baseline (the fix pack corresponding to SP1, as noted in the Service Pack Matrix), you might be affected by the following issues due to missing schema updates:

  • LPS-66133: Discussion-enabled assets with zero comments cause performance issues
  • LPS-66599: MBDiscussion table has entries where groupId is 0, which prevents staging from working
  • LPS-44965: For a specific upgrade path (6.1.10 -> 6.1.20 -> 7.0.10), text columns are the wrong size on Oracle

If you upgraded to DXP before SP2 (March 28, 2017), or you upgraded with a hotfix that uses DE-12 or earlier as a baseline, you might be affected by the following issues due to missing schema updates:

  • LPS-68410: Certain text columns, like LayoutPrototype.description, allow fewer characters on SQL Server and Sybase than they would allow on other databases
  • LPS-68775: Organization.type_ column contains values not defined in the organizations.types portal property
  • LPS-69878: Group_.groupKey column is a CLOB instead of a VARCHAR on MySQL

If you upgraded to DXP before SP3 (May 11, 2017), or you upgraded with a hotfix that uses DE-14 or earlier as a baseline, you might be affected by the following issue due to missing schema updates:

  • LPS-70807: PortletPreferences table contains references to 1_WAR_kaleoformsportlet

Module Schema Changes

Just as before, let's say you've skimmed through the javadocs, and you've decided that you want to take advantage of some newer API that's been made available in the later tags of the Core Portal Artifacts. However, this time let's also assume LPS-72269 was recently applied to the core source code of Liferay, and you both started and ended on a commit that sits between 7.0.3 GA4 and 7.0.4 GA5.

So, you fast-forward your repository to a point in time which contains the API that you want, and you build Liferay from source. However, you encounter some behavior that you dislike, and so you decide to bring things back to how they used to be. To do so, you rollback to your original commit, and you build Liferay from source again.

When you start up Liferay after this rollback, something unexpected happens. You no longer see the "Navigation" option in the Control Panel side menu.

When Module Schema Changes Occur

Essentially, as long as the portal can start, and the module has all of its packages satisfied, its upgrades will run.

In theory, when you use only CE releases without building from source, the only time you will see a module schema change is when you acquire the new release containing the updated core schema version. As a result, Liferay will refuse to start due to the core version changes, and the only time module upgrades happen is when you manually run the upgrade process.

In practice, in fulfilling the dream of modularity, module schema versions change independently of the core schema version, and so it is theoretically possible for a module upgrade to happen without a core schema version change, as long as the release manager is configured to allow it (it is allowed by default).

Of course, because transitioning between actual CE releases prevents the portal from starting until the upgrade process is officially run, this really only affects two audiences: those who build from source between core schema changes, and those who choose to use DXP instead of CE.

In those situations, because there is no additional restriction on when a module upgrade will run beyond "can the portal start up", we have a situation where module schema upgrades will simply run as soon as the portal starts up and the module is deployed, and there will be no advance warning.

Transitive Component Dependencies

So how does all of this lead to the "Navigation" option disappearing? Welll, between the commit you started from and the commit you tested, pull request #50722 was merged, which resulted in an updated Liferay-Require-SchemaVersion on the com.liferay.mobile.device.rules.service.

As a result, after starting Liferay with the newer module, the database was automatically updated to reflect the module schema changes. Liferay recorded that this updated schema occurred, and when you rolled back, the rolled back version of com.liferay.mobile.device.rules.service declares that it provides an older schema version.

And this is where the problem arises. Liferay announces to the different modules that only code that knows how to work with the newer schema version should be run. (Of course, it does so very quietly, which is another point of contention; it just quietly fails instead of loudly complaining about it.)

This means that even though the bundle starts, none of its service builder components are made available as OSGi components. Through several layers of transitive dependencies, the code responsible for rendering "Navigation" option in the Control Panel side menu no longer has its dependencies satisfied, and it disappears.

Identifying Module Schema Changes

So how do you know if you're having problems that might be related to schema changes?

If you are building from source, you will need to know both the starting commit and the ending commit. From there, you can get a list of all the schema versions by scanning all the bnd.bnd files in the source for the Liferay-Require-SchemaVersion header, and use a diff tool in order to compare the differences.

If you are working with releases and fix packs, you can use a tool that I built after we discovered the problem between DE-26 and DE-27 (which was caused by LPS-72269 mentioned above) to understand just how often this actually happened. This link shows the differences between DE-29 and DE-30, which involved services that are fairly critical to Liferay's function: Liferay-Require-SchemaVersion Changes Since DXP Release

If you've recently attempted to rollback, and you're worried about whether you're affected, you can check its symptoms. While there are many reasons for Spring beans to not be registered as OSGi components (one of which we encountered in Troubleshooting Liferay from Source), if you're affected by a module schema change, you are definitely in a situation where the Spring beans are not registered as OSGi components.

For example, by using dm wtf, because Felix's dependency manager is used in order to prevent the registering of the service builder services as OSGi components, and everything that's failing to resolve should be reported as a result.

You can also have Liferay automatically report problems by creating a configuration file and add unavailableComponentScanningInterval=60 to that file, which will turn on scanning of unavailable Spring components, as described in Detecting Unresolved OSGi Components. You will also want to turn on INFO level logging by following the instructions on Adjusting Module Logging.

  • Between and including DE-24 and DE-30 (which includes 7.0.4 GA5), you would update the file com.liferay.portal.spring.extender.internal.configuration.SpringExtenderConfiguration.cfg and enable INFO logging on com.liferay.portal.spring.extender.internal.context which lives in the com.liferay.portal.spring.extender module
  • After and including DE-31, you would update the file com.liferay.portal.osgi.debug.spring.extender.internal.configuration.UnavailableComponentScannerConfiguration.cfg and enable INFO logging on com.liferay.portal.osgi.debug.spring.extender.internal which lives in the com.liferay.portal.osgi.debug.spring.extender module

Fixing Module Schema Changes

You may have encountered Liferay-Require-SchemaVersion in the article Creating Data Upgrade Processes for Modules: it attempts to prevent outdated code from running against an upgraded schema.

Of course, the mechanism is not cluster-aware, so it doesn't prevent outdated nodes in a cluster from running older code against the newer schema, so this functionality really only helps with accidental bad deployments.

So now we run into a dilemma. Assuming you have no choice but to rollback (in other words, whatever issue you encountered in the updated Liferay requires the rollback), how can we get everything working again? Well, as noted at the beginning, there's very little you can do, but a little is better than nothing.

Restore the Older Schema

If you ask around, the recommended way is to restore a database backup that contains the older schema for your module. This will have the appropriate Release_ table entries in Liferay for it to know that this older schema exists for the module, and everything is known to work with the older module code.

However, you wind up losing all data from the time the backup was made up until today. Therefore, you will have to determine if this trade-off is actually acceptable.

Restore the Newer Bundle

One way is to deploy a bundle that is actually compatible with the upgraded schema, where it might be simplest to deploy just the updated module (if you're dealing with a release bundle where everything is in .lpkg files, the updated module needs to go into ${liferay.home}/osgi/marketplace/override).

However, this approach is risky for all the reasons we just experienced with TermsOfUseContentProvider in Keeping Customizations Up to Date with Liferay Source, and it can rapidly become unwieldy. If you're dealing with a release bundle, any of the modules you are forced to override can no longer be patched, which makes your installation unsupportable.

Even if you assume that you don't need supportability, if the service module we're redeploying implements a ProviderType interface and the package containing that interface has updated, things get really messy really fast.

To start, you will need to deploy an updated version of the module that provides that interface, or use one of the Import-Package approaches described in the previous blog entry. If the service module we're redeploying also exports a package containing a ProviderType interface, you may have to deploy an updated version of everything that implements that interface so that the correct Import-Package versions are listed.

Ultimately, with this approach, you may have to deploy an updated version of all of the transitive dependencies as well, if any of these updated modules either provides or implements ProviderType interfaces. It may get to the point where you're redeploying large parts of the portal as an override in order to counteract package updates for any packages holding or any components implementing ProviderType interfaces.

Erase Module Tables

It's entirely possible that you don't actually care about the data in the affected module. For example, in the case of Mobile Device Rules, it might not have been a component you used at all, so you'd be perfectly comfortable erasing all of its data (because there is no data).

We started this blog post from the side of customization. From the customization side, if you'd run across the problem and tried to find a solution, you would eventually run into a tool named the DB Support Gradle Plugin. This tool provides a single command, cleanServiceBuilder, which does just one thing: "Cleans the Liferay database from the Service Builder tables and rows of a module."

In other words, a fairly straightforward way to get things working is to simply start over from scratch for that module.

If you have access to the Liferay source, you can simply navigate to the module that's failing to start, update the Gradle configuration so that the DB Support Gradle Plugin can function (though you may need to update it if you're using a database other than MySQL, due to a regression bug introduced with the fix to LPS-73124, resolved in LPS-76854), and use cleanServiceBuilder to erase all of that module's data.

Stopping Module Schema Changes

As described in Running the Upgrade Process, you can prevent module upgrades from running by adding autoUpgrade=false to com.liferay.portal.upgrade.internal.configuration.ReleaseManagerConfiguration.cfg. Normally this is done during an actual upgrade so you can run the module upgrade separately from the core schema upgrade, but it also has the side-effect that it prevents module upgrades from running if new upgrades are added without warning.

However, this doesn't actually prevent the problem of transitive component dependencies, because we now simply have the problem in reverse.

Just as is the case when you create a custom service builder module that doesn't have an upgrade process to a schema version, Liferay is declaring that it only knows how to work with the older schema version, but the code says that it only knows how to work with the newer schema version. As a result, none of the service builder components are made available as OSGi components.

So, what happens is components will stop working. However, at least it's very easy to rollback to the earlier version, as no module schema changes have actually happened.