For the last few months as I've been working with Liferay 7 CE / Liferay DXP, I've been a little stymied trying to manage the complexities of the new OSGi universe.
In Liferay 6.x, for example, an OOTB demo setup of Liferay comes with like 5 or 6 war files. And when the portal starts up, they all start up.
But with Liferay 7 CE and Liferay DXP, there are a lot of bundles in the mix. Liferay 7 CE GA3, for example, has almost 2,500 bundles in OSGi.
And when the portal starts up, most of these will also start. Some will not. Some might not be able to. Some can't start because they have unsatisfied dependencies.
But you're not going to know it.
Seriously, you won't know if something has failed to start when you restart your environment. There may or may not be something in the log. Someone might have stopped a bundle intentionally (or unintentionally) in the gogo shell w/o telling you. And with almost 2,500 bundles in there, it's going to be really hard finding the needle in the haystack especially if you don't know if there's a needle in there at all.
So I've been working on a new utility over the past few months to resolve the situation - TripWire.
TripWire actually scans the OSGi environment to gather information about deployed bundle statuses, bundle versions, and service components. Tripwire also scans the system and portal properties too.
This scanning is done at two points, the first is when an administrator takes a snapshot (basically to persist a baseline for all comparisons), and the second is a scheduled task that runs on the node to monitor for changes. The comparison scan can also be kicked off manually.
After installing TripWire and navigating to the TripWire control panel, you'll be prompted to capture an initial baseline scan:
Click the Take Snapshot card to see the system snapshot:
You can save the new baseline (to be compared against in the automated scans), you can export the snapshot (downloads as an excel spreadsheet), or you can cancel.
Each section expands to show captured details:
The funny looking hash keys at the top? Those are calculated hashes from the scanned areas, by comparing the baseline hash against the scanned hash, TripWire knows quickly if there is a variation between the baseline and the current scan.
When you save the new baseline, the main page will reflect that the server is currently consistent with the baseline:
You can test the server to force a scan by clicking on the Test Server card:
TripWire supports dynamically creating exclusion rules to exclude items from being part of the scan. You might add an exclusion for a property value that you're not interested in monitoring, for example. Click on the Exclusions card and then on the Add New Exclusion Rule button:
The Camera drop down lists all of the current cameras used when taking a snapshot. Choose either a specific camera or the Any Camera option to allow for a match to any camera.
The Type drop down allows you to select either a Match, a Starts With, a Contains or a Regular Expression type for the exclusion rule.
The value field is what to match against, and the Enabled slider allows you to disable a particular exclusion rule.
Modifying the exclusion rules will affect scans immediately resulting in failed scans:
By adding the rule to exclude any System Property that starts with "catalina.", scans now show the server to be inconsistent when compared to the baseline. At this point you can take a new baseline snapshot to approve the change, or you could disable the exclusion rule (basically reverting the change to the system) to restore baseline consistency.
TripWire uses Liferay notifications to alert subscribed administrators when the node is in an inconsistent state and when the node returns to a consistent state. For Liferay 7 CE, a subscribed administrator will only receive notifications about the single Liferay node. For Liferay DXP, subscribed administrators will receive notifications from every node that is out of sync with the baseline snapshot.
Notifications will be issued for every failed scan on every node until consistency is restored.
To subscribe or unsubscribe to notifications, click on the Subscriptions card. If you are unsubscribed, the bell image will be grey, if you are subscribed the bell will be blue and have a red notification number on it. Note this number does not represent the number of notifications you might currently have, it is just a visual marker that you are subscribed for notifications.
TripWire supports setting configuration for the scanning schedule. Click on the Configuration card:
Using the Cameras tab, you can also choose the cameras to use in the snapshots and scans:
Normally I recommend enabling all but the Separate Service Status Camera (because this camera is quite verbose in the details it captures).
The Bundle Status Camera captures status for each bundle.
The Bundle Version Camera captures versions of every bundle.
The Configuration Admin Camera captures configuration changes from the control panel. Note that CA only saves values that are different from the set of default values on each panel, so the details on this section will always be shorter than the actual set of configurations saved for the portal.
The Portal Properties Camera captures changes to known Liferay portal properties (unknown properties are ignored). In a Liferay DXP cluster, some properties will need to be excluded using the Exclusion Rules since nodes will have separate, unique values that will never match a baseline.
The Service Status Camera captures counts of OSGi DS Services and their statuses.
The System Properties Camera captures changes to system properties from the JVM. Like the portal properties, in a Liferay DXP cluster some properties will need to be excluded using Exclusion Rules since nodes will have separate, unique values that will never match a baseline.
The Unsatisfied References Camera captures the list of bundles with unsatisfied references (preventing the bundles from starting). Any time a bundle has an unsatisfied reference, the bundle and it's unsatisfied reference(s) will be captured by this camera.
The three email tabs configure who the notification emails are from and the consistent/inconsistent email templates.
For Liferay DXP clusters, TripWire uses the same baseline across all nodes in the cluster and reports on cluster node inconsistencies:
Clicking on the server link in the status area, you can review the server's report to see where the problems are:
Some of the additions and changes are due to unique node values and should be handled by adding new Exclusion Rules.
The Removals above show that one node in the cluster has Audience Targeting deployed but the other node does not. These are the kinds of inconsistencies that you may not be aware of from a cluster perspective but would result in your DXP cluster not serving the right content to all users, and identifying this discrepancy once in your cluster in an easy and quick way will save you time, money and effort.
For your cluster Exclusion Rules, your rule list will be quite long:
It is available from the Liferay Marketplace:
- TripWire CE: https://web.liferay.com/marketplace/-/mp/application/87693826
- TripWire DXP: https://web.liferay.com/marketplace/-/mp/application/87618369
There is a cost for each version, but that is to offset the time and effort I have invested in this tool.
And while there may not seem to be an immediate return, the first time this tool saves you by identifying a node that is out of sync or an unauthorized change to your OSGi environment, it will save you time (in waiting for the change to be identified), effort (in having to sort through all of the gogo output and other details), user impressions (from cluster node sync issues) and most of all, money.