Available Now: TripWire

For the last few months as I've been working with Liferay 7 CE / Liferay DXP, I've been a little stymied trying to manage the complexities of the new OSGi universe.

In Liferay 6.x, for example, an OOTB demo setup of Liferay comes with like 5 or 6 war files.  And when the portal starts up, they all start up.

But with Liferay 7 CE and Liferay DXP, there are a lot of bundles in the mix. Liferay 7 CE GA3, for example, has almost 2,500 bundles in OSGi.

And when the portal starts up, most of these will also start. Some will not. Some might not be able to. Some can't start because they have unsatisfied dependencies.

But you're not going to know it.

Seriously, you won't know if something has failed to start when you restart your environment. There may or may not be something in the log. Someone might have stopped a bundle intentionally (or unintentionally) in the gogo shell w/o telling you. And with almost 2,500 bundles in there, it's going to be really hard finding the needle in the haystack especially if you don't know if there's a needle in there at all.

So I've been working on a new utility over the past few months to resolve the situation - TripWire.

Features

TripWire actually scans the OSGi environment to gather information about deployed bundle statuses, bundle versions, and service components. Tripwire also scans the system and portal properties too.

This scanning is done at two points, the first is when an administrator takes a snapshot (basically to persist a baseline for all comparisons), and the second is a scheduled task that runs on the node to monitor for changes. The comparison scan can also be kicked off manually.

After installing TripWire and navigating to the TripWire control panel, you'll be prompted to capture an initial baseline scan:

Click the Take Snapshot card to see the system snapshot:

You can save the new baseline (to be compared against in the automated scans), you can export the snapshot (downloads as an excel spreadsheet), or you can cancel.

Each section expands to show captured details:

The funny looking hash keys at the top? Those are calculated hashes from the scanned areas, by comparing the baseline hash against the scanned hash, TripWire knows quickly if there is a variation between the baseline and the current scan.

When you save the new baseline, the main page will reflect that the server is currently consistent with the baseline:

You can test the server to force a scan by clicking on the Test Server card:

Exclusions

TripWire supports dynamically creating exclusion rules to exclude items from being part of the scan.  You might add an exclusion for a property value that you're not interested in monitoring, for example. Click on the Exclusions card and then on the Add New Exclusion Rule button:

The Camera drop down lists all of the current cameras used when taking a snapshot. Choose either a specific camera or the Any Camera option to allow for a match to any camera.

The Type drop down allows you to select either a Match, a Starts With, a Contains or a Regular Expression type for the exclusion rule.

The value field is what to match against, and the Enabled slider allows you to disable a particular exclusion rule.

Modifying the exclusion rules will affect scans immediately resulting in failed scans:

By adding the rule to exclude any System Property that starts with "catalina.", scans now show the server to be inconsistent when compared to the baseline. At this point you can take a new baseline snapshot to approve the change, or you could disable the exclusion rule (basically reverting the change to the system) to restore baseline consistency.

Notifications

TripWire uses Liferay notifications to alert subscribed administrators when the node is in an inconsistent state and when the node returns to a consistent state. For Liferay 7 CE, a subscribed administrator will only receive notifications about the single Liferay node. For Liferay DXP, subscribed administrators will receive notifications from every node that is out of sync with the baseline snapshot.

Notifications will be issued for every failed scan on every node until consistency is restored.

To subscribe or unsubscribe to notifications, click on the Subscriptions card. If you are unsubscribed, the bell image will be grey, if you are subscribed the bell will be blue and have a red notification number on it. Note this number does not represent the number of notifications you might currently have, it is just a visual marker that you are subscribed for notifications.

Configuration

TripWire supports setting configuration for the scanning schedule. Click on the Configuration card:

Using the Cameras tab, you can also choose the cameras to use in the snapshots and scans:

Normally I recommend enabling all but the Separate Service Status Camera (because this camera is quite verbose in the details it captures).

The Bundle Status Camera captures status for each bundle.

The Bundle Version Camera captures versions of every bundle.

The Configuration Admin Camera captures configuration changes from the control panel.  Note that CA only saves values that are different from the set of default values on each panel, so the details on this section will always be shorter than the actual set of configurations saved for the portal.

The Portal Properties Camera captures changes to known Liferay portal properties (unknown properties are ignored). In a Liferay DXP cluster, some properties will need to be excluded using the Exclusion Rules since nodes will have separate, unique values that will never match a baseline.

The Service Status Camera captures counts of OSGi DS Services and their statuses.

The System Properties Camera captures changes to system properties from the JVM. Like the portal properties, in a Liferay DXP cluster some properties will need to be excluded using Exclusion Rules since nodes will have separate, unique values that will never match a baseline.

The Unsatisfied References Camera captures the list of bundles with unsatisfied references (preventing the bundles from starting). Any time a bundle has an unsatisfied reference, the bundle and it's unsatisfied reference(s) will be captured by this camera.

The three email tabs configure who the notification emails are from and the consistent/inconsistent email templates.

Liferay DXP

For Liferay DXP clusters, TripWire uses the same baseline across all nodes in the cluster and reports on cluster node inconsistencies:

Clicking on the server link in the status area, you can review the server's report to see where the problems are:

Some of the additions and changes are due to unique node values and should be handled by adding new Exclusion Rules.

The Removals above show that one node in the cluster has Audience Targeting deployed but the other node does not. These are the kinds of inconsistencies that you may not be aware of from a cluster perspective but would result in your DXP cluster not serving the right content to all users, and identifying this discrepancy once in your cluster in an easy and quick way will save you time, money and effort.

For your cluster Exclusion Rules, your rule list will be quite long:

Conclusion

That's TripWire.

It is available from the Liferay Marketplace:

There is a cost for each version, but that is to offset the time and effort I have invested in this tool.

And while there may not seem to be an immediate return, the first time this tool saves you by identifying a node that is out of sync or an unauthorized change to your OSGi environment, it will save you time (in waiting for the change to be identified), effort (in having to sort through all of the gogo output and other details), user impressions (from cluster node sync issues) and most of all, money.

 

 

Blogs
@#!@$%! awesome! This is the kind of plugin EVERYONE should consider. Anything that can save your organization money, or you as an individual from pulling your hair out is worth it weight in gold. I've had this problem (environment config differences) plague me with the past version of Liferay, which as you said were much simpler. Anyone who has run into this type of problem under the new architecture will immediately appreciate the value this tool brings to the table. Great work David! Can't wait for the release!
Hi David,
This seems a great tool indeed, however I do have a question: does the tool allow somehow to interact with the bundle status? Let's say one of the bundle stops for whatever reason and you would like to restart it, or that on the other hand you want to stop it because you identify it's causing a general problem and you rather live without it while it's being corrected, can you do that? With other words, does it offer the flexibility of having the gogo Shell "start bundle" and "stop bundle" and maybe other gogo Shell actions?
If not, do you know any tool that would allow this (besides the gogo Shell itself of course)? I am asking because we can't just give Telnet access to servers (moreover production): the people that can have access to the Gogo Shell are then not the Liferay Developers and Administrators in general, when it's the Liferay developers that would need this OSGi management tool.
Thanks for your help, and I am anyways considering you solution.
Tanguy