Microbenchmarking Liferay Registry with JMH

I've been working with JMH (Java Microbenchmark Harness) [1] written by the OpenJDK/Oracle compiler guys. JMH is a microbenchmark test harness which takes care of all the concerns that most developers can't really wrap their head around and often neglect to take into consideration. It's discussed here [2] by it's primary developer. And here [3] is a good write-up about it.

JMH is certainly a tool that you'll want to bring into your toolbox if you have any care at all for understanding the performance of your applications (more importantly down a the algorithm and language level).

It's a little tricky getting JMH setup in a pure Ant environment but I can talk about that in another post.

Meanwhile, we've been working on an implementation of a generic "registry" library (a.k.a. liferay-registry) which is backed by the OSGi Service Registry.

The source of my interest in JMH has been to make sure that this new registry implementation is close to being as fast as the one(s) currently in the portal. My goal was to reach at least 90% equivalent performance given that the new registry has more features, but those should not impose a significant performance degradation.

In order to baseline the result, I compared all implementations (existing and new) with that of a plain old java array, and also a plain old ArrayList (list). The serviceTrackerCollection is the impl from the liferay-registry which is backed by the registry itself for tracking registered impls. Finally, the two uses of the EventProcessorUtil were tested:

  • when a list of classNames are passed (eventProcessorUtil_process_classNames)
  • when a list of impls are pre-registered (eventProcessorUtil_process_registered)

Here are the outcomes of the JMH "throughput" (max number of operations per iteration) benchmark over 200 iterations with a concurrency of 4 (multi-threaded):

     [java] Benchmark                                                                  Mode Thr     Count  Sec         Mean   Mean error    Units
     [java] c.l.j.p.e.EventsPerformanceTest.array                                     thrpt   4       200    1    40868.206       68.055   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.eventProcessorUtil_process_classNames     thrpt   4       200    1    16099.645       28.735   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.eventProcessorUtil_process_registered     thrpt   4       200    1    32784.652       60.586   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.list                                      thrpt   4       200    1    41045.476       82.463   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.serviceTrackerCollection                  thrpt   4       200    1    41143.900       69.304   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.serviceTrackerCollection_ToArray          thrpt   4       200    1    35950.619      223.727   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.serviceTrackerCollection_ToArray_Typed    thrpt   4       200    1     8069.174       34.354   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.serviceTracker_getServices                thrpt   4       200    1      986.573        2.329   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.serviceTracker_getServices_Typed          thrpt   4       200    1      824.367        1.683   ops/ms
     [java] c.l.j.p.e.EventsPerformanceTest.serviceTracker_getTracked_values          thrpt   4       200    1     9243.379      282.008   ops/ms

See here [4] for code details.

 

We achieved a pretty significant improvement in performance over the original, thus ensuring that when we integrate the registry into the Liferay core shortly, it won't cause a performance degradation (and may actually bring a slight improvement).

Conclusion

JMH allowed us to deeply understand implementation details which were impacting execution and concurrency of our new implementation. It would have been extremely painful to try to achieve the same type of analysis without this type of tool. Thanks to the OpenJDK team for publishing it.

Finally, I've done all the heavy lifting necessary to integrate JMH in our build with the goal to continue to create more benchmarks and ensure we are providing the very best implementations we can to our community. So look for that to be introduced into the core in the coming weeks.

 

Blogs
Nice, i love JMH - finally a good microbenchmark library. However, one thing I would do differently is NOT using loops in the tests (imho thats a benchmark-anti-patternemoticon. For example, benchmark "serviceTracker_getServices()" have a iterator loop to also process 4 events. Is this really what you want to measure or **just** call to getService()? If it is just a call to getService() then remote the loop and use eg BlackHole to consume the returned service. Cheers!
The point of the test is to determine the performance of obtaining the "collection" of event handlers, and also the relative performance of their iterator implementations.

Not all iterators are equal.

In order to eliminate the effects a loop would have on the result, the loop size is fixed.

Finally, getService is specifically _not_ what I wanted to test since that is only to be used when you want to get the "one single" matching service rather than "all" matching, which is what you want for event handlers (we need to process every tracked event handler).

I wanted to compare it against our existing implementation of EventsProcessorUtil.process() method. You can't do that unless you account for iteration over the collection.

HTH
Got it. Still, I would not use the `for` loop in the benchmark tests. Loop size is not the only thing that affect the benchmark; JVM might optimize it with the loop unrolling. For example, that is why all previous benchmark tools that used the for loop for iterating benchmark code (like eg google caliper) are not correct. Instead, I would manually unroll the loop or use just one element in collection and explicitly get the first one, without any looping. That is all what I wanted to say ;)
But it must use a loop! Otherwise I cannot compare directly to the origjnal implementation (which internally uses a loop). Also, it must also return "more than one" event listener otherwise the collection behavior and iterators are not compared. If I want to have benchmarks which compare directly against the original impl they must emulate the exact behaviour.

I completely understand the loop issue. But in this test (I did ask directly the developer of JMH if the design was a problematic, and he said "No") I really need to include the loop but how the looping issue is eliminated is to ensure each loop is identically oriented such that it's unrolling does not affect the outcome of the test (each having identical unrolling means that cost is constant and therefore does not negatively impact the test).