Increasing Capacity and Decreasing Response Times Using a Tool You're Probably Not Familiar With

Introduction

When it comes to Liferay performance tuning, there is one golden rule:

The more you offload from the application server, the better your performance will be.

This applies to all aspects of Liferay. Using Solr/Elastic is always better than using the embedded Lucene. While PDFBox works, you get better performance by offloading that work to ImageMagick and GhostScript.

You can get even better results by offloading work before it gets to the application server. What I'm talking about here is caching, and one tool I like to recommend for this is Varnish.

According to the Varnish site:

Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture.

So I've found the last claim to be a little extreme, but I can say for certain that it can offer significant performance improvement.

Basically Varnish is a caching appliance.  When an incoming request hits Varnish, it will look at in it's cache to see if it has been rendered before. If it isn't in the cache, it will pass the request to the back end and store the response (if possible) in the cache before returning the response to the original requestor.  As additional matching requests come in, Varnish will be able to serve the response from the cache instead of sending it to the back end for processing.

So there are two requirements that need to be met to get value out of the tool:

  1. The responses have to be cacheable.
  2. The responses must take time for the backend to generate.

As it turns out for Liferay, both of these are true.

So Liferay can actually benefit from Varnish, but we can't just make such a claim, we'll need to back it up w/ some testing.

The Setup

To complete the test I set up an Ubuntu VirtualBox instance w/ 12G of memory and 4 processors, and I pulled in a Liferay DXP FP 15 bundle (no performance tuning for JVM params, etc). I also compiled Varnish 4.1.6 on the system. For both tests, Tomcat will be running using 8G and Varnish will also be running w/ an allocation of 2G (even though varnish is not used for the Tomcat test, I think it is "fairer" to keep the tests as similar as possible).

In the DXP environment I'm using the embedded ElasticSearch and HSQL for the database (not a prod configuration but both tests will have the same bad baseline). I deployed the free Porygon theme from the Liferay Marketplace and set up a site based on the theme. The home page for the Porygon demo site has a lot of graphics and stuff on it, so it's a really good site to look at from a general perspective.

The idea here was not to focus on Liferay tuning too much, to get a site up that was serving a bunch of mixed content. Then we measure a non-Varnish configuration against a Varnish config to see what impact Varnish can have in performance terms.

We're going to test the configuration using JMeter and we're going to hit the main page of the Porygon demo site.

Testing And Results

JMeter was configured to use 100 users and loop for 20 times.  Each test would touch on the home page, the photography, science and review pages and would also visit 3 article pages. JMeter was configured to retrieve all related assets synchronously to exagerate the response time from the services.

Response Times

Let's dive right in with the response times for the test from the non-Varnish configuration:

Response Times Without Varnish

The runtime for this test was 21 minutes, 20 seconds. The 3 article pages are the lines near the bottom of the graph, the lines in the middle are for the general pages w/ the asset publishers and all of the extra details.

Next graph is the response times from the Varnish configuration:

Response Times With Varnish

The runtime for this test was 11 minutes, 58 seconds, a 44% reduction in test time, and it's easy to see that while the non-Varnish tests seem to float around the 14 second mark, the Varnish tests come in around 6 seconds.

If we rework the graph to adjust the y-axis to remove the extra whitespace we see:

Response Times With Varnish

The important part here for me was the lines for the individual articles. In the non-Varnish test, /web/porygon-demo/-/space-the-final-frontier?inheritRedirect=true&redirect=%2Fweb%2Fporygon-demo shows up around the 1 second response time, but with varnish it hovers at the 3 second response time.  Keep that in mind when we discuss the custom VCL below.

Aggregate Response Times

Let's review the aggregate graphs from the tests.  First the non-Varnish graph:

Aggregate Without Varnish

This reflects what we've seen before; individual pages are served fairly quickly, pages w/ all of the mixed content take significantly longer to load.

And the graph for the Varnish tests:

Aggregate With Varnish

At the same scale, it is easy to see that Varnish has greatly reduced the response times.  Adjusting the y-axis, we get the following:

Aggregate With Varnish

Analysis

So there's a few parts that quickly jump out:

  • There was a 44% reduction in test runtime reflected by decreased response times.
  • There was a measurable (but unmeasured) reduction in server CPU load since Liferay/Tomcat did not have to serve all traffic.
  • Since work is offloaded from Liferay/Tomcat, overall capacity is increased.
  • While some response times were greatly improved by using Varnish, others suffered.

The first three bullets are easy to explain.  As Varnish is able to cache "static" responses from Liferay/Tomcat, it can serve those responses from the cache instead of forcing Liferay/Tomcat to build a fresh response every time.  Having Liferay/Tomcat rebuild responses each time requires CPU cycles, so returning a cached response reduces the CPU load.  And since Liferay/Tomcat is not busy rebuilding the responses that now come from the cache, Liferay/Tomcat is free to handle responses that cannot be cached; basically the overall capacity of Liferay/Tomcat is increased.

So you might be asking that, since Varnish is so great, why do the single article pages suffer from a response time degradation? Well, that is due to the custom VCL script used to control the caching.

The Varnish VCL

So if you don't know about Varnish, you may not be aware that caching is controlled by a VCL (Varnish Configuration Language) file. This file is closer to a script than it is a configuration file.

Normally Varnish operates by checking the backend response cache control headers; if a response can be cached, it will be, and if the response cannot be cached it won't. The impact of Varnish is directly related to how many of the backend responses can be cached.

You don't have to rely solely on the cache control headers from the backend to determine cacheability; this is especially true for Liferay. Through the VCL, you can actually override the cache control headers and make some responses cachable that otherwise would not have been and make other responses uncacheable even when the backend says it is acceptable.

So now I want to share the VCL script used for the test, but I'll break it up into parts to discuss the reasons for the choices that I made. The whole script file will be attached to the blog for you to download.

In the sections below comments have been removed to save space, but in the full file the comments are embedded to explain everything in detail.

Varnish Initialization

probe company_logo {
  .request =
    "GET /image/company_logo HTTP/1.1"
    "Host: 192.168.1.46:8080"
    "Connection: close";
  .timeout = 100ms;
  .interval = 5s;
  .window = 5;
  .threshold = 3;
}

backend LIFERAY {
  .host = "192.168.1.46";
  .port = "8080";
  .probe = company_logo;
}

sub vcl_init {
  new dir = directors.round_robin();
  dir.add_backend(LIFERAY);
}

So in Varnish you need to declare your backends to connect to.  In this example I've also defined a probe request used to verify health of the backend.  For probes it is recommended to use a simple request that results in a small response; you don't want to overload the system with all of the probe requests.

Varnish Request

sub vcl_recv {
  ...
  if (req.url ~ "^/c/") {
    return (pass);
  }

  if (req.url ~ "/control_panel/manage") {
    return (pass);
  }
  ...
  if (req.url !~ "\?") {
    return (pass);
  }
  ...
}

The request handling basically determines whether to hash (lookup request from the cache) or pass (pass request directly to backend w/o caching).

For all requests that start with the "/c/..." URI, we pass those to the backend.  They represent request for /c/portal/login or /c/portal/logout and the like, so we never want to cache those regardless of what the backend might say.

Also any control panel requests are also passed directly to the backend. We wouldn't want to accidentally expose any of our configuration details now would we? cheeky

Otherwise the code is trying to force hashing of binary files (mp3, image, etc) if possible and conforms to most average VCL implementations.

The last check of whether the URL contains a '?' character, well I'll be getting to that later in the conclusion...

Varnish Response

sub vcl_backend_response {

  if (bereq.url ~ "^/c/") {
    return (deliver);
  }
  
  if ( bereq.url ~ "\.(ico|css)(\?[a-z0-9=]+)?$") {
    set beresp.ttl = 1d;
  } else if (bereq.url ~ "^/documents/" && beresp.http.content-type ~ "image/*") {
    if (std.integer(beresp.http.Content-Length,0) < 10485760 ) {
      if (beresp.status == 200) {
        set beresp.ttl = 1d;
        unset beresp.http.Cache-Control;
        unset beresp.http.set-cookie;
      }
    }
  } else if (beresp.http.content-type ~ "text/javascript|text/css") {
    if (std.integer(beresp.http.Content-Length,0) < 10485760 ) {
      if (beresp.status == 200) {
        set beresp.ttl = 1d;
      }
    }
  }
  ...
}

The response handling also passes the /c/ type URIs back to the client w/o caching.

The most interesting part of this section is the testing for content type and altering caching as a result.  Normally VCL rules will look for some request for "/blah/blah/blah/my-javascript.js" by checking for the extension as part of the URI.

But Liferay really doesn't use these standard extensions.  For example, with Liferay you'll see a lot of requests like /combo/?browserId=other&minifierType=&languageId=en_US&b=7010&t=1494083187246&/o/frontend-js-web/liferay/portlet_url.js&.... These kinds of requests do not have the standard extension on it so normal VCL matching patterns would discard this request as uncacheable. Using the VCL override logic above, the request will be treated as cacheable since it is just a request for some JS.

Same kind of logic applies to the /documents/ URI prefix; anything w/ this prefix is a fetch from the document library.  Full URIs are similar to /documents/24848/0/content_16.jpg/027082f1-a880-4eb7-0938-c9fe99cefc1a?t=1474371003732.  Again since it doesn't end w/ the standard extension, the image might not be cached. The override rule above will match on all /documents/ prefix and content types of images and treat the request as cacheable.

Conclusion

So let's start with the easy ones...

  • Adding Varnish can decrease your response times.
  • Adding Varnish can reduce your server load.
  • Adding Varnish can increase your overall capacity.

Honestly I was expecting that to be the whole list of conclusions I was going to have to worry about. I had this sweet VCL script and performance times were just awesome. As a final test, I tried logging into my site with Varnish in place and, well, FAIL.  I could log in, but I didn't get the top bar or access to the left or right sidebars or any of these things.

I realized that I was actually caching the response from the friendly URLs and, well, for Liferay those are typically dynamic pages.  There is logic specifically in the theme template files that change the content depending upon whether you are logged in or not.  Because my Varnish script was caching the pages when I was not logged in, after I logged in the page was coming from the cache and the necessary stuff I needed was now gone.

I had to add the check for the "?" character in the requests to determine if it was a friendly URL or not.  If it was a friendly URL, I had to treat those as dynamic and had to send them to the backend for processing.

This leads to the poor performance, for example, on the single article display pages.  My first VCL was great, but it cached too much.  My addition for friendly URLs solved the login issue but now prevented caching pages that maybe could be pages so I swung too far again, but since the general results were still awesome I just went with what I had.

Now for the hard conclusions...

  • Adding Varnish requires you to know your portal.
  • Adding Varnish requires you to know your use cases.
  • Adding Varnish requires you to test all aspects of your portal.
  • Adding Varnish requires you to learn how to write VCL.

The VCL really isn't that hard to wrap your head around.  Once you get familiar with it, you'll be able to customize the rules to increase your cacheability factor without sacraficing the dynamic nature of your portal.  In the attached VCL, we add a response header for a cache HIT or MISS, and this is quite useful for reviewing the responses from Varnish to see if a particular response was cached or not (remember the first request will always be a MISS, so check again after a page refresh).

I can't emphasize the testing enough though.  You want to manually test all of your pages a couple of times, logged in and not logged in, logged in as users w/ different roles, etc., to make sure each UX is correct and that you're not bleeding views that should not be shared.

You should also do your load testing.  Make sure you're getting something out of Varnish and that it is worthwhile for your particular situation.

Note About SSL

Before I forget, it's important to know that Varnish doesn't really talk SSL, nor does it talk AJP.  If you're using SSL, you're going to want to have a web server sitting in front of Varnish to handle SSL termination.

And Varnish doesn't talk AJP, so you will have to configure for HTTP connections from both the web server and the app server.

This points toward the reasoning behind my recent blog post about configuring Liferay to look at a header for the HTTP/HTTPS protocols.  In my environ I was terminating SSL at Apache and needed to use the HTTP connectors to Varnish and again to Tomcat/Liferay.

Although it was suggested in a few of the comments that separate connections could be used to facilitate the HTTP and HTTPS traffic, etc., those options would defeat some of the Varnish caching capabilities. You'd either have separate caches for each connection type (or perhaps no cache on one of them) or other unforseen issues. Being able to route all traffic through a single pipe to Varnish will ensure Varnish can cache the response regardless of the incoming protocol.

Update - 05/16/2017

Small tweak to the VCL script attached to the blog, I added rules to exclude all URLs from /api/* from being cached.  Those are basically your web service calls and rarely would you really want to cache those details.  Find the file named localhost-2.vcl for the update.

Blogs
Do you know by chance if it is perfectly safe to cache the whole theme? (Of course, theme deployments necessitate clearing the cache, no big deal)

It bugs me somewhat that Liferay appends a lot of stuff to various urls, e.g. the browserId:

aui.css?browserId=firefox&themeId=...

Currently I am just ignoring those parameters and caching theme css, js, ... unconditionally. I remember, I did a quick look into the code once and came to the conclusion that it is safe. Especially with 7.0+, since the theme is build now with gulp. But I might have missed something.

I usually use nginx because I know it quite well and it has the advantage that it can handle ssl. With the Varnish solution I would still need to use an nginx causing an extra hop. And I don't need the flexibility of Varnish, nginx is a quite powerful reverse proxy and cache server too. Of course, in case you have tested it and Varnish beats nginx, I am all ears. :-)
Well, the theme includes the FM templates and those, of course, are dynamic and are used during page construction in the portal; it's one of the reasons that I had to treat friendly URLs as uncacheable, because of the content generated into the pages depending upon whether you are logged in or not (and what roles you might have).

The extra parms are sometimes important. For example, if you have to code up something to work against different browser types, the browser id allows you to manage that. The theme id is, of course, the selected theme to use for the page, and the others have their own purposes. Ignoring these args is only safe if you know that you're not needing to have anything dynamic based on their values.

Choosing nginx is fine; I think it depends most upon a) which you might have more experience and/or documentation to support and b) which might offer you necessary flexibility in controlling caching.

For example, if I know that I'm using current user role to customize generated javascript, I know I can put in an appropriate rule for Varnish to skip caching that particular asset yet still cache the rest. Nginx might have the same available features, I'm just not that familiar with it.

If you are, I'd encourage you to write a blog post about how to front Liferay/Tomcat w/ Nginx as that may prove to be useful to the community.
I think you misunderstood me. I was talking about the files delivered by the paths /my-theme /LR 6.2) or /o/my-theme (LR 7). Those files seem to be static. The template files are never send from there, so it doesn't matter what they contain.

I usually add a rule to nginx to cache the whole path unconditionally. Never hat any problems with it.
As I said, I looked in the sourcecode, LR does some search & replace in the files but I deemed it harmless/cacheable. But still, it would be nice to have a confirmation. In my opinion, those browser parameters are legacy. But a second opinion would be nice.


I tried to cache pages too, but I couldn't do it because in most environments I can't tell if a user is authenticated. I would need something like the userId in a cookie or header to cache per user.


Btw.: Removing the cache directive on /documents is pretty dangerous. You allow proxies to cache the file. So, a company proxy might cache supersecretfile.doc and deliver it to anybody who requests it. Might be fine in your case, but in general you must not do that.

I usually cache only certain public folder paths and add them to the configuration one by one.


Nginx vs. Varnish: Well, Varnish is more powerful/flexible since it was developed to be a cache server. But Nginx is more than sufficient for most usecases. It's better than Apache IMHO, Apache is powerful too, but very, very difficult to configure. I guess, your usecase would be doable in Nginx, it's not to difficult to configure stuff on a "per path" basis.

About blogging: I will think about it, but usually things like blogging (and alas, forum) end up on the "when I have some spare time ..." stack.
Well, I can't tell you (or anyone) it is okay to cache based on theme path. It really does depend upon whether you have browser-specific stuff in your theme. Your theme might not have browser-specific stuff in it and therefore it may be okay to ignore the browser id, but the next five folks reading the blog and comments might have theme developers who have made such changes.

That's why I have the bullet item, "Adding Varnish requires you to know your portal." You can't just guess at whether to cache or not, you have to know.

Allowing caching on /documents can be dangerous, and my commented VCL form points out the issue. Liferay determines permissions based upon who a user is and whether they have view rights or not. I added an override to cache images regardless of what Liferay specified, thus potentially ignoring Liferay security. Why? Well when you look at the urls that come in not only do they have /documents/.../my-image.jpg, but the path continues with /<a big UUID value>/... So basically I'm allowing "security by obscurity". Is this something everyone should do on their portal? Well, this is also a case of needing to know your portal; you have to balance the ability to cache images with the possible security exposure that might come with it.

Note I was only overriding caching for /documents/ that had image/* mime types, so it was not set up to be a blanket cache override for all of your Docs And Media. That definitely is not going to be a good idea, unless of course you "know your portal" and have basically a public-only facing site with documents that are never "supersecretfile.doc" emoticon
Um, I still don't get it. How would I add browser specific stuff to my theme and do something e.g. with the browserId? All resources css, js, ... are delivered by Liferay, not by "me". Of course, I could write something like a servlet filter, but I that's not part of the theme for me. Do I miss something here?

Not sure what you mean with the uuid part of the url. You can simply remove the uuid from the url and it will still work (well, most of the time, there are some special cases like moving the file to a different folder).

Anyway, I really like to cache the documents folder, it's great to deliver images from the webserver. They really excel there. I just wanted to emphasize that one has to be really careful with caching or manipulating cache headers. You have to make sure that you only cache "public" content.
I haven't traced through all of the code to see how it works, but if you check out AggregateFilter you can see that they are doing some magic to CSS when the browser is not IE. There may be other cases where CS and/or JS are allowing for built-in modification by the core.

UUID can be stripped, but like I said, I'm relying on security by obscurity and didn't say this was a good fit for everyone. I think there can be valid cases where you may want to allow caching of images rather than forcing the portal to do security checks every time.

And I wholeheartedly agree w/ only caching public content, especially if you have sensitive images or docs to protect. Anytime you override Liferay's cache control headers, you really need to understand the ramifications and make sure it's an optimization that works for your particular portal implementation.

Caching the whole docs folder though caries risk that other readers should take note of; Liferay does the permission checks on whether something from Docs & Media is accessible, and if you let the web server/Varnish cache and return on it's own, you're bypassing the permission checks. Sometimes this can be okay, sometimes not so much.

You must know your portal to know what will work for you.
Mmmh, yes, I know. As far as I can tell, AggregateFilter removes just one style in gulp themes. I don't remember details for LR 6.2, but I we deemed it safe to ignore cache the theme files unconditionally. Well, at least for our themes.

IMHO this whole browserId stuff is legacy and unnecessary with gulp. Add an autoprefixer and be done with it.
If you're doing a clean theme rewrite, this might be an option. But if you're dealing with a theme upgrade (or a skills upgrade for your team), you might need to keep relying upon the browser id handling.
Hello David,

I'm a rookie using Varnish emoticon and I have used your localhost(2).vcl file. I have some problems because the URLs are not cache, the "Age" value is always 0 and the "X-Cache" value is always a MISS.
I have debugged and I have found that the code always does a return pass in this code.
# liferay friendly urls typically represent dynamic content.
# if the url is friendly, let's go straight to fetch
if (req.url !~ "\?") {
return (pass);
}
My portal has URLs like these http://localhost/es/web/guest/news or http://localhost/es/web/guest/services. I think that these URLs types should be cache but the file has that code.
I do not understand this code, why are not these URLs types (http://localhost/es/web/guest/news) cache?.

Thank you very much!.
David,
Thanks a ton for this post and the reference .vcl file. We've found this to be rather effective in production environments. Although, we did come across an issue that may be worth reviewing for the reference .vcl attached. THere are a few lines that we found to increase the overall header size dramatically when long URLs were in play such as when editors are in the control panel. We removed the following lines.

- # Create a header to capture the full URL
- set req.http.X-Full-Uri = req.http.host + req.url;
- set req.http.X-Full-Url = req.url;

and modified line 210...

- hash_data(req.http.X-Full-Url);
+ hash_data(req.url);

It was hard to debug but the users were getting an apache message stating "Bad Request your browser sent a request that this server could not understand" This was happening when the URL string itself was roughly 2600 characters. The lines in this VCL above would then take that url length and add it two more times to the header so now our HTTP header as a whole was going over the default 8K limit that Apache has set for max header length.

Thanks again for the work here, please post back if the removal of those lines seems problematic to you in any way.
Thanks, Orin! I haven't hit that problem so far, but maybe I've just been lucky emoticon

The X-* headers are optional; they are not necessary for the actual request handling. They can be useful if you're doing something on the backend with those original values.

I should note that I use Apache <-> Varnish <-> Tomcat, so the Varnish additions to the headers for me don't really affect Apache at all. But I don't claim that my setup is the best, it's just one that has worked for me thus far.

As you've found a good working solution, I'd say stick with it!
[...] Giles Westwood: Large journalarticles like we have are poor performance wise when read from the database, therefore we would like them to always be read from ehcache regardless of which node a user... [...] Read More

In my experience is not a good choice to put a cache server before Liferay, to process requests going to Liferay. Apart theme resources, that "must" be availables anonymoulsly, all the other URIs could be controlled by liferay permissions and you have to really really know how Liferay works to tune the URIs to cache.

I think the right approach is to use a CDN, which Liferay supports ootb.

You could setup a Varnish, or a Squid, as an "in-house CDN" and have it serve all the resources Liferay knows could be available anonymously

Hi David,

 

first of all, thanks for this blog entry, it contains a lot of usefull information, needed to understand how Varnish works and how it could be configured.

 

Only one thing, I can't find the  VCL script mentioned as a reference (localhost-2.vcl on the 05/16/2017 update). Did you remove it from the post? If so, can you please upload it again? 

 

Thanks in advance.