Why we need a new liferay-npm-bundler (1 of 3)

What is the problem with bundler 1.x

This is the first of a three articles series motivating and explaining the enhancements we have done to Liferay's npm bundler. You can learn more about it in its first release blog post.

How bundler 1.x works

As you all know, the bundler lets you package your JS files and npm packages inside Liferay OSGi bundles so that they can be used from portlets. The key feature is that you may use a standard npm development workflow and it will work out of the box without any need for complex deployments or setups.

To make its magic, the bundler grabs all your npm dependencies, puts them inside your OSGi bundle and transforms them as needed to be run inside portlets. Among these transformations, one of the most important is converting from CommonJS to AMD, because Liferay uses an AMD compliant loader to manage JS modules.

Once your OSGi bundle is deployed, every time a user visits one of your pages, the loader gets information about all deployed npm packages, resolves your dependency tree and loads all needed JS modules.

For example: say you have a portlet named my-portlet that once started loads a JS module called my-portlet/js/main and ultimately, that module depends on isarray npm package to do its job. That would lead to a project containing these files (among others):


package.json
    {
        "name": "my-portlet",
        "version": "1.0.0",
        "dependencies": {
            "isarray": "^1.0.0"
        }
    }
node_modules/isarray
   
package.json
        {
           "name": "isarray",
           "version": "1.0.1",
           "main": "index.js"
        }
META-INF/resources
   
view.jsp
        <aui:script require="my-portlet@1.0.0/js/main"/>
   
js
        main.js
            require('isarray', function(isarray) {
                console.log(isarray([]));
            });


Whenever you hit my-portlet's view page the loader looks for the my-portlet@1.0.0/js/main JS module and loads it. That causes main.js to be executed and when the require call is executed (note that it is the AMD require, not the CommonJS one) the loader gets information from the server, which has scanned package.json, to determine the version number of isarray package and find a suitable version among all those deployed. In this case, if only your bundle is deployed to the portal, main.js will get isarray@1.0.1, which is the version bundled inside your OSGi JAR file.

What if we deploy two portlets with shared dependencies

Now imagine that one of your colleagues creates a portlet named his-portlet which is identical to my-portlet, but because he developed it later, it bundles isarray at version 1.2.0 instead of 1.0.1. That would lead to a project containing these files (among others):


package.json
    {
        "name": "his-portlet",
        "version": "1.0.0",
        "dependencies": {
            "isarray": "^1.0.0"
        }
    }
node_modules/isarray
    package.json
        {
            "name": "isarray",
            "version": "1.2.0",
            "main": "index.js"
        }
META-INF/resources
   
view.jsp
        <aui:script require="his-portlet@1.0.0/js/main"/>
    js
       
main.js
            require('isarray', function(isarray) {
                console.log(isarray([]));
            });


In this case, whenever you hit his-portlet's view page the loader looks as before for the his-portlet@1.0.0/js/main JS module and loads it. Then the require call is executed and the loader finds a suitable version. But now something has changed because we have two versions of isarray deployed in the server:

  • 1.0.1 bundled inside my-portlet
  • 1.2.0 bundled inside his-portlet

So, which one is the loader giving to his-portlet@1.0.0/js/main? As we said, it gives the best suitable version among all deployed. That means the newest version satisfying the semantic version constraints declared in package.json. And, for the case of his-porlet that is version 1.2.0 because it satisfies semantic version constraint ^1.0.0.

Looks like everything is working like with my-porlet, doesn't it? Well, not really. Let's look at my-portlet again, now that we have two versions of isarray. In my-portlet's package.json file the semantic version constraint for isarray is ^1.0.0 too, so, what will it get?

Of course: version 1.2.0 of isarray. That is because 1.2.0 better satisfies ^1.0.0 than 1.0.1 and, in fact, it's similar to what npm would do if you rerun npm install in my-portlet as it will find a newer version in http://npmjs.com and will update it.

Also, this strategy will lead to deduplication of the isarray package and if both my-portlet and his-portlet are placed in the same page, only one copy of isarray will be loaded in the JS interpreter.

But that's perfect! What's the problem, then?

Although this looks quite nice, it has some drawbacks. One is already seen in the example: the developer of my-portlet was using isarray@1.0.1 in his local development copy when he bundled it. That means that all tests were done with that version. But then, because a colleague deployed another portlet with an updated isarray his bundle changed and decided to use a different version which, even if it is declared semantically compatible, may lead to unexpected behaviours.

Not only that, but the fact that version 1.0.1 or 1.2.0 is loaded for my-portlet is not decided in any way by the developer of my-portlet and changes depending on what is being deployed in the server.

Those drawbacks are very easy to spot, but if we look in depth, we can find two subtler problems that may lead to unstabilities and hard to diagnose bugs.

Transitive dependencies shake

Because the bundler 1.x solution decides to perform aggressive semantic version resolution, the dependencies graph of any project may lead to unexpected results depending on how semantic version constraints are declared. This is specially important for what I call framework packages, as opposed to library packages. This is not a formal definition, but I refer to framework packages when using npm packages that are supposed to call the project's code, while library packages are supposed to be called from the project's code.

When using library packages, a switch of version is not so bad, because it usually leads to using a newer (and thus more stable) version. That's the case of the isarray example above.

But when using frameworks, you usually have a bunch of packages that are supposed to cooperate together and be in specific versions. That, of course, depends on how the framework is structured and may not hold true for all of them, but it is definitely easier to have problems in a dependency graph where some subset of packages are tightly coupled than in one where every package is isolated and doesn't worry too much about the others.

Let's see an example of what I'm referring to: imagine you have a project using the Non Existing Wonderful UI Components framework (let's call it WUI). That framework is composed of 3 packages:

  1. component-core
  2. button
  3. textfield

Packages 2 and 3 depend on 1. And suppose that package 1 has a class called Widget from which Button (in package 2) and TextField (in package 3) extend. This is a usual widget based UI pattern, you get the idea. Now, let's suppose that Widget has this check somewhere in its code:

Widget.sayHelloIfYouAreAWidget = function(widget) {
    if (widget instanceof Widget) {
        console.log('Hey, I am a widget, that is wonderful!');
    }
};

The function tests if some object is extending from Widget by looking at its prototype and says something if it holds true.

Now, say that we have two portlet projects again: my-portlet and his-portlet (not the ones we were using above, but two new portlet projects that use WUI) and their dependencies are set like this:

my-portlet@1.0.0
    ➥ button 1.0.0
    ➥ textfield 1.2.0
    ➥ component-core 1.2.0

his-portlet@1.0.0
    ➥ button 1.5.0
    ➥ textfield 1.5.0
    ➥ component-core 1.5.0

In addition, the dependencies of button and textfield are set like this:

button@1.0.0
    ➥ component-core ^1.0.0

button@1.5.0
    ➥ component-core ^1.0.0

textfield@1.2.0
    ➥ component-core ^1.0.0

textfield@1.5.0
    ➥ component-core ^1.0.0

If the two portlets are created at different times, depending on what is available at http://npmjs.com, you may get the following versions after npm install:

my-portlet@1.0.0
    ➥ button 1.0.0
        ➥ component-core 1.2.0
    ➥ textfield 1.2.0
        ➥ component-core 1.2.0
    ➥ component-core 1.2.0

his-portlet@1.0.0
    ➥ button 1.5.0
        ➥ component-core 1.5.0
    ➥ textfield 1.5.0
        ➥ component-core 1.5.0
    ➥ component-core 1.5.0

This assumes that the latest version of component-core available when npm install was run in my-portlet was 1.2.0, but then it was updated and by the time that his-portlet ran npm install the latest version was 1.5.0.

What happens when we deploy my-portlet and his-portlet?

Because the platform will do aggressive deduplication you will get the following dependency graphs:

my-portlet@1.0.0
    ➥ button 1.0.0
        ➥ component-core 1.5.0 (✨ note that it gets 1.5.0 because `his-portlet` is providing it)
    ➥ textfield 1.2.0
        ➥ component-core 1.5.0 (✨ note that it gets 1.5.0 because `his-portlet` is providing it)
    ➥ component-core 1.2.0 (✨ note that the project gets 1.2.0 because it explicitly asked for it)

his-portlet@1.0.0
    ➥ button 1.5.0
        ➥ component-core 1.5.0
    ➥ textfield 1.5.0
        ➥ component-core 1.5.0
    ➥ component-core 1.5.0

We are almost there. Now imagine that both my-portlet and his-portlet do this:

var btn = new Button();
Widget.sayHelloIfYouAreAWidget(btn);

Will it work as expected in both portlets? As you may have guessed, the answer is no. It will definitely work in his-portlet but in the case of my-portlet the call to Widget.sayHelloIfYouAreAWidget won't print anything because the instanceof check will be testing a Button that extends from Widget at component-core@1.5.0 against Widget at component-core@1.2.0 (because the project code is using that version, not 1.5.0) and thus will fail.

I know this is a fairly complicated (and maybe unstable) setup that can ultimately be fixed by tweaking the framework dependencies or changing the code, but it is definitely a possible one. Not only that, but there's no way for a developer to know what is happening until he deploys the portlets and, even if a certain combination of portlets works now, it could fail after if a new portlet is deployed.

On the contrary, in a scenario where the developer was using a standard bundler like webpack or Browserify the final build would be predictable for both portlets and would work as expected, each one loading its own dependencies. The drawback would be that with standard bundlers there's no way to deduplicate and share dependencies between them.

Diverted peer dependencies

Let's see another case where the bundler 1.x cannot satisfy the build expectations. This time it's with peer dependencies. We will again use two projects named my-portlet and his-portlet with the following dependencies:

my-portlet@1.0.0
    ➥ a-library 1.0.0
    ➥ a-helper 1.0.0

his-portlet@1.0.0
    ➥ a-library 1.0.0
    ➥ a-helper 1.2.0

At the same time, we know that a-library has a peer dependency on helper ^1.0.0. That is:

a-library@1.0.0
    ➥ [peer] a-helper ^1.0.0

So, in both projects, the peer dependency is satisfied, as both a-helper 1.0.0 (in my-portlet) and 1.2.0 (in his-portlet) satisfy a-library's semantic version constraint ^1.0.0.

But now, what happens when we deploy both portlets to the server? Because the platform aggressively deduplicates, there will only be one a-library package in the system making it impossible that it depends on a-helper 1.0.0 and 1.2.0 at the same time. So the most rational decision -probably- is to make it depend on a-helper 1.2.0.

That looks OK for this case as we are satisfying semantic version constraints correctly, but we are again changing the build at runtime without any control on the developer side and that can lead to unexpected results.

However, there's a subtler scenario where the bundler doesn't know how to satisfy peer dependencies and it's when peer dependencies appear in a transitive path.

So, for example, say that we have these dependencies:

my-portlet@1.0.0
    ➥ a-library 1.0.0
    ➥ a-sub-helper 2.0.0

a-library@1.0.0
    ➥ [peer] a-helper >=1.0.0

a-helper@1.0.0
    ➥ a-sub-helper 1.0.0

Now suppose that a-library requires a-sub-helper in one of its modules. In this case, when run in Node.js, a-library receives a-sub-helper at version 2.0.0, not 1.0.0. That's because it doesn't matter that a-library peerdepends on a-helper to resolve a-sub-helper, but a-sub-helper is simply resolved from the root of the project because a-library is not declaring it as a dependency, but just relying on a-helper to provide it.

But this cannot be reproduced in Liferay, because it needs to know the semantic version constraints of each dependency package as it doesn't have any node_modules where to look up for packages. We could fix it by injecting an extra dependency in a-library's package.json to a-sub-helper 2.0.0 but that would work for this project, not for all projects deployed in the server. That is because, as we saw in the previous deployment in this same section, there's only one a-library package for everybody, but at the same time we can have several projects where a-sub-helper resolves to a different version when required from a-library.

In fact, we used this technique for Angular peer dependencies by means of the liferay-npm-bundler-plugin-inject-angular-dependencies plugin and it used to work if you only deployed one version of Angular. But things became muddier if you deployed several ones.

For these reasons, we needed a better model where the whole solution could be grown in the future. And that need led to bundler 2.x where we have -hopefully- created a solid foundation for future development of the bundler.

If you liked this, head on to the next article where we explain how bundler 2.x addresses these issues.

Blogs