Mounting Multiple CMIS Repositories on Liferay 6.1

Last year, there was a huge buzz about this new protocol call CMIS that just was released at 1.0.  Much like JCR, CMIS is a protocol that allows interoperability for document repositories of different systems (there is a good blog post about different use cases that can be found here).  The major advantage of CMIS over JCR is, well, it is not bound to Java.  So there are libraries for python, PHP, .NET, etc.  It also runs on top of standard web protocols like AtomPub or WebServices.

(The first section is just a bit of history.  So if you just want to know what's happening in version 6.1, skip the first section.)

The Famed CMISHook

So, in Liferay 6.0, I was tasked with allowing Liferay to hook into CMIS as a means to store its document library data.  If you have seen our JCRHook implementation, it is basically the same concept -- the CMISHook was used as a means to store our document library's low-level data.  But then, if you went to your CMIS Repository, you would have all these numbers show up in your system which make no sense whatsoever to the average user.  And if you changed anything in that repository, it would screw up Liferay because things were not synchronized.

Now WHY, some have asked, would you ever want something like this?  Doesn't this defeat the whole purpose of CMIS?  The answer to the second question is no.  The first question is much like asking why we have a JCRHook -- or an S3Hook, for that matter.  The main point is that, for some environments, you want to scale their systems in a way that a simple FileSystemHook will just not do.  It is like asking if you want to store your files on a thumbdrive, network drive or your new Thunderbolt hard disk.  This is what CMISHook did -- it gave users another option.

Liferay 6.1: Mounting CMIS Repositories

Of course, in Liferay, we were aware that the simple CMISHook was just a first step.  The next step was to completely redesign the Document Library to support multiple repositories mounted for each document library portlet.  Sergio in the Madrid office and I have been working on this redesign and now we can show a little of what it looks like.  On a Mac, this is akin to mounting my iDisk or a network drive to my Finder.  But the thing is, this is not simply CMIS -- the vision for 6.1 EE is to mount many vendor-specific repositories like Sharepoint and Documentum.  Instead of swapping my SATA drive with an SSD, you are given the option to put a SATA + SSD + FireWire.  So in 6.1, multiple repositories can be mounted to one document library -- CMIS is just the first of these.

While this is still relatively new and only in trunk, let me show you how it works.

Step 1: Credentials

The first thing you need to know is credentials.  In order to log into a CMIS repository, we need to basically pass the credentials you used in Liferay through to CMIS.  So, you need to make sure to set your portal.properties to allow Liferay to store your password in the session:

session.store.password=true 

Next, you need to make sure that the means in which you login to Liferay is the same as for your repository.  For most, this means that you need the same screenname.  So, in portal.properties, I have: 

company.security.auth.type=screenName

Of course, what this means is that if I log into, say Nuxeo using "alex" and "secretpassword", then I have to login to Liferay with those same credentials as well.  Most people would have some kind of an LDAP or something like this anyhow, so that should be fine.  Without the same credentials, obviously, you will have a principal exception and your users will be complaining to you about why they can't see their data.. you don't want that.

Step 2: Mounting Your Repository

OK, in the Document Library control panel, you will see an "Add Repository" action.  After clicking that, you will be given a form that looks like you are adding a folder -- but instead, you are adding a new repository.  (Incidentally, we made it so you can mount a repository in any folder in the Document Library -- the root level, or your sub-sub-subfolder.  As long as it is not in another third-party repository, you are fine.)

In this example, the repository type is set to CMIS AtomPub, but you can use WebServices if you like (it just has a whole lot more parameters to choose from).  For CMIS, you need to fill in all entries -- but the repositoryId is not required.  If you do not enter a repositoryId, then it will just look for the first repository using the given parameters and set it to that -- many systems only have one.

Step 3: Enjoy!

OK, after doing that, Liferay will try to talk to the other server and verify its connection.  Assuming everything goes well, a new repository is added to your list of folders.  Below, I have mounted the same Alfresco server -- once via AtomPub and again via Web Services.

Now, automatically, what you will notice is all the data that is stored at the Alfresco repository has been linked into Liferay.  Just to verify, you can look at the files and folders in Liferay and compare them to Alfresco.  Here's Liferay:

And here's Alfresco:

Obviously, any CRUD operation you do on Alfresco will be reflected on Liferay and vice versa.

CMIS Superstar?

For many, CMIS is perhaps the best thing since sliced bread.  And in fact, as an interoperability protocol, it is pretty darn good.  HOWEVER, all protocols come with shortcomings.  Remember, CMIS is quite new (it is only at 1.0 right now) and has a lot of room for growth.  A couple things to be aware of before you throw away that bread slicer you got for Christmas..

  • CMIS does not give vendor-neutral specifications for many features found in Liferay or other repositories... like workflow.  CMIS does not yet specify how a workflow is to be started and its different stages, etc.  This is something that is in the discussion for CMIS v2.0.  So, if you noticed, I mentioned Sharepoint and Documentum.. both of which are supposed to have support CMIS.  The reason why we are building out vendor-specific repositories for 6.1 EE is because CMIS does not solve all the woes of integrating legacy systems into Liferay.
  • Another fundamental item that is not supported by CMIS is a vendor-neutral way of managing metadata.  There is no adhoc metadata, tags/categories, etc.  Nuxeo, for example, stores their data as part of the CMIS file's properties using Dublic Core notation.  Alfresco, on the other hand, builds things in the "extended" space of CMIS properties and brings in their proprietary "aspects."  There are proposals out there for having a fundamental feature as metadata included in at least CMIS v1.1 (see here and here).  But as of today, this does not exist. We are hoping to build out some of these vendor-specific attributes ontop of CMIS, but there are so many vendors out there that have their own way of supporting the standard.  Just Nuxeo and Alfresco alone have quite different implementations.
  • Another thing you will notice is speed.. or the lack thereof.  If you are going to use everything on another server and have it translated into AtomPub or WebServices, over the wire to another server that has to translate it back to its native format... yah, it will get a performance hit.  I mean, within our code, I try to cache as much information as possible, but it is still noticeably slower on my system (of course, I am running multiple servers on my non-thunderbolt-equipped MacBook Pro).  It is like when I backup my wife's computer -- I always plug in and never do it over the WiFi.

Conclusion

All in all, CMIS is not a bad protocol.  In fact, it is an excellent protocol.  But, like all protocols, there is always a tension between the generic protocol and the genius of different vendors, trying to solve different problems for their customer's needs.

For us, we have gotten quite a few requests to support legacy repositories.  We have no problem with that.  In fact, that is the whole point of a portal – as an aggregator of information from vastly different technologies.  However, it required a complete overhaul of the backend to do it (the 6.1 document library API is VERY different.. but you can't tell, can you?).  Hence, the addition of CMIS as a separate repository in our document library, just adds to the greater ecosystem Liferay supports.  Customers like that.  And therefore, we do too.

Give her a spin and let me know how it goes.  Thanks for reading.

 

ブログ
Great news! It would be great to hear how the demand for the CMISHook looks after this new functionality is released. Would you be willing to make a prediction, perhaps? ;-)

Also, is anyone from Liferay engaged with the CMIS OASIS technical committee? Most of the limitations of CMIS 1.0 raised in the post are well known to the TC and being considered for inclusion in future versions, but it would be great to also have client implementers (such as Liferay) engaged in the process to ensure the spec meets their needs.
Very nicely explained and I have to say its very innovative. One doubt i have is about permissions. Will the permissions be at liferay's end and the repository will just act to its name thats is storing documents or is it necessary to have permission in the repository to access the documents?
@Peter The concerns I have raised in this post are well known in OASIS. Actually, if you look at the links I posted, they all link to OASIS public documents.

@Sandeep The CMIS protocol specifies a set of permissions that, within Liferay, is translated over. All the permissions are managed from the third-party repository's end.
Hi Alexander,

The CMIS performance can actually be pretty good.

I think you are using OpenCMIS under the hood. Reuse the Session object as much as you can. It does a lot of caching for you and tries to avoid unnecessary calls to the repository. Especially, the AtomPub binding benefits from it. OpenCMIS is thread-safe. It's no problem to share the Session or other CMIS objects across threads.
Also, only fetch the properties you really need. Some repositories send lot of properties by default, which bloat the messages and reduce the performance. Some CMIS properties have to be calculated on the repository side, which can take time. If you filter them out with an OpenCMIS OperationContext, you gain performance.
Have a look at the CMIS Workbench that is part of OpenCMIS. There is no noticeable delay when you navigate through the folder hierarchy.
@Peter I was part of it last year but there were some changes on our status that I'm not too clear on.

@Alexey Definitely. Give her a try and let me know what you think. You can email me directly if you like: alexander.chow { at } liferay.com.

@Florian Good to hear from you and your thoughts on performance. I am very much aware that running things via the workbench can be quite snappy. If you like, we can have more of a technical discussion offline (not sure if this is the best forum for clarifying technical problems) -- feel free to email me.

Briefly, for the OperationContext, though this is true, the bulk of the properties are needed. I mean, fundamentally, to the end user, we are trying to portray the third-party repository to look and feel like a Liferay repository that has many properties stored in the database and other things in the filesystem or something like that. So, there is not much filtering that can occur.

In regards to the reuse of sessions, the difficulty comes when we have a multi-user environment (as compared to the workbench). Each session must be configured different for each user. So, if you have 100 logged in users, you need to keep that many sessions opened, with their respective caches. When things are managed all within Liferay's own repository, we have the ability to cache based on the roles of the users which, obviously, shrinks the number of individual caches stored. And when we load a page, we load not only the contents of a folder, but also statistics for its subfolders and sub-subfolder listing and other such things -- none of which are retrieved when we are using the workbench. Instead, we now require multiple queries to address this across the wire.

There are undoubtedly areas which we can improve on our end. Maybe we can configure it so a lot of these extra statistics and random information can be removed. But, at present, that is the current experience our users are used to without the same speed hit.

Anyhow, as I said, if you would like to discuss this further, drop me an email or maybe we can arrange a call.
Hi Alexander !

It is not clear for me if we can configure multiple DL portlets instance with each instance pointing (for example) to a separate Alfresco servers...

In fact, in a possible scenario, we can aggregate in a single portal the contents contained in various Alfresco servers, of course instantiating the same quantity of DL portlets, each one in a different community...

Thank you for your carification !

Ivano C.
Hi Ivano,

I am not quite sure I understand your question. But ultimately, in one DL portlet, you can point to multiple servers -- each mounted as a separate repository. In my example above I simulate having two Alfresco servers (one via CMIS AtomPub and another via CMIS Web Services) mounted to the same DL portlet.

Hope that clarifies things!

Alex
Hi Alex,

I wanted to give it a try, but it is impossible to even connect from liferay via alfresco share http://issues.liferay.com/browse/LPS-15543 ... I spent half a day on this issue. I simply can't figure that out.... whatever I tried I ended up with this error...

I'd really appreciate if you could take a look at it... I feel hopeless

http://issues.alfresco.com/jira/browse/ALF-7423
Hi Alex, got error when adding repository (at revision 74180, created account admin/admin with role Administrator, and log in as admin/admin).

04:46:03,851 ERROR [RepositoryServiceImpl:305] com.liferay.portal.security.auth.PrincipalException: org.apache.chemistry
.opencmis.commons.exceptions.CmisRuntimeException: Unauthorized
com.liferay.portal.security.auth.PrincipalException: org.apache.chemistry.opencmis.commons.exceptions.CmisRuntimeExcepti
on: Unauthorized
at com.liferay.portal.repository.cmis.CMISRepository.processException(CMISRepository.java:1408)
at com.liferay.portal.repository.cmis.CMISRepository.initRepository(CMISRepository.java:619)
at com.liferay.portal.service.impl.RepositoryServiceImpl.createRepositoryImpl(RepositoryServiceImpl.java:415)
at com.liferay.portal.service.impl.RepositoryServiceImpl.mountRepository(RepositoryServiceImpl.java:302)
Hi Alex. I'm the development lead for IBM's CMIS servers. Would you be interested in doing some interoperability testing if I could give you access to a test IBM CMIS P8 cloud system. I'm just looking for you to point your client at my public server and see if there are any compatibility issues. We do this with a lot of CMIS client vendors (e.g. Apache Chemistry ) and we have found that is makes both clients and implementers better. Sometimes we even end up tightening up the spec as a result of gray areas that we uncover.

Looking forward to hearing from you. You will find my correct email address in your database under this account.

Thanks in advance,

Jay Brown
IBM
Hi Jay, Thanks. I want to test IBM CMIS P8, too. In LR 6.1, OpenCMIS Apache Chemistry should get supported 100%.

Eventually it would be nice that the CMIS of Day Software, Dennis Hamilton, EMC, FatWire, Microsoft, Open Text,
Oracle, and SAP could be tested in details in Liferay 6.1.
@Jakub In general, Liferay has provided a CMISHook since 6.0 and, in the upcoming 6.1 (not yet released), we provide mounting of multiple repositories -- CMIS being one of them via CMISRepository. I'll respond more in your message board post.

@Jonas From the looks of your exception, it looks like you have an authentication exception on the Alfresco end. Did you check their logs? Also, make sure you can hook into it using the CMIS workbench (http://incubator.apache.org/chemistry/cmis-workbench.html). If you can't there, you definitely won't be able to via Liferay.

@Jay That would be brilliant! P8 is what of the repositories I am tasked with testing against so that is exactly what I would need. Drop me an email -- alexander.chow { at } liferay.com.
Hi Alex,

Awesome feature! Its nice to be able to give end users the consistent look and feel of Liferay with a powerful backend repository like Alfresco.

However I am noticing the performance hit that you mentioned quite a bit. I have Alfresco running on a pretty hefty dedicated system and it still takes 5 to 10 seconds for files and folders to show up. The Alfresco process is pretty pegged whenever Liferay is pulling up files and folders. Hopefully some of the things Florian mentioned can help with this.

Thanks,
Jamie
@Jamie Florian and I have had quite a few emails about improving the performance. I have taken a few steps at boosting the performance which you should be able to see as of rev 74447. No matter what, the first time a logged in user accesses a given folder, it will take a little bit to load for the data to be cached.

I can't comment on how well Alfresco performs because I don't use Alfresco on a regular basis. But now, I have minimised the number of calls I am making over the wire so it may improve for you.

Let me know how things go.

Alex
Yeah minimising the number of round trips is the #1 priority from a performance perspective. If a single call is found to be slow then the usual Alfresco performance tuning and analysis process can be followed (a service Alfresco provides, fwiw).
does anyone have experience with the combination of Liferay 6.1, Alfresco, CMIS and CAS? i think it is not possible because the password has to be saved... any ideas?
How will authentication to the external repository work if we use SSO, like CAS, with the portal? In this case the portal never gets to see the password, so cannot store it to send on to the repository.

The only thing I can think of is that somehow the external repository will forward the request to the CAS server and receive an auth ticket behind the scenes. This is a clear shot in the dark!

Other than this, I am very excited about this support and cannot wait to try it out!
@Hajo @Drew Yeah the question of CAS (and other SSO systems) is something that is in the back of my mind. The question really is whether we can either (1) use the ticket used to log into the portal or (2) create a proxy ticket to log into the third-party repository, probably the latter. Or, to put it another way, it is a question of whether the third-party repository supports CAS (proxy)tickets via CMIS... and I can throw in the hook for that (should be relatively simple, if it does).

I have tried exploring this with Alfresco, but haven't gotten too far... mainly because I don't know too much about how to get CAS running on Alfresco (not an Alfresco expert here) and my contact in Alfresco wasn't too clear on how CAS works with CMIS either. I also have been a bit strapped with some other work so haven't really been able to research this too much.

@Peter @Florian Any thoughts on this one?
@Alexander I completely understand the struggle with Alfresco CAS. The documentation are not very helpful, and neither are their engineers on the forums. I have been able to get Alfresco CASified, though. I would be willing to provide my documentation if you're interested. I am fairly certain that CASifying Alfresco also puts all web service URLs behind CAS as well. Although it's wishful thinking I am hoping that if both Alfresco and Liferay are CASified and the user is already authenticated, attempting to access Alfresco via CMIS will automatically trigger a CAS ticket request and push through on it's own. I haven't had a chance to test this yet, but plan to soon. Thanks for your work. It's really going to be slick!
@alex, look at this: http://issues.alfresco.com/jira/browse/ALF-7074

if you are willing to modify your cmishook, you should create a seperate alfresco-cmishook!
@drew why don't you drop me an email. alexander.chow { at } liferay.com

@hajo thanks for the link. Will need to explore this further. Not sure if that is a CAS ticket or an Alfresco ticket. But I think we can probably create a generic authentication utility that Liferay can use to login to not only Alfresco but any other system.
@Alex, regarding managing metadata, custom types / JCR mixins, it is called "Secondary types" and it seems to be in development

http://tools.oasis-open.org/issues/browse/CMIS-713
@Alex, 3 months ago, I was about to implement BaseRepositoryImpl so that I would mount a custom JCR repository to DL, via this implementation. I said to myself that I better wait until it stabilizes or some documentation shows up.

In regard to BaseRepositoryImpl / BaseLocalRepositoryImpl :

Is it so, that CMISRepository doesn't implement the local interface, because ACL is handled on the repository side ?
So that if the third party repository hadn't its own authentication / authorization mechanism, it would implement both interfaces and the remote one would be a permission checking wrapper for the local one, right ?

In regard to sync of a 3rd party repository with RepositoryEntry, the idea is (in CMISRepository impl) having all documents synchronized, right ? It can be seen in CMISRepository.cacheFoldersAndFileEntries(); You mentioned that in this blog post. But shouldn't CMIS be only a mounting point ? I don't see a reason for keeping state of remote repository in RepositoryEntry table. Whatever CRUD I do, I want it to be reflected only in the remote repository, imho. What is the idea behind the synchronization ?

Not only CMIS repository, but even a custom JCR repository should be only "mounted" to DL. Why would one need to keep track of documentIDs ?
I suppose that the reasons have "integration nature", because DL uses DLAppService and it operates with FileEntries, so that addFileEntry() method of a third part repository must return FileEntry and there must be some tracking between FileEntry table and third party repository. For instance, right after file is added via DL, it is added to AssetPublisher.

I believe that a lot of LR developers would actually need to employ DL only as UI for their custom repositories. If you google around, you see that in most cases, people think about using their document store in LR. They've got a document store and they are searching for a portal to wrap it with.

Considering that hooks are meant only for scaling purposes. When used with LiferayRepository, the store must be empty, to be in sync with FileEntries. They can be used directly though (not necessarily via LiferayRepository).

It isn't clear to me, what third party repositories that implement BaseRepositoryImpl are meant for. Because it is very hard to think about the conflicts that may arise when you make changes into the third party repository not only via BaseRepositoryImpl implementation but even by some other means. It practically excludes the possibility to use DL as a user interface for custom document stores, which I think is what people would like to see. Though it is hard to imagine something like this if the state in the document store and FileEntry table is to be synchronized, no matter how perfect would the implementation be.
Hi Jakub,

The only things that are really synchronized are IDs. The cache you refer to is just for a quick lookup that gets reset for ever HttpRequest. This is because each HttpRequest may access the same files or folder multiple times and we want to minimize excessive calls within a few milliseconds. Hence, there should not really be too much of a conflict between CRUD operations in Liferay vs. the third party repository.

Hope that clarifies things.

Alex
I'm referring to the synchronized IDs. I'm just saying that for the sake of LR , DL might offer the possibility of being UI for repositories... So that creation of FileEntry could theoretically be part of Repository implementation. That way, DL could be used as UI for third party repositories without any possible sync issues when creating new files. I know it is practically Utopia, because DL is built on FileEntries, it's not built for JCR stores... But I can't even figure what happens if a file is removed from repository (not via DL), but DL already got it's ID :-)

The tight integration can be also seen for instance here : EditFileEntryAction.updateFileEntry() method has :

AssetPublisherUtil.addAndStoreSelection(
actionRequest, DLFileEntry.class.getName(),
fileEntry.getFileEntryId(), -1);

Shouldn't this be part of the Repository implementation ? If I use my custom repository, it should decide itself about AssetPublishing right ?

I'm just trying to point this out, because for past years, as I said, I've seen many people asking about how to utilize Liferay for their existing document stores. Imho DL now is not enough transparent for doing this.
Hi Alex,
I have tried to do the same But facing some issues with the Visibility of the file. I have raised the issue in the http://issues.liferay.com/browse/LPS-23408.
Please help me to fix the issue.

Thanks in advance.
Hi Alex,
are there any news about testing against IBM's CMIS servers?

We have here inhouse a IBM CMIS Server and I'm not able to connect against that Server
AtomPub:
01:16:15,669 DEBUG [FullNameValidatorFactory:46] Return com.liferay.portal.security.auth.DefaultFullNameGenerator
01:16:15,671 WARN [DLFileEntryPersistenceImpl:5544] No DLFileEntry exists with the key {groupId=19, folderId=0, title=IBM}
01:16:15,751 WARN [RepositoryLocalServiceImpl:95] Unable to initialize CMIS session for repository with {repositoryId=10569}
01:16:15,752 DEBUG [InvokerPortletImpl:370] processAction for 20 takes 89 ms
01:16:15,753 DEBUG [InvokerFilterChain:113] Skip disabled filter class com.liferay.portal.servlet.filters.servletcontextinclude.ServletContextIncludeFilter

WebServices
01:22:42,188 DEBUG [FullNameValidatorFactory:46] Return com.liferay.portal.security.auth.DefaultFullNameGenerator
01:22:42,189 WARN [DLFileEntryPersistenceImpl:5544] No DLFileEntry exists with the key {groupId=19, folderId=0, title=ibm}
01:22:42,200 WARN [RepositoryLocalServiceImpl:95] org.apache.chemistry.opencmis.commons.exceptions.CmisConnectionException: Cannot initalize Web Services service object [org.apache.chemistry.opencmis.binding.webservices.RepositoryService]: 2 counts of InaccessibleWSDLException.

01:22:42,202 DEBUG [InvokerPortletImpl:370] processAction for 20 takes 22 ms

It seem's to me that I'm doing some principal error ?
Thanks
Michael
Hi Michael,

I've worked with Jay Brown on IBM's P8 server in the past and have been able to get it to work. However, IBM has quite a few servers that have various levels of CMIS support. You may want to try using the CMIS workbench (http://chemistry.apache.org/java/developing/tools/dev-tools-workbench.html) against your server and perhaps even running the built-in TCK to make sure it works first. If it is a principal error, Liferay should be returning that.

Also, the forums are perhaps a better forum to discuss problems you may be having than on my blog. It widens the discussion to input from others.

Alex
Dear all,


interested if there is any possibility/plans to add a SVN repository of choice via LR document library?
Hi Natasha,

That's a great question. I don't think that is part of our use case but it is surely a very conceivable one. The framework, however, was built so one can fairly easily build a plugin. Somebody in the community has already done one for te file system.

Alex
Hi Alex,

I've successfully mounted an Alfresco 4.0 repository throuth CMIS Atom Pub Service. I can browse folders and files in Document and Media Portlet. I was aspecting that searches could retrurn hits both from local (Liferay) and remote (Alfresco/CMIS) repository but actually only local files are founded.

The question is, what about search cabability mounting a remote CMIS repository in Liferay Document and Media Portlet?

Thanks,
Denis.
Hi Denis,

Liferay 6.1 supports searching both local and remote repositories. The search capabilities, however, are limited to the facilities that is provided by the protocol (in this case CMIS).

So, for instance, if you can put a file named "Liferay 6.1.pdf" in both the local and remote repositories and see the differences in the searches. If you search for "Liferay", both should show up (I just tested it). If you are curious, the search is mainly done in the class BaseCmisSearchQueryBuilder.

There is a significant difference between 6.1 CE and EE where we overlooked performing a content search (searching against the actual file being uploaded as opposed to simply a name search). This was fixed just after CE was cut so did not make it until 6.1 EE unfortunately (see http://issues.liferay.com/browse/LPS-25066).

Hope that helps.

Alex
Hi Alexander,

thanks for your clarification. I'm doing some more test on my environment.

One more question. Liferay whole repository can be accessed through CMIS ?
(e.g. using an application that uses CMIS to access Liferay contents).

WDYT ?

Thanks,
Denis.
Sounds good.

In regards to your question, we do have a ticket out there for Liferay to be exposed as a CMIS repository (http://issues.liferay.com/browse/LPS-10201). At the moment, this is in our backlog since other matters have trumped that in terms of priority.

Alex
@Alexander

Thansk for your quick answer.

By the way, as reported here (http://www.liferay.com/community/wiki/-/wiki/Main/CMIS+Repository#section-CMIS+Repository-FAQ) mounting CMIS repository require to store user password in session.

It's a limit of CMIS itself or (in particular with Alfresco) is there a workaround that allow CMIS repository working wiht a SSO ?

Thanks,
Denis.