Fórum

NoSQL, MongoDB, Cassandra, etc. and Liferay

thumbnail
Ray Augé, modificado 13 Anos atrás.

NoSQL, MongoDB, Cassandra, etc. and Liferay

Liferay Legend Postagens: 1197 Data de Entrada: 08/02/05 Postagens Recentes
Hey All,

Over the last several months the hype around NoSQL DB design has reached fever pitch.

At the same time, the hype around dynamic data modeling, web based form design, dynamic schema design, meta-data attachment and such has been increasing as well (if not under those names, creating stuff online dynamically with little coding).

Meanwhile, there have been concerns that certain aspects of Liferay's architecture may not be so well suited for scalability (we're of course talking lots and lots of data), namely around Custom Fields and Expandos in general. A Sharepoint demo that Jon Lee (Liferay) gave during the developer retreat and the ensuing discussion about its scalability seemed to confirm that the model they use which is much like ours for Expando doesn't seem to scale well when data sets become very large.

All this being said, Expando was designed not to be too tightly bound to a specific backend and it has no relations with any other portal domain entities.

As such, I've been thinking to prototype a NoSQL adapter (effectively ServiceWrapper hook) that we could plug in as the backend to make Expando scale to huge amounts of data (via some NoSQL DB Impl) and with the coming of the User Data Lists/Workflow Forms which should have adapters for storing in either WCM or Expando. It would allow our web based data modeling to support huge data sets dynamically and without any portal changes. It would mean that things like custom fields automatically get stored in this new backend.

OR

When developers create Expando tables programatically, they are in effect creating new document types, tables, field sets (or whatever NoSQL nomenclature is used for defining the data sets) and storing in highly scalable storage.

So if anyone asks about if we have plans to or ideas of how we can use the NoSQL model in Liferay, this is one way of how I see it being used.

Thoughts?
thumbnail
Szymon Gołębiewski, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Regular Member Postagens: 246 Data de Entrada: 08/06/09 Postagens Recentes
I can only write our experience with NoSQL database (it was RIAK). We made a system that stored ads that users entered on one of our sites. On one node RIAK was unable to do "map reduce" on set of 10 000 ads (users were gettin timeout error). So we made tests of different DB systems like MongoDB, CouchDB, PostgreSQL and MySQL. For one node and flat data structure MySQL was the fastests database. Problem was that if you want RIAK to be as fast as MySQL you have to prepare lots of nodes. Our farm consisted of 5 servers but 5 nodes was not enough.

So question is on what ammount of nosql nodes those DBs will be faster than MySQL (which btw have pretty nice replication options ootb)?
thumbnail
Ray Augé, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Liferay Legend Postagens: 1197 Data de Entrada: 08/02/05 Postagens Recentes
I would think that given a flat data structure and only 10000 items, adding the overhead of managing a separate DB (especially a NoSQL one) into an existing infrastructure would seem a little out of place to me. I would not think twice to add that to SQL and in the worst case add an indexer on top of that to speed searching. Heck, I wouldn't think too much if we were even talking of doing that using Expando's in Liferay (since those can also be indexed).

I think the problem comes up more when the order of magnitude begins to show signs of performance degredation. As in, the number of records starts to hit into the millions+, and perhaps you still need some amount of dynamic behavior such as ability to add columns on the fly, or add new tables (or documents in NoSQL speak).

Homomorphic data models (schema is defined as data), such as Expando (and Sharepoint Lists) start to degrade in performance because the number of tables is fixed. Whether you have 1 virtual table or 1000, all the data is still in the same real few tables. This means you start to form contention as more and more different apps try to use the read and write from those.

Now, Homomorphic data models suffer from additional limitations in that you can't perform traditional DB operations on the data because the columns are not stored in such as way as to allow aggregate operations on them (sorting and filtering for instance). As the number of SQL operations starts to go down it begins to look more like a document repository than a SQL one, minus the optimizations that a document repo has (such as inbuilt indexing). What we've been doing to solve that problem with Liferay is adding the ability to index Expando data (in our own embedded indexer, Lucene, Solr) along with the ORM entities. That solved Search (read of CRUD). The problem still lies in Create, Update, and Delete as scale increases again due to contention on those few tables.

This is where the idea of using NoSQL comes in:

1 - This data has to be reliable (clustering, replication, backup)
2 - It has to be dynamic (add custom fields to entities on the fly, create new Data Lists on the fly, etc.)
3 - It should offer aggregate operations for sorting and filtering (at least close to what you'd expect from a Document repo with indexing)
4 - It has to scale.
5 - It has to perform well (indexing, MapReduce, etc.)

What I'm looking to figure out is:

1 - Can NoSQL do what Expando does? (i.e. Can we map Expando onto NoSQL? I think so!)
2 - At what point does Expando backend need to be moved to a higher scale architecture like NoSQL?
3 - How hard is it to write an adapter Expando -> NoSQL?
4 - Is it really worth the effort?
5 - Does anyone see value in doing that (would it make anyone feel more comfortable, make their job easier, and make them look like Wizards to their bosses when they say they've implemented NoSQL seamlessly into their infrastructure and gained X amount of benefit)?
thumbnail
Jonas X. Yuan, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Liferay Master Postagens: 993 Data de Entrada: 27/04/07 Postagens Recentes
Great! Thank you, Ray.
thumbnail
Marcelo Ruiz Camauër, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Junior Member Postagens: 78 Data de Entrada: 09/05/06 Postagens Recentes
Congratulations, this is a very interesting experiment or rather, enhancement!

My question is would it be possible to replace the ENTIRE Liferay schema with a NoSQL db?

There's one in particular, VoltDB, which has a great degree of compatibility with SQL (a pretty complete subset of it). Given that LR runs on so many types of DB engines, it probably could get ported to it without too much trouble... VoltDB sounds pretty good, and may enable really large scale portals with large scale customizability (ie Expandos) of user data...
thumbnail
Ray Augé, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Liferay Legend Postagens: 1197 Data de Entrada: 08/02/05 Postagens Recentes
The short answers is either:

1) "It's very highly doubtful."
2) "The undertaking might not be worth the effort."

It might be easier to ask "Could Hibernate be made to work on a non-SQL persitence backend?" That would have to happen before Liferay could even consider it in any way.

On the other hand Liferay does have, as we've demonstrated, several places where we could leverage a non-SQL persistence. Another that I can think of (besides those we've already mentioned) is as perhaps an other implementation of the DocLib Repository backend (Liferay 6.1 will support multiple backend repositories all at the same time.) I've read that MongoDB (and I imagine there are others) are ideally suited for storing large binary objects (like video) efficiently and with highly concurrent, and extremely efficient streaming in either direction.
thumbnail
Marcelo Ruiz Camauër, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Junior Member Postagens: 78 Data de Entrada: 09/05/06 Postagens Recentes
Relational DB's can and do handle very large workloads, and there are a variety of ways of extending their scalability, but in general they are costly to implement (multi-slave db's, etc., many servers, etc).

Large binary storage is one useful capability. Currently putting doc libraries in the DB is not a great strategy for really large-scale storage... it bogs down the whole DB and backup systems.

One feature about these nosql systems is their replication and resynchronization (and resiliency)... maybe they'd be useful for a LR system that can run locally and re-synch later when connected to the Web? You could do vertical apps with LR as the substrate then... and even in the First World you don't always have great connectivity outside the large cities... Maybe you could have a "P2P" portal?

The best strategy would be mapping Hibernate to NoSQL and leave Liferay untouched.
thumbnail
Ján Ondrušek, modificado 6 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

New Member Postagens: 13 Data de Entrada: 26/04/09 Postagens Recentes
Interestingly, Hibernate already supports NoSQL and there is a Jira task to research OGM. Please vote!
thumbnail
Ray Augé, modificado 13 Anos atrás.

RE: NoSQL, MongoDB, Cassandra, etc. and Liferay

Liferay Legend Postagens: 1197 Data de Entrada: 08/02/05 Postagens Recentes
Committed to SVN (plugins/trunk).

See http://issues.liferay.com/browse/LPS-14646.