Sharding Databases - An Unorthodoxed Design

 I have been recently working on the very interesting task of making the Liferay database shardable.  What does this mean?  There is a good introduction to shards here, but basically it is a way of scaling your database horizontally.  For a given table or set of tables, you split up the data that is stored and fetched based on a given hash or something like that.  Instead of storing a gigabyte of user data on one database, you spread this data across.. let's say, a dozen shards.  This way you don't overload one database, have smaller queries (since each table has less data now), and have better overall throughput under load because all your IO is not going through one database server.  Google, Facebook, Wikipedia -- all these guys shard their databases.

Nice concept, but how does it work in Liferay?  Well, as sort of a quick initial use case, I tried to shard the system based on portal Instances.  This is great if you are working in an ASP (Application Service Provider) environment where you host multiple portals on a single running version of Liferay.  So, in a nutshell, what we do is we use Spring to intercept calls to the persistence tier, figure out what the companyId is (that which designates the Portal Instance), and switch out the DataSource underneath you before letting you continue.  When there is a need to make calls across all databases (e.g., UpgradeProcess), we catch it in Spring and make the appropriate call multiple times across each of the shards.  With that, you now have a sharded database!

In the future, we are looking to shard other things... Users, Communities, Layouts, Portlets, etc.  But for now, this is our first cut.

If you would like to try it out yourself, check out the wiki article.

Blogs
Hi,

I was considering following scenerio: Say ASP has several Portal instances. In one instance something goes wrong - due to user mistake and company requires to restore database from backup. Of course other companies are not even aware of this company existance and would not let restore database from backup.

This solution would work in such case??
Though it is not the intention of sharding, what you are describing is an interesting use case. If you were using different databases for each portal instance and something happens, theoretically yes, you should be able to just restore that DB from a backup.
I think we design the liferay default shard which store global settings.This is good for if you use custom authentication like ldap

and add other shard for users if user id end with 1,2,3 use shard1 , 4,5,6 use shard2 ....