
Database Sharding
Table of Contents [-]
Introduction #
Database sharding is a way of scaling your database horizontally. For a given table or set of tables, you split up the data that is stored and fetched based on a given hash or something like that. Google, Facebook, and Wikipedia all use database sharding.
Benefits #
- One database does not get overloaded
- Smaller queries (since each table has less data now)
- Better overall throughput under load because all your IO is not going through one database server.
Sharded Portal Instances #
At present, sharding exists in Liferay from version 5.2.3+ for handling data across multiple portal instances. Here's what you do to get it up.
- After you have a normally working development environment, make sure your hosts file is setup to allow virtual hosting (for our convenience, I will call the hosts abc1.com, abc2.com, abc3.com, etc.).
- Use the create-minimal SQL script (from our Downloads section) to create and populate three database schemas: lportal, lportal1, and lportal2. Be careful and use the scripts for your liferay portal version, otherwise, it won't work. In 6.0, if you simply create several empty database schemas manually, when Liferay starts up with sharding enabled, it will automatically populate the schemas.
- Note: By default, the configuration files are setup for three schemas called default, one and two, but you can configure it for more. All this configuration is set up in the file portal-impl/src/META-INF/shard-data-source-spring.xml . As you will see, we will include this file in the property spring.configs in the next step. In case we want to modify it we just should write our own file in the Extension Environment and include it in this property instead of the default one.
- In your portal-ext.properties, you will need to set the following:
- Enable
META-INF/shard-data-source-spring.xml and bunch of other spring configs (refer to "Additional Settings" section below)
underspring.configs
. - Configure the jdbc schema settings for
jdbc.default.*
,jdbc.one.*
, andjdbc.two.*
. - Enable the shard names:
shard.available.names=default,one,two
- Don't forget to set the appropriate username and password for each schema.
- Startup the server and create several (2-3) portal instances (e.g., abc1.com, abc2.com, abc3.com).
- Using your favorite database browser, do a query on each of your schemas for the User_ table, and you will notice that the data is now distributed across different schemas. That's it!
Additional Settings #
Remember that default, one, and two are default values defined in portal-impl/src/META-INF/shard-data-source-spring.xml and portal-ext.properties. If you want to change it, make sure to propagate the changes to the spring XML file as well as portal-ext.properties.
- If you want to manually select the shard (via the GUI when creating portal instances), you need to enable in your portal-ext.properties file:
shard.selector=com.liferay.portal.dao.shard.ManualShardSelector
. Otherwise, the shard for the data will be chosen using a round robin technique. - Due to the nature of sharding across multiple data sources, it does not support transaction management by itself. In order to enable proper transaction management, you will need to configure JTA/XA. For example, see JTA-XA on Tomcat.
- Following spring configs also need to be added to portal-ext.properties in 6.1 version ..
spring.configs= \META-INF/base-spring.xml, \META-INF/hibernate-spring.xml, \META-INFinfrastructure-spring.xml, \META-INF/management-spring.xml, \META-INF/util-spring.xml, \META-INF/jpa-spring.xml, \META-INF/executor-spring.xml, \META-INF/audit-spring.xml, \META-INF/cluster-spring.xml, \META-INF/editor-spring.xml, \META-INF/executor-spring.xml, \META-INF/jcr-spring.xml, \META-INF/ldap-spring.xml, \META-INF/messaging-core-spring.xml, \META-INF/messaging-misc-spring.xml, \META-INF/mobile-device-spring.xml, \META-INF/notifications-spring.xml, \META-INF/poller-spring.xml, \META-INF/rules-spring.xml, \META-INF/scheduler-spring.xml, \META-INF/scripting-spring.xml, \META-INF/search-spring.xml, \META-INF/workflow-spring.xml, \META-INF/counter-spring.xml, \META-INF/document-library-spring.xml, \META-INF/mail-spring.xml, \META-INF/portal-spring.xml, \META-INF/portlet-container-spring.xml, \META-INF/shard-data-source-spring.xml, \META-INF/ext-spring.xml
Please note that for 6.0 version of the Liferay portal the following spring configs should be added to portal-ext.properties instead of the above mentioned ones:
spring.configs=\META-INF/base-spring.xml, META-INF/hibernate-spring.xml,\META-INF/infrastructure-spring.xml,\META-INF/management-spring.xml, META-INF/util-spring.xml, META-INF/editor-spring.xml,\META-INF/jcr-spring.xml,\META-INF/messaging-spring.xml,\META-INF/scheduler-spring.xml,\META-INF/search-spring.xml, META-INF/counter-spring.xml,\META-INF/document-library-spring.xml,\META-INF/lock-spring.xml,\META-INF/mail-spring.xml,\META-INF/portal-spring.xml,\META-INF/portlet-container-spring.xml,\META-INF/wsrp-spring.xml, META-INF/mirage-spring.xml, \META-INF/shard-data-source-spring.xml