Auto-tagging your assets

If you are working at a company where a lot of content is produced you will recognize this problem.

How can I make sure that the content is always tagged AND in a consistent way.
If you think tagging is not relevant ... then think again.

Tagging can...

  • Provide a specific and relevant set of articles
  • Support your search with relevant context
  • Support your SEO efforts
  • Support for related articles
  • Automatically display content on specific pages

To make it easy and consistent you need business rules to apply the right tags automatically.

Since Liferay DXP is using Elasticsearch we can make use of the percolator functionality.
This is a reverse query engine and instead of indexing documents in an index and using queries to find the right documents we do it the other way around. We store queries/business rules in the index and we provide documents/articles/text to find the relevant business rules. By adding the name as metadata to a business rule we can create the auto-tagging functionality.

When I started working on this concept I first created a webservice that could do everything I needed.

  • create a connection to Elasticsearch
  • create a business rule/query
  • get a list of rules
  • sent some text and match with the stored business rules/queries

Once this was working I created a service module that would be triggered if new assets were created. It calls the webservice to match the text (title/summary/content or any other metadata) with the stored business rules/queries. It then receives the tags and add them to the asset. The nice thing is you can basically use it for any asset type you like.

I've added the link to the github repo below and also a link to the video showing you how it works.

Feel free to ping me if you have questions.

github.com/jverweijL/ElasticAutoTagger
https://www.youtube.com/watch?v=mL6TUIQ6KvA&index=1&list=PLp6cS8SjamlPOMBiFZ17y1HocOmmFn1ex