Taxonomies and Folksonomies - Increasing search and retrieve capabilities in Knowledge Base Articles

Folksonomies are a user-driven approach to organizing content via tags, cooperative classification and communication through shared metadata. The portal implements folksonomies via tags. A tag may be associated to many assets; while an asset may have many tags associated with. This is what we called tagging content.  By the way, a tag may have many properties. Each property is made up of name-value pair.

Taxonomies are a hierarchical structure, used in scientific classification schemes. Although taxonomies are common, it can be difficult to implement them. The portal implements taxonomies as vocabularies and category trees in order to tag contents and classify them.

Abstracted from the book: Liferay Portal 6 Enterprise Intranets  (coming out soon)

This article will introduce how to use Taxonomies and Folksonomies - Increasing search and retrieve capabilities in Knowledge Base Articles.

Introduction

What’s knowledge base or knowledge management (KM)? The portlet Knowledge Base allows authoring articles and organize them in a hierarchy of navigable categories. It leverages Web Content articles, structures, and templates; allows rating on articles; allows commenting on articles; allows adding hierarchy of categories; allows adding tags on articles; exports articles to PDF and other formats; supports workflow; allows adding custom attributes (called custom fields); supports indexing and advanced search; allows using rule engine; etc.

In general, the portlet Knowledge Base provides two pieces inside: Articles – managing knowledge base articles - and Article Aggregator - publishing knowledge base articles.

Mostly importantly, indexing, Folksonomies and Taxonomies are applied on Knowledge Base articles. Therefore, we could increase search and retrieve capabilities in Knowledge Base Articles. Of course, same solutions could be applied on any content like Web Content, Wiki articles, Blogs entries, Forum messages, Bookmarks entries, calendar entries, Image gallery images, document library documents, etc.

Solutions

Suppose that knowledge base articles are tagged with following tags and categories (plus category hierarchy), we use these tags and categories as an example.

 

Knowledge base articles could be searched or retrieved with following features.

  • Indexing articles with title, description, content, tags and categories
  • Retrieve articles by tag
  • Retrieve articles by category
  • Retrieve articles by category hierarchy
  • Find related articles
  • Search by combination of tags, categories and category hierarchy

Indexing articles with title, description, content, tags and categories

Articles get indexed with title, description, content, tags and category, plus custom attributes.

Document document = new DocumentImpl();
document.addUID(PORTLET_ID, resourcePrimKey);
document.addKeyword(Field.COMPANY_ID, companyId);
document.addKeyword(Field.PORTLET_ID, PORTLET_ID);
document.addKeyword(Field.GROUP_ID, groupId);
document.addKeyword(Field.USER_ID, userId);
document.addText(Field.TITLE, title);
document.addText(Field.CONTENT, content);
document.addText(Field.DESCRIPTION, description);
document.addText("resourcePrimKey", String.valueOf(resourcePrimKey));
document.addKeyword(Field.ASSET_CATEGORY_NAMES, assetCategoryNames);
document.addKeyword(Field.ASSET_TAG_NAMES, assetTagNames);
document.addModifiedDate();
document.addKeyword(Field.ENTRY_CLASS_NAME, KBArticle.class.getName());
document.addKeyword(Field.ENTRY_CLASS_PK, resourcePrimKey);
ExpandoBridgeIndexerUtil.addAttributes(document, expandoBridge);

Retrieve articles by tags

Knowledge base articles could be retrieved by tag, like t1, t2, etc with following approach.
<div class="page-tags">
    <liferay-ui:asset-tags-summary
        className="<%= KBArticle.class.getName() %>"
        classPK="<%= kbArticle.getResourcePrimKey() %>"
        message="tags"
        portletURL="<%= PortletURLUtil.clone(taggedPagesURL, renderResponse) %>"
    />
</div>

Retrieve articles by category

Knowledge base articles could be retrieved by category, like B0, B1, etc.  with following approach.

<div class="page-categories">

     <liferay-ui:asset-categories-summary
 
       className="<%= KBArticle.class.getName() %>"
        classPK="<%= KkbArticle.getResourcePrimKey() %>"
        portletURL="<%= PortletURLUtil.clone(categorizedPagesURL, renderResponse) %>"
    />
</div>

Retrieve articles by category hierarchy

Knowledge base articles could be retrieved by category like B0 and its hierarchy. In fact, the portal or portlet search articles with following expression

B0 or B1 or B2

Behind the scene, it uses portal property setting:
asset.categories.search.hierarchical=true

Set the above to false to specify that searching and browsing using categories should only show assets that have been assigned the selected category explicitly. When set to true, the children categories are also included in the search.

Or specially, setting following properties in Plugins.
kb.articles.categories.search.hierarchy.enabled=true
kb.articles.categories.search.hierarchy.approach=top-down

The property is set to true, thus the children categories or parent categories are also included in the search.  Of course, you can set it to false to specify that searching and browsing using categories should only show assets that have been assigned the selected category explicitly.

Approaches include top-down and bottom-up.

To find children categories, use top-down approach; to find parent categories, use bottom-up approach.

Find related articles

Related articles are a set of articles which sharing same or similar tags or categories.
Supposed that one article is tagged with t1, A0, related articles are a set of articles with following expression of tags and categories without category hierarchy.

T1 or A0

With category hierarchy, related articles are a set of articles with following expression of tags and categories.

T1 or A0 or A1 or A2 or A3

Search by combination of tags, categories and category hierarchy

In most case, we need search articles by combination of tags, categories and category hierarchy in dynamic expression, like by tags t3 and t4, categories C0, C1, C4, and B2, as

T3 and T4 and C0 and (C1 or C2 or C3) and (C4 or C5 or C6) and B2

In 6.0 or above, the portal provides AssetEntryQuery to generate dynamic query easily with any combination of AND, OR and NOT.

In 5.2 or below, only AND or OR is supported, plus NOT contain. Thus above example can be represented as

a)    T3 and T4
b)     (C1 or C2 or C3) or (C4 or C5 or C6)
c)    a) and b)
or
a)    T3 and T4
b)     C0 and C1 and C4 and B2
c)    a) and b)

As you can see, 6.0 improves a lot on dynamic query


Prototype – An Implementation

Environment: Liferay portal 5.2.5 – EE 5.2 SP1; Knowledge base portlet. It would be easy to upgrade to 6.0.

Indexing articles with title, description, content, tags and categories

- input keyword, it will search keyword against title, description, content, tags and categories (and category hierarchy)

 

Retrieving articles by tag, categories and category hierarchy

Related articles


Search by Keywords, Status, Tags, Categories and Category hierarchy


Summary
As you can see, content - knowledge base articles – are indexed and searchable via Folksonomies and Taxonomies, especially category hierarchy. By the way, these approaches would be helpful for any content like Web Content, Wiki articles, Blogs entries, Forum messages, Bookmarks entries, calendar entries, Image gallery images, document library documents, etc.

Last but not least, I'd like to send special thanks to Frank Yu, Robert Chen, Peter Shin, Julio Camarero, Jorge Ferrer, Bruno Farache who did an amazing job to make Knowledge base portlets a reality. Also to everyone else that helped during development and providing feedback.

Blogs
Hello,
While not 100% related to the article, I am very interested in the possibility to add a description to categories. I would like for example to implement a taxonomy similar to the one on isinet (http://science.thomsonreuters.com/mjl/scope/scope_ahci/). So, while each category would cover many related fields, it would require an explanation about the covered fields. Would it be readily available, or does it need custom development.

Thanks and great work again!
Hi Eduard, thanks for sharing. You may be interested in Ontology. The following is abstracted from the book: Liferay Portal 6 Enterprise Intranets.

Why it doesn’t merge both kinds of tags through Ontology?

As you can see, there are two kinds of tags: taxonomies and folksonomies. Both of them could be used as a way of organizing and aggregating content. Folksonomy is a way of classification, creating and managing tags to annotate and categorize content; while taxonomy is a hierarchical structure for classification.

In fact, taxonomies and folksonomies are different.

Taxonomies are a closed set of categories (or called tags) and the vocabulary, created and organized in a hierarchical structure. It helps standardization, especially when you store it in Shared Global group to standardize categorization through all of the organizations. In a word, Folksonomies are an open set of tags, extended by the end user.

Why doesn’t merge both through ontology? Ontology, the study of entities and their relations, is less concerned with what is than with what is possible. The answer would be “yes” and it is highly expected.