Faceted Search in Liferay 6.1

It's been so long since I've written anything, I've been fealing rather guilty. Luckly recently we've been undertaking a huge effort to document features of Liferay, old and new.

One interesting but highly understated feature of Liferay 6.1 is the new Faceted Search support that I was lucky enough to get to work on. As I finished the first round of documentation (for my more eloquent peers to turn into a more polished and finished product) I thought that this would be a great bunch of info to place here for comment. It's a little more formal than a blog post should be, and probably much longer as well (there is even a toc!!!).. but what the hey!

 

Definitions

Before going through features, let us outline a set of definitions that are commonly used in discussion Faceted Search (or search in general).

indexed field: When we store documents in a search engine, we classify aspects of the document into fields. These fields represent the metadata about each document. Some typical fields are: name, creation date, author, type, tags, content, etc.

term : A term is a single value that can be searched which does not contain any whitespace characters. Terms may appear more than once in a document or appear in several documents, and are typically considered atomic units of search. Within the search engine, each indexed field (for example name) will have a list of known terms found within all the documents having that particular indexed field.

phrase: A phrase is a series of terms separated by spaces. The only way to use a phrase as a term in a search is to surround it with double quotes (").

multi value field: Some fields store more than one term at a time. For instance the "content" field may in fact contain hundreds of unique terms. Such fields are often referred to as "text" or "array" fields.

single value field: In contrast to multi-value fields, we have to logically assume that there is such a thing as single value field. Such fields always only contain a single term. These fields are often referred to as "token" or "string" fields.

frequency: The frequency value indicates how many times a term appears within a set of documents.

facet: A facet is a combination of the information about a specific indexed field, it's terms and their frequency. Facets are typically named by the field in question.

term result list: When a facet displays it's data, we call this the term result list.

frequency threshold: Some facet have a property called frequency threshold. This value indicates the minimum number for frequency of terms we want to show. If the frequency threshold of a facet is set to 1, a term appearing 1 or more times will appear in the term result list.

max terms: Some facet have a property called max terms. This value indicates the maximum number of terms that will be included in the term result list regardless of how many actual matching terms are found for the facet. This is done to keep the user interface under control and not to overwhelm the user with too much information.

order: The order property determines the default ordering used for the term result list. There are two possible modes: Order Hits Descending, or Order Value Ascending. The first, Order Hits Descending , means that results will be ordered by frequency in a descending order. The second, Order Value Ascending, means that the results will be ordered by value (i.e. "term") in ascending order. Both modes will fall back to the other mode as a secondary sort order when there are duplicates. (i.e. many terms with the same frequency will always be sorted by "value").

range: A range defines an interval within which all the matching terms' frequencies are summed. This means that if a facet defines a term range for the "creation time" field between the year 2008 to 2010, and another for 2011 to 2012, all matching documents having a creation time within one of these specified ranges will be returned as a sum for that range. Thus you may find 7 documents in the first range, and 18 documents in the second range. Ranges cannot be used with multi-value fields.

For End Users and Portal Administrators

Faceted search is a new feature in Liferay 6.1 (although some of the APIs were first introduced in Liferay 6.0 EE sp2). As such there is little relevance with previous versions other than in direct comparison with the old implementation of the Search Portlet which had no facet capabilities of any kind.

Although the new Faceted Search APIs are used transparently throughout the portal, primary exposure is surfaced through the new Search Portlet implementation.

What follows is a list of features provided by Faceted Search via the Search Portlet.

  1. Aggregation of assets into a single result set: Results from all portlets are returned as a single set and the relevance is normalized among the entire set, regardless of type (i.e. the best results among all types will be at the top). Searching has a more linear cost due to the fact that only a single query is performed. Searching is therefore faster, more intuitive, and more relevant.

    In previous versions of the portal, each Portlet implemented it's own search and returned a separate result set which resulted in several issues:
    • Each portlet invoked it's own query, and each portlet was called in turn resulting in a single portal request generating potentially N queries to the index each with it's own processing time. This lead to increased time to produce the final view.
    • Depending on the order of how portlet searches were called, the results near the bottom may be the most relevant and due to positioning could appear to have less value than those of portlets positioned physically higher up on the page. i.e. the relevance of results was not normalized across all the total results of all portlets.
  2. Default facets: Asset Type, Asset Tags, Asset Categories, and Modified Time range facets are provided by default. These defaults make finding content on the most common facets simple and powerful. Facets details are displayed in the left column of the search portlet and provide information in context of the current search.
    • Asset Type: Performing a search for the term "htc" may return Asset Type facet details which appear as follows:



      The value in parenthesis is the frequency with the term appearing on the left.

      You may notice that as you perform different searches, the Asset Type terms may disappear and re-appear. When a term does not appear it means; a) it was not be found among the results, b) it did not meet the frequency threshold property, or c) it was beyond the maxTerms property (these properties will be discussed more later).
    • Asset Tags: If tags have been applied to any document which appear in the result set, they may appear in the Asset Tag facet:



      Note: Not all tags may appear. In the example above, there are many more than the 10 tags that are listed, but the default configuration for this facet is to show the top 10 most frequently occuring terms as set by it's maxTerms property.
    • Asset Categories: If categories have been applied to any document which appear in the result set, they may appear in the Asset Categories facet:



      Note: Not all categories may appear. In the example above, there are many more than the 10 categories that are listed, but the default configuration for this facet is to show the top 10 most frequently occuring terms as set by it's maxTerms property.
    • Modified Time: All documents appearing in the result set should have an indexed field called "modified" which indicates when the document was created (or updated). The Modified Time facet is a range facet which provides several pre-configured ranges as well as an option for the user to specify a range. All results in the subsequent query should then fall within this range.
  3. Drill down: The next feature allows refining results by selecting terms from each facet thereby adding more criteria to the search to narrow results (referred to as "drilling down" into the results).

    Clicking on terms adds them to the search criteria (currently only one term per facet). They are then listed in what is known as "token style" just below the search input box for convenience and clarity. Clicking the any token's X removes it from the currently selected criteria.

    e.g. Selected the tag "liferay":


    e.g. Additionally, selected the type "Web Content":
  4. Advanced operations: These are supported directly in the search input box. Most of the advanced operations supported by Lucene are supported with only slight variations.

    For a full description of the Lucene syntax see: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/queryparsersyntax.html

    Note: Many of the descriptions bellow are copied (almost word for word) from the above reference to account for the similarities but also to highlight the slight variations found between the two.
    • Searching in specific fields: By default, searches are performed against a long list of fields (this is different from Lucene which searches a single specific field by default). Sometimes you want results for a term within a particular field. This can be achieved using the field search syntax:

      <field>:<term> title:liferay

      Searching for a phrase within a field requires surrounding the term with double quotation marks:

      content:"Create categories"

      Note:The field is only valid for the term that it directly precedes, so the query

      content:Create categories will search for the term "Create" in the content field, and the term "categories" will be searched in "all" the default fields.
    • Wildcard Searches: The Search Portlet supports single and multiple character wildcard searches within single terms not within phrase queries.

      To perform a single character wildcard search use the "?" symbol.

      To perform a multiple character wildcard search use the "*" symbol.

      The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:

      te?t Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:

      test* You can also use the wildcard searches in the middle of a term.

      te*t Note: You cannot use a "*" or "?" symbol as the first character of a search.
    • Fuzzy Searches : Search supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a single word term.

      For example to search for a term similar in spelling to "roam" use the fuzzy search:

      roam~ This search will find terms like foam and roams.

      An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:

      roam~0.8 The default that is used if the parameter is not given is 0.5.
    • Range Searches: Ranges allow one to match documents whose field(s) values are between the lower and upper bound specified by the range. Ranges can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically.

      modified:[20020101000000 TO 20030101000000] This will find documents whose modified fields have values between 2002/01/01 and 2003/01/01, inclusive.

      Note: Liferay's date fields are always formatted according to the value of the property index.date.format.pattern. The format used should be a sortable pattern. The default date format pattern used is yyyyMMddHHmmss. So, when comparing or searching by dates, this format must be used.

      You can also use ranges with non-date fields:

      title:{Aida TO Carmen} This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.

      Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets.

      Note: Ranges can only be applied to single value fields.
    • Boolean Operators: Boolean operators allow terms to be combined through logical operations. The Search Portlet supports AND, "+", OR, NOT and "-" as Boolean operators.

      Note: Boolean operators must be ALL CAPS.
      • The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

        To search for documents that contain either "liferay portal" or just "liferay" use the query:

        "liferay portal" liferay or

        "liferay portal" OR liferay
      • The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.

        To search for documents that contain "liferay portal" and "Apache Lucene" use the query:

        "liferay portal" AND "Apache Lucene"
      • The "+" or required operator requires that the term after the "+" symbol exist somewhere in a field of a single document.

        To search for documents that must contain "liferay" and may contain "lucene" use the query:

        +liferay lucene
      • The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.

        To search for documents that contain "liferay portal" but not "Apache Lucene" use the query:

        "liferay portal" NOT "Apache Lucene" Note: The NOT operator cannot be used with just one term. For example,the following search will return no results:

        NOT "liferay portal"
      • The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.

        To search for documents that contain "liferay portal" but not "Apache Lucene" use the query:

        "liferay portal" -"Apache Lucene"
    • Grouping: Search supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

      To search for either "liferay" or "apache" and "website" use the query:

      (liferay OR apache) AND website This eliminates any confusion and makes sure that website must exist and either term liferay or apache may exist.
    • Field Grouping: Search supports using parentheses to group multiple clauses to a single field.

      To search for a title that contains both the word "return" and the phrase "pink panther" use the query:

      title:(+return +"pink panther")
    • Proximity Searches and Term Boosting are not supported.

Portlet Configuration

[Updated: Oct 16 2012] Search portlet configurations are currently scoped to the site page, which means that all Search Portlets used in the same site will have the same settings, regardless of their location or position on different pages will have their own configurations; this also includes any instances of the portlet embedded in themes, or other templates.

Display Settings:

  • Basic : This represents the most basic way of controlling the visible facets.

    Display Asset Type Facet: Display or not.
    Display Asset Tags Facet: Display or not.
    Display Asset Categories Facet: Display or not.
    Display Modified Range Facet: Display or not.
  • Advanced: This mode gives ultimate control over the display of facets and is where the true power lies in the Search Portlet. However, it is not for the faint of heart and requires creating a configuration in JSON format. (Future versions of Liferay will include a user friendly user interface for configuration of facets.)

    In it's default configuration, the Search Portlet configuration would equate to the following JSON text:
    {"facets": [
    	{
    		"className": "com.liferay.portal.kernel.search.facet.AssetEntriesFacet",
    		"data": {
    			"frequencyThreshold": 1,
    			"values": [
    				"com.liferay.portlet.bookmarks.model.BookmarksEntry",
    				"com.liferay.portlet.blogs.model.BlogsEntry",
    				"com.liferay.portlet.calendar.model.CalEvent",
    				"com.liferay.portlet.documentlibrary.model.DLFileEntry",
    				"com.liferay.portlet.journal.model.JournalArticle",
    				"com.liferay.portlet.messageboards.model.MBMessage",
    				"com.liferay.portlet.wiki.model.WikiPage",
    				"com.liferay.portal.model.User"
    			]
    		},
    		"displayStyle": "asset_entries",
    		"fieldName": "entryClassName",
    		"label": "asset-type",
    		"order": "OrderHitsDesc",
    		"static": false,
    		"weight": 1.5
    	},
    	{
    		"className": "com.liferay.portal.kernel.search.facet.MultiValueFacet",
    		"data": {
    			"maxTerms": 10,
    			"displayStyle": "list",
    			"frequencyThreshold": 1,
    			"showAssetCount": true
    		},
    		"displayStyle": "asset_tags",
    		"fieldName": "assetTagNames",
    		"label": "tag",
    		"order": "OrderHitsDesc",
    		"static": false,
    		"weight": 1.4
    	},
    	{
    		"className": "com.liferay.portal.kernel.search.facet.MultiValueFacet",
    		"data": {
    			"maxTerms": 10,
    			"displayStyle": "list",
    			"frequencyThreshold": 1,
    			"showAssetCount": true
    		},
    		"displayStyle": "asset_tags",
    		"fieldName": "assetCategoryTitles",
    		"label": "category",
    		"order": "OrderHitsDesc",
    		"static": false,
    		"weight": 1.3
    	},
    	{
    		"className": "com.liferay.portal.kernel.search.facet.ModifiedFacet",
    		"data": {
    			"ranges": [
    				{
    					"range": "[past-hour TO *]",
    					"label": "past-hour"
    				},
    				{
    					"range": "[past-24-hours TO *]",
    					"label": "past-24-hours"
    				},
    				{
    					"range": "[past-week TO *]",
    					"label": "past-week"
    				},
    				{
    					"range": "[past-month TO *]",
    					"label": "past-month"
    				},
    				{
    					"range": "[past-year TO *]",
    					"label": "past-year"
    				}
    			],
    			"frequencyThreshold": 0
    		},
    		"displayStyle": "modified",
    		"fieldName": "modified", 
    		"label": "modified",
    		"order": "OrderHitsDesc",
    		"static": false,
    		"weight": 1.1
    	}
    ]}
    
    The base definition consists of a JSON object with a field of type array named "facets":
    {"facets": []}
    This array must contain elements (which in JSON are called Objects) having the following mandatory structure:
    {
    	"className": ...,
    	"data": ...,
    	"displayStyle": ...,
    	"fieldName": ...,
    	"label": ...,
    	"order": ...,
    	"static": ...,
    	"weight": ...
    }
    
    • "className": This field must contain a string value which is the FQCN (fully qualified class name) of a java implementation class implementing the Facet interface. Liferay provides the following implementations by default:
      "com.liferay.portal.kernel.search.facet.AssetEntriesFacet"
      "com.liferay.portal.kernel.search.facet.ModifiedFacet"
      "com.liferay.portal.kernel.search.facet.MultiValueFacet"
      "com.liferay.portal.kernel.search.facet.RangeFacet"
      "com.liferay.portal.kernel.search.facet.ScopeFacet"
      "com.liferay.portal.kernel.search.facet.SimpleFacet"
      
    • "data": This field takes an arbitrary JSON "Object" (a.k.a. {}) for use by a specific facet implementation. As such, there is no fixed definition of the data field. Each implementation is free to structure it as needed.
    • "displayStyle": This field takes a value of type string and represents a particular template implementation which is used to render the facet. These templates are normally JSP pages (but can also be implemented as Velocity or Freemarker templates provided by a theme if the portal property theme.jsp.override.enabled is set to true). The method of matching the string to a JSP is simply done by prefixing the string with /html/portlet/search/facets/ and appending the .jsp extension.

      e.g. "displayStyle": "asset_tags"

      maps to the JSP

      /html/portlet/search/facets/asset_tags.jsp Armed with this knowledge a crafty developer could create custom display styles by deploying custom (new or overriding) JSPs using a JSP hook.
    • "fieldName": This field takes a string value and indicates the indexed field on which the facet will operate.

      e.g. "fieldName": "entryClassName"

      indicates that the specified facet implementation will operate on the entryClassName indexed field.

      Note: You can identify available indexed fields by checkmarking the Search Portlet's Display Results in Document Form configuration setting and then expanding individual results by clicking the [+] to the left of their title.
    • "label": This field takes a string value and represents the language key that will be used for localizing the title of the facet when rendered.
    • "order": This field takes a string value. There are two possible values:

      "OrderValueAsc" This tells the facet to sort it's results by the term values, in ascending order.

      "OrderHitsDesc" This tells the facet to sort it's results by the term frequency, in descending order.
    • "static": This field takes a boolean value (true or false). A value of true means that the facet should not actually be rendered in the UI. It also means that, rather than using inputs dynamically applied by the end user, it should use pre-set values (stored in it's "data" field). This allows for the creation of pre-configured result domain. The default value is false.

      Image Search Example: Imagine you would like to create a pre-configured search that returns only images (i.e. the indexed field "entryClassName" would be com.liferay.portlet.documentlibrary.model.DLFileEntry and the indexed field "extension" should contain one of bmp, gif, jpeg, jpg, odg, png, or svg). We would need two static facets, one with "fieldName": "entryClassName" and another with "fieldName": "extension". This could be represented using the following facet configuration:
      {
      	"displayStyle": "asset_entries",
      	"static": true,
      	"weight": 1.5,
      	"order": "OrderHitsDesc",
      	"data": {
      		"values": [
      			"com.liferay.portlet.documentlibrary.model.DLFileEntry"
      		],
      		"frequencyThreshold": 0
      	},
      	"className": "com.liferay.portal.kernel.search.facet.AssetEntriesFacet",
      	"label": "asset-type",
      	"fieldName": "entryClassName"
      },
      {
      	"displayStyle": "asset_entries",
      	"static": true,
      	"weight": 1.5,
         	"order": "OrderHitsDesc",
      	"data": {
      		"values": [
      			"bmp", "gif", "jpeg", "jpg", "odg", "png", "svg"
      		],
      		"frequencyThreshold": 0
         	},
      	"className": "com.liferay.portal.kernel.search.facet.MultiValueFacet",
      	"label": "images",
      	"fieldName": "extension"
      }
      
      			
    • "weight": This field takes a floating point (or double) value and is used to determine the ordering of the facets in the facet column of the search portlet. Facets are positioned with the largest values at the top (yes it's counter intuitive and perhaps should be reversed in future versions).

Other Settings

  • Display Results in Document Form: This configuration, if checked, will display each result with an expendable section you can reach by clicking the [+] to the left of the result's title. In Document Form, all of the result's indexed fields will be shown in the expandable section. This is for use in testing search behavior.

    Note: Even if enabled, for security reasons this ability is only available to the portal Administrator role because the raw contents of the index may expose protected information.
  • View in Context: This configuration, if checked, will produce results which have links that target the first identifiable application to which the result is native.

    For example, a Blog entry title will link (or attempt to link) to a Blogs Admin, Blogs, or Blogs Aggregator portlet somewhere in the current site. The exact method of location is defined by the result type's AssetRenderer implementation.
  • Display Main Query: This configuration, if checked, will output the complete query that was used to perform the search. This will appear directly bellow the result area, like this:

  • Display Open Search Results: In previous versions of the portal, the Search Portlet was implemented as a collection of com.liferay.portal.kernel.search.OpenSearch implementation classes which were executed in series. Due to the subsequent re-design of the Search Portlet, the portal itself no longer relies on these implementations for it's primary search. However, third party plugin developers may yet have Open Search implementations which they would like to continue to use. This configuration, if checked, will enable the execution of these third party Open Search implementations and results will appear bellow the primary portal search.

    Note: It is highly recommended that third parties re-design their search code to implement com.liferay.portal.kernel.search.Indexer or more simply to extend com.liferay.portal.kernel.search.BaseIndexer. Thus it will be possible to aggregate custom assets with native portal assets.

For Developers

Key Classes

When implementing a customized search, many of following API classes are important:

com.liferay.portal.kernel.search.SearchContext
com.liferay.portal.kernel.search.SearchContextFactory
com.liferay.portal.kernel.search.facet.config.FacetConfiguration
com.liferay.portal.kernel.search.facet.config.FacetConfigurationUtil
com.liferay.portal.kernel.search.facet.util.FacetFactoryUtil
com.liferay.portal.kernel.search.facet.Facet
com.liferay.portal.kernel.search.Indexer
com.liferay.portal.kernel.search.IndexerRegistryUtil
com.liferay.portal.kernel.search.BaseIndexer
com.liferay.portal.kernel.search.FacetedSearcher
com.liferay.portal.kernel.search.SearchEngineUtil
com.liferay.portal.kernel.search.Hits
com.liferay.portal.kernel.search.Document
com.liferay.portal.kernel.search.facet.collector.FacetCollector
com.liferay.portal.kernel.search.facet.collector.TermCollector

We'll briefly go through the general organization of the above to understand where each class fits into the greater scheme.

SearchContext

The first thing required is to setup a context within which to perform a search. The context defines things like company instance to search, the current user invoking the search, etc. This task is handled by the com.liferay.portal.kernel.search.SearchContext class. Since this class has a wide variety of context properties to deal with, the most effective way to get one is to call the getInstance(HttpServletRequest request) method of the com.liferay.portal.kernel.search.SearchContextFactory class.

SearchContext searchContext = SearchContextFactory.getInstance(request);

Context Properties

Once you have a SearchContext instance, we then can populate values like the pagination style, start and end:

searchContext.setAttribute("paginationType", "more");
searchContext.setEnd(mainSearchSearchContainer.getEnd());
searchContext.setStart(mainSearchSearchContainer.getStart());

There are number of other SearchContext properties that can be set. See the javadocs for a complete list.

Setting up Facets

After we have setup all the appropriate SearchContext properties, we are ready to add the Facets for which we want to collect information. We can add Facets either programatically or through configuration. Programatically adding facets allows the developer to tightly control how the search is used. The following example shows how to add two facets using some provided Facet classes:

Facet assetEntriesFacet = new AssetEntriesFacet(searchContext);
assetEntriesFacet.setStatic(true);
searchContext.addFacet(assetEntriesFacet);

Facet scopeFacet = new ScopeFacet(searchContext);
scopeFacet.setStatic(true);
searchContext.addFacet(scopeFacet);

Note: The above two Facet implementations are not re-usable in that they always operate on specific indexed fields; entryClassName, and groupId (and scopeGroupId) respectively. Other implementations can be re-used with any index fields as demonstrated previously in the Image Search Example.

As shown previously, facets can also be setup using a JSON definition. Using a JSON definition allows for the highest level of flexibility since the configuration can be changed at run-time. These definitions are parsed by the static method load(String configuration) on the com.liferay.portal.kernel.search.facet.config.FacetConfigurationUtil class. This method reads the JSON text and returns a list of com.liferay.portal.kernel.search.facet.config.FacetConfiguration instances.

List<FacetConfiguration> facetConfigurations = FacetConfigurationUtil.load(searchConfiguration);

for (FacetConfiguration facetConfiguration : facetConfigurations) {
	Facet facet = FacetFactoryUtil.create(searchContext, facetConfiguration);

	searchContext.addFacet(facet);
}

Facets as Filters

It should be noted that Facets are always created with reference to the SearchContext. Since facets also behave as the dynamic filter mechanism for narrowing search results, having the SearchContext allows a Facet implementation to observe and react to context changes such as looking for specific parameters which affect it's behavior.

Indexer Implementations

The next step involves obtaining a reference to an indexer implementation. The implementation obtained determines the type of results return from the search.

With respect to searching, there are two categories of Indexer implementations: Asset Specific Searchers and Aggregate Searchers.

Asset Specific Searchers

As the name implies, Asset Specific Searchers always deal with only one specific type of asset. These are the implementations that are provided by developers when creating/designing custom Asset types. Liferay provides the following Asset Specific Searchers:

com.liferay.portal.plugin.PluginPackageIndexer
com.liferay.portlet.blogs.util.BlogsIndexer
com.liferay.portlet.bookmarks.util.BookmarksIndexer
com.liferay.portlet.calendar.util.CalIndexer
com.liferay.portlet.documentlibrary.util.DLIndexer
com.liferay.portlet.journal.util.JournalIndexer
com.liferay.portlet.messageboards.util.MBIndexer
com.liferay.portlet.softwarecatalog.util.SCIndexer
com.liferay.portlet.usersadmin.util.OrganizationIndexer
com.liferay.portlet.usersadmin.util.UserIndexer
com.liferay.portlet.wiki.util.WikiIndexer

A developer tells the portal about Indexer implementations by declaring them in their liferay-portlet.xml file.

<indexer-class>com.liferay.portlet.calendar.util.CalIndexer</indexer-class>

Any number of such implementations may be provided.

Aggregate Searchers

Obtaining a reference to an Asset Specific Indexer requires calling either the getIndexer(Class<?> clazz) or getIndexer(String className) methods on the com.liferay.portal.kernel.search.IndexerRegistryUtil class.

Indexer indexer = IndexerRegistryUtil.getIndexer(PluginPackage.class);

Aggregate Searchers can return any of the asset types in the index according to the SearchContext and/or facet configuration. Liferay only provides a single aggregate searcher implementation:

com.liferay.portal.kernel.search.FacetedSearcher

Obtaining a reference to this searcher simply involves calling the static getInstance() method of the same class.

Indexer indexer = FacetedSearcher.getInstance();

Note<: When implementing Indexers it is highly recommended to extend the com.liferay.portal.kernel.search.BaseIndexer class.

SearchEngineUtil

Internally each Indexer will make calls to the SearchEngineUtil which handles all the intricacies of the engine implementation. For the purpose of this document, we won't delve into the internals of SearchEngineUtil. But suffice it to say that all traffic to and from the search engine implementation passes through this class, and so when debuging problems it is often beneficial to enable debugging level logging on this class.

Performing the Search

Once an Indexer instance has been obtained, searches are performed by calling its search(SearchContext searchContext) method.

Hits & Documents

The result of the search method is an instance of the com.liferay.portal.kernel.search.Hits class.

Hits hits = indexer.search(searchContext);

This object contains any search results in the form of an array (or list) of com.liferay.portal.kernel.search.Document instances.

Document[] docs = hits.getDocs();

OR

List<Document> docs = hits.toList();

The results display typically involves iterating over this array. Each Document is effectively a hash map of the indexed fields and values.

Facet Rendering

Facet rendering is done by getting Facets from the SearchContext after the search has completed and passing each to a template as defined by the FacetConfiguration:

Map<String, Facet> facets = searchContext.getFacets();
List<Facet> facetsList = ListUtil.fromCollection(facets.values());
facetsList = ListUtil.sort(
	facetsList, new PropertyComparator("facetConfiguration.weight", false, false));

for (Facet facet : facetsList) {
	if (facet.isStatic()) {
		continue;
	}

	FacetConfiguration facetConfiguration = facet.getFacetConfiguration();
	request.setAttribute("search.jsp-facet", facet);

%>

	<liferay-util:include page='<%= "/html/portlet/search/facets/" + facetConfiguration.getDisplayStyle() + ".jsp" %>' />

<%
}

Facet Details (Terms and Frequencies)

A Facet's details are obtained by calling it's getFacetCollector() method which returns an instance of com.liferay.portal.kernel.search.facet.collector.FacetCollector class.

FacetCollector facetCollector = facet.getFacetCollector();

The primary responsibility of this class is to in turn provide access to TermCollector instances primarily by calling the getTermCollectors() method, but also by getting a TermCollector by term value using the getTermCollector(String term) method. There will be a TermCollector for each term that matches the search criteria, as well as the facet configuration.

List<TermCollector> termCollectors = facetCollector.getTermCollectors();

OR

TermCollectorterm termCollector = facetCollector.getTermCollector(term);

And finally, the com.liferay.portal.kernel.search.facet.collector.TermCollector class provides the getFrequency() method.

<%= termCollector.getTerm() %> <span class="frequency">(<%= termCollector.getFrequency() %>)</span>

Rendered facet views (i.e. non-static facets) should result in UI code which allows dynamically passing facet parameters the interpretation by the implementation (see Facets as Filters). There are a number of examples in the /html/portlet/search/facets folder of the Search Portlet.

============================================================

Well, I hope that was useful information.

At the core of all content management lies search and so I'm really excited about the potential of this new search API. As we work on 6.2 and introduce even more innovative new search features, I hope to see Liferay become the most feature rich and extensible search integration platform available in the market. 

Blogs
This is awesome Ray!!! Thanks for the post
Just couple of questions around the frameworks used did we try to use Solor or still continued with Lucene and wrote our facet search framework.
We have abstracted our own Facet framework so that we could build it more or less on any engine. For Lucene, facet support is implemented using the Bobo engine (http://code.google.com/p/bobo-browse/) and in the case of Solr we just use it's features directly. In both cases we wrap the underlying technology with our own API. The idea being that you could plug any engine in back there and our front end APIs don't need to change. The API is very simple for facet support.
That is a lot of information. I think I'm going to need to read it several times in order to hold it all in my head. Nice work. It looks like a lot of time and effort went into this post.
Ray I always have an eagerness for reading your posts and articles because I know all your articles are informative as well as knowledgeable..
Hi..i m new to liferay..i want to create search portlet using lucene..can u suggest steps for me..any sample code...

Thank you..
Hi Ray,
I wanna thank you about this post because it's so informative, but I have just a problem: when for example I am searching for a word, and results are in more than one page when I click to go to the next page (I click "More") I got error and no results appear ..
@Siva, the best sample code is the search portlet itself: https://github.com/liferay/liferay-portal/blob/master/portal-web/docroot/html/portlet/search/main_search.jspf

@Firas, is the error with the default search portlet or with custom code?
@Ray Augé:
the error is with the default search portlet in Liferay 6.1: and this is the error syntax:

12:06:15,737 ERROR [IncludeTag:154] com.liferay.portal.kernel.search.SearchException: java.lang.IllegalArgumentException: fromIndex(20) > toIndex(5)
at com.liferay.portal.kernel.search.FacetedSearcher.search(FacetedSearcher.java:106)
at org.apache.jsp.html.portlet.search.search_jsp._jspService(search_jsp.java:1073)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:432)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:390)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilterChain.doFilter(InvokerFilterChain.java:72)
at com.liferay.portal.kernel.servlet.filters.invoker.InvokerFilter.doFilter(InvokerFilter.java:70)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684)
at org.apache.catalina.core.ApplicationDispatcher.doInclude(ApplicationDispatcher.java:593)
at org.apache.catalina.core.ApplicationDispatcher.include(ApplicationDispatcher.java:530)
at com.liferay.taglib.util.IncludeTag.include(IncludeTag.java:323)
at com.liferay.taglib.util.IncludeTag._doInclude(IncludeTag.java:418)
at com.liferay.taglib.util.IncludeTag.doEndTag(IncludeTag.java:92)
Caused by: java.lang.IllegalArgumentException: fromIndex(20) > toIndex(5)
at java.util.SubList.<init>(AbstractList.java:604)
at java.util.RandomAccessSubList.<init>(AbstractList.java:758)
Can you file a ticket for this which outlines the exact steps you used to reproduce it? I'll take a look as soon as I get a chance.
@ Ray:
http://issues.liferay.com/browse/LPS-26227
Hi Ray,

I'm new to Liferay and need your advise.

I've created a hook to a search portlet and have a couple of questions:
1. The start page is view.jsp and conains only <liferay-ui:search />. how would I bypass it so it would go directly to main_search.jsp?

2. In my Document library I have categories Monthly Reports and Quarterly Reports. I want to place two links(like facets) with category names and when user clicks it would process the search only for selected category. How would you suggest to approach this task?

Or maybe i need to create a custom search portlet based on a search portlet. Is it a good way to copy all the code from custom portlet into my portlet and modify it?

Thank you in advance
Hi,

Any progress on this. I have modified liferay's default Blog portlet via EXT to meet up my requirement.
In control panel when I click on Blogs portlet and try to search for a Blog with say "test" then I can see the list of Blogs with proper pagination.
For example , for 44 entries of corresponding Blogs I can see 3 pages i.e 4 links:
1 2 3 Next
But when I follow same step for my customized Blog portlet then for same results I can see 5 links:
1 2 3 4 Next
Now when I click on 4, no display appears with backend error saying :
java.lang.IllegalArgumentException: fromIndex(60) > toIndex(44)

Can anybody help me to find the exact cause. emoticon

Thanks
Hi,

Any progress on this. I have modified liferay's default Blog portlet via EXT to meet up my requirement.
In control panel when I click on Blogs portlet and try to search for a Blog with say "test" then I can see the list of Blogs with proper pagination.
For example , for 44 entries of corresponding Blogs I can see 3 pages i.e 4 links:
1 2 3 Next
But when I follow same step for my customized Blog portlet then for same results I can see 5 links:
1 2 3 4 Next
Now when I click on 4, no display appears with backend error saying :
java.lang.IllegalArgumentException: fromIndex(60) > toIndex(44)

Can anybody help me to find the exact cause. emoticon

Thanks
Nice post, and timely too. I was just about to implement search in a project I'm working on. A couple of questions: you said that proximity search and term boosting aren't supported. Is that just in the search portlet, or in the underlying API as well?
Also, in the development part of the article your code refers to something called mainSearchContainer . What is it, and where'd you get it from? Thanks.
Hi Ray,

Thanks for the article. I'm trying to get a faceted search to work using an "AND" condition for the assetCategoryIds field, however it never returns any results. I enabled debug output on both the SearchEngineUtils and LuceneIndexSearcherImpl classes and the query string output is "+(+(companyId:10154) +(assetCategoryIdsemoticon11703 AND 11804)) +createDate:[19700101000000 TO 20121231235959] +((+(entryClassName:com.liferay.portlet.journal.model.JournalArticle) +(status:0))))".

To test the lucene syntax, I also used the code below to produce a Query object and it works fine. Do you have any ideas on what might be happening? Thanks in advance for the help.

Query query = StringQueryFactoryUtil.create("+(+(companyId:10154) +(assetCategoryIdsemoticon11703 AND 11804)) +createDate:[19700101000000 TO 20121231235959] +((+(entryClassName:com.liferay.portlet.journal.model.JournalArticle) +(status:0))))");

Hits hits = SearchEngineUtil.search(searchContext.getCompanyId(), query, QueryUtil.ALL_POS, QueryUtil.ALL_POS);
Try

+(+assetCategoryIds:11703 +assetCategoryIds:11804)

it's easier on the parser and means the same thing.
AND should be used more specifically like so:

assetCategoryIds:11703 AND assetCategoryIds:11804
Hi Ray,

Thanks for the quick reply! I have used "+" instead of "AND", but still get the same result.

I'm not very familiar with lucene, but it seems like the problem is that based on the way I am setting the attribute values in the SearchContext object and have the facet defined, the objects that build the query string will only output the "assetCategoryIds" text just once. For example, the query string would look like +(assetCategoryIds:+11703 +11804) instead of +(+assetCategoryIds:11703 +assetCategoryIds:11804).

As noted above, I'm using the SearchContext object to set the attributes of the assetCategoryIds field, but I don't see a way to include multiple arguments into that one field with out concatenating them together myself. Something like the code below. I've also tried setting the andSearch property on the SearchContext with no success.

searchContext.setAttribute("assetCategoryIds", convertCategoryIdsToAndString(categoryIds));
Ray, just found this post after looking into Search results and facets in more detail for our customization work.

I have one question though? can we get the facet tags and categories localized?(displayed in the facets/asset_tags.jsp as well as under relevant search item).
When you have multi-lingual site it looks odd if the categories are in the base locale for the site. The categories are translated when displayed within the relevant portlet (eg document library).
I understand these are now facets in a search engine result list but the user isnt going to understand that,
Hi Ray,

I have a problem with create pre-configured to search only returns That tag "noticia"
The search returns all the content I type "com.liferay.portlet.journal.model.JournalArticle"

My code is this:

{facets: [
{className: 'com.liferay.portal.kernel.search.facet.AssetEntriesFacet',
data: {frequencyThreshold: 1,
values: [
'com.liferay.portlet.journal.model.JournalArticle'
]},
displayStyle: 'asset_entries',
fieldName: 'entryClassName',
label: 'asset-type',
order: 'OrderHitsDesc',
static: false,
weight: 1.5},
{className: 'com.liferay.portal.kernel.search.facet.MultiValueFacet',
data: {frequencyThreshold: 1,
values: [
'noticia'
]},
displayStyle: 'asset_entries',
fieldName: 'assetTagNames',
label: 'tag',
order: 'OrderHitsDesc',
static: false,
weight: 1.5},
{className: 'com.liferay.portal.kernel.search.facet.MultiValueFacet',
data: {displayStyle: 'list',
frequencyThreshold: 1,
maxTerms: 10,
showAssetCount: true},
displayStyle: 'asset_tags',
fieldName: 'assetTagNames',
label: 'tag',
order: 'OrderHitsDesc',
static: false,
weight: 1.4},
{className: 'com.liferay.portal.kernel.search.facet.MultiValueFacet',
data: {displayStyle: 'list',
frequencyThreshold: 1,
maxTerms: 10,
showAssetCount: true},
displayStyle: 'asset_tags',
fieldName: 'assetCategoryNames',
label: 'category',
order: 'OrderHitsDesc',
static: false,
weight: 1.3},
{className: 'com.liferay.portal.kernel.search.facet.RangeFacet',
data: {frequencyThreshold: 1,
ranges: [{label:'modified',
range:'[19700101000000 TO *]'}]},
displayStyle: 'modified',
fieldName: 'modified',
label: 'modified',
order: 'OrderHitsDesc',
static: false,
weight: 1.1}]}

Thank you very much.
Best regards.
It is possible that Liferay can index content of PDF documents so it can show up in the search ?
@Salvador, make your "noticia" facet static since you are forcing passing a single value.

@Amit, Liferay does it's best attempt to index PDFs automatically (it can't get content from a PDF filled only with scanned images for instance).
Wow, incredible article! I would like to know what is the best way to let the user filter by asset type before hand, before perfoming the search. For example, the user could enter the word "water", and then, from a checkboxes list, choose the asset type to look into, for example, Blog entries. And that would search the word water in blog entries only.
It's true that the search portlet doesn't start in the mode having performed a default search (a search with no keywords). If it did, then you would see all the facets available, and then you would see and could select the asset type you want to search within and then add keywords. I think that addresses the scenario you are asking about. Perhaps it would be possible to add that behavior is a configuration option of the portlet. It wouldn't be hard. Can you make a feature request in JIRA (http://issues.liferay.com)?
Thanks Ray for the prompt reply. I just added a request: http://issues.liferay.com/browse/LPS-27514
Thank you very much Ray. Solved.

Facet to filter by tag:

{facets: [
{className: 'com.liferay.portal.kernel.search.facet.AssetEntriesFacet',
data: {frequencyThreshold: 1,
values: [
'com.liferay.portlet.journal.model.JournalArticle'
]},
displayStyle: 'asset_entries',
fieldName: 'entryClassName',
label: 'asset-type',
order: 'OrderHitsDesc',
static: false,
weight: 1.5},
{className: 'com.liferay.portal.kernel.search.facet.MultiValueFacet',
data: {frequencyThreshold: 1,
values: [
'noticia'
]},
displayStyle: 'asset_entries',
fieldName: 'assetTagNames',
label: 'tag',
order: 'OrderHitsDesc',
static: true,
weight: 1.5}]}

Best regards
Hi,

I would like to know how drill down can be allowed... Even I select a tag in result view, each tag is replaced when I choose another one.

Thanks for your answer emoticon
Yeah, this was deferred to a later version. Technically the backend code can handle any number of arguments per facet, but in order to get a first cut, simple to use UI we opted to limit it to only a single argument per facet. BUT, since it's only a limitation from the UI, you can easily create custom facet view template (jsp hook) that overrides the default and allows multiple selection per facet.

Can you open a feature request so we can track this for future versions?
Hi Ray,
I would like to know what is the best way to add a custom facet implementation.
I am trying to implement drilled-down search, and I need to produce an AND query for MultiValueFacet.
Looking at source, I believe that this can't be done, because termQuery has booleanClause hardcoded (added request http://issues.liferay.com/browse/LPS-28228).

Thanks in advance
Facets are for single fields only (both Bobo [our lucene facet impl] and Solr only have APIs for collecting facet on single fields at a time). But if you need to refine the underlying query so that it reflects some custom logic, then you can do that by passing an array of BooleanClauses to the searchContext before calling search in the FacetedSearcher.

SearchContext searchContext = .. // setup the context
Indexer indexer = FacetedSearcher.getInstance();
searchContext.setBooleanClauses(BooleanClause[] booleanClauses)
Hits hits = indexer.search(searchContext);

Note: An AND is simply a MUST "Occur" clause around a number of other Query instances (or Clauses).
BTW, the "Multi" in the name MultiValueFacet is not to reflect how it is used with respect to the number of fields to collect data from, but rather to indicate the "type" of field it can be used with. In this case fields with "multiple" values.

In indexing engines, fields generally fall into one of two broad classes:
- single value fields (like a number, fixed string token)
- multi-value fields (like text, or arrays of values)

There are certain types of operations that can take place on each of those two classes of fields. For instance, you can't do a Range query on a multi-value field. you generally can't sort a result set on a multi-value field, single value fields must generally be exact matches (setting aside regex matching obviously), etc.
Hi Ray,
I have the requirements to pass multiple categories names while searching. But Faceted searcher is not giving the results for multiple categories.
For example while I pass assetCategoryNames field value as comm separated (say topic2,COUNTRY). The full query that is built in FacetedSearcher.java class looks like ----

"+(+(companyId:10154) +(assetCategoryNames:topic2 COUNTRY) +((+(entryClassName:com.liferay.portlet.bookmarks.model.BookmarksEntry)) (+(entryClassName:com.liferay.portlet.blogs.model.BlogsEntry)) (+(entryClassName:com.liferay.portlet.calendar.model.CalEvent)) (+(entryClassName:com.liferay.portlet.documentlibrary.model.DLFileEntry) +(status:0)) ..........".

Above if you see the part of query---" (assetCategoryNames:topic2 COUNTRY) ", you will notice that category names are coming but there is no boolean operator added by the system. I have gone through all the above threads, I found ur comments but could not get it completely as what needs to be done to select multiple categories at one time while searching.
If any booleanClause need to be set in serachContext then how and what it should be .

Please let me know the solution if you are aware of this scenario.

Thanks in advance,
Himanshu Modi
Typo--(assetCategoryNames:topic2 assetCategoryNames:COUNTRY) is the correct query I'm getting.
The correct way to do that would be by passing a set of additional BooleanClauses to the searchContext.setBooleanClauses(BooleanClause[] clauses) method just before making the search. Facets are not designed for doing filtering only for collecting metrics.
Hi Ray,
is there any way to integrate a custom portlet into the asset entries facet ?
My portlet has it's custom Indexer that is registered via liferay-portlet.xml and I added my model in the search-portlet's configuration. If I add an OpenSearch implementation the results are displayed at the bottom of the search, but I can't get it to work with the faceted search. What am I missing?
I figured it it out, my Indexer extends BaseIndexer but somehow not all keywords are set for the facet query to match my documents. I was missing fields "COMPANY_ID" and "GROUP_ID" that are only set in BaseIndexer if your model is an instance of "AuditedModel" which apparently my model class is not.
Ok, great! I'm glad you managed to figure it out. Yes, AuditedModel is a helper interface around entities designed to support multi-tenancy.

AuditedModel interface will be automatically added to your Model when the entity definition contains the fields: companyId, createDate, modifiedDate, userId, userName.

Similarly, the GroupedModel interface provides support for scoping models to groups (a.k.a. Sites) and is applied if the entity is an AuditedModel + has the groupId field.

There are several other automatically applied interfaces derived from entity columns, like workflow, attached, resourced, etc.

Apparently this is a subject still in need of documentation.
Ray, thanks for your reply!
What would think of a new feature in the service builder definition that would automatically create these audit fields i.e. a new property "auditedModel=true/false" ?
That would be cool (JIRA feature request?).
Hi Ray,

A really useful article. Can you tell me if the same functionalities are available through web services or JSON requests (via portal-client.jar) ?

Thanks,
Denis.
Unfortunately not at the moment (JIRA feature request?).
Hi Ray,
I do understand your point about facet's design. However, as others have pointed out, there's a strong use case regarding drilled down search.
As of 6.1GA2, I don't see any possible solution which can be developed by using a jsp hook. We can't use multiple facets on the same term, and current implementation has BooleanClauseOccur.SHOULD hardcoded.
MultiValueFacet (btw thanks for explaining its name) does handle multiple terms, so why you say that ain't the right place to edit the query?

Look at @Himanshu case as example:
- requirement: filter search with multiple terms on the same element
- terms:topic2,COUNTRY
- current implementation produces (assetCategoryNames:topic2 assetCategoryNames:COUNTRY): result is OR between these clauses
- my proposal http://issues.liferay.com/browse/LPS-28228: let admin decide facet clause
example result with (AND -BooleanClauseOccur.MUST ) -> (assetCategoryNames:topic2 AND assetCategoryNames:COUNTRY)

I don't want to use a single facet to search on multiple fields, instead I need the option to produce and AND booleanQuery.
I'm already using this approach on a couple of projects, because of deadlines, but I'm more than open to suggestions on a better way to implement this. Your explanations are really valuable.

Thanks for your response, and sorry for my English..
Matteo
Thanks for this detailed blog about Faceted Search API.
I am having an issue. The search is returning articles with old articleID.

I have a portlet to add articles with articleID set to article title, later I change the portlet to delete all articles and add articles with articleID generated by Liferay.

Problem happened, the data returned is the old article created before. The article is still using title as articleID instead of system generated ID.

I tried to clear cache from Server Administration and tried to restart server several times and still the Faceted Search API returns old article. I checked the database and it is not there.

Where is the data cached?
The data is not cached so much as it may be that the indexes are out of sync with the real data (in the Demoticon. Try reindexing the search engines (you can do that all at once via the Admin portlet, or individually by portlet through the plugins configurations portlet).
@Matteo, As I stated previously, you can use the searchContext.setBooleanClauses(BooleanClause[] clauses) method to add more filtering. Such clauses can implement whatever complex logic you wish to add without causing the facet collector to process for that data (which if all you want to do is filter, is really the wrong mechanism).
Hi Ray, Thank you for very informative and helpful blog.

I would like to add something more in detail here:
We sometime keep data in web content structure fields, and custom fields. Following are naming pattern of these fields:

Structure fields can be referred as:
"web_content/structure-field-name"
Here structure-field-name name of field in webcontent structure.

Custom fields can be referred as:
"expando/custom_fields/custom_field_key"
Here custom_field_key is key of custom field for entity.

These can be used in facet configuration with additional display style (via hook).
@Yogesh, that is correct! Thanks for pointing that out. We wanted to make sure any field in the index was accessible for facet collection, including custom fields, including document type fields which will be encoded a little differently as "ddm/<ddmStructureId>/<fieldName>". Sadly, it's true that the use of <ddmStructureId> is not ideal and there has already been discussion to perhaps provide an alternative that is more "usable" in future versions.
Hi Ray,

Is there a way to exclude certain fields? Here it only gives option to specify fields and values that are to be searched. What about the scenario : search all the facets except for one facet. Or search all except fieldX with valueY? I suppose this is only possible for now with PostProcessorHook?
A facet by it's very nature can only search one field.. so I'm not clear on that question.

It's also possible to apply an array of QueryClauses on the SearchContext to filter the reqults. This is how I would implement the: "Or search all except fieldX with valueY?" req. But that would require a hook at the moment.
Hi Ray, Thank you for very informative blog.

Question:
suppose that there are two categories / tags with same name in Global group and current group; how could we distinguish them in facets by names? is it better to use category ID / tag ID?
Jonas, categoryId is better in this case.
Hi Ray, I'm new to Liferay

in my custom theme I put search portlet at runtime with the following code:

$velocityPortletPreferences.setValue("portlet-setup-show-borders","false")
$velocityPortletPreferences.setValue("advancedConfiguration","true")
$velocityPortletPreferences.setValue("searchConfiguration","'facets':[{'displayStyle':'asset_entries','weight':1.5,'s[......]")
$theme.runtime("3_INSTANCE_kw01","",$velocityPortletPreferences.toString())
$velocityPortletPreferences.reset()

where in "searchConfiguration", I insert a new line to make sure that the portlet can search a custom entity .

But when in the portal I push the search command, these configurations are not observed.
Ok, I think I see the problem. First, the search portlet is not instanceable, which means you can remove the "_INSTANCE_kw01" portion of the portletId. Secondly, because the portlet is not instanceable, you have to use a different technique to set it's preferences. See this gist https://gist.github.com/4287391 (there are actually 2 different preferences, the preferences, and the setup). ;)
Hi Ray, This info. is really good. i have some requirement but i don't know how to do this. My requirement is "suppose with search portlet there is 3 radio button 1st company tag, 2nd is for message board and 3rd for site. after selecting any radio button where and which value i need set or what code modification i need to do."

Thank you in Advance
Ray,
Is putting the default search configuration in a custom theme (in the link you posted the portal_normal.vm file) the preferred way to change the default search configuration for all search portlets? I couldn't get it to work and was wondering if there's a better way. I'm also trying to add a custom entity to the advanced search configuration for all search portlets in my app.
Thanks.