Creating a Google Like Search Part III: Autocompletion / Suggestions

(Previous parts of this series can be found here (part 1) and here (part 2)

This time we add an autocomplete / keyword suggester to the search field and query suggestions with automatic alternative search mechanism for the queries not giving any results.

First, a few words about the semantics and definitions. Autocomplete, keywords and query suggestions and spellchecking are, in many cases, mixed with each other in spoken language. So, before going to the task I’d prefer to make a slight distinction between those terms. Autocompletion, in my interpretation, is an inline completion component, like the predictive text input in mobile phones. An UI feature. Suggestions are those what an autocomplete component is offering and the challenge is getting those right. Keyword suggestions are terms or phrases offered in a form of a list to you when you type. Query suggestions are alternatives offered to you when your search doesn’t give any results. Spellchecking tends to work like keyword suggestions but its’ sole purpose is to do spell checking. From UI perspective all share a lot, of course.

How do suggesters work?

There are multiple approaches to making a suggesters.

Probably the easiest and most manageable is to use self-defined dictionaries. This is possible with Liferay even with the standard search portlet. If you take a look at portal.properties you can see there:

#index.search.query.suggestion.dictionary[en_US]=\
     com/liferay/portal/search/dependencies/querysuggestions/en_US.txt

If you define these language bound dictionaries and enable query indexing there, queries get indexed automatically to the querySuggestion Elasticsearch type and are ready for the use as query suggestions in standard Liferay search portlet.

Another simple approach would be just to make searches as you type and return suggestions in the preferred form back to UI. Obviously, this would probably kill Elasticsearch servers under heavy traffic unless you had a good caching mechanism and proper delays.

The approach we are using here for both keyword and query suggestions, and what Liferay offers out of the box, is query indexing. You make a succesful search and if the result count is above a defined threshold value the query gets indexed to the querySuggestions type in Elasticsearch. This is also the principle Google is doing it –certainly having a lots of other kind of intelligence and filtering in providing those suggestions to UI.

The best thing in this option is manageable relevancy. You can configure (in portlet configuration) the threshold level, when queries get indexed. You could then for example say that a search phrase returning only 2 results is not relevant enough to get into suggestions. Also as you have the suggestions as a dedicated index type (or in dedicated index) you can do there index level tuning to adjust analyzers and improve the relevancy for you use case. In this exercise here we are using the standard Liferay QueryIndexer with standard index settings but I’ll revisit this topic in the coming parts to show some options how you could improve the suggestions relevancy.

Doing a good keywords suggester is by no means a trivial task and different scenarios would probably need different kind on fine tuning. The solution created here is just a starting point where you can build on.

About the Solution

There are few things to keep in mind in this solution.

First, the suggestions are not persisted, they live only in index. If you reindex, they are lost and portal has to start learning suggestions again. With time however, they’ll get to the same level.

Suggestions are language bound. If you have a multilingual portal, users of different languages are not sharing same suggestions.

Suggestions management. The solution here doesn’t do any filtering, at least yet. If you search with a not nice search phrase and get results above the threshold, it gets indexed.

Query Indexing

How are the queries getting indexed? In this solution I’m bypassing much of the automation that using SearchContext and FacetedSearcher brings, to get a better low-level control of things happening. That’s one of the reasons why I had to implement trigger query indexing here by myself, using however the standard com.liferay.portal.search.internal.hits.QueryIndexingHitsProcessor.

If you are already worried about this app messing up with Liferay indexing: I’m just working here around hits processors and query indexing which are bound to the search interface.

Query indexer processor is triggered after search results have been returned, in fi.soveltia.liferay.gsearch.web.search.internal.GSearchImpl service in method getResults():

_queryIndexerProcessor.process(searchContext, _gSearchDisplayConfiguration, _queryParams, hits);

Processing happens in QueryIndexerProcessorImpl service:


@Override
public boolean process(
	SearchContext searchContext,
	GSearchDisplayConfiguration gSearchDisplayConfiguration,
	QueryParams queryParams, Hits hits)
	throws Exception {

	if (_log.isDebugEnabled()) {
		_log.debug("Processing QueryIndexer");
	}

	if (!gSearchDisplayConfiguration.enableQuerySuggestions() &&
		!gSearchDisplayConfiguration.enableAutoComplete()) {
		return true;
	}

	if (_log.isDebugEnabled()) {
		_log.debug("QueryIndexer is enabled");
	}
	
	if (hits.getLength() >= gSearchDisplayConfiguration.queryIndexingThreshold()) {

		if (_log.isDebugEnabled()) {
			_log.debug("QueryIndexing threshold exceeded. " + 
                              Indexing keywords: " + queryParams.getKeywords());
		}

		addDocument(
			queryParams.getCompanyId(), queryParams.getKeywords(),
			queryParams.getLocale());
	} else {
		if (_log.isDebugEnabled()) {
			_log.debug("QueryIndexing threshold wasn't exceeded." +
                                 " Not indexing keywords.");
		}
	}
	return true;
}

Keyword Suggester

The easiest part in doing the keyword suggester was adding the autocompletion functionality to the searchfield. I’m using here my own override of the Metal.JS autocomplete class but this override just removes the SPACE key from making a suggestion selection.

The autocomplete class GSearchAutoComplete:


import Autocomplete from 'metal-autocomplete/src/Autocomplete';

const DOWN = 40;
const ENTER = 13;
const UP = 38;

/*
 * GSearch autocomplete component extending Metal.JS autocomplete.
 */
class GSearchAutocomplete extends Autocomplete {

	/**
	 * This is an override for the original Metal.js Autocomplete.
	 * It simply removes SPACE from the select keys.
	 *
	 * @param {!Event} event
	 * @protected
	 */
	handleKeyDown_(event) {
		
		if (this.visible) {
			switch (event.keyCode) {
				case UP:
					this.activateListItem_(this.decreaseIndex_());
					event.preventDefault();
					break;
				case DOWN:
					this.activateListItem_(this.increaseIndex_());
					event.preventDefault();
					break;
				case ENTER:
					this.handleActionKeys_();
					event.preventDefault();
				break;
			}
		}
	}
}
export default GSearchAutocomplete;

Binding the autocomplete component to the search field is done in the search fields component class  GSearchFields.es.js


initAutocomplete() {
	
	let _self = this;
	
	let autocomplete = new GSearchAutocomplete ({
		elementClasses: 'gsearch-autocomplete-list',
		inputElement:document.querySelector('#' + this.portletNamespace + 'SearchField'),
		data: function(keywords) {
			if (keywords.length >= _self.getQueryParam('queryMinLength') && 
                           !_self.isSuggesting && keywords.slice(-1) != ' ') {
				return _self.getSuggestions(keywords);
			} else {
				return;
			}
		},
		select: function(keywords, event) {
			$('#' + _self.portletNamespace + 'SearchField').val(keywords.text);
		}
	});
}

Autocomplete request is being made in getSuggestions():


getSuggestions(keywords) {

	// Set this flag to manage concurrent suggest requests (delay between requests).
	
	this.isSuggesting = true;
	
	let _self = this;
	
	let params = new MultiMap();
	
	params.add(this.portletNamespace + 'q', keywords);
	
	return Ajax.request(
		this.suggestionsURL,
		'GET',
		null,
		null,
		params,
		this.requestTimeout
	).then((response) => {
			let suggestions = JSON.parse(response.responseText);

			_self.releaseSuggesting();

			return suggestions;

	}).catch(function(error) {

		_self.releaseSuggesting();

		console.log(error);
	});
}

I added there a const delay of 150ms between subsequent requests. You can change it to your likings in the class constants.

The autocomplete resource url is put to the SOY template context in fi.soveltia.liferay.gsearch.web.portlet.action.ViewMVCRenderCommand


template.put(
	GSearchWebKeys.SUGGESTIONS_URL,
	createResourceURL(renderResponse, GSearchResourceKeys.GET_SUGGESTIONS));

Next thing to do was to implement a resource command action fi.soveltia.liferay.gsearch.web.portlet.action.GetSuggestionsMVCResourceCommand. In that class I’m injecting the suggester service:


@Reference
protected GSearchKeywordSuggester _gSearchSuggester;

...and doing the suggestions:


JSONArray response = null;

try {
	response = _gSearchSuggester.getSuggestions(
		resourceRequest,
		_gSearchDisplayConfiguration);
}
catch (Exception e) {

	_log.error(e, e);

	return;
}

The real work is then done in the fi.soveltia.liferay.gsearch.web.search.internal.suggest. GSearchKeywordSuggesterImpl service.

For the suggestions I chose to use phrase suggester to be able to suggest complete search phrases. Among other options are TermSuggester, which gives results in a single term array and Aggregate suggester which allows you to combine different kind of suggestions.

Query suggestions

For the query suggestions i.e. search phrase suggestions after getting no results, I again used slightly customized version of the standard com.liferay.portal.search.internal.hits.QuerySuggestionHitsProcessor - which is using the same phrasesuggester as the searchfield suggester. Of the same reasons mentioned in section Query Indexing suggestions are processed manually fi.soveltia.liferay.gsearch.web.search.internal.query.processor.QuerySuggestionsProcessorImpl

What QuerySuggestionsProcessor service does is basically finding viable alternative search queries, making an alternative search based on one of those and additionally, if configured, offering other possible alternatives to the UI


public boolean process(
	PortletRequest portletRequest, SearchContext searchContext,
	GSearchDisplayConfiguration gSearchDisplayConfiguration,
	QueryParams queryParams, Hits hits)
	throws Exception {

	if (_log.isDebugEnabled()) {
		_log.debug("Processing QuerySuggestions");
	}
	
	if (!gSearchDisplayConfiguration.enableQuerySuggestions()) {
		return true;
	}

	if (_log.isDebugEnabled()) {
		_log.debug("QuerySuggestions are enabled.");
	}
		
	if (hits.getLength() >= gSearchDisplayConfiguration.
             querySuggestionsHitsThreshold()) {
		
		if (_log.isDebugEnabled()) {
			_log.debug("Hits threshold was exceeded. Returning.");
		}

		return true;
	}

	if (_log.isDebugEnabled()) {
		_log.debug("Below threshold. Getting suggestions.");
	}
	
	// Have to put keywords here to searchcontext because
	// suggestKeywordQueries() expects them to be there

	searchContext.setKeywords(queryParams.getKeywords());

	if (_log.isDebugEnabled()) {
		_log.debug("Original keywords: " + queryParams.getKeywords());
	}
	
	// Get suggestions
	
	String[] querySuggestions = _gSearchSuggester.
          getSuggestionsAsStringArray(portletRequest, gSearchDisplayConfiguration);

	querySuggestions =
		ArrayUtil.remove(querySuggestions, searchContext.getKeywords());

	if (_log.isDebugEnabled()) {
		_log.debug("Query suggestions size: " + querySuggestions.length);
	}
	
	// Do alternative search based on suggestions (if found)
	
	if (ArrayUtil.isNotEmpty(querySuggestions)) {

		if (_log.isDebugEnabled()) {
			_log.debug("Suggestions found.");
		}
		
		// New keywords is plainly the first in the list.

		queryParams.setOriginalKeywords(queryParams.getKeywords());

		if (_log.isDebugEnabled()) {
			_log.debug("Using querySuggestions[0] for alternative search.");
		}

		queryParams.setKeywords(querySuggestions[0]);
		
		Query query = _queryBuilder.buildQuery(portletRequest, queryParams);

		BooleanClause booleanClause = BooleanClauseFactoryUtil.create(
			query, BooleanClauseOccur.MUST.getName());

		searchContext.setBooleanClauses(new BooleanClause[] {
			booleanClause
		});

		Hits alternativeHits =
			_indexSearcherHelper.search(searchContext, query);
		hits.copy(alternativeHits);
	}

	hits.setQuerySuggestions(querySuggestions);

	return true;
}

About the Configuration Options

Configuration options matter a lot for the suggestions. I added there links to related Elasticsearch documents in the configuration options but saying here already, especially the confidence level is important. Basically, lower you put that, more suggestions you get. But error margin grows.

The code, again, can be found on Github https://github.com/peerkar/liferay-gsearch. Please see the Requirements section in Readme for this module to work.

Blogs
[...] (Previous parts of this series can be found here (part 1), here (part 2) and here (part 3) [...] Read More
[...] Previous parts of this series can be found here (part 1), here (part 2), here (part 3) and here (part 4). In the final part of this blog series few more interesting features are added to the... [...] Read More

Hi

 I tried to implement autocomplete-suggestions using elastic search engine following this post.

Step 1:

Indexed my keyword "news" with below code when number of results of the Hits more than my threshold:

 

_indexWriterHelper.indexKeyword(companyId,"news",0,SuggestionConstants.TYPE_QUERY_SUGGESTION, locale);

 

Step 2:

on checking Elastic engine server I found my keyword is indexed as below:

url:http://localhost:9200/liferay-20115/_search?pretty&q=news

 

"hits" : [ {

"_index" : "liferay-20115",

"_type" : "querySuggestion",

"_id" : "20115_spellCheckWord_6RitQgdCR1qG3k8CzKTjdw==",

"_score" : 1.9425526,

"_source" : {

"uid" : "20115_spellCheckWord_6RitQgdCR1qG3k8CzKTjdw==",

"companyId" : "20115",

"groupId" : "0",

"keywordSearch_en_US" : "news",

"priority" : "0.0",

"spellCheckWord" : "true"

}

 

Step 3:

Then tried to find the indexed keyword with below code:

field = "keywordSearch_en_US";

keyword = "news";

TermSuggester termSuggester = new TermSuggester("termSuggester", field, keyword);

/** Method 1 using QuerySuggester of com.liferay.portal.kernel.search.suggest.QuerySuggester */

SuggesterResults suggesters1 = _querySuggester.suggest(searchContext, termSuggester);

 

Collection<SuggesterResult> suggesterResults = suggesters1.getSuggesterResults();

if (suggesterResults != null) {

for (SuggesterResult suggesterResult : suggesterResults) {

for (Entry entry : suggesterResult.getEntries()) {

for (Option option : entry.getOptions()) {

if (!suggestions.contains(option.getText())) {

suggestions.add(option.getText());

} } } } }

 

 

 /** Method 2 using indexSearcher directly of com.liferay.portal.kernel.search.IndexSearcher*/

SearchEngine searchEngine = SearchEngineHelperUtil.getSearchEngine(searchContext.getSearchEngineId());  IndexSearcher indexSearcher = searchEngine.getIndexSearcher();  

SuggesterResults suggesters2 = indexSearcher.suggest(searchContext, termSuggester);

 

suggesterResults = suggesters2.getSuggesterResults();

if (suggesterResults != null) {

for (SuggesterResult suggesterResult : suggesterResults) {

for (Entry entry : suggesterResult.getEntries()) {

for (Option option : entry.getOptions()) {

if (!suggestions.contains(option.getText())) {

suggestions.add(option.getText());

} } } } }

 

But in both the methods(method1 and Method2)  entry.getOptions() is coming empty list.

I am not sure what should be the value of field while initializing termSuggester.

 

for the response in advance.

Hi Aastha and sorry for the delay!  

The field you should be querying is keywordSearch_{LANGUAGE_ID} if you are using the standard mapping but as this is an "old" post, I'd like to ask:

 - Which portal and ES versions are you using (seems to be 6.1)? Embedded or standalone?      - If you are using GSearch, which version of the Core?

- Are you using the standard Liferay ES adapter or the custom one?    

 

There can be a couple other reasons than the field name for empty results so I can only try to give some hints (before knowing you app versions):

- If you are using standalone ES, have you checked the log? If you were querying an unmapped field or there was a syntactical error etc. you should get an error.

- If you are using LR 7.1, there was a problem in the standard adapter. See workaround here: https://github.com/peerkar/liferay-portal/blob/fbdbd757a017a7f8ea8ae3e4f23a43ff6fe924fe/modules/apps/portal-search-elasticsearch6/portal-search-elasticsearch6-impl/src/main/java/com/liferay/portal/search/elasticsearch6/internal/suggest/AggregateSuggesterTranslatorImpl.java#L61

I've  only Phrase and Completion suggesters implement there but if you haven't done it already, please take a look at the current implementation in GSearch: https://github.com/peerkar/liferay-gsearch/blob/master/gsearch-core-impl/src/main/java/fi/soveltia/liferay/gsearch/core/impl/suggest/GSearchKeywordSuggesterImpl.java  

and the corresponding default configuration:

https://github.com/peerkar/liferay-gsearch/blob/master/gsearch-core-impl/src/main/resources/configs/fi.soveltia.liferay.gsearch.core.impl.configuration.KeywordSuggesterConfiguration.config

I hope this helps, 

Petteri