Forums de discussion

Case insensitive search in elasticsearch

thumbnail
Charalampos Chrysikopoulos, modifié il y a 6 années.

Case insensitive search in elasticsearch

Junior Member Publications: 79 Date d'inscription: 09/12/11 Publications récentes
Hello,

I am trying to run a search query in liferay DXP with the embedded elasticsearch, and my problem is that the search is not case insensitive.

The search is done for web contents of a specific structure and template and on specific fields of this structure. I tried to apply my knowledge from 6.2 (with Solr), and it works ok, except of the case sensitivity.

My code looks like this:

		 BooleanQuery fullQuery = BooleanQueryFactoryUtil.create(searchContext);
		
		 BooleanQuery searchQuery = BooleanQueryFactoryUtil.create(searchContext);
		 searchQuery.addRequiredTerm(Field.STATUS, WorkflowConstants.STATUS_APPROVED);
		 searchQuery.addRequiredTerm(Field.GROUP_ID, groupId);
		 searchQuery.addRequiredTerm("head", Boolean.TRUE);
		 if (Validator.isNotNull(searchContext.getEntryClassNames()) && searchContext.getEntryClassNames().length > 0)
		 {
		   searchQuery.addRequiredTerm(Field.ENTRY_CLASS_NAME, searchContext.getEntryClassNames()[0]);
		 }
		
		 if (Validator.isNotNull(ddmStructureKey)) {
		   searchQuery.addRequiredTerm("ddmStructureKey", ddmStructureKey);
		 }
		
		 if (Validator.isNotNull(ddmTemplateKey)) {
		   searchQuery.addRequiredTerm("ddmTemplateKey", ddmTemplateKey);
		 }

		BooleanQuery keywordQuery = new BooleanQueryImpl();
		WildcardQuery wildcardKeywordQuery = new WildcardQueryImpl(searchFieldName, keyword);
		WildcardQuery wildcardStarKeywordQuery = new WildcardQueryImpl(searchFieldName, "*" + keyword + "*");

		keywordQuery.add(wildcardKeywordQuery, BooleanClauseOccur.SHOULD);
		keywordQuery.add(wildcardStarKeywordQuery, BooleanClauseOccur.SHOULD);

		searchQuery.add(keywordQuery, BooleanClauseOccur.MUST);


So I thought that searching in the "searchFieldName" with "THI" as a keyword in a string "This is the string I am looking for", it should return at least the web content with this string. But I get this result only when searching with "Thi".

I need to search with a wildcard query.

Searching with the default search portlet works fine. Is there something in the API that has been changed, or is my code bad?

Thank you in advance,
Harry
thumbnail
Jorge Díaz, modifié il y a 6 années.

RE: Case insensitive search in elasticsearch

Liferay Master Publications: 753 Date d'inscription: 09/01/14 Publications récentes
Hi Charalampos,

About your case sensitive search issue, are you searching in some particular field?

The search behavior in Liferay for each field is configured by elasticsearch type mapping configuration.
Default configuration is:
You have information about Elasticsearch mapping configuration in following link:
In short words, basic configurations that Liferay uses are:
  • "type": "string" + "index": "analyzed" => standard analyzer is applied: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-standard-analyzer.html (it lowercases the text)
  • "type": "string" + "index": "not_analyzed" => data is stored without modification
  • "type": "string" + "analyzer": "keyword_lowercase" => keyword_lowercase is applied (only lowercase of data)
  • "type": "string" + other specific language analyzer => analyzers of specific languages are applied

So if you are searching in some field that is "not_analyzed" you won't be able to do case insensitive search.
but If you are searching in other field that uses the standard analyzer or the keyword_lowercase analyzer, data is lowercased, so you should be able to do case insensitive search.

In order to verify the behavior, review the default configuration of the field you are using.

If you want to change default elasticsearch type mapping configuration, go to control panel => system settings => foundation => elasticsearch and configure:
  • Additional Type Mappings: add new type mappings of fields that are not included in default liferay-type-mappings.json
  • Override Type Mappings: overwrite type mappings of fields that are already included in default liferay-type-mappings.json

After doing that changes, you have to reindex. emoticon
thumbnail
Charalampos Chrysikopoulos, modifié il y a 6 années.

RE: Case insensitive search in elasticsearch

Junior Member Publications: 79 Date d'inscription: 09/12/11 Publications récentes
Hi Jorge,

thank you for your answer and sorry for the delay. Yes, I am searching in some fields of a web content structure. For example a field named "shortText". In elasticsearch the metadata for this field is


"ddm__text__68549__shortText_en_GB": {
"analyzer": "english",
"term_vector": "with_positions_offsets",
"store": true,
"type": "string"
},


So in my case, the field should be analysed, as expected for a filed of a web content structure.

The default Liferay search seems to work properly, so it searches case insensitive. So, the problem seems to be in my query and not in elastic search.

Another problem that I have is that I cannot apply filters. I need to show only the web contents that have an English translation of the field "shortText". So, if I understand ES correctly, I need to use the "ExistsFilter" for this field and apply it on the top query. But it doesn't seem to work either.
thumbnail
Charalampos Chrysikopoulos, modifié il y a 6 années.

RE: Case insensitive search in elasticsearch

Junior Member Publications: 79 Date d'inscription: 09/12/11 Publications récentes
Hi, again, the solution was very easy. I was not lowercasing the keywords in the search query...
Brigden Nicholas, modifié il y a 5 années.

RE: Case insensitive search in elasticsearch

New Member Envoyer: 1 Date d'inscription: 30/10/18 Publications récentes
Matches documents that have fields that contain a term (not analyzed). So the search term will not be analyzed but on indexing Elasticsearch will analyze the field lowercase unless you define a custom mapping. So if you want to use a term query— analyze the term on your own before querying. prepaidgiftbalance
Vikram Singla, modifié il y a 4 années.

RE: Case insensitive search in elasticsearch

New Member Publications: 3 Date d'inscription: 21/07/17 Publications récentes
How to lowercase the pattern?
Vikram Singla, modifié il y a 4 années.

RE: Case insensitive search in elasticsearch

New Member Publications: 3 Date d'inscription: 21/07/17 Publications récentes
Can anybody tell how to apply case insensitive logic in wildcardquery in java code with an exmple?
thumbnail
Charalampos Chrysikopoulos, modifié il y a 4 années.

RE: Case insensitive search in elasticsearch

Junior Member Publications: 79 Date d'inscription: 09/12/11 Publications récentes
Hi Vilkram, sorry for the delay, I used simply the following code

keyword = stringHelper.removeDiacritics(keyword).toLowerCase();

And the removeDiacritics method:


    public String removeDiacritics(String text) {
        if (StringUtils.isBlank(text))
         return text;
        String normalized = Normalizer.normalize(text, Normalizer.Form.NFD);
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < normalized.length(); i++) {
         char c = normalized.charAt(i);
         if (Character.getType(c) != Character.NON_SPACING_MARK) {
         builder.append(c);
         }
        }
        return Normalizer.normalize(builder.toString(), Normalizer.Form.NFC);
   }

Hope it helps,
Harry