Foros de discusión

Document Library lucene fields are indexed but not searchable

thumbnail
Florencia Gadea, modificado hace 10 años.

Document Library lucene fields are indexed but not searchable

Regular Member Mensajes: 146 Fecha de incorporación: 27/03/12 Mensajes recientes
Hi Everyone,

I created new lucene fields for the document and media pdf files. So I created a custom IndexerPostProcessor for DLFileEntry class where I create the new fields. This works fine, the field is properly indexed. I can even search it through Luke.

The problem I have is retrieving documents that have this custom field.

Here is the code I use to search:

SearchContext searchContext = SearchContextFactory.getInstance(servletRequest);			
searchContext.setGroupIds(null);
searchContext.setUserId(0);

BooleanQuery booleanQueryPageContent = BooleanQueryFactoryUtil.create(searchContext);
booleanQueryPageContent.addRequiredTerm("customfield", term, false);	
BooleanClause booleanClauseGeneral = BooleanClauseFactoryUtil.create(booleanQueryPageContent, BooleanClauseOccur.MUST.getName());
					
BooleanQuery booleanQueryEntryClassPK = BooleanQueryFactoryUtil.create(searchContext);
booleanQueryEntryClassPK.addRequiredTerm("entryClassPK", ""+entryClassPK, false);
BooleanClause booleanClauseEntryClassPK = BooleanClauseFactoryUtil.create(booleanQueryEntryClassPK, BooleanClauseOccur.MUST.getName());			

searchContext.setBooleanClauses(new BooleanClause[] { booleanClauseEntryClassPK, booleanClauseGeneral});		

Indexer indexer = IndexerRegistryUtil.getIndexer(DLFileEntry.class);								
hits = indexer.search(searchContext);


It returns no results at all. That results in the following query:

+(+((+(entryClassName:com.liferay.portlet.documentlibrary.model.DLFileEntry) +(status:0)))) +(+(entryClassPK:308139)) +(+(customfield:energie))

In Luke, this query works fine, it returns the proper results. But in Liferay, this doesn't bring any result at all.

Do you know why? What should I do to make this work?

Cheers,

Flor.
thumbnail
Ray Augé, modificado hace 10 años.

RE: Document Library lucene fields are indexed but not searchable

Liferay Legend Mensajes: 1197 Fecha de incorporación: 8/02/05 Mensajes recientes
Liferay checks permissions, are you sure the query is not failing due to a failed permission check?

You have set the userId to 0.
thumbnail
Florencia Gadea, modificado hace 10 años.

RE: Document Library lucene fields are indexed but not searchable

Regular Member Mensajes: 146 Fecha de incorporación: 27/03/12 Mensajes recientes
Well, as you can see in the code, the userId is already set to 0.

Is there any other permission check I should be aware of?
thumbnail
Florencia Gadea, modificado hace 10 años.

RE: Document Library lucene fields are indexed but not searchable

Regular Member Mensajes: 146 Fecha de incorporación: 27/03/12 Mensajes recientes
May be I have to explain a little more what I'm doing. After Liferay indexes a pdf document, I create a new document, exactly the same document but with two more custom fields and then index it too. I do this for every page of the pdf document. To do so, I created a FileIndexerPostProcessor hook. Here is the code of the postProcessDocument method:

public void postProcessDocument(Document document, Object obj)	{

			try {
				DLFileEntry dlFileEntry = (DLFileEntry) obj;
				if("pdf".equals(dlFileEntry.getExtension())) {					
					System.out.println("fileEntry: " + dlFileEntry);								        				
			                // Load PDF document
			                InputStream fileInputStream = DLFileEntryLocalServiceUtil.getFileAsStream(dlFileEntry.getUserId(), dlFileEntry.getFileEntryId(), dlFileEntry.getVersion());
			                PDFParser parser = new PDFParser(fileInputStream);
			                parser.parse();
	                                PDDocument pdfDocument = parser.getPDDocument();
			        
	                               // Initialize text extractor
	                               PDFTextStripper stripper = new PDFTextStripper();
	                               String pageContent = "";
	                               if (pdfDocument == null) {	                     
	                                       return;
	                               }
	                
	                              // Split PDF document into pages
	                             Splitter splitter = new Splitter();
	                             splitter.setSplitAtPage(1);
	                             List<pddocument> pages = splitter.split(pdfDocument);
	                             List<document> documents = new ArrayList<document>();
	                             // we will add one document per page, per document	                
                 	             for (int pageNr = 0; pageNr &lt; pages.size(); pageNr++) {
	                    	                    
	                             // Extract page content
	                	    PDDocument pdfPage = (PDDocument) pages.get(pageNr);
	                            pageContent = stripper.getText(pdfPage);
	                    
	                           if (StringUtils.isNotEmpty(pageContent)) {
	                    	
	                           // copy the current indexed document
	                    	   Document copy = new DocumentImpl();
				   Map<string,field> fields = document.getFields();
				   for(Map.Entry<string,field> field : fields.entrySet()) {					        	
				       	copy.add(field.getValue());				
				   }
					        
				   // add a pagenr and add the content
	                          copy.add(new Field("pagenr", "" + (pageNr + 1)));
	                         //copy.add(new Field("pagecontent", pageContent));
	                         copy.addText("pagecontent", pageContent);                         
	                        documents.add(copy);	                        
	                    }
	                    // Close page
	                    pdfPage.close();
	                }
	                // Close document
	                pdfDocument.close();

	                // add the documents to the default search engine index
   	                SearchContext searchContext = new SearchContext();
		        searchContext.setSearchEngineId(SearchEngineUtil.SYSTEM_ENGINE_ID);					
			SearchEngineUtil.getSearchEngine().getIndexWriter().addDocuments(searchContext, documents);
	
				}
		} catch(Exception e) {
			System.out.println("exception");
		}
	}
</string,field></string,field></document></document></pddocument>


And when I see the indexed content of the original file and the copy, I realize that they have different indexed fields.

Can you tell my why? Do you think this is the reason why I can't get the searched documents through Liferay?

I attached the view of the original document and the copy in the Lucene index through Luke.

Thanks.

Flor.
Nelson Borges, modificado hace 9 años.

RE: Document Library lucene fields are indexed but not searchable

New Member Mensaje: 1 Fecha de incorporación: 29/07/13 Mensajes recientes
There is some solution for this search problem?
Enzo Terranova, modificado hace 7 años.

RE: Document Library lucene fields are indexed but not searchable

New Member Mensajes: 3 Fecha de incorporación: 20/03/17 Mensajes recientes