Fórum

Document Library lucene fields are indexed but not searchable

thumbnail
Florencia Gadea, modificado 10 Anos atrás.

Document Library lucene fields are indexed but not searchable

Regular Member Postagens: 146 Data de Entrada: 27/03/12 Postagens Recentes
Hi Everyone,

I created new lucene fields for the document and media pdf files. So I created a custom IndexerPostProcessor for DLFileEntry class where I create the new fields. This works fine, the field is properly indexed. I can even search it through Luke.

The problem I have is retrieving documents that have this custom field.

Here is the code I use to search:

SearchContext searchContext = SearchContextFactory.getInstance(servletRequest);			
searchContext.setGroupIds(null);
searchContext.setUserId(0);

BooleanQuery booleanQueryPageContent = BooleanQueryFactoryUtil.create(searchContext);
booleanQueryPageContent.addRequiredTerm("customfield", term, false);	
BooleanClause booleanClauseGeneral = BooleanClauseFactoryUtil.create(booleanQueryPageContent, BooleanClauseOccur.MUST.getName());
					
BooleanQuery booleanQueryEntryClassPK = BooleanQueryFactoryUtil.create(searchContext);
booleanQueryEntryClassPK.addRequiredTerm("entryClassPK", ""+entryClassPK, false);
BooleanClause booleanClauseEntryClassPK = BooleanClauseFactoryUtil.create(booleanQueryEntryClassPK, BooleanClauseOccur.MUST.getName());			

searchContext.setBooleanClauses(new BooleanClause[] { booleanClauseEntryClassPK, booleanClauseGeneral});		

Indexer indexer = IndexerRegistryUtil.getIndexer(DLFileEntry.class);								
hits = indexer.search(searchContext);


It returns no results at all. That results in the following query:

+(+((+(entryClassName:com.liferay.portlet.documentlibrary.model.DLFileEntry) +(status:0)))) +(+(entryClassPK:308139)) +(+(customfield:energie))

In Luke, this query works fine, it returns the proper results. But in Liferay, this doesn't bring any result at all.

Do you know why? What should I do to make this work?

Cheers,

Flor.
thumbnail
Ray Augé, modificado 10 Anos atrás.

RE: Document Library lucene fields are indexed but not searchable

Liferay Legend Postagens: 1197 Data de Entrada: 08/02/05 Postagens Recentes
Liferay checks permissions, are you sure the query is not failing due to a failed permission check?

You have set the userId to 0.
thumbnail
Florencia Gadea, modificado 10 Anos atrás.

RE: Document Library lucene fields are indexed but not searchable

Regular Member Postagens: 146 Data de Entrada: 27/03/12 Postagens Recentes
Well, as you can see in the code, the userId is already set to 0.

Is there any other permission check I should be aware of?
thumbnail
Florencia Gadea, modificado 10 Anos atrás.

RE: Document Library lucene fields are indexed but not searchable

Regular Member Postagens: 146 Data de Entrada: 27/03/12 Postagens Recentes
May be I have to explain a little more what I'm doing. After Liferay indexes a pdf document, I create a new document, exactly the same document but with two more custom fields and then index it too. I do this for every page of the pdf document. To do so, I created a FileIndexerPostProcessor hook. Here is the code of the postProcessDocument method:

public void postProcessDocument(Document document, Object obj)	{

			try {
				DLFileEntry dlFileEntry = (DLFileEntry) obj;
				if("pdf".equals(dlFileEntry.getExtension())) {					
					System.out.println("fileEntry: " + dlFileEntry);								        				
			                // Load PDF document
			                InputStream fileInputStream = DLFileEntryLocalServiceUtil.getFileAsStream(dlFileEntry.getUserId(), dlFileEntry.getFileEntryId(), dlFileEntry.getVersion());
			                PDFParser parser = new PDFParser(fileInputStream);
			                parser.parse();
	                                PDDocument pdfDocument = parser.getPDDocument();
			        
	                               // Initialize text extractor
	                               PDFTextStripper stripper = new PDFTextStripper();
	                               String pageContent = "";
	                               if (pdfDocument == null) {	                     
	                                       return;
	                               }
	                
	                              // Split PDF document into pages
	                             Splitter splitter = new Splitter();
	                             splitter.setSplitAtPage(1);
	                             List<pddocument> pages = splitter.split(pdfDocument);
	                             List<document> documents = new ArrayList<document>();
	                             // we will add one document per page, per document	                
                 	             for (int pageNr = 0; pageNr &lt; pages.size(); pageNr++) {
	                    	                    
	                             // Extract page content
	                	    PDDocument pdfPage = (PDDocument) pages.get(pageNr);
	                            pageContent = stripper.getText(pdfPage);
	                    
	                           if (StringUtils.isNotEmpty(pageContent)) {
	                    	
	                           // copy the current indexed document
	                    	   Document copy = new DocumentImpl();
				   Map<string,field> fields = document.getFields();
				   for(Map.Entry<string,field> field : fields.entrySet()) {					        	
				       	copy.add(field.getValue());				
				   }
					        
				   // add a pagenr and add the content
	                          copy.add(new Field("pagenr", "" + (pageNr + 1)));
	                         //copy.add(new Field("pagecontent", pageContent));
	                         copy.addText("pagecontent", pageContent);                         
	                        documents.add(copy);	                        
	                    }
	                    // Close page
	                    pdfPage.close();
	                }
	                // Close document
	                pdfDocument.close();

	                // add the documents to the default search engine index
   	                SearchContext searchContext = new SearchContext();
		        searchContext.setSearchEngineId(SearchEngineUtil.SYSTEM_ENGINE_ID);					
			SearchEngineUtil.getSearchEngine().getIndexWriter().addDocuments(searchContext, documents);
	
				}
		} catch(Exception e) {
			System.out.println("exception");
		}
	}
</string,field></string,field></document></document></pddocument>


And when I see the indexed content of the original file and the copy, I realize that they have different indexed fields.

Can you tell my why? Do you think this is the reason why I can't get the searched documents through Liferay?

I attached the view of the original document and the copy in the Lucene index through Luke.

Thanks.

Flor.
Nelson Borges, modificado 9 Anos atrás.

RE: Document Library lucene fields are indexed but not searchable

New Member Mensagem: 1 Data de Entrada: 29/07/13 Postagens Recentes
There is some solution for this search problem?