Fórum
Document Library lucene fields are indexed but not searchable
Florencia Gadea, modificado 10 Anos atrás.
Document Library lucene fields are indexed but not searchable
Regular Member Postagens: 146 Data de Entrada: 27/03/12 Postagens Recentes
Hi Everyone,
I created new lucene fields for the document and media pdf files. So I created a custom IndexerPostProcessor for DLFileEntry class where I create the new fields. This works fine, the field is properly indexed. I can even search it through Luke.
The problem I have is retrieving documents that have this custom field.
Here is the code I use to search:
It returns no results at all. That results in the following query:
In Luke, this query works fine, it returns the proper results. But in Liferay, this doesn't bring any result at all.
Do you know why? What should I do to make this work?
Cheers,
Flor.
I created new lucene fields for the document and media pdf files. So I created a custom IndexerPostProcessor for DLFileEntry class where I create the new fields. This works fine, the field is properly indexed. I can even search it through Luke.
The problem I have is retrieving documents that have this custom field.
Here is the code I use to search:
SearchContext searchContext = SearchContextFactory.getInstance(servletRequest);
searchContext.setGroupIds(null);
searchContext.setUserId(0);
BooleanQuery booleanQueryPageContent = BooleanQueryFactoryUtil.create(searchContext);
booleanQueryPageContent.addRequiredTerm("customfield", term, false);
BooleanClause booleanClauseGeneral = BooleanClauseFactoryUtil.create(booleanQueryPageContent, BooleanClauseOccur.MUST.getName());
BooleanQuery booleanQueryEntryClassPK = BooleanQueryFactoryUtil.create(searchContext);
booleanQueryEntryClassPK.addRequiredTerm("entryClassPK", ""+entryClassPK, false);
BooleanClause booleanClauseEntryClassPK = BooleanClauseFactoryUtil.create(booleanQueryEntryClassPK, BooleanClauseOccur.MUST.getName());
searchContext.setBooleanClauses(new BooleanClause[] { booleanClauseEntryClassPK, booleanClauseGeneral});
Indexer indexer = IndexerRegistryUtil.getIndexer(DLFileEntry.class);
hits = indexer.search(searchContext);
It returns no results at all. That results in the following query:
+(+((+(entryClassName:com.liferay.portlet.documentlibrary.model.DLFileEntry) +(status:0)))) +(+(entryClassPK:308139)) +(+(customfield:energie))
In Luke, this query works fine, it returns the proper results. But in Liferay, this doesn't bring any result at all.
Do you know why? What should I do to make this work?
Cheers,
Flor.
Ray Augé, modificado 10 Anos atrás.
RE: Document Library lucene fields are indexed but not searchable
Liferay Legend Postagens: 1197 Data de Entrada: 08/02/05 Postagens Recentes
Liferay checks permissions, are you sure the query is not failing due to a failed permission check?
You have set the userId to 0.
You have set the userId to 0.
Florencia Gadea, modificado 10 Anos atrás.
RE: Document Library lucene fields are indexed but not searchable
Regular Member Postagens: 146 Data de Entrada: 27/03/12 Postagens Recentes
Well, as you can see in the code, the userId is already set to 0.
Is there any other permission check I should be aware of?
Is there any other permission check I should be aware of?
Florencia Gadea, modificado 10 Anos atrás.
RE: Document Library lucene fields are indexed but not searchable
Regular Member Postagens: 146 Data de Entrada: 27/03/12 Postagens Recentes
May be I have to explain a little more what I'm doing. After Liferay indexes a pdf document, I create a new document, exactly the same document but with two more custom fields and then index it too. I do this for every page of the pdf document. To do so, I created a FileIndexerPostProcessor hook. Here is the code of the postProcessDocument method:
And when I see the indexed content of the original file and the copy, I realize that they have different indexed fields.
Can you tell my why? Do you think this is the reason why I can't get the searched documents through Liferay?
I attached the view of the original document and the copy in the Lucene index through Luke.
Thanks.
Flor.
public void postProcessDocument(Document document, Object obj) {
try {
DLFileEntry dlFileEntry = (DLFileEntry) obj;
if("pdf".equals(dlFileEntry.getExtension())) {
System.out.println("fileEntry: " + dlFileEntry);
// Load PDF document
InputStream fileInputStream = DLFileEntryLocalServiceUtil.getFileAsStream(dlFileEntry.getUserId(), dlFileEntry.getFileEntryId(), dlFileEntry.getVersion());
PDFParser parser = new PDFParser(fileInputStream);
parser.parse();
PDDocument pdfDocument = parser.getPDDocument();
// Initialize text extractor
PDFTextStripper stripper = new PDFTextStripper();
String pageContent = "";
if (pdfDocument == null) {
return;
}
// Split PDF document into pages
Splitter splitter = new Splitter();
splitter.setSplitAtPage(1);
List<pddocument> pages = splitter.split(pdfDocument);
List<document> documents = new ArrayList<document>();
// we will add one document per page, per document
for (int pageNr = 0; pageNr < pages.size(); pageNr++) {
// Extract page content
PDDocument pdfPage = (PDDocument) pages.get(pageNr);
pageContent = stripper.getText(pdfPage);
if (StringUtils.isNotEmpty(pageContent)) {
// copy the current indexed document
Document copy = new DocumentImpl();
Map<string,field> fields = document.getFields();
for(Map.Entry<string,field> field : fields.entrySet()) {
copy.add(field.getValue());
}
// add a pagenr and add the content
copy.add(new Field("pagenr", "" + (pageNr + 1)));
//copy.add(new Field("pagecontent", pageContent));
copy.addText("pagecontent", pageContent);
documents.add(copy);
}
// Close page
pdfPage.close();
}
// Close document
pdfDocument.close();
// add the documents to the default search engine index
SearchContext searchContext = new SearchContext();
searchContext.setSearchEngineId(SearchEngineUtil.SYSTEM_ENGINE_ID);
SearchEngineUtil.getSearchEngine().getIndexWriter().addDocuments(searchContext, documents);
}
} catch(Exception e) {
System.out.println("exception");
}
}
</string,field></string,field></document></document></pddocument>
And when I see the indexed content of the original file and the copy, I realize that they have different indexed fields.
Can you tell my why? Do you think this is the reason why I can't get the searched documents through Liferay?
I attached the view of the original document and the copy in the Lucene index through Luke.
Thanks.
Flor.
Anexos:
Nelson Borges, modificado 9 Anos atrás.
RE: Document Library lucene fields are indexed but not searchable
New Member Mensagem: 1 Data de Entrada: 29/07/13 Postagens Recentes
There is some solution for this search problem?
Enzo Terranova, modificado 7 Anos atrás.