留言板

Convert MS Docx/Doc to Liferay Web content

thumbnail
Raja Seth,修改在9 年前。

Convert MS Docx/Doc to Liferay Web content

Regular Member 帖子: 233 加入日期: 11-8-18 最近的帖子
Hi All,

Has anyone worked on functionality for converting document with (Docx/Doc) format to Liferay web content. As far as what I understood is that need to transform docx/doc format to .xml format. I created a simple web-content and checked the content section in journalarticle table, value is as given below :

<?xml version="1.0"?>

<root available-locales="en_US" default-locale="en_US">
<static-content language-id="en_US"><![CDATA[<p style="margin-bottom: 0in">This document demonstrates the ability of the calibre DOCX Input plugin to convert the various typographic features in a Microsoft Word (2007 and newer) document. Convert this document to a modern ebook format, such as AZW3 for Kindles or EPUB for other ebook readers, to see it in action.</p>

<p style="margin-bottom: 0in">There is support for images, tables, lists, footnotes, endnotes, links, dropcaps and various types of text and paragraph level formatting.</p>

<p style="margin-bottom: 0in">To see the DOCX conversion in action, simply add this file to calibre using the <b>"Add Books" </b>button and then click "<b>Convert". </b> Set the output format in the top right corner of the conversion dialog to EPUB or AZW3 and click <b>"OK"</b>.</p>

<table border="1" cellpadding="1" cellspacing="1" style="width: 500px;">
<tbody>
<tr>
<td><strong>Name</strong></td>
<td><strong>Age</strong></td>
</tr>
<tr>
<td>Ajay</td>
<td>20</td>
</tr>a
<tr>
<td>Vijay</td>
<td>25</td>
</tr>
</tbody>
</table>
<img alt="Image" src="/documents/10184/10559/plasticcover.jpg/56748787-8cda-42ec-806e-e13f925a1005?t=1419246918468" style="width: 200px; height: 113px; border-width: 1px; border-style: solid; margin: 3px; float: right;" />]]></static-content>
</root>

Is there any plugin or document conversion api available in liferay for this or is there any better way to proceed with.

Thanks & Regards,
Raja
thumbnail
Raja Seth,修改在9 年前。

RE: Convert MS Docx/Doc to Liferay Web content

Regular Member 帖子: 233 加入日期: 11-8-18 最近的帖子
Any pointers on this?
thumbnail
Andew Jardine,修改在9 年前。

RE: Convert MS Docx/Doc to Liferay Web content

Liferay Legend 帖子: 2416 加入日期: 10-12-22 最近的帖子
Hi Raja,

Liferay has optiong for integration with OpenOffice -- that might help you with what you are after. I've not yet used this feature to be honest, but you could have a look at it -- there are several properties available for override in the portal-ext.properties


##
## OpenOffice
##

    #
    # Enabling OpenOffice integration allows the Document Library portlet and
    # the Wiki portlet to provide conversion functionality. This is tested with
    # OpenOffice 2.3.x through 3.2.x. It is recommended that you have OpenOffice
    # on the same machine. Using a remote host for the instance is not fully
    # supported and could lead to various problems. To start OpenOffice as
    # service, run the command:
    #
    # soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;"
    #
    openoffice.server.enabled=false
    openoffice.server.host=127.0.0.1
    openoffice.server.port=8100
    openoffice.cache.enabled=true

    #
    # Specify the file extensions of files to allow conversions from. Entries
    # must be limited by what is supported by OpenOffice.
    #
    openoffice.conversion.source.extensions[drawing]=odg
    openoffice.conversion.source.extensions[presentation]=odp,ppt,pptx,sxi
    openoffice.conversion.source.extensions[spreadsheet]=csv,ods,sxc,tsv,xls,xlsx
    openoffice.conversion.source.extensions[text]=doc,docx,html,odt,rtf,sxw,txt,wpd

    #
    # Specify the file extensions of files to allow conversions to. Entries must
    # be limited by what is supported by OpenOffice.
    #
    openoffice.conversion.target.extensions[drawing]=pdf,svg,swf
    openoffice.conversion.target.extensions[presentation]=odp,pdf,ppt,swf,sxi
    openoffice.conversion.target.extensions[spreadsheet]=csv,ods,pdf,sxc,tsv,xls
    openoffice.conversion.target.extensions[text]=doc,odt,pdf,rtf,sxw,txt


Ultimately, you could write your own custom portlet/hooks that would allow a user to feed a Word document in and have the code convert it into Web Content -- but I suspect the main challenge would be identifying the structure to use and how to parse the fields out.
thumbnail
Raja Seth,修改在9 年前。

RE: Convert MS Docx/Doc to Liferay Web content

Regular Member 帖子: 233 加入日期: 11-8-18 最近的帖子
Hi Andrew,

Thanks for the reply. I guess structure doesn't come into the picture if I am not wrong. It's like creating a new "Web Content" after adding "Web Content Display" portlet on a page. As far as integration with OpenOffice is concerned I referred to this link, which states that we can convert any of the below formats :-

Portable Document Format (.pdf)
OpenDocument Text (.odt)
OpenOffice.org 1.0 Text (.sxw)
Rich Text Format (.rtf)
Microsoft Word (.doc)
Plain Text (.txt)


But in my case I want to convert MS Docx/Doc to Liferay Web content. I created one simple "Web Content" as described above and checked the content column of journalarticle table it got saved in xml format.

Please let me know if there is any possibility in liferay OpenOffice integration to convert directly from docx/doc to Web Content or xml format of content column of journalarticle table

Thanks & Regards,
Raja
thumbnail
Andew Jardine,修改在9 年前。

RE: Convert MS Docx/Doc to Liferay Web content

Liferay Legend 帖子: 2416 加入日期: 10-12-22 最近的帖子
Hi Raja,

As I mentioned, OpenOffice and its integration isn't something that I have had to put into practice as of yet, it was just a feature that I was aware of that Liferay had. So with that said, I don't really have any pointers as I am pretty sure you and I would be starting from the same point emoticon

For our opening statement though where you said that structures don't come into the picture -- just a clarification, ALL web content is the result of a structure + template in Liferay. When you add a web content via the portlet directly it will choose for you "Basic Web Content" as the default. The structure of basic web content is a title and a body field. Your point in valid though, you could use it as a big bucket I suppose to simply store the contents of the file.

So back to the problem at hand -- there is nothing preventing you from using the JournalArticleLocalServiceUtil to add articles programatically, you just need to know all the parameters. The method signature I am looking at is --


	public JournalArticle addArticle(
			long userId, long groupId, long folderId, long classNameId,
			long classPK, String articleId, boolean autoArticleId,
			double version, Map<locale, string> titleMap,
			Map<locale, string> descriptionMap, String content, String type,
			String ddmStructureKey, String ddmTemplateKey, String layoutUuid,
			int displayDateMonth, int displayDateDay, int displayDateYear,
			int displayDateHour, int displayDateMinute, int expirationDateMonth,
			int expirationDateDay, int expirationDateYear,
			int expirationDateHour, int expirationDateMinute,
			boolean neverExpire, int reviewDateMonth, int reviewDateDay,
			int reviewDateYear, int reviewDateHour, int reviewDateMinute,
			boolean neverReview, boolean indexable, boolean smallImage,
			String smallImageURL, File smallImageFile,
			Map<string, byte[]> images, String articleURL,
			ServiceContext serviceContext)
		throws PortalException, SystemException {
</string,></locale,></locale,>


Most of that should not be difficult to obtain and the ddmStructureKey and ddmTemplateKey for the basic web content can probably be plucked out of the database. So that would take care of adding the word document. Incidentally, I'm not sure if you know this or not, but if you rename the .docx extension to .zip you can "unpack" the word document which will give you a bunch of files (mostly XML) that represent the document. The markup is all based of Microsoft DTDs though so I still think plucking out the body content would be easier than using an XSL transform to convert it into the XML structure that Liferay stores.

Last point, I think you are hoping to do this when a user chooses to +Add web content, namely in the dialogue the user would have the option to pick a word file. This could be a neat feature and done as well, just needs some JSP hook work, and probably a Struts Action override.

Does any of that help?
thumbnail
Raja Seth,修改在9 年前。

RE: Convert MS Docx/Doc to Liferay Web content

Regular Member 帖子: 233 加入日期: 11-8-18 最近的帖子
HI Andrew,

Thanks again for your reply. I searched and find with the help of docx4j I am able to convert docx format into html. After that simply need to take out body section and put it in the xml format of web content the same way you were suggesting.

Thanks & Regards,
Raja
sunil kumar,修改在6 年前。

RE: Convert MS Docx/Doc to Liferay Web content

New Member 帖子: 13 加入日期: 17-3-28 最近的帖子
Hi Rajesh,
Is it possible to share demo code to write a Word doc data to web content.
i am looking for it since a long.It would be very great full.
Please share ASAP.

Thanks & regards
Suneel