掲示板
Enconding characters
Hi!
I'm trying to read values from a LanguageProperties and then extract them to a csv file. I'll do with HtmlUtil.extractText but the result is wrong because the encode of special characters like á and others are printed.
Does anyone knows what I have to use for printing correctly?
Thanks for the help!
I'm trying to read values from a LanguageProperties and then extract them to a csv file. I'll do with HtmlUtil.extractText but the result is wrong because the encode of special characters like á and others are printed.
Does anyone knows what I have to use for printing correctly?
Thanks for the help!
Daniel G:
I'm trying to read values from a LanguageProperties and then extract them to a csv file. I'll do with HtmlUtil.extractText but the result is wrong because the encode of special characters like á and others are printed.
Does anyone knows what I have to use for printing correctly?
"Printing correctly" is subject to interpretation. Assuming you'd like to extract UTF-8 text to your csv file, you'll still need to make sure to escape some characters, like quotes, line breaks, comma or semicolon. You'll need to decide if (or how) you would like to see tags like "<b>" in your result. Thus, a single call won't be sufficient.
You've decided to use a method from HtmlUtil, a class that is intimately tied to the HTML format. You might want to look at the other methods that are also there and see if stripHtml, unescape or render fit your needs. Or you might want to check other methods of extraction.
First to all, thanks for the help.
What other methods of extraction can I use? I read the code from a Language.properties with LanguageUtil.get , but I'm not able that this works with any method I use of HtmlUtil , because characters are printed.
Thanks again!
What other methods of extraction can I use? I read the code from a Language.properties with LanguageUtil.get , but I'm not able that this works with any method I use of HtmlUtil , because characters are printed.
Thanks again!
6年前 に Christoph Rabel によって更新されました。
RE: Enconding characters
Liferay Legend 投稿: 1554 参加年月日: 09/09/24 最新の投稿
We did something like that once, but used really ugly translation tables, something like this:
http://www.thesauruslex.com/typo/eng/enghtml.htm
Since we needed only a subset, it worked pretty well in the end.
Another wild idea we had was to use the browser. The browser is able to translate all those special characters. So, the idea was to write texts in divs, get the uf8 text using innerHTML and send it back. We never implemented or even tried it, but it might work. Write a page, print all text in a list, add a javascript to send all the text to the backend.
http://www.thesauruslex.com/typo/eng/enghtml.htm
Since we needed only a subset, it worked pretty well in the end.
Another wild idea we had was to use the browser. The browser is able to translate all those special characters. So, the idea was to write texts in divs, get the uf8 text using innerHTML and send it back. We never implemented or even tried it, but it might work. Write a page, print all text in a list, add a javascript to send all the text to the backend.
Daniel G:
What other methods of extraction can I use? I read the code from a Language.properties with LanguageUtil.get , but I'm not able that this works with any method I use of HtmlUtil , because characters are printed.
First of all: Please list some of your input and the desired output. Let's say you'd like to have the following values in csv - what do you expect?
- press any key to continue, any other to quit
- A semicolon (";") is a valid character
- Single quotes look like this: '
- This is <b>valid</b> HTML<br/>with two lines.
- The german alphabet knows of a character ä - really!
- The german alphabet knows of a character ä. Really!
(The question is about comma, semicolon, quotes, HTML-Tags, special characters (Umlaut) in whatever form you find them.)
Thanks to all!! And sorry for the delay, but I was busy these days so I couldn't post.
I obtain this:
-Holan/Adiós
and I should obtain this:
- Hola/Adiós
Thanks!
I obtain this:
-Holan/Adiós
and I should obtain this:
- Hola/Adiós
Thanks!
Daniel G:
I obtain this:
-Holan/Adiós
and I should obtain this:
- Hola/Adiós
What about the other inputs that I've asked about? The problem is that converting encoding from one to another is not really a trivial task that can be answered with a single example. My samples above are not complete. Assume they all go into a CSV file with the line number. What would be the correct output?
1,press any key to continue, any other to quit
2,A semicolon (";") is a valid character
3,Single quotes look like this: '
4,This is <b>valid</b> HTML<br>with two lines.
5,The german alphabet knows of a character ä - really!
6,The german alphabet knows of a character ä. Really!
Obviously, some lines have 2 entries, some have 3.
You have the encoding from HTML to plain UTF-8, then from plain UTF-8 to CSV. However, with tags in HTML, even that is not a well defined requirement - will you keep the tags? Escape them? Simplify them?