掲示板

Default translations: Use machine-translated content or not? Flags?

thumbnail
14年前 に Olaf Kock によって更新されました。

Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
As I just went through the current german translation file for Liferay, it was obvious again that the default translation seems to be machine generated. It's easy to claim translation to a language this way, but the overall result is (unless corrected) somewhere on a scale from "strange" to "completely utterly unusable" - e.g. some times I didn't even know what the short text was talking about until I had a look at the original english version. I'd rather like to see english text in this case than machine translations.

As updating the new texts requires either following the subversion history or completely reading/digesting the huge properties file, I'd like to propose a slightly different approach for default translations.

What about flagging new content - e.g. "[translation needed]" - in the translation itself? This would be easy to find when searching for new text, will be deleted during the translation process and can even be used as a flag for some translations that were found to be bad but without an idea of how to do them better.

As there's some translation script going through to convert the .properties.native files to .properties this could even filter out the flags, so that the flag(s) will not be visible in the product at all.

For me this would be tremendous help.
thumbnail
14年前 に Tomasz Wojewodka によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Junior Member 投稿: 56 参加年月日: 07/09/07 最新の投稿
Hi Olaf,

In my opinion default English is better than any machine translated content. That's why I'm glad that the Polish lang is not modified by machines.
I've got a situation that in trunk in English property file is just more lines than in my language specific file.

Some time ago I've written a small and ugly perl script that:
- for each line gets a key
- looks in my specific language file if it exists (so it is translated)
- if a key exists puts translation line to the output file
- if a key doesn't exist puts english version to the output file

After that I've got an updated properties with some untranslated entries.

Then a simple diff shows me which keys I have to translate.

It would be nice to have that made by ant scripts as you suggested.

In the meantime feel free to use my script if you find it useful.

添付ファイル:

thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

That's a very smart idea. I'll double check with Brian (who implemented the existing process) that he doesn't have any issue with it.

Would you have time to implement it?
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Good news, Brian liked the idea. Olaf, let me know if you have time to work on it.

In case Olaf doesn't have time, is there any other volunteer?
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Well, it should not be that much effort - certainly way less than my online editing idea that would try to solve a few more problems with a lot of more effort.

I'm currently overwhelmed with work, so I couldn't attack this immediately but I like this idea so much that I really want to see it in action. If someone else volunteers: Go ahead. Otherwise I'll post here when I start working on it (expect this in the not-distant-future, but certainly not immediately).

If there are any valuable pointers into nonobvious parts of the current language-resource-processing that you, Brian or someone else can give, I'd be happy to leverage the shortcut and not have to discover them myself. But it can't be too bad, so don't bother if you can't think of pitfalls.

Thanks for the feedback (especially as it's been so positive)
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hey Olaf,

As I was seeing several new messages get added in the last few weeks I couldn't help thinking how much time in reviewing would be saved if we had already implemented your idea.

So I decided to go ahead and give it a try, and luckily it was easy enough... voilá!

From now on look for "[automatic" within your language file to find out new keys which have been either automatically translated or copied directly from English.

PS1: I also made a second improvement that will also save some time. Now you can execute build-lang for only 1 specific language, which is quite useful for testing the translation. See LPS-4198 for details.

PS2: Tomasz, if you can rewrite your perl script in Java I'll add it as an ant target.
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
This is great news - thanks a lot.
I promise to make good use of that keeping the translation up to date.

Now that I've seen the ridiculously small patch you've done (congratulations - very good job!), I'd like to dive more into it and demand even more - this time I'll add an implementation right ahead:

I've frequently seen the situation where the english text was ambiguous (as the verb and subject was indistinguishable) and I had the feeling that the translation was bad, but couldn't tell for sure - and I'd like to use this flagging to signal other translators (or the remind me later).

Hooking into what you already did with ant filters, FilterChain can be utilized easily:

<copy file="in.properties" tofile="out.properties">
    <filterchain>
        <replaceregex pattern="\[note:.+\]" replace="" />
        <replaceregex pattern="\[feedback:.+\]" replace="" />
    </filterchain>
</copy>


This will filter
center=Mitte [feedback: if this was a verb it'd be "zentrieren", this translation assumes that it's a noun]
consumer-agent=Verbraucher-Agent [note: need better translation]


where "feedback" can be thought of a signal to the implementors of the english version that the key is ambiguous in some language and requires some more context, and "note" is a hint to translators coming along later. They could even be combined, when the english translation gives a required context (from 'feedback') in a 'note' section in the english version (or a translator might provide the hint.

(Note: Somebody have a look at that regexp - it works in my test case, but I don't know all the pitfalls. It has garbled some UTF-8 test file, so filtering should be applied to the escaped file)

This might be a slight abuse of such a flagging system, but it'd be soooo convenient emoticon
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
I like the idea of supporting notes from the translators. I would leave the two existing notes from LangBuilder the way they are since they are simple and explicit enough.

For translator's notes I would be more explicit. Something like:

consumer-agent=Verbraucher-Agent (Translator Note: need better translation)

Thoughts?
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
+1: I don't care what the exact syntax is as long as I can leave those notes.

As the two examples I've given address two completely distinct groups of people, I'd really like to see them both: Notes for translators and notes for those writing the english original keys. Sooner or later this might lead to translator notes in the english original providing required context for good translations.

Leaving the current notes as they are is also fine with me, I'm just proposing to add more filters to the ant script, not replace them. This implementation you did triggered my intense wish to place such notes in the translation files.

I can't wait to get my hands on these features...

Edit: I've probably given bad examples in my post suggesting these two distinct kinds of comments. But during translation I've sometimes been wondering which of two or three possible grammatically possible structures this was that I had to translate - this'd be a comment for those writing the english original fragments.
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

I'm not sure the notes from the original translator would ever be used, because ambiguities usually only come up when translating. In fact, what we did some time ago but is not very well knows is to allow for this:

<liferay-ui:message key="open[action]" />

Which would be included in Language.properties as:

open=Open


But in Spanish we would have:

open=Abierto
open[action]=Abrir


My point here is that it's usually the usage which needs a disambiguation, not the actual English message. And in the case there is a note to be made in the English message it can also qualify as a "Translator Note".

Thoughts?
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Hi Jorge,
let me disambiguate:
I was thinking about placing all these notes in the "foreign" language files, e.g. in the german translation for me.

Those who create the original english content could get their feedback through
grep "[feedback" LanguageResources_*.properties

if and when they care. The result of this could provide some insight into constructs (e.g. single word constructions) that are ambiguous in various languages.

I guess that this feedback would be rarely used, but if it was, it could lead to disambiguation in the english file. I've been thinking along the "translator note" lines, but I like your way of introducing a new key ("open[action]") better (this is also one of the words that can't be easily translated into german: something that is open uses another word for this than the action to open something.

If there was enough feedback in the translated files, feature developers could spot patterns in problematic keys and avoid them in future. This might help people to see these problems in advance even if they speak only english. I wouldn't expect anybody to run around looking for the ambiguous keys in order to fix them in code.
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

I've tried your suggestion of using a ReplaceRegex tokenfilter, but it fails with the following error:

No supported regular expression matcher found: java.lang.ClassNotFoundException: org.apache.tools.ant.util.regexp.Jdk14RegexpRegexp


Apparently for it to work it needs an additional dependency (ant-optional.jar). Did you add that when doing your tests?

Does anyone out there know of any alternative to implement this feature without requiring a new dependency?
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
I've not had any problems - but I have used ant out of the box as eclipse offered it (if I remember correctly it was eclipse, not the command line version, I've tried it at home and am currently in the office)

The regular expression stuff is configurable in ant, they have a strong tendency towards ORO. When it doesn't work currently out of the box I suppose that you can't break anything with redefining the engine to use. I can't tell if I've used Java 5 or 6 or if eclipse preconfigured the ant environment to use one of the available engines.

I'll have a look at my personal setup when I'm at home (I hope to remember...)
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Sorry, took longer than expected...

My test setup had a tremendous amount of additional ant jars. It's been the standard eclipse-ganymede amount (I've missed the galileo install on that computer). Among the jars are ant.jar, ant-apache-oro.jar, ant-apache-regexp.jar. Together there are 24 jars from the eclipse plugin org.apache.ant_1.7.0_...... I guess that either the oro or regexp jar are responsible for my test to work.

How should we continue? Do you need to know the minimal requirements to execute the ant script that I suggested? I'd be happy to provide and test it. Or do you oppose the feature if it requires tweaks to the build environment?
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

Knowing the minimal requirements would be very useful. For example, I don't think we need ORO for this simple use case. If we don't have to add heavy JARs there shouldn't be a problem.
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Hi Jorge,

the minimum classpath that I've managed to run this script with was: ant.jar, ant-nodeps.jar and ant-launcher.jar - all from eclipse ganymede. No extra regexp package.

Eclipse added its own "Additional Tasks&Support" that I guess make no difference but I've used this setup in order to be sure that no other jars from the system were picked up automagically. The additional task jars were $eclipse.home/configuration/org.eclipse.osgi/bundles/150/1/.cp/lib/remoteAnt.jar, tools.jar from the JDK (jdk 1.6.0.14) and $eclipse.home/plugins/org.eclipse.swt.gtk.linux.x85_3.4.1.v3449c.jar.

I've also tried JDK 1.5.0.18 successfully.
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

Thanks for doing this research.

I guess that means that we would need to add ant-launcher.jar and ant-nodeps-jar to lib/development?

Have you tested running it with these two libraries from the command line with Liferay's build file modified to add the regular expression as proposed in the previous posts?
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
I'll do and report here - I guess it'll be thursday or friday, in any case no later than the weekend.
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Ok, I've tested it, but found another issue. In my setup (just plain ant from commandline) everything was working well, without any jars placed anywhere.

The real issue I've found is that it's unclear which version of the Language_xx.properties[.native] is the leading one. The build process (in target build-lang-cmd) as I have seen it:
  • first calls build-lang-native2ascii
  • then invokes LangBuilder, which in _createProps() generates the native file again from the ascii version. During this process some translation magic happens
  • then build-lang-native2ascii is invoked again.


As LangBuilder re-generates the native file from the ascii file there's no chance to implement my suggestion without some more intense tweaking of this process. I'd fear to step on something I have not seen yet. I guess it's trivial for someone with insight to the build and translation process, unfortunately I currently can't dive deep into this. Even if I did I'd probably break something...

My expectations would be that the native file is the leading version, read by the build script. The ascii version IMHO could be generated during the build process and not be committed to svn. The native version would need to be touched by the automatic translation process for new keys, but other than that it should be manually updated.

Am I missing something?

BTW: In the process of trying this I've translated the marked german entries. Hunting for "(Automatic translation)" and "(Automatic copy)" works really well. Almost fun emoticon
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

Sorry for not answering earlier, I was out for vacation.

The native version is always the leading version, in fact you can delete the ASCII version and it will be regenerated.
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Hi Jorge,

now I've been on holiday... Yes, I understand that native is the leading version. However, as I described, I expected the native file to be read during the build process, but it will also get written to again, eliminating the chance to keep more information than currently available in it.

Is this by design or has this been missed somehow?
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Olaf,

I see. I didn't notice/remember that was the case. I agree with you that it would be very useful to be able to add notes as discussed in the previous posts.

Have you taken a look at the code? If so, do you think it would be easy to change it?
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Hi Jorge,
I think it would be fairly easy to change this behaviour, but I feared that this was by design - e.g. to explicitly eliminate any comments or annotations within the files and (e.g.) to provide a standard ordering for entries, possibly more. It's been a while and I'll look at the issue again. I'll provide feedback this week.

Looking forward to see you at the European Symposium,
Olaf
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
It's Brian who designed it so I'm not sure. Although he may have just done it that way for simplicity.

Good to know that you'll be going to the Symposium. Looking forward to meet you there emoticon
thumbnail
14年前 に Olaf Kock によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿
Oops, I missed working on this topic in time...

Of course, com.liferay.portal.tools.LangBuilder writes the native file in order to provide complete content - e.g. once a new value has been added to the english file, it will automagically appear in the translated native file. The code will take a few special cases into account, but once understood why the native file is written and what workarounds have been done, it's fairly easy.

See you tomorrow...
thumbnail
14年前 に Tomasz Wojewodka によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Junior Member 投稿: 56 参加年月日: 07/09/07 最新の投稿
Hello Jorge,

Jorge Ferrer:

PS2: Tomasz, if you can rewrite your perl script in Java I'll add it as an ant target.


I'll try to manage some time to do that.
Thanks :-)
thumbnail
14年前 に Tomasz Wojewodka によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Junior Member 投稿: 56 参加年月日: 07/09/07 最新の投稿
Jorge,

1. I've synchronized my sources with trunk and when I try to run ant build-lang I'm getting an exception:
[java] Exception in thread "main" java.lang.IllegalArgumentException
     [java] 	at com.liferay.portal.tools.LangBuilder.main(LangBuilder.java:60)
[java] Java Result: 1


Is it a problem with my configuration or a general trunk issue?

2. I've been trying to do a quick analysis of LangBuilder and I see that it adds _AUTOMATIC_COPY suffix to the navite properties entry value only if it already ends with "_AUTOMATIC_COPY" or the value includes "{" or "<" characters. Correct me if I'm wrong, please.
I think that it should copy a line from the "content" if it doesn't find the key in the native property file and the value can't be automatically translated.

What do you think?
thumbnail
14年前 に Jorge Ferrer によって更新されました。

RE: Default translations: Use machine-translated content or not? Flags?

Liferay Legend 投稿: 2871 参加年月日: 06/08/31 最新の投稿
Hi Tomasz,

Regarding your first question, build-lang is working fine for me. It's also strange that line 60 is a blank line. Are you sure you don't have a modified version of LangBuilder? What's in your line 60?

Regarding your other question, I just added (Automatic Copy) wherever it was already being copied. But not that we are done with that we can discuss when it's being copied right now (honestly I don't even know since I didn't do that), and how it should be instead. Can you open a different thread for that?