Archive for August 22nd, 2010

English documentation sources

Posted in Announcements on August 22nd, 2010 by Clonk-Karl – 2 Comments

Until recently the sources of the developer mode documentation have been in German. This was due to traditional reasons since the documentation is based on the Clonk Rage one which was only available in German for some time and which was later translated into English. However this made it quite hard for international folks to contribute since a German version had to be available before it could be translated into other languages (see bug #287). So after some initial work by Maikel I sat down this weekend with the goal in mind to turn around the situation: The main documentation sources should be in English which can then be translated into German (or other languages). The way the translation process works is that a tool called xml2po scans all XML files in docs/sdk/ which make up the documentation and creates so-called po files which contain all strings that need to be translated and their translation into a certain language. The po files are edited by human translators and, once finished, are eventually used to generate translated documentation XML.

The following steps were required (plus/minus some detail work):

  1. Finish the English translation: When I started there were like 200 untranslated German strings in the documentation which I needed to translate. gtranslator is a nice tool to do the translation, except for the fact that it does not show where the string to translate originates from (i.e. from which XML file and line) — I had to look that up in the plain po file when I needed the information.
  2. Write a script which reads the po file of the English translation, replaces the German content of all XML files with the English version and creates a po file which contains the translation from English into German. This started off as a quite nice and clear python program but then I added handling for an increasing number of special cases until it resulted in code I am certainly not proud of anymore… but well, it did it’s job in the end. Another thing I learned while doing this: It’s a good idea try to avoid mixing python strings and unicode objects; instead always use either one or the other — otherwise you get strange UnicodeDecodeError exceptions and whatnot all over the place… that’s why I normally prefer strongly typed programming languages.
  3. Get rid of duplicate entries in the newly created de.po. There were quite a few cases where two different German strings translated to the same in English, especially in tables which list (English) identifiers that should remain untranslated since they refer to the name of a setting in a configuration file. I decided to mark them with a different tag in the XML (<literal_col> instead of <col>) which causes them not to show up in the translation.

Some of the strings to be translated contain XML fragments, for example for inserting placeholders or highlighting portions of the text. Quite a few of this XML was broken in the English translation. I wonder why this did not show up previously when building the documentation.

Now if you want to translate the documentation to German or even another language all you need to do is to edit docs/de.po with your favorite translation tool (which might be a simple text editor, the format of the file is pretty self-explanative). On Linux you can then easily regenerate the .po file (by scanning the XML files for new strings) and create the translated version by running make in docs/ (for Windows that’s a bit more complicated, but you can have a look at README.cygwin.txt which has instructions). This also converts the XML into HTML suitable for display by any web browser (for example Firefox cannot handle the XMLs directly for some reason). If you want to create a translation to a new language from scratch and need a po file to start with (or want it to be integrated into the make process without knowing how to do so) feel free to approach a developer in the forums or the chat. A word of warning though: Translating all the docs is a huge effort since it includes more than 4,000 strings. However it’s also perfectly fine to only translate parts of it as a start. Untranslated strings will remain in English.

Despite this work there is still quite some documentation tasks to be carried out. Newton gave a nice overview in the forums. Just now you don’t have an excuse anymore not to do it, my English speaking friends :). Additionally some of the code examples still contain German string literals or comments. Eventually they should be turned into English as well though I think most of the examples are understable already since they are accompanied by an explanative description. If you edit the documentation XMLs you can directly review your changes with Internet Explorer (not Firefox though).