[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

16. Other Data Formats

While the GNU gettext tools deal mainly with POT and PO files, they can also manipulate a couple of other data formats.

16.1 Internationalizable Data Formats

Here is a list of other data formats which can be internationalized using GNU gettext.

16.1.1 POT - Portable Object Template

RPMs

gettext

Ubuntu packages

gettext

File extension

pot, po

Extractor

xgettext

16.1.2 Resource String Table

RST is the format of resource string table files of the Free Pascal compiler versions older than 3.0.0. RSJ is the new format of resource string table files, created by the Free Pascal compiler version 3.0.0 or newer.

RPMs

fpk

Ubuntu packages

fp-compiler

File extension

rst, rsj

Extractor

xgettext, rstconv

16.1.3 Glade - GNOME user interface description

RPMs

glade, libglade, glade2, libglade2, intltool

Ubuntu packages

glade, libglade2-dev, intltool

File extension

glade, glade2, ui

Extractor

xgettext, libglade-xgettext, xml-i18n-extract, intltool-extract

16.1.4 GSettings - GNOME user configuration schema

RPMs

glib2

Ubuntu packages

libglib2.0-dev

File extension

gschema.xml

Extractor

xgettext, intltool-extract

16.1.5 AppData - freedesktop.org application description

This file format is specified in https://www.freedesktop.org/software/appstream/docs/.

RPMs

appdata-tools, appstream, libappstream-glib, libappstream-glib-builder

Ubuntu packages

appdata-tools, appstream, libappstream-glib-dev

File extension

appdata.xml, metainfo.xml

Extractor

xgettext, intltool-extract, itstool

16.1.6 Preparing Rules for XML Internationalization

Marking translatable strings in an XML file is done through a separate "rule" file, making use of the Internationalization Tag Set standard (ITS, https://www.w3.org/TR/its20/). The currently supported ITS data categories are: ‘Translate’, ‘Localization Note’, ‘Elements Within Text’, and ‘Preserve Space’. In addition to them, xgettext also recognizes the following extended data categories:

Context

This data category associates msgctxt to the extracted text. In the global rule, the contextRule element contains the following:

Escape Special Characters

This data category indicates whether the special XML characters (<, >, &, ") are escaped with entity reference. In the global rule, the escapeRule element contains the following:

Extended Preserve Space

This data category extends the standard ‘Preserve Space’ data category with the additional values ‘trim’ and ‘paragraph’. ‘trim’ means to remove the leading and trailing whitespaces of the content, but not to normalize whitespaces in the middle. ‘paragraph’ means to normalize the content but keep the paragraph boundaries. In the global rule, the preserveSpaceRule element contains the following:

All those extended data categories can only be expressed with global rules, and the rule elements have to have the https://www.gnu.org/s/gettext/ns/its/extensions/1.0 namespace.

Given the following XML document in a file ‘messages.xml’:

 
<?xml version="1.0"?>
<messages>
  <message>
    <p>A translatable string</p>
  </message>
  <message>
    <p translatable="no">A non-translatable string</p>
  </message>
</messages>

To extract the first text content ("A translatable string"), but not the second ("A non-translatable string"), the following ITS rules can be used:

 
<?xml version="1.0"?>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
  <its:translateRule selector="/messages" translate="no"/>
  <its:translateRule selector="//message/p" translate="yes"/>

  <!-- If 'p' has an attribute 'translatable' with the value 'no', then
       the content is not translatable.  -->
  <its:translateRule selector="//message/p[@translatable = 'no']"
    translate="no"/>
</its:rules>

xgettext’ needs another file called "locating rule" to associate an ITS rule with an XML file. If the above ITS file is saved as ‘messages.its’, the locating rule would look like:

 
<?xml version="1.0"?>
<locatingRules>
  <locatingRule name="Messages" pattern="*.xml">
    <documentRule localName="messages" target="messages.its"/>
  </locatingRule>
  <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
</locatingRules>

The locatingRule element must have a pattern attribute, which denotes either a literal file name or a wildcard pattern of the XML file(7). The locatingRule element can have child documentRule element, which adds checks on the content of the XML file.

The first rule matches any file with the ‘.xml’ file extension, but it only applies to XML files whose root element is ‘<messages>’.

The second rule indicates that the same ITS rule file are also applicable to any file with the ‘.msg’ file extension. The optional name attribute of locatingRule allows to choose rules by name, typically with xgettext's -L option.

The associated ITS rule file is indicated by the target attribute of locatingRule or documentRule. If it is specified in a documentRule element, the parent locatingRule shouldn't have the target attribute.

Locating rule files must have the ‘.loc’ file extension. Both ITS rule files and locating rule files must be installed in the ‘$prefix/share/gettext/its’ directory. Once those files are properly installed, xgettext can extract translatable strings from the matching XML files.

16.1.6.1 Two Use-cases of Translated Strings in XML

For XML, there are two use-cases of translated strings. One is the case where the translated strings are directly consumed by programs, and the other is the case where the translated strings are merged back to the original XML document. In the former case, special characters in the extracted strings shouldn't be escaped, while they should in the latter case. To control wheter to escape special characters, the ‘Escape Special Characters’ data category can be used.

To merge the translations, the ‘msgfmt’ program can be used with the option --xml. See section Invoking the msgfmt Program, for more details about how one calls the ‘msgfmt’ program. ‘msgfmt’'s --xml option doesn't perform character escaping, so translated strings can have arbitrary XML constructs, such as elements for markup.

16.2 Localized Data Formats

Here is a list of file formats that contain localized data and that the GNU gettext tools can manipulate.

16.2.1 Editable Message Catalogs

These file formats can be used with all of the msg* tools and with the xgettext program.

If you just want to convert among these formats, you can use the msgcat program (with the appropriate option) or the xgettext program.

16.2.1.1 PO - Portable Object

File extension

po

16.2.1.2 Java .properties

File extension

properties

16.2.1.3 NeXTstep/GNUstep .strings

File extension

strings

16.2.2 Compiled Message Catalogs

These file formats can be created through msgfmt and converted back to PO format through msgunfmt.

16.2.2.1 MO - Machine Object

File extension

mo

See section The Format of GNU MO Files for details.

16.2.2.2 Java ResourceBundle

File extension

class

For more information, see the section Java and the examples hello-java, hello-java-awt, hello-java-swing.

16.2.2.3 C# Satellite Assembly

File extension

dll

For more information, see the section C#.

16.2.2.4 C# Resource

File extension

resources

For more information, see the section C#.

16.2.2.5 Tcl message catalog

File extension

msg

For more information, see the section Tcl - Tk's scripting language and the examples hello-tcl, hello-tcl-tk.

16.2.2.6 Qt message catalog

File extension

qm

For more information, see the examples hello-c++-qt and hello-c++-kde.

16.2.3 Desktop Entry files

The programmer produces a desktop entry file template with only the English strings. These strings get included in the POT file, by way of xgettext (usually by listing the template in po/POTFILES.in). The translators produce PO files, one for each language. Finally, an msgfmt --desktop invocation collects all the translations in the desktop entry file.

For more information, see the example hello-c-gnome3.

16.2.3.1 How to handle icons in Desktop Entry files

Icons are generally locale dependent, for the following reasons:

However, icons are not covered by GNU gettext localization, because

Desktop Entry files may contain an ‘Icon’ property, and this property is localizable. If a translator wishes to localize an icon, she should do so by bypassing the normal workflow with PO files:

  1. The translator contacts the package developers directly, sending them the icon appropriate for her locale, with a request to change the template file.
  2. The package developers add the icon file to their repository, and a line
     
    Icon[locale]=icon_file_name
    

    to the template file.

This line remains in place when this template file is merged with the translators' PO files, through msgfmt.

16.2.4 XML files

See the section Preparing Rules for XML Internationalization and Invoking the msgfmt Program, subsection “XML mode operations”.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Bruno Haible on September, 19 2023 using texi2html 1.78a.