Overview

Summary

This wiki summarizes discussions about internationalization(i18n) in Apache Lenya. This summary should lead into a common understanding of the i18n problematic and possible or already implemented solutions in Lenya. Note that I18N was implemented in Lenya as of version 1.2, this page is of historical interest.

The following topics concerning i18n are covered here.

Discussion Threads

The following links will guide you to the beginning of mailinglist threads related to i18n and Lenya

  1. Introducing UserLocaleAction
    http://mail-archives.apache.org/mod_mbox/lenya-dev/200312.mbox/%3c1072122632.2610.278.camel@samson.rkunet.org%3e
  2. User language determination
    http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lenya-dev@cocoon.apache.org&msgId=1359385
  3. i18n and CMS UI
    http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lenya-dev@cocoon.apache.org&msgId=1359345
  4. Documents and language versions
    http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lenya-dev@cocoon.apache.org&msgId=1359117
  5. Information hiding for URIs
    http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lenya-dev@cocoon.apache.org&msgId=1359032
  6. Localization of Lenya
    http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lenya-dev@cocoon.apache.org&msgId=1359104

Current State Of I18N

Documents can be authored and served in multiple languages. Each language version is stored in a separate file fitting a file name pattern which currently is myfilename_locale.xml i.e. tutorial_de.xml or tutorial_en.xml. The language negotiation protocol is simply defined within the request url. Should mean, a Lenya class i.e. the DefaultDocumentBuilder searches for a language identifier like en or _de in the request uri. Based on that information the appropiate language version of the requested document is served. If you request i.e. tutorial_de.html all links in that page point to _de.html versions assuming that a german version of the destination exists. Request to uris without a language identifier are _always being routed to a document version in the publication's default language, which is imho not very user (agent) friendly.

The current simple protocol of language negotiation has a significant advantage compared to using i.e. mod_negotiation. See the next section for more information on language negotiation.

The Art Of Language Negotiation

There are different approaches for language negotiation, thus answers to the question in which language a document should be served to the user. Two of those approaches are

  1. automatic negotiation between a user agent and a web server i.e. using mod_negotiation or other appropiate functionality build into web servers
  2. user triggered via hard linked language switches

There are at least three big advantages of the second approach, which are obviously show stoppers for the first approach. Those advantages are

  1. Search engines can index all language versions of a document, which is often really important for site owners.
  2. Its easy to create multilingual offline versions of the website. Should mean the user can switch the langauge even if pages are not served via a http server with language negotiation capabilities.
  3. The user can switch a document's language without changing i.e. browser settings, which is sometimes not possible. Imagine a scenario in an university providing so called internet access points. On those access points changing browser settings is not possible. Of course this is seldom, but real.

Doing the language negotiation to 100% via a web server is not as flexible as we need it.

See the references section for must reads about language negotiation.

Naming Conventions

As we saw in the previous sections it is pragmatic to use language indentifiers in urls. Currently Lenya uses i.e. en or _de as identifiers. I would prefer a naming scheme like stated in section _Note on hyperlinks and naming conventions in mod_negotiation.

A good naming scheme is filename.locale.extension i.e. tutorial.en.html. Using language indentifiers in uris or better internationalization in general meets another problem discussed in LenyaAndCoolUris. Should mean urls get worse if one encodes to much information in it i.e. a language identifier. What about translating the hole url? Now things get really ridiculous? I think to translate uris or not is worth a separate discussion, but....hey.

I propose to create uri names in the systems default language (or better english names) and thats it. Otherwise the maintenance of a website would get much more complex.

Language Negotiation For System Messages

A system message in my opinion are strings on a LenyaCMSScreen like user adminstration or strings in a workflow notification email send to a reviewer or editor. In the latter case language negotiation via a request url or a accept-language header is not possible since the user is not directly requesting the message via an url. That means a user registered within Lenya cms should be able to specify the preferred language for system messages.

References

Serving I18N Documents

The current implementation to serve multilingual documents with Lenya is a really pragmatic approach. Nevertheless I want to propose the following enhancements/changes, which lead into a compromise of using automatic language negotiation and user triggered language changes via hyperlinks.

.

Serving I18N Messages

The common way to localize messages using cocoon is to use the I18NTransformer. It translates keys marked up with i18n elements. The transformer looks up keys in so called dictionaries. See docu of the I18NTransformer for details.

The transformer needs to be told for which locale the translation should be done. I propose to introduce a user settings dialog within the admin cms tab called My Settings. There each user of Lenya can define the langauge settings for system messages. I propose the follwing options for the language settings.

The first setting is useful, if a user wants to use a user agent i.e. a browser to define his preferred language for web pages. The second setting is to define the language of system messages, which are not directly requested by a user agent i.e. workflow notification emails send to the user by Lenya.

Putting it all together

To implement i18n of system messages and the serving of multilingual documents as proposed in previous sections we need to change some things in Lenya code and develop new components.

This wiki was created by RolfKulemann in december 2003. Please post comments to Lenya-Dev.