When we use the web, we’re used to seeing English everywhere we go, except for maybe some rogue search results in Google. This is instantly a division between internet users, which is only natural. While geographical barriers no longer restrain how we choose to interact with other people or businesses, language is a barrier the web hasn’t yet overcome.
I’ve recently had to modify our in-house CMS to accommodate languages, well 12 of them to start with. Once you start looking into this issue you can see how this division needs to be recognised; To allow like-for-like translations requires multi-lingual staff or translators to do this because literal translations produced by software can’t judge context or the way that languages differ in terms of phrasing. So, in this project I’ve allowed for the site the system produces and it’s micro-sites to have parallel structures so that the site’s homepage is the only common element in terms of hierarchy but it’s content can effectively communicate with it’s intended audience more clearly.
When it comes to the technical details and site design, this is the challenge; not all languages flow left to right, not all languages use the same character set (think Russian or Japanese) and of course, more minor details like the name of the language in it’s own voice (eg German -> Deutsch). Going through the system that are many places where there could be technical problems: the encoding used on the column of the database you’re using; the encoding of the XHTML you’re serving, the various XHTML language and encoding elements and attributes that can be used; more than that – realising that users that use a non-latin character set will likely have a different keyboard configuration. Their characters need to be stored and interpreted back as they should be.
This could be problematic with Russian or Chinese characters for example, which can be served using UTF-8 (although specific regional encodings are available). These characters that are unfamiliar to use will directly input into your system and they need to be held and presented authentically.
Following on from the site structure itself, there are impacts on site assets. Some graphics may have text embedded on them, or have associated meta data, which can be preserved for alt text on images, page meta descriptions, etc. This associated data needs it’s own language version to realistically present a fully localised site. Retro-fitting an exiting system to do all of this is difficult but it can be done but as with most things, we can’t presume that it’s a quick switch. A literally translated site still requires language-specific content for it’s assets and meta data or it’s a half-measure. Maybe that’s one of the reasons so few sites work globally?