Browser Language Detection and Redirection [en]

[fr] Une explication de la méthode que j'ai suivie pour que http://stephanie-booth.com redirige le visiteur soit vers la version anglaise du site, soit la française, en fonction des préférences linguistiques définies dans son navigateur.

Update, 29.12.2007: scroll to the bottom of this post for a more straightforward solution, using Multiviews.

I’ve been working on stephanie-booth.com today. One of my objectives is the add an English version to the previously French-only site.

I’m doing this by using two separate installations of WordPress. The content in both languages should be roughly equivalent, and I’ll write a WordPress plugin which allows to “automate” the process of linking back and forth from equivalent content in different languages.

What I did today is solve a problem I’ve been wanting to attack for some time now: use people’s browser settings to direct them to the “correct” language for the site. Here is what I learnt in the process, and how I did it. It’s certainly not the most elegant way to do things, so let me know if you have a better solution by using the comments below.

First, what I needed to know was that the browser language preferences are sent in the HTTP_ACCEPT_LANGUAGE header (HTTP header). First, I thought of capturing the information through PHP, but I discovered that Apache (logical, if you think of it) could handle it directly.

This tutorial was useful in getting me started, though I think it references an older version of Apache. Out of the horses mouth, Apache content negotiation had the final information I needed.

I’ll first explain the brief attempt I did with Multiviews (because it can come in handy) before going through the setup I currently have.

Multiviews

In this example, you request a file, e.g. test.html which doesn’t physically exist, and Apache uses either test.html.en or test.html.fr depending on your language preferences. You’ll still see test.html in your browser bar, though.

To do this, add the line:

Options +Multiviews

to your .htaccess file. Create the files test.html.en and test.html.fr with sample text (“English” and “French” will do if you’re just trying it out).

Then, request the file test.html in your browser. You should see the test content of the file corresponding to your language settings appear. Change your browser language prefs and reload to see what happens.

This is pretty neat, but it forces you to open a file — and I wanted / to redirect either to /en/ or to /fr/.

It’s explained pretty well in this tutorial I already linked to, and this page has some useful information too.

Type maps

I used a type map and some PHP redirection magic to achieve my aim. A type map is not limited to languages, but this is what we’re going to use it for here. It’s a text file which you can name menu.var for example. In that case, you need to add the following line to your .htaccess so that the file is dealth with as a type map:

AddHandler type-map .var

Here is the content of my type-map, which I named menu.var:

URI: en.php
Content-Type: text/html
Content-Language: en, en-us, en-gb

URI: fr.php
Content-Type: text/html
Content-Language: fr, fr-ch, fr-qc

Based on my tests, I concluded that the value for URI in the type map cannot be a directory, so I used a little workaround. This means that if you load menu.var in the browser, Apache will serve either en.php or fr.php depending on the content-language the browser accepts, and these two PHP files redirect to the correct URL of the localized sites. Here is what en.php looks like:

And fr.php, logically:

Just in case somebody came by with a browser providing neither English nor French in the HTTP_ACCEPT_LANGUAGE header, I added this line to my .htaccess to catch any 406 errors (“not acceptable”):

ErrorDocument 406 /en.php

So, if something goes wrong, we’re redirected to the English version of the site.

The last thing that needs to be done is to have menu.var (the type map) load automatically when we go to stephanie-booth.com. I first tried by adding a DirectoryIndex directive to .htaccess, but that messed up the use of index.php as the normal index file. Here’s the line for safe-keeping, if you ever need it in other circumstances, or if you want to try:

DirectoryIndex menu.var

Anyway, I used another PHP workaround. I created an index.php file with the following content:

And there we are!

Accepted language priority and regional flavours

In my browser settings, I’ve used en-GB and fr-CH to indicate that I prefer British English and Swiss French. Unfortunately, the header matching is strict. So if the order of your languages is “en-GB, fr-CH, fr, en” you will be shown the French page (en-GB and fr-CH are ignored, and fr comes before en). It’s all explained in the Apache documentation:

The server will also attempt to match language-subsets when no other match can be found. For example, if a client requests documents with the language en-GB for British English, the server is not normally allowed by the HTTP/1.1 standard to match that against a document that is marked as simply en. (Note that it is almost surely a configuration error to include en-GB and not en in the Accept-Language header, since it is very unlikely that a reader understands British English, but doesn’t understand English in general. Unfortunately, many current clients have default configurations that resemble this.) However, if no other language match is possible and the server is about to return a “No Acceptable Variants” error or fallback to the LanguagePriority, the server will ignore the subset specification and match en-GB against en documents. Implicitly, Apache will add the parent language to the client’s acceptable language list with a very low quality value. But note that if the client requests “en-GB; q=0.9, fr; q=0.8”, and the server has documents designated “en” and “fr”, then the “fr” document will be returned. This is necessary to maintain compliance with the HTTP/1.1 specification and to work effectively with properly configured clients.

Apache, Content Negotiation

This means that I added regional language codes to the type map (“fr, fr-ch, fr-qc”) and also that I changed the order of my language preferences in Firefox, making sure that all variations of one language were grouped together, in the order in which I prefer them:

Language Prefs in Firefox

Catching old (now invalid) URLs

There are lots of incoming links to pages of the French site, where it used to live — at the web root. For example, the contact page address used to be http://stephanie-booth.com/contact, but it is now http://stephanie-booth.com/fr/contact. I could write a whole list of permanent redirects in my .htaccess file, but this is simpler. I just copied and modified the rewrite rules that WordPress provides, to make sure that the correct blog installation does something useful with those old URLs (bold is my modification):

# BEGIN WordPress

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . **/fr**/index.php [L]


# END WordPress

In this way, as you can check, http://stephanie-booth.com/contact is not broken.

Next steps

My next mission is to write a small plugin which I will install on both WordPress sites (I’ve got to write it for a client too, so double benefit). This plugin will do the following:

  • add a field to the write/edit post field in which to type the post slug of the correponding page/post in the other language *(e.g. “particuliers” in French will be “individuals” in English)
  • add a link to each post pointing to the equivalent page in the other language

It’s pretty basic, but it beats manual links, and remains very simple. (I like simple.)

As I said, if you have a better (simpler!) way of doing all this, please send it my way.

A simpler solution [Added 29.12.2007]

For each language, create a file named index.php.lg where “lg” is the language code. For French, you would create index.php.fr with the following content:

Repeat for each language available.

Do not put an index.php file in your root directory, just the index.php.lg files.

Add the two following lines to your .htaccess:

Options +Multiviews
ErrorDocument 406 /fr/

…assuming French is the default language you want your site to show up in if your visitor’s browser doesn’t accept any of the languages you provide your site in.

You’re done!

Lars Trieloff: i18n for Web 2.0 (Web 2.0 Expo, Berlin) [en]

steph-note: incomplete notes. I was very disappointed by this session, mainly because I’m exhausted and I was expecting something else, I suppose. I should have read the description of the talk, it’s quite true to what was delivered. Please see my work on multilingualism to get an idea where I come from.

Why internationalize? You have to speak in the language of your user.

e.g. DE rip-offs of popular EN apps like Facebook. CN version of Facebook, and RU, and turkish.

What is different in Web 2.0 internationalization? Much more complicated than normal software i18n, but some things are easier.

More difficult:

  • sites -> apps
  • web as platform
  • JS, Flash, etc…

The i18n challenge is multiplied by the different technologies.

Solution: consolidate i18n technology. Need a common framework for all.

steph-note: OK, this looks like more of a developer track. A little less disappointed.

Keep the i18n data in one place, extract the strings, etc. then pull them back into the application once localized.

Example of how things were done in Mindquarry.

steph-note: oh, this is in the Fundamentals track :-/ — this is way too tech-oriented for a Fundamentals track in my opinion.

steph-note: insert a whole bunch of technical stuff I’m skipping, because I can’t presently wrap my brain around it and it is not what interests me the most, to be honest.

Web 2.0 Expo Berlin 21

Web 2.0 Expo Berlin 22

Not All Switzerland Speaks German, Dammit! [en]

Here we go, yet another misguided attempt at localisation: my MySpace page is now in German.

MySpace now joins PayPal, eBay, Amazon, Google in defaulting to German for Swiss people.

Switzerland is a multilingual country. The linguistic majority speaks Swiss-German (reasonably close to German but quite un-understandable for native German-speakers who have not been exposed to it). Second language in the country is French. Third is Italian, and fourth is… (no, not English) …Romansh.

You know how linguistic minorities are. Touchy. Oh yeah.

As a French speaker with rather less-than-functional German, I do find it quite irritating that these big “multinational” web services assume that I speak German because I’m Swiss. I’d rather have English, and so would many of my non-bilingual fellow-cititzens (particularly amongst web-going people, we tend to be better at English than German).

Yes, I’ve said that English-only is a barrier to adoption. But getting the language wrong is just as bad, if not worse (most people have come to accept the fact that English is the “default” language on the internet, even if they don’t understand it). If I want my Amazon books to be shipped here free of charge, I have to use Amazon.de, which is in German, and doesn’t have a very wide choice of French books. My wishlist is therefore on Amazon.de too, which maybe explains why I never get anything from it.

Paypal is almost worse. I can’t really suggest it to clients as a solution for “selling stuff over the internet”, because all it offers in its Swiss version is a choice between German (default) and English. You can’t sell a book in French with a payment interface in German or English.

So please, remember that country != language, and that there is a little place called Switzerland scrunched up in the middle of Europe, caught between France, Italy, Germany and Austria (Liechtenstein is even worse off than us I suppose), and that not everyone in that little country speaks German.

Thank you.

MediaWiki [en]

I’ve installed MediaWiki. Explanation and solution of a bug I bumped into while installing (because of UTF-8 in MySQL 4.1.x) and comments on the method for interface translation.

[fr] J'ai installé MediaWiki pour récussiter le moribond SpiroLattic, tombé sous les coups du wiki-spam. Voici la solution à  un problème que j'ai rencontré durant l'installation (dû au fait que j'utilise MySQL 4.1.x avec UTF-8), et aussi une description de la façon dont est faite la localisation par utilisateur de l'interface. Très intéressant!

I recently managed to install MediaWiki to replace PhpWiki for SpiroLattic, which I took offline some time ago because the only activity it had become home to was the promotion of various ringtone, viagra, and poker sites.

MediaWiki is the wiki engine behind Wikipedia. It is PHP/MySQL (good for me, maybe not for the server) and has a strong multilingual community.

I bumped into one small problem installing MediaWiki 1.4: the install aborted while creating the tables. Unfortunately, I don’t have the error message anymore, but it was very close to the one given for this bug.

If I understood correctly, when you’re running MySQL 4.1.x in UTF-8, the index key becomes too big, and MySQL balks. The solution is to edit maintenance/tables.sql and to change the length of the index key MySQL was complaining about. In my case, the guilty part of the query was KEY cl_sortkey(cl_to,cl_sortkey(128)) — I replaced 128 by 50 and it went fine. (Don’t forget to clean out the partially built database before reloading the install page — like that you don’t have to fill it all in again.)

MediaWiki allows each user to choose his or her language of choice for the interface. That is absolutely great, particularly for a multilingual wiki! Even better than that, they let users tweak the interface translation strings directly on the wiki.

There is a page named “Special:Allmessages” which lists all the localized strings. If you’re not happy with one of the translations, just click on the string, and the wiki will create a new blank page where you can enter your translation for it, which will override the initial translation. How cool is that?

Something like that for WordPress would be great, in my opinion!

WordPress Polyglots [en]

A mailing-list for WordPress language (localization) issues. Join it!

[fr] Si vous êtes un utilisateur multilingue de WordPress, rejoignez la liste des polyglottes!

If you’re a multilingual or polyglot user of WordPress, please join the polyglots mailing-list.

It’s really great to have a mailing list devoted entirely to language issues!