Browser Language Detection and Redirection [en]

[fr] Une explication de la méthode que j'ai suivie pour que http://stephanie-booth.com redirige le visiteur soit vers la version anglaise du site, soit la française, en fonction des préférences linguistiques définies dans son navigateur.

Update, 29.12.2007: scroll to the bottom of this post for a more straightforward solution, using Multiviews.

I’ve been working on stephanie-booth.com today. One of my objectives is the add an English version to the previously French-only site.

I’m doing this by using two separate installations of WordPress. The content in both languages should be roughly equivalent, and I’ll write a WordPress plugin which allows to “automate” the process of linking back and forth from equivalent content in different languages.

What I did today is solve a problem I’ve been wanting to attack for some time now: use people’s browser settings to direct them to the “correct” language for the site. Here is what I learnt in the process, and how I did it. It’s certainly not the most elegant way to do things, so let me know if you have a better solution by using the comments below.

First, what I needed to know was that the browser language preferences are sent in the HTTP_ACCEPT_LANGUAGE header (HTTP header). First, I thought of capturing the information through PHP, but I discovered that Apache (logical, if you think of it) could handle it directly.

This tutorial was useful in getting me started, though I think it references an older version of Apache. Out of the horses mouth, Apache content negotiation had the final information I needed.

I’ll first explain the brief attempt I did with Multiviews (because it can come in handy) before going through the setup I currently have.

Multiviews

In this example, you request a file, e.g. test.html which doesn’t physically exist, and Apache uses either test.html.en or test.html.fr depending on your language preferences. You’ll still see test.html in your browser bar, though.

To do this, add the line:

Options +Multiviews

to your .htaccess file. Create the files test.html.en and test.html.fr with sample text (“English” and “French” will do if you’re just trying it out).

Then, request the file test.html in your browser. You should see the test content of the file corresponding to your language settings appear. Change your browser language prefs and reload to see what happens.

This is pretty neat, but it forces you to open a file — and I wanted / to redirect either to /en/ or to /fr/.

It’s explained pretty well in this tutorial I already linked to, and this page has some useful information too.

Type maps

I used a type map and some PHP redirection magic to achieve my aim. A type map is not limited to languages, but this is what we’re going to use it for here. It’s a text file which you can name menu.var for example. In that case, you need to add the following line to your .htaccess so that the file is dealth with as a type map:

AddHandler type-map .var

Here is the content of my type-map, which I named menu.var:

URI: en.php
Content-Type: text/html
Content-Language: en, en-us, en-gb

URI: fr.php
Content-Type: text/html
Content-Language: fr, fr-ch, fr-qc

Based on my tests, I concluded that the value for URI in the type map cannot be a directory, so I used a little workaround. This means that if you load menu.var in the browser, Apache will serve either en.php or fr.php depending on the content-language the browser accepts, and these two PHP files redirect to the correct URL of the localized sites. Here is what en.php looks like:

And fr.php, logically:

Just in case somebody came by with a browser providing neither English nor French in the HTTP_ACCEPT_LANGUAGE header, I added this line to my .htaccess to catch any 406 errors (“not acceptable”):

ErrorDocument 406 /en.php

So, if something goes wrong, we’re redirected to the English version of the site.

The last thing that needs to be done is to have menu.var (the type map) load automatically when we go to stephanie-booth.com. I first tried by adding a DirectoryIndex directive to .htaccess, but that messed up the use of index.php as the normal index file. Here’s the line for safe-keeping, if you ever need it in other circumstances, or if you want to try:

DirectoryIndex menu.var

Anyway, I used another PHP workaround. I created an index.php file with the following content:

And there we are!

Accepted language priority and regional flavours

In my browser settings, I’ve used en-GB and fr-CH to indicate that I prefer British English and Swiss French. Unfortunately, the header matching is strict. So if the order of your languages is “en-GB, fr-CH, fr, en” you will be shown the French page (en-GB and fr-CH are ignored, and fr comes before en). It’s all explained in the Apache documentation:

The server will also attempt to match language-subsets when no other match can be found. For example, if a client requests documents with the language en-GB for British English, the server is not normally allowed by the HTTP/1.1 standard to match that against a document that is marked as simply en. (Note that it is almost surely a configuration error to include en-GB and not en in the Accept-Language header, since it is very unlikely that a reader understands British English, but doesn’t understand English in general. Unfortunately, many current clients have default configurations that resemble this.) However, if no other language match is possible and the server is about to return a “No Acceptable Variants” error or fallback to the LanguagePriority, the server will ignore the subset specification and match en-GB against en documents. Implicitly, Apache will add the parent language to the client’s acceptable language list with a very low quality value. But note that if the client requests “en-GB; q=0.9, fr; q=0.8”, and the server has documents designated “en” and “fr”, then the “fr” document will be returned. This is necessary to maintain compliance with the HTTP/1.1 specification and to work effectively with properly configured clients.

Apache, Content Negotiation

This means that I added regional language codes to the type map (“fr, fr-ch, fr-qc”) and also that I changed the order of my language preferences in Firefox, making sure that all variations of one language were grouped together, in the order in which I prefer them:

Language Prefs in Firefox

Catching old (now invalid) URLs

There are lots of incoming links to pages of the French site, where it used to live — at the web root. For example, the contact page address used to be http://stephanie-booth.com/contact, but it is now http://stephanie-booth.com/fr/contact. I could write a whole list of permanent redirects in my .htaccess file, but this is simpler. I just copied and modified the rewrite rules that WordPress provides, to make sure that the correct blog installation does something useful with those old URLs (bold is my modification):

# BEGIN WordPress

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . **/fr**/index.php [L]


# END WordPress

In this way, as you can check, http://stephanie-booth.com/contact is not broken.

Next steps

My next mission is to write a small plugin which I will install on both WordPress sites (I’ve got to write it for a client too, so double benefit). This plugin will do the following:

  • add a field to the write/edit post field in which to type the post slug of the correponding page/post in the other language *(e.g. “particuliers” in French will be “individuals” in English)
  • add a link to each post pointing to the equivalent page in the other language

It’s pretty basic, but it beats manual links, and remains very simple. (I like simple.)

As I said, if you have a better (simpler!) way of doing all this, please send it my way.

A simpler solution [Added 29.12.2007]

For each language, create a file named index.php.lg where “lg” is the language code. For French, you would create index.php.fr with the following content:

Repeat for each language available.

Do not put an index.php file in your root directory, just the index.php.lg files.

Add the two following lines to your .htaccess:

Options +Multiviews
ErrorDocument 406 /fr/

…assuming French is the default language you want your site to show up in if your visitor’s browser doesn’t accept any of the languages you provide your site in.

You’re done!

BlogTalk 2008 Proposal — Being Multilingual: Blogging in More Than One Language [en]

Here’s the proposal I just sent for BlogTalk 2008 (Cork, Ireland, March 3-4):

The strongest borders online are linguistic. In that respect, people who are comfortable in two languages have a key “bridge” role to play. Blogging is one of the mediums through which this can be done.

Most attempts at bilingual (or multilingual) blogging fall in three patterns:

  • separate and independent blogs, one per language
  • one blog with proper translation of all content, post by post
  • one blog with posts sometimes in one language, sometimes in another

These different strategies and other attempts (like community-driven translation) to use blogging as a means to bridge language barriers are worth examining in closer detail.

Considering that most people do have knowledge (at least passive, even if incomplete) of more than one language, multilingual blogging could be much more common than it is now. The tools we use, however, assume that blogs and web pages are in a single language. Many plugins, however, offer solutions to adapt existing tools like WordPress to the needs of multilingual bloggers. Could we go even further in building tools which encourage multilingualism rather than hindering it?


Extra material:

I’ve gathered pointers to previous talks and writings on the topic here: https://climbtothestars.org/focus/multilingual — most of them are about multilingualism on the internet in general, but this proposal is for a talk much more focused on blogging. Here is a video of the first talk I gave in this series (by far not the best, I’m afraid!) http://video.google.com/videoplay?docid=2096847420084039011 and which was about multilingual blogging — it can give you an idea of what this talk could look like, though I’ve refined my thinking since then and have now fallen in the grips of presentation slides. I also intend to base my talk on real-world examples of what bloggers are doing in the field.

Please don’t hesitate to get in touch if you would like more details for evaluating this proposal.

We had a long discussion on IRC about the fact that the submission process required a 2-page paper for a talk (in all honesty, for me, almost the same amount of sweat and tears as preparing the talk itself — I’ll let you figure that one out yourself). BlogTalk is a conference which aims to bridge the space between academics and practitioners, and a 2-page paper, I understood, was actually a kind of compromise compared to the usual 10-15 page papers academics send in when they want to speak at conferences.

The form was changed, following this discussion, to make the inclusion of the paper optional. Of course, this might reflect badly on proposals like mine or Stowe’s which do not include a paper. We’ll see!

I’ll also be speaking on structured portable social networks during the workshop on social network portability, the day before the conference.

Basic Bilingual 0.3 for Multilingual Blogging [en]

[fr] Une mise à jour de mon plugin "Basic Bilingual" qui permet de rendre WordPress bilingue. Modification majeure: il n'y a plus besoin de bidouiller son template pour faire apparaître l'extrait du billet dans "l'autre langue". Par contre, c'est toujours nécessaire pour rajouter les attributs lang.

Long overdue, an upgrade of my plugin Basic Bilingual. Grab the tgz archive or check out the code.

Some explanations. First, you all know of my long-standing interest in all things multilingual and in multilingual blogging in particular.

Years ago, I switched to Movable Type and then to WordPress because I was blogging in two languages. Movable Type allowed me to assign more than one category to each post — so I used two huge categories, fr and en, to indicate what language I was blogging in. This soon made the rebuilds a real pain in the neck, and WordPress allowed me first of all to happily hack it into being multilingual, and then actually write a plugin to do it in a cleaner way. The plugin hasn’t changed much since, and this upgrade isn’t a major one, but it’s a step in the right direction.

Ideally, I’d like people to be able to use the plugin without having to modify their templates at all. I’d also like the plugin to allow filtering out one language if that is what the reader desires. I still hope that WordPress will one day “see the light” and let us define language at post-level (Matt saw the light for tagging ;-), so I do have hope). By the way, I stumbled upon this Ajax Language Switcher for Basic Bilingual earlier today, and it will probably greatly interest those courageous ones of you who tend to have translations of each post or page.

Back to the plugin. It installs normally (unzip everything in the /plugins directory). If you’re using other languages than French and English, you’ll have to manually change the language codes in the plugin file (not very difficult, you don’t have to know PHP to do it; just look for “en” and “fr” and put the language codes for your languages instead).

I’ve fixed an annoying problem with slashes that popped up at some point (somebody else gave me the fix, but I can’t remember who — let me know!).

But most of all, I’ve made the “other language excerpt” appear automatically in the post content. Yes, you hear me: no need to add <php bb_the_other_excerpt(); ?> in your templates anymore. Yay! Added bonus: it will show up in the feeds, too — for that reason, I’ve added a text separator between the excerpt and the post so that there is a separation between the languages.

Basic Bilingual in Google Reader

Obviously, you’ll want to hide these separators and style your posts a little. Here is roughly what I’m using right now:

.other-excerpt {
font-style: italic;
background: #fff;
padding-left: 1em;
padding-right: 1em;
border: 1px solid #ccc;
}

.other-excerpt:lang(fr) p.oe-first-child:before {
content: "[fr] ";
font-weight: bold;
}

.other-excerpt:lang(en) p.oe-first-child:before {
content: "[en] ";
font-weight: bold;
}

.bb-post-separator {
display: none;
}

div.hentry:lang(fr) .entry-title:after {
    content: " [fr] ";
    vertical-align: middle;
    font-size: 80%;
    color: #bbb;
}

div.hentry:lang(en) .entry-title:after {
    content: " [en] ";
    vertical-align: middle;
    font-size: 80%;
    color: #bbb;
}

Now, notice there is fancy stuff in there which relies on the lang attribute. If you’re mixing languages on a page, you should use the lang attribute to indicate which language is where. This means (unfortunately, until I become buddies with PHP’s ob_start() function) that you need to touch your template. It’s not that hard, though.

Find the outermost <div> for each post in the template (it should have the CSS class hentry, by now). Add this inside the tag: lang="<?php bb_the_language(); ?>". Do so on every theme template which produces posts. With the Sandbox theme, it would look like this:

&lt;div id=&quot;post-" class="" lang=""&gt;

That’s it!

If you’re using this plugin, please leave a link to your blog. I’m also always interested in hearing of other examples of multilingual blogging or multilingualism online.

Lars Trieloff: i18n for Web 2.0 (Web 2.0 Expo, Berlin) [en]

steph-note: incomplete notes. I was very disappointed by this session, mainly because I’m exhausted and I was expecting something else, I suppose. I should have read the description of the talk, it’s quite true to what was delivered. Please see my work on multilingualism to get an idea where I come from.

Why internationalize? You have to speak in the language of your user.

e.g. DE rip-offs of popular EN apps like Facebook. CN version of Facebook, and RU, and turkish.

What is different in Web 2.0 internationalization? Much more complicated than normal software i18n, but some things are easier.

More difficult:

  • sites -> apps
  • web as platform
  • JS, Flash, etc…

The i18n challenge is multiplied by the different technologies.

Solution: consolidate i18n technology. Need a common framework for all.

steph-note: OK, this looks like more of a developer track. A little less disappointed.

Keep the i18n data in one place, extract the strings, etc. then pull them back into the application once localized.

Example of how things were done in Mindquarry.

steph-note: oh, this is in the Fundamentals track :-/ — this is way too tech-oriented for a Fundamentals track in my opinion.

steph-note: insert a whole bunch of technical stuff I’m skipping, because I can’t presently wrap my brain around it and it is not what interests me the most, to be honest.

Web 2.0 Expo Berlin 21

Web 2.0 Expo Berlin 22

Reminder: Speaking Tuesday at Web2Open, Berlin [en]

[fr] Je présente une session sur le multilinguisme ici à Berlin, à l'occasion de Web2Open, mardi (demain!) à 10h10.

Just a reminder: I’ll be giving my talk Waiting for the Babel Fish: Languages and Multilingualism Tuesday (tomorrow as of writing) at 10:10 during Web2Open at the Web2.0Expo in Berlin.

I also put together (for the occasion, but I’d been wanting to do it for a long time) a page entirely devoted to my work about languages and multilingualism on the internet. This is the first page of the Focus series which will showcase some of my work and the areas I’m currently active in.

For those of you who’ve been intrigued by this twitter of mine I’m going to make you wait a little more — but if we bump into each other at Web2.0Expo, don’t be shy to ask me about it!

Update: here’s the slideshow! Slightly upgraded since the last incarnation of this talk at Google:

Thanks to all those of you who came. I got lost on the way so arrived late — my apologies to any of you who might have been there on time and left before I arrived.

Advice for a Translating Tool [en]

[fr] Quelques conseils pour mettre en place un outil de traduction d'interfaces en ligne.

I was asked for some advice for a soon-to-be-released online interface translation tool. (Hint: maybe my advice would be more useful earlier on in the project…) Here’s what I said:

  1. allow for regional forking of languages. e.g. there was a merciless
    war on the French wikipedia between the French and the Belgians over
    “Endive” which is called “Chicon” in Belgium. One is not more right than
    another, and these differences can be important.

  2. remember that words which are the same in English can have two
    different translations in other languages. e.g. “Upload” can be
    translated as “Téléchargez” (imperative verb form) or “Téléchargement”
    (noun)

  3. if you’re doing some sort of string-based thing (which I suppose
    you are) like translate.wordpress.com, let people see what they’re
    translating in context. (See the interface in English, with the place
    the string is in highlighted, and then see the interface in French,
    with the string highlighted too.)

Note: yes, this person had already watched my Google Tech Talk on languages online — and yes, I’m going to collect my language stuff somewhere neat on a static page at some point.

Two Panel Submissions for SXSW Interactive (Language Issues) [en]

[fr] Il y a deux propositions portant mon nom pour SXSW -- merci de voter pour elles! Sinon, dates et description de mes prochaines conférences.

Je cherche aussi un "speaking agent" -- faites-moi signe si vous en connaissez un qui travaille avec des personnes basées en Europe. Merci d'avance!

Oh. My. God.

I just realised, reading Brian’s post, that I haven’t blogged about the two panel proposals I’m on for SXSW Interactive next March in Austin, Texas:

  • Opening the Web to Linguistic Realities (co-presenting with Stephanie Troeth)
    ** A basic assumption on the Internet is that everybody speaks and understands one language at a time. Globalism and immigration has created an even more prominent trend of multilingualism amongst the world’s inhabitants. How can the WWW and its core technologies keep up? How can we shift our biased perspectives?
  • Lost in Translation? Top Website Internationalization Lessons (panel I’m moderating)
    ** How do you publish software or content for a global audience? Our expert panel discusses lessons learned translating and localizing. Leaders from Flickr, Google, iStockphoto and the Worldwide Lexicon will tackle various marketing issues; how to translate the ‘feel’ of a Web site, and; best practices for software and content translation.

As you can see, both proposals revolve around the use of languages on the internet — and as you know, it’s one of the topics I care about nowadays. I’ve spoken on this topic a few times now (BlogCamp ZH, Reboot9, Google Tech Talks) and I’m looking forward to taking things further with these new chances to toss these problems around in public.

80 or so of the 700+ panel submissions to SXSW Interactive will be selected by public voting and actually take place. That’s not a lot (roughly 10%). So please go and vote for these two panels (“Amazing” will do) so that they make it into the selection. I really want to go to Austin! (Can you hear me begging? OK, over. But please vote.)

Other than that, I have a few more talks planned in the coming months:

My proposal for Web 2.0 Expo didn’t make it, it seems, but I’ll probably submit something for Web2Open.

And, as you might have heard, I’m looking for a speaking agent. If you can recommend any good speaking agents who work with European-based speakers, please drop me a line or a comment.

A Blog is Not a Post, Dammit! [en]

[fr] De plus en plus répandue, la confusion entre "blog" et "post/billet/article" est un cancer qui ronge la terminologie blogosphérique. Pour mémoire, un blog est un type de site composé d'une série d'articles (ou posts, ou billets). On ne dirait pas, dans le cas d'un magazine composé d'articles, "j'ai écrit un nouveau magazine" -- et donc on ne dit pas "j'ai écrit un nouveau blog sur le sujet".

Photographiez les coupables à coups de saisie d'écran et envoyez-les-moi -- je les ajouterai à la collection dans ce b... illet!

Lately, I (and others) have noticed an increasingly aggravating trend: saying “blog” instead of “post”.

To make it clear: a blog is a type of website, made of a collection of blog posts, or “posts”.

Just like a magazine is a collection of articles. You wouldn’t say “he just wrote a new magazine” instead of “he just wrote a new article”, would you?

So, you don’t say “to write a blog” instead of “to write a post”. It just doesn’t make sense.

I’ve started collecting screenshots of offenders and I’m collecting them here (Flickr tag: ablogisnotapost). Post your own screenshots on Flickr and I’ll add them to this blog… post (!) — with credit, linkage, and everything, of course. Just drop me a line or leave a comment with the link.

Let’s fight back and get all those newcomers to get their terminology straight before it’s too late!

“Blog” and “post” confusion — offenders

How to Make a Blog:

Confusing 'blog' and 'blog post'

E-mail:

E-mail with "blog" and "post" confusion

StumbleUpon:

StumbleUpon » My Preferences

StumbleUpon » My Blog

Plasq, courtesy of Stowe Boyd:

plasq bad blog usage

Maria on Millions of Us, courtesy of Stowe Boyd (one could argue that this is, in fact, her “first blog”):

Her First Blog Ever

Foreign correspondent Telegraph Blog, courtesy of Adam Tinworth:

Not a Britney Blog - a Britney Post!

SAP Community Network:

SAP Blog_Post Confusion

Alan Patrick (his excuse: lots of beer and a late night, and an attempt at justification by invoking a semantic shift of the word “blog”):

broadstuff blog_post confusion

Dwayne Phillips commenting on /Message:

Comment on /Message, blog/post terminology confusion

Tim Berners-Lee himself 🙁:

OMG. TBL himself calling a post a blog :-(

Send me yours!

Most People Are Multilingual [en]

[fr] Une clarification de ce que j'entends par "la plupart des gens sont multilingues". Multilingues au sens large.

In a comment to my last post, Marie-Aude says I’m being a bit optimistic by stating that “most people are multilingual”. I’d like to clarify what I mean by that.

The “most people are multilingual” thing is not from me. I’ve seen it mentioned in varied settings, though I still need to find systematic studies to back it up (let me know if you have any handy).

It all depends how you define “multilingual”. If you define it in a broad sense (ie, school-level passive understanding of a language counts), then a little thinking shows it’s not that “optimistic”. Here is what would make somebody multilingual:

  • immigration, of course
  • learning a foreign language at school
  • living in a country with different linguistic groups.

Some examples:

  • in India, many people are fluent in their mother tongue, and to some extent in one of the countries official languages: Hindi or English
  • in the US, think about the huge immigrant population; the whole country was built upon immigration, come to think of it; in the bus in San Francisco, I often heard more foreign languages than English
  • again in the US (because the English-speaking world is seen as a big “monolingual” block), think of the increasingly important hispanic/latino population (people who will often have knowledge of both English and Spanish)
  • in most European countries, people learn at least one foreign language in school — even if it’s not used, most people retain at least some passive knowledge of it; I’m not sure about Asia, Africa, Southern America, Australia: does anybody know?

So, I don’t think it’s that optimistic to say most people are multilingual. To say that most people are “perfectly multilingual”, of course, is way off the mark. But most people understand more than one language, at least to some extent.

Another Multilingual Talk Proposal (Web 2.0 Expo, Berlin) [en]

[fr] Une proposition de conférence sur le multilinguisme et internet, pour Web 2.0 Expo à Berlin en novembre. J'ai un peu laissé passer le délai, mais advienne que pourra.

I’m sending in a (very late) talk proposal for Web 2.0 Expo, Berlin. Here’s the description I sent them, for my personal records, mainly. We’ll see what happens.

Title: Waiting for the Babel Fish: Languages and Multilingualism

Short description: Languages are the new borders of our connected world, but our tools make them stronger than they have to be. Most people are multilingual: how can language-smart apps help us out of the Internet’s monolingual silos?

Full description: The Internet is the ideal space to reach out to a wide public. However, if geographical boundaries are non-existent, linguistic barriers are all the more present.

Localization is a first step. But though most people and organizations recognize the necessity of catering to non-English audiences, some assumptions on how to do it need to be challenged. For example, countries and languages do not overlap well. Also, most people do not live and function in exclusively one language.

However necessary, localization in itself is not sufficient in getting different linguistic communities to emerge from their silos and mingle.

Multilingual spaces and tools will weaken the linguistic borders by allowing multilingual people of varying proficiency to act as bridges between communities otherwise incapable of communicating.

Till today, unfortunately, our tools are primarily monolingual even when correctly localized, and multilingualism is perceived as an exception or a fringe case which is not worthy of much attention — when in fact, most human beings are multilingual to some extent.

Previous incarnations: for the record again, previous incarnations of this talks (or, to put it slightly differently, other talks I’ve given about this topic):

Speaker blurb: Stephanie Booth lives in Lausanne, Switzerland and Climb to the Stars,
The Internet. After a degree in Indian religions and culture, she has
been a project manager, a middle-school teacher, and is now an
independant web consultant. More importantly, she’s been bilingual
since she could talk, has lived in a multilingual country since she
was two, and been an active web citizen in both English and French
since she landed online in the late 90s.