Browser Language Detection and Redirection [en]

[fr] Une explication de la méthode que j'ai suivie pour que http://stephanie-booth.com redirige le visiteur soit vers la version anglaise du site, soit la française, en fonction des préférences linguistiques définies dans son navigateur.

Update, 29.12.2007: scroll to the bottom of this post for a more straightforward solution, using Multiviews.

I’ve been working on stephanie-booth.com today. One of my objectives is the add an English version to the previously French-only site.

I’m doing this by using two separate installations of WordPress. The content in both languages should be roughly equivalent, and I’ll write a WordPress plugin which allows to “automate” the process of linking back and forth from equivalent content in different languages.

What I did today is solve a problem I’ve been wanting to attack for some time now: use people’s browser settings to direct them to the “correct” language for the site. Here is what I learnt in the process, and how I did it. It’s certainly not the most elegant way to do things, so let me know if you have a better solution by using the comments below.

First, what I needed to know was that the browser language preferences are sent in the HTTP_ACCEPT_LANGUAGE header (HTTP header). First, I thought of capturing the information through PHP, but I discovered that Apache (logical, if you think of it) could handle it directly.

This tutorial was useful in getting me started, though I think it references an older version of Apache. Out of the horses mouth, Apache content negotiation had the final information I needed.

I’ll first explain the brief attempt I did with Multiviews (because it can come in handy) before going through the setup I currently have.

Multiviews

In this example, you request a file, e.g. test.html which doesn’t physically exist, and Apache uses either test.html.en or test.html.fr depending on your language preferences. You’ll still see test.html in your browser bar, though.

To do this, add the line:

Options +Multiviews

to your .htaccess file. Create the files test.html.en and test.html.fr with sample text (“English” and “French” will do if you’re just trying it out).

Then, request the file test.html in your browser. You should see the test content of the file corresponding to your language settings appear. Change your browser language prefs and reload to see what happens.

This is pretty neat, but it forces you to open a file — and I wanted / to redirect either to /en/ or to /fr/.

It’s explained pretty well in this tutorial I already linked to, and this page has some useful information too.

Type maps

I used a type map and some PHP redirection magic to achieve my aim. A type map is not limited to languages, but this is what we’re going to use it for here. It’s a text file which you can name menu.var for example. In that case, you need to add the following line to your .htaccess so that the file is dealth with as a type map:

AddHandler type-map .var

Here is the content of my type-map, which I named menu.var:

URI: en.php
Content-Type: text/html
Content-Language: en, en-us, en-gb

URI: fr.php
Content-Type: text/html
Content-Language: fr, fr-ch, fr-qc

Based on my tests, I concluded that the value for URI in the type map cannot be a directory, so I used a little workaround. This means that if you load menu.var in the browser, Apache will serve either en.php or fr.php depending on the content-language the browser accepts, and these two PHP files redirect to the correct URL of the localized sites. Here is what en.php looks like:

And fr.php, logically:

Just in case somebody came by with a browser providing neither English nor French in the HTTP_ACCEPT_LANGUAGE header, I added this line to my .htaccess to catch any 406 errors (“not acceptable”):

ErrorDocument 406 /en.php

So, if something goes wrong, we’re redirected to the English version of the site.

The last thing that needs to be done is to have menu.var (the type map) load automatically when we go to stephanie-booth.com. I first tried by adding a DirectoryIndex directive to .htaccess, but that messed up the use of index.php as the normal index file. Here’s the line for safe-keeping, if you ever need it in other circumstances, or if you want to try:

DirectoryIndex menu.var

Anyway, I used another PHP workaround. I created an index.php file with the following content:

And there we are!

Accepted language priority and regional flavours

In my browser settings, I’ve used en-GB and fr-CH to indicate that I prefer British English and Swiss French. Unfortunately, the header matching is strict. So if the order of your languages is “en-GB, fr-CH, fr, en” you will be shown the French page (en-GB and fr-CH are ignored, and fr comes before en). It’s all explained in the Apache documentation:

The server will also attempt to match language-subsets when no other match can be found. For example, if a client requests documents with the language en-GB for British English, the server is not normally allowed by the HTTP/1.1 standard to match that against a document that is marked as simply en. (Note that it is almost surely a configuration error to include en-GB and not en in the Accept-Language header, since it is very unlikely that a reader understands British English, but doesn’t understand English in general. Unfortunately, many current clients have default configurations that resemble this.) However, if no other language match is possible and the server is about to return a “No Acceptable Variants” error or fallback to the LanguagePriority, the server will ignore the subset specification and match en-GB against en documents. Implicitly, Apache will add the parent language to the client’s acceptable language list with a very low quality value. But note that if the client requests “en-GB; q=0.9, fr; q=0.8”, and the server has documents designated “en” and “fr”, then the “fr” document will be returned. This is necessary to maintain compliance with the HTTP/1.1 specification and to work effectively with properly configured clients.

Apache, Content Negotiation

This means that I added regional language codes to the type map (“fr, fr-ch, fr-qc”) and also that I changed the order of my language preferences in Firefox, making sure that all variations of one language were grouped together, in the order in which I prefer them:

Language Prefs in Firefox

Catching old (now invalid) URLs

There are lots of incoming links to pages of the French site, where it used to live — at the web root. For example, the contact page address used to be http://stephanie-booth.com/contact, but it is now http://stephanie-booth.com/fr/contact. I could write a whole list of permanent redirects in my .htaccess file, but this is simpler. I just copied and modified the rewrite rules that WordPress provides, to make sure that the correct blog installation does something useful with those old URLs (bold is my modification):

# BEGIN WordPress

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . **/fr**/index.php [L]


# END WordPress

In this way, as you can check, http://stephanie-booth.com/contact is not broken.

Next steps

My next mission is to write a small plugin which I will install on both WordPress sites (I’ve got to write it for a client too, so double benefit). This plugin will do the following:

  • add a field to the write/edit post field in which to type the post slug of the correponding page/post in the other language *(e.g. “particuliers” in French will be “individuals” in English)
  • add a link to each post pointing to the equivalent page in the other language

It’s pretty basic, but it beats manual links, and remains very simple. (I like simple.)

As I said, if you have a better (simpler!) way of doing all this, please send it my way.

A simpler solution [Added 29.12.2007]

For each language, create a file named index.php.lg where “lg” is the language code. For French, you would create index.php.fr with the following content:

Repeat for each language available.

Do not put an index.php file in your root directory, just the index.php.lg files.

Add the two following lines to your .htaccess:

Options +Multiviews
ErrorDocument 406 /fr/

…assuming French is the default language you want your site to show up in if your visitor’s browser doesn’t accept any of the languages you provide your site in.

You’re done!

Hairy .htaccess Dreamhost WordPress Problem Solved! [en]

[fr] Résolution d'un problème qui m'a littéralement empoisonné mes vacances. Ouf.

Thanks to grimboy, my “parent” .htaccess now has two extra lines and looks like this (and the problem that has kept me awake for the last week is solved):

AddDefaultCharset OFF
# BEGIN WordPress

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} ^/membres.*$ [OR]
RewriteCond %{REQUEST_URI} ^/failed_auth.html$
RewriteRule ^.*$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php

# END WordPress

Thanks so much!

Hairy .htaccess Dreamhost WordPress Problem [en]

[fr] Un des derniers problèmes qui me résistent sur le nouveau serveur.

Here’s roughly what I wrote in a support ticket I sent Dreamhost this morning. If you have any suggestions, I’ll take them.

> Hello,

> I have a site http://cafecafe.ch on which I have installed wordpress
(http://cafecafe.ch/wp/ -> displays as http://cafecafe.ch/blog with
Filosofo Homepage-Control plugin).

> That server has a subdirectory http://cafecafe.ch/membres/ which is
password-protected using .htaccess. Inside is another wordpress
install http://cafecafe.ch/membres/wp.

> I had this set up on my previous host and it worked fine.

> Now, if I go to http://cafecafe.ch/membres/ the request is caught be
the blog installed in http://cafecafe.ch/wp/, and I’m shown the page
http://cafecafe.ch/blog/.

> To make sure it wasn’t a conflict between the two wordpress installs,
I created an empty directory http://cafecafe.ch/test/ which I tried to
password-protect in the same way. The problem is the same (going to
http://cafecafe.ch/test/ displays http://cafecafe.ch/blog). If I
comment out the “request valid-user” line of the .htaccess, I get to
see the directory listing.

> Similarly, if I come back to http://cafecafe.ch/membres and comment
out that line in .htaccess, both wordpress installs work fine, with
permalinks and all (only the private blog isn’t protected anymore,
which won’t do it).

> I’ve tried not doing the password protection manually, and using what
is provided in the panel for that, but the problem remains exactly the
same.

> Weird, isn’t it?

> Hope you can help me out on this. Tried checking error logs but they
were empty.

Getting Rid of www [en]

[fr] Une recette pour faire disparaître magiquement ce satané "www" des noms de domaines que j'héberge...

I personally hate having “www” in front of a domain name. It’s redundant. If we’re visiting a website, we’re on the web anyway. It also brings no end of problems when people start writing for the web and creating links, because they think that what makes something a “website address” is the “www” in front if it, instead of “http://”. That’s how they end up with links like “http://example.com/www.yahoo.com” on their sites. But I digress.

On one of the sites I manage, we have a restricted members-only area. However, our users started reporting that when they used “www” in front of the domain name, they were being asked for the password twice. I tried myself, and I was simply asked for the password ad aeternam. Probably a server configuration glitch somewhere.

Anyway, I decided the simplest solution was to redirect all “www” requests to the non-www domain. I know I had that in place for CTTS at some point, but the setting must have got lost at some point. Instead of sticking rewrite rules in .htaccess as no-www.org suggests, I modified my vhost configuration slightly so that it looked like this:

ServerName example.com
DocumentRoot /home/example/www/
ErrorLog logs/example-error
CustomLog logs/example-access combined



ServerName www.example.com
Redirect permanent / http://example.com/

Try it!

http://www.cafecafe.ch/

Many thanks to those who gave suggestions and nudged me along the way to this solution.

Scripts for a WordPress Weblog Farm [en]

A first step to WordPress-farming: a shell script and a PHP script which allow you to easily install a whole lot of WordPress weblogs in only a few minutes (I installed over 30 in less than 5 minutes). Scripts require adapting to your environment, of course.

Update 03.11.06: Batiste made me realise I should point the many people landing here in the search of multi-user WordPress to WordPress MU. All that I describe in this post is very pretty, but nowadays completely obsolete.

Here is the best solution I’ve managed to come up with in half a day to finally install over 30 WordPress weblogs in under 5 minutes (once the preparation work was done).

A shell script copies the image of a WordPress install to multiple directories and installs them. A PHP script then changes a certain number of options and settings in each weblog. It can be used later to run as a “patch” on all installed weblog if a setting needs modifying globally.

Here are the details of what I did.

I first downloaded and unzipped WordPress into a directory.

wget http://wordpress.org/latest.tar.gz
tar -xzvf latest.tar.gz
mv wordpress wp-farm-image

I cleaned up the install (removing wp-comments-popup.php and the import*.php files, for example), added a language directory (as I’m wp-farming in French) and modified index.php to my liking; in particular, I edited the import statement for the stylesheet so that it looked like this:

@import url( http://edublogs.net/styles/toni/style.css );

The styles directory is a directory in which I place a bunch of WordPress styles. I don’t need the style switcher capability, but I do need to styles. Later, users will be able to change styles simply by editing that line in their index.php (or I can do it for them).

Another very important thing I did was rename wp-config-sample.php to config-sample and fill in the database and language information. I replaced wp_ by xxx_ so that I had $table_prefix = 'xxx_';.

To make it easier to install plugins for everyone, correct the language files, and edit whatever may be in wp-images, I moved these three directories out of the image install and replaced them with symbolic links, taking inspiration from Shelley’s method for installing multiple WordPress weblogs.

mv image/wp-content common
mv image/wp-images common
mv image/wp-includes/languages common
ln -s common/wp-content image/wp-content
ln -s common/wp-images image/wp-images
ln -s common/languages image/wp-includes/languages

I also added an .htaccess file (after some painful tweaking on a test install).

Once my image was ready, I compiled a list of all the users I had to open weblogs for (one username per line) in a file named names.txt, which I placed in the root directory all the weblog subfolders were going to go in.

I then ran this shell script (many thanks to all those of you who helped me with it — you saved my life):

for x in `cat names.txt`
do
cp -rv /home/edublogs/wp-farm/image/ $x
cat $x/wp-config.php | sed "s/xxx/${x}/" > config.tmp
mv config.tmp $x/wp-config.php
wget http://st-prex.edublogs.net/$x/wp-admin/install.php?step=1
wget http://st-prex.edublogs.net/$x/wp-admin/install.php?step=2
wget http://st-prex.edublogs.net/$x/wp-admin/install.php?step=3
done

This assumes that my WordPress install image was located in /home/edublogs/wp-farm/image/ and that the weblog addresses were of the form http://st-prex.edublogs.net/username/.

This script copies the image to a directory named after the user, edits wp-config to set the table prefix to the username, and then successively wgets the install URLs to save me from loading them all in my browser.

After this step, I had a bunch of installed but badly configured weblogs (amongst other things, as I short-circuited the form before the third install step, they all think their siteurl is example.com).

Entered the PHP patch which tweaks settings directly in the database. I spent some time with a test install and PHPMyAdmin to figure out which fields I wanted to change and which values I wanted to give them, but overall it wasn’t too complicated to do. You’ll certainly need to heavily edit this file before using it if you try and duplicate what I did, but the basic structure and queries should remain the same.

I edited the user list at the top of the file, loaded it in my browser, and within less than a few seconds all my weblogs were correctly configured. I’ll use modified versions of this script later on when I need to change settings for all the weblogs in one go (for example, if I want to quickly install a plugin for everyone).

In summary:

  1. compile list of users
  2. prepare image install
  3. run shell script
  4. run PHP script

If you try to do this, I suggest you start by putting only two users in your user list, and checking thoroughly that everything installs and works correctly before doing it for 30 users. I had to tweak the PHP script quite a bit until I had all my settings correctly configured.

Hope this can be useful to some!

Update 29.09.2005: WARNING! Hacking WordPress installs to build a farm like this one is neat, but it gets much less neat when your weblog farm is spammed with animal porn comments. You then realise (oh, horror!) that none of the anti-spam plugins work on your beautiful construction, so you weed them out by hand as you can, armed with many a MySQL query. And then the journalist steps in — because, frankly, “sex with dogs” on a school website is just too good to be true. And then you can spend half a day writing an angry reaction to the shitty badly-researched article.

My apologies for the bad language. Think of how you’re going to deal with spam beforehand when you’re setting up a school blog project.

Musings on a Multiblog WordPress [en]

Thinking about a solution to make WordPress MultiBlog. Comments, criticism and other ideas welcome — please join the fun. In particular, I bump into a hairy PHP include problem.

[fr] Je réfléchis à comment on pourrait donner à WordPress la capacité de gérer plusieurs blogs avec une installation. Je me heurte à un problème concernant les includes PHP. Feedback et autres idées bienvenues!

Update June 2007: Try WordPress Multi-User now.

I’ve used Shelley’s instructions using soft links. I tried Rubén’s proof-of-concept, but got stuck somewhere in the middle.

So I started thinking: how can we go about making WordPress MultiBlog-capable? Here is a rough transcript of my thoughts (I’ve removed some of the dead ends and hesitations) in the hope that they might contribute to the general resolution of the problem. I have to point out my position here: somebody with a dedicated server who’s thinking of setting up a “WordPress weblog-farm” (for my pupils, mainly). So I’m aware that I’m not the “standard user” and that my solution is going to be impractical to many. But hey, let’s see where it leads, all the same. Actually, I think I probably reconstructed most of Rubén’s strategy here — but I’m not sure to what extent what I suggest differs from what he has done.

From a system point of view, we want to have a unique installation of WordPress, and duplication of only the files which are different from one blog to another (index.php, wp-config.php, wp-comments.php, wp-layout.css, to name a few obvious ones). The whole point being that when the isntall needs to be upgraded, it only has to be upgraded in one place. When a plugin is downloaded and installed, it only has to be done once for all weblogs — though it can of course be activated individually for each weblog.

From the point of view of the weblogs themselves, they need to appear to be in different domains/subdomains/folders/whatever. What I’m most interested in is different subdomains, so I’ll stick to that in my thinking. (Then somebody can come and tell me that my “solution” doesn’t work for subfolders, and here’s one that works for subfolders and subdomains, and we’ll all be happy, thankyouverymuch.) So, when I’m working with blog1.example.com all the addresses need to refer to that subdomain (blog1.example.com/wp-admin/, etc); ditto for blog2.example.com, blog3.example.com, blogn.example.com (I used to like maths in High School a lot).

As Rubén puts it, the problem with symbolic links (“soft links”) is called “soft link hell”: think of a great number of rubber bands stretched all over your server. Ugh. So let’s try to go in his direction, for a while. First, map all the subdomains to the same folder on the server. Let’s say blog1.example.com, blog2.example.com (etc.) all point to /home/bunny/www/wordpress/. Neat, huh? Not so. They will all use the same wp-config.php file, and hence all be the same weblog.

This is where Rubén’s idea comes in: include a file at the top of wp-config.php which:

  1. identifies which blog we are working with (in my case, by parsing $HTTP_HOST, for example — there might be a more elegant solution)
  2. “replaces” the files in the master installation directory by the files in a special “blog” directory, if they exist

The second point is the tricky one, of course. We’d probably have a subfolder per blog in wordpress/blogs: wordpress/blogs/blog1, wordpress/blogs/blog2, etc. The included file would match the subdomain string with the equivalent folder, check if the page it’s trying to retrieve exists in the folder, and if it does, include that one and stop processing the initial script after that. Another (maybe more elegant) option would be to do some Apache magic (I’m dreaming, no idea if it’s possible) to systematically check if a file is available in the subdirectory matching the subdomain before using the one in the master directory. Anybody know if this is feasible?

The problem I see is with includes. We have (at least) three types of include calls:

  • include (ABSPATH . 'wp-comments.php');
  • require ('./wp-blog-header.php');
  • require_once(dirname(__FILE__).'/' . '/wp-config.php');

As far as I see it, they’ll all break if the calling include is in /home/bunny/www/wordpress/blogs/blog1 and the file to be called is in /home/bunny/www/wordpress. What is wrong with relative includes? Oh, they would break too. Dammit.

We would need some intelligence to determine if the file to be included or called exists in the subdirectory or not, and magically adapt the include call to point to the “right” file. I suspect this could be done, but would require modifying all (at least, a lot of) the include/requires in WordPress.

Maybe another path to explore would be to create a table in the database to keep track of existing blogs, and of the files that need to be “overridden” for each blog. But again, I suspect that would mean recoding all the includes in WordPress.

Another problem would be .htaccess. Apache would be retrieving the same .htaccess for all subdomains, and that happens before PHP comes into play, if I’m not mistaken.

Any bright ideas to get us out of this fix? Alternate solutions? Comments? Things I missed or got wrong? The comments and trackbacks are yours. Thanks for your attention.