Finally out of MySQL encoding hell [en]

[fr] Description de comment je me suis sortie des problèmes d'encodage qui résultaient en l'affichage de hiéroglyphes sur tous les sites hébergés sur mon serveur.

It took weeks, mainly because I was busy with a car accident and the end of school, but it also took about two real whole days of head-banging on the desk to get it fixed.

Here’s what happened: remember, a long time ago, I had trouble with stuff in my database which was supposed to be UTF-8 but seemed to be ISO-8859-1? And then, sometime later, I had a weird mixture of UTF-8 and ISO-8859-1 in the same database?

Well, somewhere along the line this is what I guess happened: my database installation must have been serving UTF-8 content as ISO-8859-1, leading me to believe it was ISO-8859-1 when it was in fact UTF-8. That led me to try to convert it to UTF-8 — meaning I took UTF-8 strings and ran them through a converter supposed to turn ISO-8859-1 into UTF-8. The result? Let’s call it “double-UTF-8” (doubly encoded UTF-8), for want of a better name.

Anyway, that’s what I had in my database. When we upgraded MySQL and PHP on the server, I suddenly started seeing a load of junk instead of my accented characters:

encoding-problem-2

What I was seeing looked furiously like UTF-8 looks when your server setup is messed up and serves it as ISO-8859-1 instead. But, as you can see on the picture above, this page was being served as UTF-8 by the server. How did I know it wasn’t ISO-8859-1 in my database instead of this hypothetical “double-UTF-8”? Well, for one, I knew the page was served as UTF-8, and I also know that ISO-8859-1 (latin-1) served as UTF-8 makes accented characters look like question marks. Then, if I wanted to be sure, I could just change the page encoding in Firefox to ISO-8859-1 (that should make it look right if it was ISO-8859-1, shouldn’t it?) Well, it made it look worse.

Another indication was that when the MySQL connection encoding (in my.cnf) was set back to latin-1 (ISO-8859-1), the pages seemed to display correctly, but WordPress broke.

The first post on the picture I’m showing here looks “OK”, because it was posted after the setup was changed. It really is UTF-8.

Now how did we solve this? My initial idea was to take the “double-UTF-8” content of the database (and don’t forget it was mixed with the more recent UTF-8 content) and convert it “from UTF-8 to ISO-8859-1”. I had a python script we had used to fix the last MySQL disaster which converted everything to UTF-8 — I figured I could reverse it. So I rounded up a bunch of smart people (dda_, sbp, bonsaikitten and Blackb|rd — and countless others, sorry if I forgot you!) and got to work.

It proved a hairier problem than expected. What also proved hairy was explaining the problem to people who wanted to help and insisted in misunderstanding the situation. In the end, we produced a script (well, “they” rather than “we”) which looked like it should work, only… it did nothing. If you’re really interested in looking at it, here it is — but be warned, don’t try it.

We tried recode. We tried iconv. We tried changing my.cnf settings, dumping the databases, changing them back, and importing the dumps. Finally, the problem was solved manually.

  1. Made a text file listing the databases which needed to be cured (dblist.txt).
  2. Dumped them all: for db in $(cat dblist.txt); do mysqldump --opt -u user -ppassword ${db} > ${db}-20060712.sql; done
  3. Sent them over to Blackb|rd who did some search and replace magic in vim, starting with this list of characters (just change the browser encoding to latin-1 to see what they look like when mangled)
  4. Imported the corrected dumps back in: for db in $(cat dblist.txt); do mysql -u user -ppassword ${db} < ${db}-20060712.sql; done

Blackb|rd produced a shell script for vim (?) which I’ll link to as soon as I lay my hands on the URL again. The list of characters to convert was produced by trial and error, knowing that corrupted characters appeared in the text file as A tilde or A circonflexe followed by something else. I’d then change the my.cnf setting back to latin-1 to view the character strings in context and allow Blackbr|d to see what they needed to be replaced with.

Thanks. Not looking forward to the next MySQL encoding problem. They just seem to get worse and worse. (And yes, I do use UTF-8 all over the place.)

Intégrer coComment sur votre blog [en]

Vous savez probablement que je ne jure que par coComment, dans la genèse duquel j’ai eu la chance de jouer un petit rôle.

Ça sert à quoi?

Ça sert principalement à choses pour l’instant:

  1. collectionner ses commentaires faits sur différents blogs en un seul endroit, comme on peut le faire en les bookmarquant avec del.icio.us, mais de façon bien plus pratique;

  2. voir facilement si quelqu’un d’autre a répondu à un de nos commentaires — mais attention, ceci ne marche pas très bien pour l’instant, car coComment est seulement capable de suivre les commentaires de gens ayant un compte coComment.

S’inscrire

Ouvrir un compte, c’est super facile, il suffit de donner nom et e-mail et de choisir un nom d’utilisateur et un mot de passe.

On notera qu’il n’est pas nécessaire d’avoir un blog pour trouver une utilité à coComment. Il y a des personnes qui participent activement à la blogosphère à travers leurs commentaires, sans pour autant être blogueurs. CoComment est pour vous!

Rendre coComment plus pratique

Une fois le compte ouvert, coComment vous fournit un bookmarklet, une sorte de lien “favori” que vous pouvez faire glisser dans la barre d’outils de votre navigateur ou dans vos favoris. Ensuite, quand vous laissez un commentaire chez quelqu’un, vous cliquez sur le bookmarklet dans votre navigateur avant de publier le commentaire.

Ça, ça devient très vite barbant. On oublie de cliquer sur le bookmarklet. Du coup, notre commentaire n’apparaît pas sur notre page de conversations coComment. Il paraît qu’on peut maintenant cliquer sur le bookmarklet après avoir envoyé le formulaire, mais personnellement je n’ai pas testé.

Les commentateurs peuvent faire quelque chose pour se simplifier la vie. Les auteurs de blogs peuvent faire quelque chose pour simplifier la vie de leurs commentateurs.

Plus pratique pour moi qui laisse des commentaires

Pour ça, il faut utiliser Firefox. Deux solutions s’offrent à vous.

  1. Le script GreaseMonkey. Ce script vous évite de devoir cliquer sur le bookmarklet à chaque fois. Vous pouvez donc oublier coComment quand vous laissez vos commentaires, c’est tout automatisé. Sympa, non?

    Script, GreaseMonkey, chinois? Pas peur, instructions pour les nuls. D’abord, installer l’extension GreaseMonkey (non, ça fait pas mal). Pour ça, on commence par s’assurer que l’on a Firefox 1.5 (voir lien ci-dessus), puis on va sur le site de l’extension GreaseMonkey. Une fois là -bas, on cliquera sur le lien qui s’appelle “Install GreaseMonkey” dans la deuxième moitié de la page. On dit oui à tout ce que nous demande Firefox (oui on veut installer l’extension, oui, oui, OK on ferme le navigateur et on le rouvre…)

    Ensuite, on clique sur ce lien-ci qui va installer le script et on dit également oui à tout.

    Voilà ! C’est fait 🙂

  2. L’extension Firefox pour coComment. Même chose que plus haut, on clique sur le lien, on clique ensuite sur “Download coComment! for Firefox”, et on dit oui, amen à tout ce que nous demande notre navigateur chéri. L’extension me paraît moins utile que le script GreaseMonkey, car elle ne fait qu’ajouter le bookmarklet coComment au menu contextuel qui apparaît lors d’un clic droit (ou long clic pour les personnes qui ont un Mac). Mais il paraît que c’est utile parfois lorsque les commentaires sont dans une fenêtre pop-up. Personnellement, j’ai vu que ça ne marchait pas tout le temps, mais j’essaie quand même.

    Note: j’ai désactivé l’extension Firefox vu que je ne l’utilise pas. A vous de voir si elle vous sert.

Plus pratique pour moi qui ai un blog

Si vous avez un blog, vous pouvez faire en sorte que vos commentateurs, s’ils ont un compte coComment, n’aient pas besoin de cliquer sur le bookmarklet, même s’ils n’ont pas installé l’extension GreaseMonkey décrite ci-dessus. Cool, non? Pour cela, il faut rajouter du code javascript pas trop loin du formulaire de commentaires.

Le code est fourni à la fin de ce billet par Merlin. Comme j’utilise une version assez standard de WordPress, je n’ai eu personnellement qu’à copier-coller ce qui était donné dans le billet. Bon, faut encore voir si ça fonctionne 😉 Attention, donc, si vous avez un autre outil de blog, il faut peut-être adapter le code. Cet autre billet explique plus précisément quel rôle joue chaque ligne et vous aidera certainement à modifier le code si nécessaire.

Attention! Pour le moment, j’arrive pas à faire marcher ça. Plus de nouvelles dès que c’est réglé. Ça marche maintenant, mais il faut faire attention aux guillemets malins pas si malins quand ils sont dans du code

Oui mais… DotClear, autres plateformes de blog, etc…?

Pas de panique. Premièrement, il faut savoir que les gars de coComment bossent d’arrache-pied pour augmenter le nombre de plate-formes avec lesquelles ils sont compatibles. Si votre blog n’est pas compatible avec coComment, mais que vous pouvez modifier votre formulaire de commentaires, tout n’est pas perdu.

Le dernier billet que j’ai mentionné explique comment faire. Il faut donc rajouter un certain nombre de lignes javascript dans le formulaire, et voilà ! Je suis certaine que DotClear fournit toutes les informations nécessaires mais avec d’autres noms que ceux auxquels s’attend coComment. Il suffirait donc qu’un(e) DotClearien(ne) prenne le taureau par les cornes et adapte le javascript aux variables de DotClear (la version publiée utilise les variables WordPress). Si vous faites ce travail, rendez-le public sur votre blog, et je lancerai un lien dans se direction! Qui s’y colle?

Mise à jour: Nicolas propose en commentaire le code à intégrer à DotClear. Quelqu’un peut confirmer que ça marche?

Fixing Newline Abuse in WordPress [en]

Newlines had crept into some of my old posts and corrupted the formatting. Quick and dirty script I used to solve the problem.

[fr] Des retours à  la ligne excédentaires se sont glissés dans mes anciens posts à  un moment donné, cassant des liens et le formattage en général. J'ai utilisé un petit script pour supprimer tous les retours à  la ligne de ces posts.

I only realised right now how badly broken about 500 of my old posts where. Somewhere in the process, newlines got added at the end of each line, and not just at the end of each paragraph. As WordPress kindly and intelligently adds the necessary HTML markup for paragraph and line breaks in posts, this meant that lots of my posts had broken links (when the newline occurred in the middle of a link) and uneven lines.

Victims: about 500 posts.

Solution: luckily, all the concerned posts had complete HTML formatting in them. Purely and simply removing all the newlines with a short PHP script was the way to go.

Thanks to all those who helped. You probably won’t be able to use the script as is, but if you have a similar problem at some point, it might help. The script pulls out the posts from the WordPress database, removes the newlines, and puts the post back into the table.

Scripts for a WordPress Weblog Farm [en]

A first step to WordPress-farming: a shell script and a PHP script which allow you to easily install a whole lot of WordPress weblogs in only a few minutes (I installed over 30 in less than 5 minutes). Scripts require adapting to your environment, of course.

Update 03.11.06: Batiste made me realise I should point the many people landing here in the search of multi-user WordPress to WordPress MU. All that I describe in this post is very pretty, but nowadays completely obsolete.

Here is the best solution I’ve managed to come up with in half a day to finally install over 30 WordPress weblogs in under 5 minutes (once the preparation work was done).

A shell script copies the image of a WordPress install to multiple directories and installs them. A PHP script then changes a certain number of options and settings in each weblog. It can be used later to run as a “patch” on all installed weblog if a setting needs modifying globally.

Here are the details of what I did.

I first downloaded and unzipped WordPress into a directory.

wget http://wordpress.org/latest.tar.gz
tar -xzvf latest.tar.gz
mv wordpress wp-farm-image

I cleaned up the install (removing wp-comments-popup.php and the import*.php files, for example), added a language directory (as I’m wp-farming in French) and modified index.php to my liking; in particular, I edited the import statement for the stylesheet so that it looked like this:

@import url( http://edublogs.net/styles/toni/style.css );

The styles directory is a directory in which I place a bunch of WordPress styles. I don’t need the style switcher capability, but I do need to styles. Later, users will be able to change styles simply by editing that line in their index.php (or I can do it for them).

Another very important thing I did was rename wp-config-sample.php to config-sample and fill in the database and language information. I replaced wp_ by xxx_ so that I had $table_prefix = 'xxx_';.

To make it easier to install plugins for everyone, correct the language files, and edit whatever may be in wp-images, I moved these three directories out of the image install and replaced them with symbolic links, taking inspiration from Shelley’s method for installing multiple WordPress weblogs.

mv image/wp-content common
mv image/wp-images common
mv image/wp-includes/languages common
ln -s common/wp-content image/wp-content
ln -s common/wp-images image/wp-images
ln -s common/languages image/wp-includes/languages

I also added an .htaccess file (after some painful tweaking on a test install).

Once my image was ready, I compiled a list of all the users I had to open weblogs for (one username per line) in a file named names.txt, which I placed in the root directory all the weblog subfolders were going to go in.

I then ran this shell script (many thanks to all those of you who helped me with it — you saved my life):

for x in `cat names.txt`
do
cp -rv /home/edublogs/wp-farm/image/ $x
cat $x/wp-config.php | sed "s/xxx/${x}/" > config.tmp
mv config.tmp $x/wp-config.php
wget http://st-prex.edublogs.net/$x/wp-admin/install.php?step=1
wget http://st-prex.edublogs.net/$x/wp-admin/install.php?step=2
wget http://st-prex.edublogs.net/$x/wp-admin/install.php?step=3
done

This assumes that my WordPress install image was located in /home/edublogs/wp-farm/image/ and that the weblog addresses were of the form http://st-prex.edublogs.net/username/.

This script copies the image to a directory named after the user, edits wp-config to set the table prefix to the username, and then successively wgets the install URLs to save me from loading them all in my browser.

After this step, I had a bunch of installed but badly configured weblogs (amongst other things, as I short-circuited the form before the third install step, they all think their siteurl is example.com).

Entered the PHP patch which tweaks settings directly in the database. I spent some time with a test install and PHPMyAdmin to figure out which fields I wanted to change and which values I wanted to give them, but overall it wasn’t too complicated to do. You’ll certainly need to heavily edit this file before using it if you try and duplicate what I did, but the basic structure and queries should remain the same.

I edited the user list at the top of the file, loaded it in my browser, and within less than a few seconds all my weblogs were correctly configured. I’ll use modified versions of this script later on when I need to change settings for all the weblogs in one go (for example, if I want to quickly install a plugin for everyone).

In summary:

  1. compile list of users
  2. prepare image install
  3. run shell script
  4. run PHP script

If you try to do this, I suggest you start by putting only two users in your user list, and checking thoroughly that everything installs and works correctly before doing it for 30 users. I had to tweak the PHP script quite a bit until I had all my settings correctly configured.

Hope this can be useful to some!

Update 29.09.2005: WARNING! Hacking WordPress installs to build a farm like this one is neat, but it gets much less neat when your weblog farm is spammed with animal porn comments. You then realise (oh, horror!) that none of the anti-spam plugins work on your beautiful construction, so you weed them out by hand as you can, armed with many a MySQL query. And then the journalist steps in — because, frankly, “sex with dogs” on a school website is just too good to be true. And then you can spend half a day writing an angry reaction to the shitty badly-researched article.

My apologies for the bad language. Think of how you’re going to deal with spam beforehand when you’re setting up a school blog project.

Requirements for a WordPress Installer Script [en]

To install 30+ WordPress blogs on a server, it would be nice to have an automatic installation script. Here is a list of what this script should do for me.

[fr] Pour installer plus de trente weblogs WordPress sur mon serveur, il serait utile d'avoir un script d'installation en PHP. Quelqu'un a offert d'en écrire un pour moi. Ce billet récapitule ce que devrait faire un tel script, de mon point de vue (installation et configuration de WordPress en fonction d'un nom d'utilisateur).

As you may know, I’m shortly going to install 30+ WordPress blogs on my server. Noderat on #wordpress kindly offered to have a go at writing a PHP script to automate WP installs. I sent him this list of what the ideal script should be able to do for me, but on second thoughts, I’m posting it here so that everybody may see it. Of course, if you know of an existing script which already does this, let me know!

  • take $username + $password as input
  • install wordpress in a subdir named “$username”, using table prefix “$username_” and with an extra user (on top of admin) named “$username” (password=”$password”), user level 3
  • mysql user should be “$username” too (password “$password” also), with grants only on the tables belonging to this weblog
  • set permalink scheme to
    /archives/%year%/%monthnum%/%day%/%postname%/ for monthlies and /categories for categories
  • generate .htaccess in directory, based on this template, with “blog” replaced by “$username” everywhere
  • in wp-config: define (‘WPLANG’, ‘fr’) and edit appropriate lines
  • wp-includes/languages and wp-content/plugins should be symlinked to directories I can specify in the script
  • blog admin password should be reset to something I can specify in the script

I should be able to edit the script config file to provide mysql root user + pass, wordpress database name.

It would also be interesting if the script was built in such a way that it could be further modified/developed to allow installation of blogs on separate subdomains rather than subfolders. From the point of view of the filesystem the blogs live in, this wouldn’t change much — but some extra WordPress options would need editing (e.g. blog address), as well as the Apache and Bind config files necessary to set up the subdomain. This is not mandatory, but could be useful at some point (if we’re thinking in the line of WordPress-farming).

Batch Category Editing For WordPress [en]

I put together an admin screen for WordPress today which allows changing multiple categories of multiple posts at the same time. Code available, no guarantees.

[fr] J'ai codé une extension à  WordPress qui permet d'éditer les catégories de nombreux billets en un coup. L'écran liste par exemple tous les billets d'une catégorie, accompagnés d'un certain nombre de selects. On effectue les modifications que l'on désire et on soumet le formulaire entier en une fois.

Update 13.07: A more recent version is out!

I had planned to give you a write-up of the beginning of my WordPress experience today. Unfortunately, I decided to clean up my categories somewhat before I did that, and I managed to badly mess things up.

The result is that I spent most of my day writing a Batch Categories admin screen to help me clean things up. It was something I had planned to do, and I suppose it will also be useful to other people.

If you want to play around: copy the code above into a file named batch-categories.php in your wp-admin directory. I highly recommend that you back up your wp_post2cat table before you get going. This script works for me, but hasn’t been tested much, and comes with no guarantees. It is not optimised either, so depending on how many posts and categories you list, the screen can very well take over half a minute to load!

There are still a few functionalities I want to add, in particular: assigning all listed posts to a category in one go (or removing them).

If you want pretty integration with the other screens of the Edit menu, you’ll have to tweak the navigation bar in edit.php, edit-comments.php, and moderation.php.

Update 24.06.04: I’ve uploaded a screenshot of the admin screen so you can see what it could look like.

Update II 24.06.04: Instead of hacking the Edit menu bars, you can also access the Batch Categories screen from the Plugins page: create a file called batch-access.php (e.g.) in your plugins directory. (Beware not to leave any whitespace after the ?>, though, or you’ll get errors. Promised, zips and more detailed documentation will follow.

Update 04.07.04: I tried using the script this morning, and it seems nastily broken (removed all categories for some posts). Use with caution, and get back to me if ever you hack it or modify it, I’m interested! I’ll look into this once I get back home from Vienna.

Update 12.07.04: The script now works as it should! Thanks to Ben and MooKitty for helping me nail the big nasty bug which was driving me bonkers! Two improvements I’m working on right now: making the code more efficient by using the category cache, and adding a “add all listed posts to category X” option.