SMS: circonflexe => 70 caractères! [fr]

[en] Did you know that using common French caracters such as ç, ê, ô (and others) made your phone switch encodings, thus reducing the maximum length of your text messages to 70 characters, or less?

Saviez-vous que le jeu de caractères à disposition pour envoyer des SMS était limité? Oui, les SMS font 160 caractères… mais si on utilise un caractère en-dehors du jeu de base, le téléphone change l’encodage (UTF-16) et vous vous retrouvez avec… 70 caractères, si c’est pas 35!

A moins d’avoir un forfait SMS illimité, cette multiplication des SMS peut se répercuter sur votre facture.

Et en français, certains caractères courants comme ç, ê, ô ne se trouvent pas dans le jeu de caractères de base. Je cite Claude, qui a attiré mon attention sur ce problème:

By default the 7bit encoding used is GSM 03.38, which has the following 128 characters alphabet: @, £, $, ¥, è, é, ù, ì, ò, Ç, LF, Ø, ø, CR, Å, å, Δ, _, Φ, Γ, Λ, Ω, Π, Ψ, Σ, Θ, Ξ, ESC, Æ, æ, ß, É, SP, !, “, #, ¤, %, &, ‘, (, ), *, +, ,, -, ., /, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, :, ;, <, =, >, ?, ¡, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, Ä, Ö, Ñ, Ü, §, ¿, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, ä, ö, ñ, ü, à

Sympa, non?

Du coup, suivez les instructions pour afficher le nombre de caractères restants sur votre téléphone histoire d’éviter les mauvaises surprise, surtout à l’étranger! (Pour l’iPhone, voyez en bas de l’article de Claude, c’est dans Réglages > Messages.)

C’est pas ça qui va encourager nos ados à écrire correctement! 😉

Invalid argument supplied for foreach() in wp-capabilities.php: Case Cracked! [en]

[fr] Le problème avec wp-capabilities.php qui fait qu'on peut se retrouver "exfermé" (enfermé dehors) de son blog WordPress (typiquement en cas de changement de serveur) semble avoir sa source dans le contenu du champ wp_user_roles dans la table wp_options. En particulier, pour la version française, "Abonné" est un rôle d'utilisateur, et en cas de problèmes d'encodage MySQL, le caractère accentué sera corrompu, causant ainsi l'erreur.

Il suffit de remplacer le caractère fautif dans PhpMyAdmin, et on retrouve l'accès à son blog. Bon, reste ensuite à régler les questions d'encodage... mais c'est déjà ça!

Finally. At last. Endlich. Enfin.

Once more, while trying to transfer a WordPress installation from one server to another, I found myself facing the dreaded problem which locks me out of my WordPress install with a rather cryptic message:

Warning: Invalid argument supplied for foreach() in /home/user/wp/wp-includes/capabilities.php on line 31

(Your lineage may vary.)

What happens is that WordPress cannot read user roles, and therefore, even though your password is accepted, you get a message telling you that you’re not welcome in the wp-admin section:

Vous n’avez pas les droits suffisants pour accéder à cette page.

Or, in English:

You do not have sufficient permissions to access this page.

A quick search on the WordPress forums told me that I was not alone in my fight with wp-capabilities.php, but that many problems had not been resolved, and more importantly, that suggested solutions often did not work for everyone.

I’ve bumped into this problem a couple of times before, and I knew that it was linked to encoding problems in the database. (I’ve had my share of encoding problems: once, twice, thrice — “once” being on of the most-visited posts on this blog, by the way, proof if needed that I’m not alone with mysql encoding issues either.)

I’ll leave the detailed resolution of how to avoid/cure the MySQL problems later (adding
mysql_query("SET NAMES 'utf8'");
to wp-db.php as detailed in this thread, and as zedrdave had already previously told me to do — should have listened! — should prevent them). So anyway, adding that line to my working WordPress install showed me that the problem was not so much in the database dumping process than in the way WordPress itself interacted with the database, because the dreaded wp-capabilities.php problem suddenly appeared on the original blog.

Now, this is where I got lucky. Browsing quickly through the first dozen or so of forum threads about wp-capability.php problems, this response caught my eye. It indicated that the source of the problem was the content of the wp_user_roles field (your prefix may vary). In this case, it had been split on more than one line.

I headed for the database, looked at the field, and didn’t see anything abnormal about it at first. All on one line, no weird characters… just before giving up, I moved the horizontal scrollbar to the end of the line, and there — Eurêka! I saw it.

Abonné

“Contributor”, in French, is “abonné”, with an accent. Accent which got horribly mangled by the MySQL problems which I’ll strive to resolve shorty. Mangled character which caused the foreach() loop to break in wp-capabilities.php, which caused the capabilities to not be loaded, which caused me to be locked out of my blog.

So, in summary: if you’re locked out of your blog and get a warning/error about wp-capabilities and some invalid foreach() loop thingy, head for PhpMyAdmin, and look carefully through the wp_user_roles field in the wp_options table. If it’s split over two or more lines, or contains funky characters, you have probably found the source of your problem.

Good luck!

Finally out of MySQL encoding hell [en]

[fr] Description de comment je me suis sortie des problèmes d'encodage qui résultaient en l'affichage de hiéroglyphes sur tous les sites hébergés sur mon serveur.

It took weeks, mainly because I was busy with a car accident and the end of school, but it also took about two real whole days of head-banging on the desk to get it fixed.

Here’s what happened: remember, a long time ago, I had trouble with stuff in my database which was supposed to be UTF-8 but seemed to be ISO-8859-1? And then, sometime later, I had a weird mixture of UTF-8 and ISO-8859-1 in the same database?

Well, somewhere along the line this is what I guess happened: my database installation must have been serving UTF-8 content as ISO-8859-1, leading me to believe it was ISO-8859-1 when it was in fact UTF-8. That led me to try to convert it to UTF-8 — meaning I took UTF-8 strings and ran them through a converter supposed to turn ISO-8859-1 into UTF-8. The result? Let’s call it “double-UTF-8” (doubly encoded UTF-8), for want of a better name.

Anyway, that’s what I had in my database. When we upgraded MySQL and PHP on the server, I suddenly started seeing a load of junk instead of my accented characters:

encoding-problem-2

What I was seeing looked furiously like UTF-8 looks when your server setup is messed up and serves it as ISO-8859-1 instead. But, as you can see on the picture above, this page was being served as UTF-8 by the server. How did I know it wasn’t ISO-8859-1 in my database instead of this hypothetical “double-UTF-8”? Well, for one, I knew the page was served as UTF-8, and I also know that ISO-8859-1 (latin-1) served as UTF-8 makes accented characters look like question marks. Then, if I wanted to be sure, I could just change the page encoding in Firefox to ISO-8859-1 (that should make it look right if it was ISO-8859-1, shouldn’t it?) Well, it made it look worse.

Another indication was that when the MySQL connection encoding (in my.cnf) was set back to latin-1 (ISO-8859-1), the pages seemed to display correctly, but WordPress broke.

The first post on the picture I’m showing here looks “OK”, because it was posted after the setup was changed. It really is UTF-8.

Now how did we solve this? My initial idea was to take the “double-UTF-8” content of the database (and don’t forget it was mixed with the more recent UTF-8 content) and convert it “from UTF-8 to ISO-8859-1”. I had a python script we had used to fix the last MySQL disaster which converted everything to UTF-8 — I figured I could reverse it. So I rounded up a bunch of smart people (dda_, sbp, bonsaikitten and Blackb|rd — and countless others, sorry if I forgot you!) and got to work.

It proved a hairier problem than expected. What also proved hairy was explaining the problem to people who wanted to help and insisted in misunderstanding the situation. In the end, we produced a script (well, “they” rather than “we”) which looked like it should work, only… it did nothing. If you’re really interested in looking at it, here it is — but be warned, don’t try it.

We tried recode. We tried iconv. We tried changing my.cnf settings, dumping the databases, changing them back, and importing the dumps. Finally, the problem was solved manually.

  1. Made a text file listing the databases which needed to be cured (dblist.txt).
  2. Dumped them all: for db in $(cat dblist.txt); do mysqldump --opt -u user -ppassword ${db} > ${db}-20060712.sql; done
  3. Sent them over to Blackb|rd who did some search and replace magic in vim, starting with this list of characters (just change the browser encoding to latin-1 to see what they look like when mangled)
  4. Imported the corrected dumps back in: for db in $(cat dblist.txt); do mysql -u user -ppassword ${db} < ${db}-20060712.sql; done

Blackb|rd produced a shell script for vim (?) which I’ll link to as soon as I lay my hands on the URL again. The list of characters to convert was produced by trial and error, knowing that corrupted characters appeared in the text file as A tilde or A circonflexe followed by something else. I’d then change the my.cnf setting back to latin-1 to view the character strings in context and allow Blackbr|d to see what they needed to be replaced with.

Thanks. Not looking forward to the next MySQL encoding problem. They just seem to get worse and worse. (And yes, I do use UTF-8 all over the place.)

MediaWiki [en]

I’ve installed MediaWiki. Explanation and solution of a bug I bumped into while installing (because of UTF-8 in MySQL 4.1.x) and comments on the method for interface translation.

[fr] J'ai installé MediaWiki pour récussiter le moribond SpiroLattic, tombé sous les coups du wiki-spam. Voici la solution à  un problème que j'ai rencontré durant l'installation (dû au fait que j'utilise MySQL 4.1.x avec UTF-8), et aussi une description de la façon dont est faite la localisation par utilisateur de l'interface. Très intéressant!

I recently managed to install MediaWiki to replace PhpWiki for SpiroLattic, which I took offline some time ago because the only activity it had become home to was the promotion of various ringtone, viagra, and poker sites.

MediaWiki is the wiki engine behind Wikipedia. It is PHP/MySQL (good for me, maybe not for the server) and has a strong multilingual community.

I bumped into one small problem installing MediaWiki 1.4: the install aborted while creating the tables. Unfortunately, I don’t have the error message anymore, but it was very close to the one given for this bug.

If I understood correctly, when you’re running MySQL 4.1.x in UTF-8, the index key becomes too big, and MySQL balks. The solution is to edit maintenance/tables.sql and to change the length of the index key MySQL was complaining about. In my case, the guilty part of the query was KEY cl_sortkey(cl_to,cl_sortkey(128)) — I replaced 128 by 50 and it went fine. (Don’t forget to clean out the partially built database before reloading the install page — like that you don’t have to fill it all in again.)

MediaWiki allows each user to choose his or her language of choice for the interface. That is absolutely great, particularly for a multilingual wiki! Even better than that, they let users tweak the interface translation strings directly on the wiki.

There is a page named “Special:Allmessages” which lists all the localized strings. If you’re not happy with one of the translations, just click on the string, and the wiki will create a new blank page where you can enter your translation for it, which will override the initial translation. How cool is that?

Something like that for WordPress would be great, in my opinion!

Problèmes d'encodage MySQL [fr]

Un joli mélange de latin-1 et d’utf-8 dans ma base de données. Un script python pour nettoyer tout ça.

[en] I've been to MySQL encoding hell and back. The little question marks you may have seen in place of accented characters a few weeks back were caused by a lovely mix-up of utf-8 and latin-1 inside my databases. Dda_ from #joiito kindly helped me by writing a python script to identify fields with non-utf-8 characters in them, and convert them back.

Vous avez peut-être remarqué, il y a une semaine ou deux, que les accents de ce site avaient été remplacés subrepticement par de vilains points d’interrogation. Une fois de plus, je me trouvais dans la situation où je croyais avoir de l’utf-8 dans mes bases de données, pour réaliser ensuite qu’il s’agissait en fait de latin-1. Et cette fois, c’était encore bien pire qu’avant: j’avais un mélange d’utf-8 et de latin-1.

Dda_ a eu la grande gentillesse de passer plusieurs heures à  me pondre un script en python qui fait le tour de tous les champs de toutes les tables de toutes les bases de données, et les convertit en utf-8 s’il y détecte des caractères non-utf-8 (ce qui signifierait, dans mon cas, qu’on se trouve en présence de latin-1). Une fois que c’est fait, le script va changer l’encodage des tables pour que tout nouveau contenu y soit stocké en utf-8.

Bref, voici l’explication et le script.

Converting MySQL Database Contents to UTF-8 [en]

I finally managed to convert my WordPress database content to UTF-8. It’s easy to do, but it wasn’t easy to figure out.

[fr] Voici comment j'ai converti le contenu de ma base de données WordPress en UTF-8. C'est assez simple en soi, mais ça m'a pris longtemps pour comprendre comment le faire!

A few weeks ago, I discovered (to my horror) that my site was not UTF-8, as I thought it was. Checking my pages in the validator produced this error:

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the XML declaration (utf-8). I will use the value from the HTTP header (iso-8859-1).

In all likeliness, my server adds a default header (iso-8859-1) to the pages it serves. When I switched to WordPress, I was careful to save all my import files as UTF-8, and I honestly thought that everything I had imported into the database was UTF-8. Somewhere in the process, it got switched back to iso-8859-1 (latin-1).

The solution to make sure the pages served are indeed UTF-8, as specified in the meta tags of my HTML pages, is to add the following line to .htaccess:

AddDefaultCharset OFF

(If one wanted to force UTF-8, AddDefaultCharset UTF-8 would do it, but actually, it’s better to leave the possibility to serve pages with different encodings, isn’t it?)

Now, when I did that, of course, all the accented characters in my site went beserk — proof if it was needed that my database content was not UTF-8. Here is the story of what I went through (and it took many days to find the solution, believe me, although it takes only 2 minutes to do once everything is ready) to convert my database content from ISO-8859-1 to UTF-8. Thanks a lot to all those who helped me through this — and they are many!

First thing, dump the database. I always forget the command for dumps, so here it is:

mysqldump --opt -u root -p wordpress > wordpress.sql

As we’re going to be doing stuff, it might be wise to make a copy of the working wordpress database. I did that by creating a new database in PhpMyAdmin, and importing my freshly dumped database into it:

mysql -u root -p wordpress-backup < wordpress.sql

Then, conversion. I tried a PHP script, I tried BBEdit, and they seemed to mess up. (Though as I had other issues elsewhere, they may well have worked but I mistakenly thought the problem was coming from there.) Anyway, command-line conversion with iconv is much easier to do:

iconv -f iso-8859-15 -t utf8 wordpress.sql > wordpress-iconv.sql

Then, import into the database. I first imported it into another database, edited wp-config.php to point to the new database, and checked that everything was ok:

mysql -u root -p wordpress-utf8 < wordpress-iconv.sql

Once I was happy that it was working, I imported my converted dump into the WordPress production database:

mysql -u root -p wordpress < wordpress-iconv.sql

On the way there, I had some trouble with MySQL. The MySQL dump more or less put the content of all my weblog posts on one line. For some reason, it didn’t cause any problems when importing the dump before conversion, to create the backup database, but it didn’t play nice after conversion.

I got this error when trying to import:

ERROR 1153 at line 378: Got a packet bigger than 'max_allowed_packet'

Line 378 contained half my weblog posts… and was obviously bigger than the 1Mb limit for max_allowed_packet (the whole dump is around 2Mb).

I had to edit my.cnf (/etc/mysql/my.cnf on my system) and change the value for max_allowed_packet in the section titled [mysqld]. I set it to 8Mb. Then, I had to stop mysql and restart it: mysqladmin -u root -p shutdown to stop it, and mysqld_safe & to start it again (as root).

This is not necessarily the best way to do it, and it might not work like that on your system, but it’s what I did and the site is now back up again. Comments welcome, and hope this can be useful to others!