Finally out of MySQL encoding hell [en]

[fr] Description de comment je me suis sortie des problèmes d'encodage qui résultaient en l'affichage de hiéroglyphes sur tous les sites hébergés sur mon serveur.

It took weeks, mainly because I was busy with a car accident and the end of school, but it also took about two real whole days of head-banging on the desk to get it fixed.

Here’s what happened: remember, a long time ago, I had trouble with stuff in my database which was [supposed to be UTF-8 but seemed to be ISO-8859-1](http://climbtothestars.org/archives/2004/07/18/converting-mysql-database-contents-to-utf-8/)? And then, sometime later, I had a [weird mixture of UTF-8 and ISO-8859-1 in the same database](http://climbtothestars.org/archives/2005/02/19/problemes-dencodage-mysql/)?

Well, somewhere along the line this is what I guess happened: my database installation must have been serving UTF-8 content as ISO-8859-1, leading me to believe it was ISO-8859-1 when it was in fact UTF-8. That led *me* to try to convert it to UTF-8 — meaning I took UTF-8 strings and ran them through a converter supposed to turn ISO-8859-1 into UTF-8. The result? Let’s call it “double-UTF-8” (doubly encoded UTF-8), for want of a better name.

Anyway, that’s what I had in my database. When we upgraded MySQL and PHP on the server, I suddenly started seeing a load of junk instead of my accented characters:

encoding-problem-2

What I was seeing looked furiously like UTF-8 looks when your server setup is messed up and serves it as ISO-8859-1 instead. But, as you can see on the picture above, this page was being served as UTF-8 by the server. How did I know it wasn’t ISO-8859-1 in my database instead of this hypothetical “double-UTF-8”? Well, for one, I knew the page was served as UTF-8, and I also know that ISO-8859-1 (latin-1) served as UTF-8 makes accented characters look like question marks. Then, if I wanted to be sure, I could just change the page encoding in Firefox to ISO-8859-1 (that should make it look right if it was ISO-8859-1, shouldn’t it?) Well, it made it [look worse](http://flickr.com/photos/bunny/179791233/).

Another indication was that when the MySQL connection encoding (in my.cnf) was set back to latin-1 (ISO-8859-1), the pages seemed to display correctly, but WordPress broke.

The first post on the picture I’m showing here looks “OK”, because it was posted after the setup was changed. It really is UTF-8.

Now how did we solve this? My initial idea was to take the “double-UTF-8” content of the database (and don’t forget it was mixed with the more recent UTF-8 content) and convert it “from UTF-8 to ISO-8859-1”. I had [a python script we had used to fix the last MySQL disaster](http://sungnyemun.org/code/fixEncodings.py.gz) which converted everything to UTF-8 — I figured I could reverse it. So I rounded up a bunch of smart people ([dda_](http://sungnyemun.org/), [sbp](http://inamidst.com/sbp/), [bonsaikitten](http://gentooexperimental.org/~patrick/weblog/) and [Blackb|rd](http://www.oydt.de/) — and countless others, sorry if I forgot you!) and got to work.

It proved a hairier problem than expected. What also proved hairy was explaining the problem to people who wanted to help and insisted in misunderstanding the situation. In the end, we produced a script (well, “they” rather than “we”) which looked like it should work, only… it did nothing. If you’re really interested in looking at it, [here it is](/code/fixDoubleEncodings.py) — but be warned, don’t try it.

We tried recode. We tried iconv. We tried changing my.cnf settings, dumping the databases, changing them back, and importing the dumps. Finally, the problem was solved manually.

1. Made a text file listing the databases which needed to be cured (dblist.txt).
2. Dumped them all: for db in $(cat dblist.txt); do mysqldump --opt -u user -ppassword ${db} > ${db}-20060712.sql; done
3. Sent them over to Blackb|rd who did some search and replace magic in vim, starting with [this list of characters](http://climbtothestars.org/play/characters.txt) (just change the browser encoding to latin-1 to see what they look like when mangled)
4. Imported the corrected dumps back in: for db in $(cat dblist.txt); do mysql -u user -ppassword ${db} < ${db}-20060712.sql; done

Blackb|rd produced [a shell script for vim](http://www.schwarzvogel.de/software-misc.shtml) (?) which I’ll link to as soon as I lay my hands on the URL again. The list of characters to convert was produced by trial and error, knowing that corrupted characters appeared in the text file as A tilde or A circonflexe followed by something else. I’d then change the my.cnf setting back to latin-1 to view the character strings in context and allow Blackbr|d to see what they needed to be replaced with.

Thanks. Not looking forward to the next MySQL encoding problem. They just seem to get worse and worse. (And yes, I *do* use UTF-8 all over the place.)

Similar Posts:

Peu de connection internet ces prochains jours [fr]

[en] My motherboard probably died, so I'll be ibook-less for one or two weeks, which means I'll have limited internet connection. Best way to reach me is by gmail or phone. If we have a meeting set up these next two weeks, please get in touch to confirm -- my calendar is on the sick computer.

Mon ordinateur portable adoré est en réparation (carte mère grillée, probablement) et j’aurai donc peu d’accès à  internet durant une semaine ou deux. Pour me joindre, utilisez mon adresse gmail (c’est celle que je vérifierai le plus souvent) ou bien des moyens archaïques comme le téléphone (mobile ou non).

De plus, si l’on est censés se voir prochainement, il est peut-être une bonne idée de me contacter pour confirmer, vu que mon agenda est sur l’ordinateur malade…

Toutes mes excuses pour le dérangement!

Similar Posts:

DailyMotion Problems Solved: View Robert's Video Now [en]

[fr] La vidéo de Robert Scoble que j'ai faite à LIFT'06 est maintenant réparée et visible dans le billet initial.

Finally! With help from Olivier of DailyMotion, I’ve solved the [DailyMotion problems](http://climbtothestars.org/archives/2006/02/09/dailymotion-problems/ “A description of what was bothering me.”) which prevented the [wild videocast of Robert Scoble](http://climbtothestars.org/archives/2006/02/04/wild-videocast-of-robert-scoble-interview/ “Read the post, enjoy the video.”) from playing correctly in the post I’d written.

I had copied the code from [another video I’d embedded](http://steph.wordpress.com/2005/11/27/youtube-or-dailymotion/ “View the post and video. A fun one shot with Michel Valdrighi.”) in a post on my [Cheese Sandwich Blog](http://steph.wordpress.com “This is where I test wordpress.com and writing boring stuff.”), and changed the video ID. The only problem is that DailyMotion code includes a key which is blog-specific, so people were getting an error message when they tried to play the video on the blog. I tried republishing the video here using their “blog this” feature, but that didn’t embed it properly. Finally, Olivier pointed me to the “manual” option — which I hadn’t seen, although it was what I was looking for! — which simply spits out code for you to copy-paste into your blog.

So, if you gave up earlier, or didn’t have a chance to see it, go and [watch Robert being podcasted by two swiss guys at LIFT’06](http://climbtothestars.org/archives/2006/02/04/wild-videocast-of-robert-scoble-interview/ “Optional bunny entertainment included.”).

Similar Posts:

DailyMotion Problems [en]

[fr] Un problème avec DailyMotion, heureusement réglé. Si vous n'avez pas pu voir la vidéo où je fais la bobette derrière Robert Scoble, c'est le moment d'y aller!

You probably know I like [DailyMotion](http://dailymotion.com/Steph “See my videos there.”). I posted some [feedback about DailyMotion](http://steph.wordpress.com/2006/02/05/dailymotion-feedback/) yesterday, and bumped into some naughty problems today.

The problem with DailyMotion is that it doesn’t have a nice forum or a real devblog like [coComment](http://cocomment.com) where we can leave feedback. So I’m posting it to my blog and tagging it in hope it will be found. By the way, I’ve been wondering [what the best place is](http://climbtothestars.org/archives/2005/12/18/split-identity-crisis/) for this kind of feedback: here or [on the Cheese Sandwich Blog](http://steph.wordpress.org/tag/geek)? What’s your take?

After LIFT’06, I put [this video of Robert being interviewed](http://www.dailymotion.com/Steph/video/39332) online and [wrote a post about it](http://climbtothestars.org/archives/2006/02/04/wild-videocast-of-robert-scoble-interview/) here. Unfortunately, it seems at least one of my readers is [not able to view it](http://climbtothestars.org/archives/2006/02/04/wild-videocast-of-robert-scoble-interview/#comment-54798) . (I guess there are at least 20 of you out there who just didn’t tell me about it.) The message says something about a key not being valid for this blog.

DailyMotion allows you to [blog your videos directly from the site](http://www.dailymotion.com/doc/faq#section_15). That’s neat, but as I’m a control freak, I like dealing with the code myself. Back in November I had [posted a video to my other blog](http://steph.wordpress.com/2005/11/27/youtube-or-dailymotion/), so I grabbed the code from over there, adapted it (video id), and it seemed to work. Actually, that was because I was still logged in to my DailyMotion account.

I first tried adding CTTS to my DailyMotion account, as a second blog. That failed (error message, just doesn’t work). As I was writing this post, I tried logging out of DailyMotion, and actually saw the message all my poor readers have been seeing these last days! In a click of my trackpad I was able to fix everything.

So, if you haven’t seen me [goofing off behind Robert Scoble as David and Marc-O try to podcast him](http://climbtothestars.org/archives/2006/02/04/wild-videocast-of-robert-scoble-interview/) (red wine and Apple hardware involved), now’s the time to do it! Sorry for the buggy post, and thanks a lot to [Raphael](http://www.electronlibre.ch/electronnews) for pointing out the problem to me.

Similar Posts:

Lift: Thanks for the Videos, but… [en]

[fr] Problème pour visionner les vidéos de LIFT avec OSX et Firefox. Et vous?

I tried to get to the [LIFT videos](http://www.freestudios.tv/?cdroite=tablo_lift06) but I can’t read them. I have the latest versions of Tiger and Firefox. I spent a minute in a pop-up configuration window (that was nasty to start with), and then it just didn’t work. Can’t we have [DailyMotion](http://dailymotion.com)-style videos that “simply work”?

Audio works, though. Would be nice to be able to download it instead of stream.

As for the podcast feed, it asks me if I want to open NNWL. A little button to subscribe in iTunes would be really neat.

Similar Posts:

MagpieRSS Caching Problem [en]

I have a caching problem using the PHP MagpieRSS library to parse feeds. Any help welcome.

[fr] J'ai un problème de cache utilisant la librarie PHP MagpieRSS. Toute aide bienvenue!

I’ve been stuck on a problem with MagpieRSS for weeks. This is a desperate call for help.

At the top of my sidebar, I have two lists of links which are generated by parsing RSS feeds: Delicious Linkball and Recently Playing. They don’t update.

If I delete the cache files, the script creates them all right. If I keep an eye on the cache files, I see their timestamp is updated every hour, but not the contents. I’ve uploaded the PHP code which parses the feeds.

Any suggestions welcome. I’m not far from giving up and setting cron jobs to regularly delete the cache files. Thanks in advance.

Update 13:00: The Recently Playing list updates once an hour (when the cache is “force-refreshed”), it seems — but not the Delicious Links one.

14:00: Some progress: http://del.icio.us/rss/steph/ doesn’t seem to update unless I clear the cache on my machine. (Huh?) http://ws.audioscrobbler.com/rdf/history/Steph-Tara, on the other hand, is — but why does the cache update only once an hour, and not each time the feed is modified?

15:00: crschmidt just pointed out that the last-modified date on my del.icio.us RSS feed was horribly wrong. Might be something that was done at the time when my caching problems were causing me to nastily abuse the poor del.icio.us server. I’ve sent a mail to Joshua to see if indeed this could be the problem.

15:50: Still thanks to the excellent crschmidt, I’ve finally understood how this caching is supposed to work. (Yes, I know, we’re starting to have lots of edits on this post.) There is a setting which determines how old the cache must be to become “stale”. As long as the cache is not stale, any requests made will use the cache directly, without pulling the feed in question. If the cache is stale, a request is sent to the server hosting the feed to check if it has changed since it was last accessed. If it has changed (i.e., if Last-Modified is more recent than the cache), it gets a fresh version of the feed. Otherwise, nothing happens (the cache age is just “reset”).

Now, for a LinkLog service like del.icio.us, setting the cache age to a couple of hours is more than enough as far as I’m concerned. However, for a list of recently played songs, every few minutes should be better. MagpieRSS seems to allow this to be set on a per-call basis by defining MAGPIE_CACHE_AGE, but it doesn’t seem to be working for me. Another variable is set on a per-installation basis: var $MAX_AGE = 1800; — but changing that won’t really help, as I want different values for Recently Playing and Delicious Links. Suggestions on this secondary problem welcome too!

16:40: After exchanging a few e-mails with Joshua, it seems that there was indeed a problem with the Last-Modified date on my feed. Not quite sure how it came about (somebody requesting the feed when I hadn’t posted in some time?), but it should be fixed now. I’ve cleared my cache files to see if my 30-minute “stale time” is working or not.

17:30: (See how I’m updating every 50 minutes? Freaky.) So, the not-so-nice things about PHP constants is that they are constant and (?) local to the function in which they are defined. (Not sure I go that bit right, but.) Important thing here is to note that MAGPIE_CACHE_AGE can’t be used to set different “stale cache” ages for different feeds. The stale cache age needs to be set at the bottom of rss_fetch.inc (the only place I hadn’t touched) — so my cache is now refreshing every half-hour. (Which is a bit too often for del.icio.us, and not often enough for Audioscrobblers.) oqp says he can write a wrapper to get around this limitation — I’m waiting impatiently for him to do it!

Similar Posts:

U-Blog, Six Apart, and Their Angry Bloggers [en]

This very long post is, for the first time in English, a pretty complete account of what has been going on with U-blog and Loïc Le Meur in the French blogosphere for some time now. With the acquisition of Ublog by Six Apart, these problems are bound to take another dimension for the English-speaking blogosphere.

[fr] Ce très long billet expose en anglais l'histoire de U-blog et des problèmes s'y rapportant. J'ai déjà écrit à ce sujet en français (lire également les commentaires) -- pour une fois que la "barrière linguistique" empêche les anglophones de savoir certaines choses, plutôt que le contraire!

So, why on earth are U-bloggers so angry?

I’m often concerned that the language divide makes non-English-speaking people miss out on a whole lot of interesting stuff. These past few days, I’ve been concerned that the language divide may be preventing English-speaking people from knowing about certain things. U-bloggers are angry, and they also have the sympathy of others in the franco-blogosphere, but all that is happening in French.

How aware is Six Apart that they have a bunch of angry french customers, who were encouraged to sign up for a paying version before the end of last year under promise of new features, which weren’t developed and seemingly never will? Edit 06.01.05: see note.

Let’s rewind a bit, shall we? I always think that history explains a lot. Many of the dates here are taken from Laurent’s short history of the franco-blogosphere, a work in progress. Other information comes from my regular trips around the blogosphere and my conversations with people — in particularly, here, with Stéphane, the creator of the U-blog weblogging platform. This is the story to the best of my knowledge. If there are any factual mistakes, I’ll be glad to correct them.

In November 2002, Stéphane Le Solliec starts working on a blogging platform he calls Meta-blog. A few months later, in December, U-blog (the new name for the platform) already has a few hundreds of users.

The interface is good, U-blog is pretty zippy, and it has a great community. Also, it’s French. Setting aside any primal xenophobia or anti-americanism, a great product designed in your language by a fellow countryman is not the same thing as another great product translated and adapted from English. (Ask somebody who lives in a country where most of the important stuff is “imported” from the German-speaking part…) And let’s face it, one does like to support a local product, whether one is French, Swiss, or American. I actually considered U-blog the best hosted solution for French-speakers, at some point, and recommended it to a few friends, who started weblogs. Joueb.com is a native French weblogging platform which has been around for far longer than U-blog, but for some reason it isn’t quite as popular.

About a year later, Stéphane is thinking about abandoning the platform. He’s doing it on his free time, he has a baby, and U-blog takes up a lot of time. He stalls development, and stops allowing the creation of new free blogs. (It will again be possible to create free blogs a few weeks later.) Existing free blogs remain in place, but lose visibility (pinging and home page) compared to paying blogs. (Paying U-blog customers pay 1€ per month.)

Around that time, Loïc, whose interest in weblogs has been sparked by meeting Joi at the World Economic Forum, and who has unsuccessfully approached the founder of Joueb.com, Stéphane Gigandet (yes! another Stéphane!), gets in touch with Stéphane Le Solliec in September (2003). As a result, he acquires the platform and user-base, and founds the company Ublog.com. Loïc really wants Stéphane to stay on board, and he does, before leaving a couple of months later (company-life isn’t really his cup of tea).

Loïc does a great job getting the French press (and later, politicians) interested in weblogs. He calls up journalists, educates them, and before long Loïc, fondateur de Ublog regularly appears in articles about weblogging. Inevitably, he starts appearing as “the guy who introduced weblogs in France”, and the expression “founder of Ublog” entertains a confusion between the blogging platform and the company (“founder” being at times replaced by “creator”). Loïc founded the company, but he in no way created the blogging platform U-blog.

You can imagine that the U-bloggers, who already weren’t very excited about having been “bought” (particularly by a guy who had the bad taste to start blogging in English), didn’t really like seeing Loïc shine so bright and Stéphane slowly fade into oblivion. Some long-standing French-speaking webloggers external to U-blog will start keeping a suspicious eye on this newcomer that so many are talking about, and who seems to be (God forbid!) making weblogs into a business (complete with press pack).

End October, when Stéphane announces the changes at Ublog following the association with Loïc, the following structure is presented (as an aside, the fact that this page seems to have been taken down doesn’t make Ublog look good. If it’s a mistake, they should put it back up again):

Free U-blog
The basic offer, with an advertising banner.
U-blog Plus
The paying offer, with a few more bells and whistles than the free one (ping, home page listing) and lots of exciting new features (for 4€ per month instead of the actual 1€)
U-blog Pro
More advanced, with own domain name, multi-author, etc… to be defined

In a smart move, existing U-bloggers were given the chance to sign up for the second offer for 1€ instead of 4€ for the coming year, starting January 1st (date at which the new tariff would become active). It sounded attractive, and quite a few went for it. The future seemed bright, with promise of dynamic future development, despite the complaints about the increase in pricing (but which did not impact existing users that much).

During the next months, some new features are introduced. More are announced.

In March, Six Apart and Ublog SA sign an exclusive representation agreement in Europe. An announcement is made in the U-blog newsletter. April 29th, TypePad arrives on U-blog. The official Ublog weblog will publish another four or five brief posts related to TypePad before going quiet.

One can wonder: what sense does it make for a blogging platform like U-blog to sign an agreement with another, similar, hosted blogging platform like TypePad? Was the U-blog platform not good enough? Will development be stalled on the “old” platform, will it be abandoned? Overall, U-bloggers are worried and unhappy (I could add more, but those are two good starting-points and seem to sum it up pretty well). They are now offered three possibilities (as often, what is said in the comments is much more interesting than the post itself):

Free U-blog
The basic offer, same as before.
U-blog Plus
The paying offer for those who already have it, same as before, but no new features.
TypePad
A more advanced platform, where the active development will take place. Approx. 15€, but discount prices for current U-bloggers.

In short, all new development efforts seem to be going towards TypePad, and U-blog Plus will stop evolving, unlike what had been promised end of October. Reactions are aggressive (we all know that end-users are not kind when they complain). When U-bloggers ask about the new features that had been promised to those of them with paying accounts, they are told that the features are on TypePad. Loïc, who has already ruffled a few feathers by demanding that a popular blogger remove a post about him, under threat of lawsuit, does not distinguish himself in the area of good customer relations. (In particular, his comment regarding the contents of Aurora’s weblog (bondage and S&M), in the middle of a thread about U-blog and TypePad, didn’t look very good.) U-bloggers (particularly the paying ones) feel a bit cheated.

There is no question for me that Loïc is being given a harder time than he deserves, but it is pretty clear that he is not doing a very good job communicating with his unhappy customers.

TypePad.fr does not seem to be a howling success. I have heard complaints of people who find it slow (slower than U-blog, in particular) and not intuitive. Jean-Luc Raymond, the blogger who runs MediaTIC, publishes a critical post about TypePad.fr. Now, JLR isn’t the blogger I respect the most. He doesn’t always verify his sources, and has been known to remove embarrassing comments and posts with little ceremony. However, if his article on TypePad is over the top (as I suspect it might), it would in my opinion deserve more precise refutation than this dismissive comment of Loïc’s.

So, what is going on today? Basically, a continuation of what was already going wrong. Now that Six Apart has bought Ublog, the U-blog platform and communitydefinitely seem doomed.

No official announcement of the transaction has been made on the U-blog site (as I mentioned, the official “corporate” weblog is dead). Loïc’s answer to my post raising the point is that U-bloggers who want information can contact him on his blog. Worse, in my opinion, Loïc withheld the announcement on his blog until it was published by the media. So in the franco-blogosphere, we learnt about it through the press rather than through Loïc’s weblog (the de facto official source of information for U-blog, as the company site has not been communicating anything these last months).

Aurora goes to war, and other U-bloggers are following suit. One can disapprove of their virulence, but calling them “Aurora’s fan-club” (in the comments to my post) does not get anybody anywhere, and mocking Aurora’s sexual preferences in response to her criticisms is distasteful, and unbecoming of the Director for Europe, Africa and the Middle-East and Executive VP of Six Apart.

Loïc may have a squeaky-clean image in the anglo-blogosphere, but it is far from being the case in the franco-blogosphere, particularly when you start digging around in comment threads. I find it especially disturbing that there seems to be a discrepancy in attitude between Loïc’s discourse on his weblog and his comments on other people’s weblogs.

I personally do not think Loïc is a bad person, or has bad intentions. He’s interested in “the business side of weblogs” (and in that we differ), and that of course will make him unsympathetic to some, but I do believe he is genuinely interested in what he’s doing. However, I think he does not understand his customers very well, and does not communicate with them well either. His ambition as a businessman, excited by the challenge of managing an American company, leader in its domain, does at times seem to overshadow his concern about his end-users well-being.

This has been a long post. If you’ve read it, thank you. If you’ve just skimmed it, let me briefly come back on my main points:

  • U-bloggers have been promised features for their pay-version, which will not come.
  • The acquisition of Ublog by Six Apart seems to point to a near death of the old blogging platform, and more dramatically for its users, of the very strong community built around it. (Typepad doesn’t really have this “community” thing to it.)
  • Ublog (and now, Six Apart Europe) is demonstrating pretty poor communication with its unhappy users

Update, 24.07.04: a brief update after some comments I’ve received about this article.

  • I have now learnt that Six Apart did know about the problems at Ublog (since before the acquisition).
  • Although I considered it a possibility that they might not know, my main motivation for writing this article was that there was more to the Ublog story than what the English blogosphere in general was getting.
  • Of course, not all U-bloggers are unhappy. We’re talking about a bunch of very vocal and very angry people, not about the whole community. But in my opinion, the fact they are a minority does not mean they should not be taken seriously.

Similar Posts: